Message boards : Number crunching : minirosetta 2.16
Author | Message |
---|---|
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
This is reverting minirosetta to 2.14 due to the recent memory problem. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
These two both failed after 11sec with the new app! Maybe something to do with the change over? EDIT// I have a oct_ task running O.K. now. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338631824 mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74504_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5 ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ============================================================================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338631880 [b]mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74521_0[b/] <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5 ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Another one same problem on a different rig, 14sec this time. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338630619 mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74213_0 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5 ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0 |
Same error on WU 338667991 (https://boinc.bakerlab.org/rosetta/result.php?resultid=370646341). Wingman also had same error. Snippet from log: Incorrect function. (0x1) - exit code 1 (0x1) ... ERROR: bad line in file minirosetta_databasescoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5 ERROR:: Exit from: ....srccorescoringScoreFunction.cc line: 204 |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
My apologies for the new errors. I forgot that those jobs are using an option that's associated with 2.15. I just cancelled those jobs. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I got this one this morning, must have been before you stopped them. Same as others, ran for 11sec. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338812571 mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_85437_0 Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5 ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Two more of the same old error, downloaded earlier today. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338799412 mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_83642_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338804158 mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_84286_0 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
weights is not a oc error. that is a program error. if you get a windows debugger dump with a 0xc error I think it is, that could be due to oc or just some problem with the task and your memory. if you get a ton of the memory errors all in a row then your oc speed is to high. for my machine i can push rosie up to 3.0 ghz (this is as fast as I can go without crashing google earth). official clock speed on my machine is 2.5ghz Could those errors be related to oc? I contributed to project in past and recently have been recontributing and have seen no erros on my side. |
mickey Send message Joined: 11 Jan 10 Posts: 2 Credit: 20,113 RAC: 0 |
I have found one little issue in x86_64 linux app. When the workunits are started for the first time (i.e. started from 0.000% of progress) they works normally and the graphics are displayed correctly, but sometimes i find the graphics switching countinuosly from Stage: to App suspended without printing which stage is running. The value % complete is still increasing, like the step counter and the cpu time, but the top graphs: "Searching..." "Accepted", "Low Energy" and "Accepted Energy" are empty. I think that this trouble is raised when the app is stopped (for system reboot or any other motivation), and for making the app running again correctly I have to reset the project, because also new workunits will go in this state. Following the stderr of one corrupted execution at 17% done: [2010-10-13 2: 3:34:] :: BOINC:: Initializing ... ok. [2010-10-13 2: 3:34:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev38513.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_mem_widd_run03_centroid_A_2a9h_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 [2010-10-13 9:55: 8:] :: BOINC:: Initializing ... ok. [2010-10-13 9:55: 8:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev38513.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_mem_widd_run03_centroid_A_2a9h_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage1 ... success! Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage2 ... success! Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_1 ... success! # cpu_run_time_pref: 21600 Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_2 ... success! Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_3 ... success! I don't know if is only a graphics issue or a real app trouble, but I think that it's a strange behaviour. edit: my system config is Fedora core 13 x86_64 boinc installed from Fedora repositories version 6.10.45 other computer infos here |
mickey Send message Joined: 11 Jan 10 Posts: 2 Credit: 20,113 RAC: 0 |
Ok guys, calm down. :) I have just checked again the graphics of the workunit that I am crunching on, and it works again. So I think that what I've posted before is only a visualization issue and not an application one. |
eruda Send message Joined: 6 Apr 06 Posts: 1 Credit: 6,508,987 RAC: 0 |
it continues that some of the app seemed to corrupt suddenly and the BOINC manager fail to notice it. I have a task running for over 60 hours and stuck at 16%, so I have no choice to terminate it manually...I don't know much about the technical issues, sorry for being not able to provide the error log. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
NP_961412.1_boinc_boinc_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22319_804_1 The WU was ended after only 1201 seconds and one model completed despite 43200 second run time pref, and received the outcome "validate error". This is a resend (no reply) originally issued on the 6th prior 2.15 being rescinded though obviously I crunched it with 2.16. A third copy has now been issued. Snags |
Snowfall Send message Joined: 10 Dec 06 Posts: 2 Credit: 4,832,916 RAC: 99 |
My computer hanged after about 5 and a half hours of computation (with about the same amount left to go), then, after a reset, when I restarted BOINC it suddenly uploaded the workunit back to the server instead of continuing it. I logged in to see why it failed, but it seems that its outcome state is "Success". I don't think this is right. Is this supposed to happen? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=339761570 |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
My computer hanged after about 5 and a half hours of computation (with about the same amount left to go), then, after a reset, when I restarted BOINC it suddenly uploaded the workunit back to the server instead of continuing it. The task unit log you linked to ends with an error, so no, that isn't the expected behaviour. However you must have been able to return complete data for at least one model/decoy from the WU in order for the system to recognise that part of the upload as valid.
|
Snowfall Send message Joined: 10 Dec 06 Posts: 2 Credit: 4,832,916 RAC: 99 |
Thank you for the quick reply. Yes, I did notice that part of the error log, that's why I was intrigued by the apparently successful outcome state. I didn't ever consider it could be partly successful though :). |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Linking to a post reporting task names with "lrlx" in the name seem to cause some problems. Rosetta Moderator: Mod.Sense |
Pilgrim57 Send message Joined: 31 Jul 08 Posts: 3 Credit: 1,965,851 RAC: 0 |
For over a week now I have had a lot of errors with work units getting computer error after just starting, half way through or just before completion see task id 373614961,373614960,373614958, 372516143, 372515901. 370066639 371496237, 371496257. I have altered the run time from 6 hours to 4 & now the default 3. My PC is overclocked but it seems these errors have started with 2.16. Any help appreciated. |
Bikermatt Send message Joined: 12 Feb 10 Posts: 20 Credit: 10,552,445 RAC: 0 |
Anyone else notice PCS_ tasks running poorly in Linux? They are running longer than default and producing fewer models then on my similarly equipped Win 7 box. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one failed after 4sec. PCS_2RN2_v1.frag_1-41_SAVE_ALL_OUT_22378_8_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=341585851 ERROR: ERROR: FragmentIO: could not open file boinc_aafrag_1-41_09_05.200_v1_3.gz ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
PCS_2RN2_atensor.frag_1-100_SAVE_ALL_OUT_22378_16_0 died @ 3.5 seconds ERROR: ERROR: FragmentIO: could not open file boinc_aafrag_1-100_09_05.200_v1_3.gz ERROR:: Exit from: ....srccorefragmentFragmentIO.cc line: 258 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
Message boards :
Number crunching :
minirosetta 2.16
©2024 University of Washington
https://www.bakerlab.org