Message boards : Number crunching : Rosetta@Home version 3.31
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,283,940 RAC: 1,099 |
No one else has created a thread for 3.31 yet, so I thought I would. I haven't seen any problems with it yet except for one very memory-hungry workunit - over 1 GB. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Guess i'm the first to get an error, this one ran for 3hrs 15min on my 4hr run pref. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=461602719 secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_5651_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> [2012- 5-17 16:32:16:] :: BOINC:: Initializing ... ok. [2012- 5-17 16:32:16:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_secY_hybrid_04a_secY_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ERROR: dis==0 in pairtermderiv! ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 468 called boinc_finish </stderr_txt> ]]> |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this? |
E the P Send message Joined: 5 Jun 06 Posts: 36 Credit: 28,333,251 RAC: 0 |
I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this? I think I've solved my own problem. I increased the memory avaiable to 75% (when the computer is in use). Decressed thrashing and my PC is much faster. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,283,940 RAC: 1,099 |
I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this? On the other hand, I've noticed my computer run much slower when I increase the available memory above about 3.5 GB (the computer has 8 GB). |
Mark Henderson Send message Joined: 24 May 06 Posts: 9 Credit: 643,001 RAC: 0 |
All of the Hybrid WUs I complete are WAY exceeding the time and only giving 20 credits. Just FYI. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Got this error after 18sec. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=460793916 ab_11_29__optpps_T5311_optpps_03_09_35686_276619_1 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3 ERROR: core::util::switch_to_residue_type_set fails ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
A couple of failures on W7, both work units crashing after 10 hours on a 6 hour preference Task 506906466 (secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_60017_0) Watchdog active. # cpu_run_time_pref: 21600 Hbond tripped: [2012- 5-18 19: 5:27:] BOINC:: CPU time: 36058.2s, 14400s + 21600s[2012- 5-19 3:39:15:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 36058.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_60017_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> Similar result for task 506906470 |
Robert Gammon Send message Joined: 9 Nov 07 Posts: 14 Credit: 969,848 RAC: 0 |
A couple of failures on W7, both work units crashing after 10 hours on a 6 hour preference I see a similar problem with 3.31 running on Ubuntu Linux 12.04 64 bit. Looks like all WUs must be abandoned as there is a clear issue with v3.31 My wus are 4 hour WUs that run in the expected amount of time. All 3.31 wus have been returned as client error. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I'm sorry that you are so upset about this that you felt it was necessary to post twice about your personal choice on handling the situation. Please don't let credit get in the way of the big picture here. CASP 10 (just like CASP 9, and CASP 8, ...and CASP 7, ... ... and CASP6) will show the significance of what you are contributing to here. The methods developed in BakerLab are consistently proven to be amongst the best and most applicable across the world's entire scientific community. I know the issues behind the credit will be addressed, indeed they always are. It happens and in recognition that it happens, and that your continued crunching is important to helping resolve it, the project awards credits to reported failures with a nightly run that has been in place for many years now. When this script grants credits, you have to display the WU details to see the awarded credit. Rosetta Moderator: Mod.Sense |
Robert Gammon Send message Joined: 9 Nov 07 Posts: 14 Credit: 969,848 RAC: 0 |
I'm sorry that you are so upset about this that you felt it was necessary to post twice about your personal choice on handling the situation. Here is one of my failed work units from Rosetta 3.31 running on Ubuntu 12/04 64 bit under BOINC 7.0.28. Are you saying that this wu was granted credit in spite of the Client Error status??? Task ID 507128901 Name ab_11_29__optpps_T5441_optpps_03_09_35686_298319_0 Workunit 462102711 Created 19 May 2012 13:50:48 UTC Sent 19 May 2012 13:51:46 UTC Received 19 May 2012 22:42:03 UTC Server state Over Outcome Client error Client state New Exit status 0 (0x0) Computer ID 1543142 Report deadline 29 May 2012 13:51:46 UTC CPU time 10030.05 stderr out <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> [2012- 5-19 13:46:14:] :: BOINC:: Initializing ... ok. [2012- 5-19 13:46:14:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 Starting work on structure: _00005 Starting work on structure: _00006 Starting work on structure: _00007 Starting work on structure: _00008 Starting work on structure: _00009 ====================================================== DONE :: 1 starting structures 10029.7 cpu seconds This process generated 9 decoys from 9 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 87.855182957171 Granted credit 87.855182957171 application version --- |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Are you saying that this wu was granted credit in spite of the Client Error status??? Granted credit 87.855182957171 Yes, that's exactly what I'm saying. I'm also saying that your work on that task still delivered some scientifically useful results: This process generated 9 decoys from 9 attempts Rosetta Moderator: Mod.Sense |
Felix Kaeufer Send message Joined: 3 Feb 12 Posts: 2 Credit: 821,233 RAC: 737 |
I've got 9 rosetta units at the moment, 8 running, but crash after some time and start from the beginning. My memory is set 75% while computer is working 95% while not. Sometimes they restart after seconds, sometimes they restart after a long time. There are no other tasks despite a GPU-Collatz task, which runs only if the computer is inactive for 3 minutes. What can I do against this permanent waste of ressources? Again after 3 minutes and 18 seconds. Maybe I don't have enough Memory (4GB). So I decided to stop all except one. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Felix, you might try clicking the preference to keep suspended tasks in memory (that "memory" is actually VIRTUAL memory). If they are not kept in memory, they lose their start if BOINC decides that task is the best one to suspend to live within the memory preference. Rosetta Moderator: Mod.Sense |
Felix Kaeufer Send message Joined: 3 Feb 12 Posts: 2 Credit: 821,233 RAC: 737 |
Keeping them in memory was ticked, now there are many ibercivis tasks, and the only rosetta unit working, works fine now. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Two different errors today, first one ran for 14sec / The second ran my full time of 4hrs, 2min then got a validate error for some reason. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462925986 ab_11_29__optpps_T5311_optpps_03_09_35686_285348_0 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3 ERROR: core::util::switch_to_residue_type_set fails ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> ================================================================================ https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462885310 2LIS_VR14_Swanson_perturbation_pathdock_SAVE_ALL_OUT_pd_2LIS_VR14_out_393_50340_7_0 Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ====================================================== DONE :: 530 starting structures 14393.3 cpu seconds This process generated 530 decoys from 530 attempts ====================================================== BOINC :: WS_max 8.81443e-280 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,283,940 RAC: 1,099 |
P. P. L., I've seen a few things before that suggest that Rosetta@Home's validator tends to have problems if the decoy count gets past 100. If so, you may need a manual validation for the workunit where the decoy count reached 530; don't expect this to happen very quickly. |
bob Send message Joined: 23 Apr 09 Posts: 2 Credit: 8,854,738 RAC: 0 |
It appears that 3.31 is still having issues. Past four days 14 out 25 3.31 runs or 56 percent have ended with client errors, I have better odds at a casino. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Got another one of these. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462885410 2LIS_VR14_Swanson_perturbation_pathdock_SAVE_ALL_OUT_pd_2LIS_VR14_out_67_50380_7_1 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ====================================================== DONE :: 583 starting structures 14399.1 cpu seconds This process generated 583 decoys from 583 attempts ====================================================== BOINC :: WS_max 8.81443e-280 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid |
Charles Tomaras Send message Joined: 18 Aug 09 Posts: 11 Credit: 25,604,406 RAC: 24,744 |
Seems about 80% of my recent work units end up with "compute error." I've stopped running Rosetta because of this new problem which appeared with the latest updates. Keep checking back to see if it's solved but ran some more Rosetta stuff last night and it's still screwed up. My computer is fine and runs SETI on all cores with zero errors or issues. Here's what I'm getting on one of the errant tasks: <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2012- 5-29 23: 5:18:] :: BOINC:: Initializing ... ok. [2012- 5-29 23: 5:18:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 Starting work on structure: _00005 Starting work on structure: _00006 Starting work on structure: _00007 </stderr_txt> ]]> |
Message boards :
Number crunching :
Rosetta@Home version 3.31
©2024 University of Washington
https://www.bakerlab.org