Rosetta@Home version 3.31

Message boards : Number crunching : Rosetta@Home version 3.31

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,332,891
RAC: 3,380
Message 73088 - Posted: 17 May 2012, 20:44:51 UTC

No one else has created a thread for 3.31 yet, so I thought I would.

I haven't seen any problems with it yet except for one very memory-hungry workunit - over 1 GB.
ID: 73088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73091 - Posted: 17 May 2012, 22:12:55 UTC
Last modified: 17 May 2012, 22:13:53 UTC

Hi.

Guess i'm the first to get an error, this one ran for 3hrs 15min on my 4hr run pref.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=461602719

secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_5651_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 5-17 16:32:16:] :: BOINC:: Initializing ... ok.
[2012- 5-17 16:32:16:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_secY_hybrid_04a_secY_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 468
called boinc_finish

</stderr_txt>
]]>
ID: 73091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile E the P

Send message
Joined: 5 Jun 06
Posts: 36
Credit: 28,333,251
RAC: 0
Message 73095 - Posted: 18 May 2012, 14:20:37 UTC

I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this?
ID: 73095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile E the P

Send message
Joined: 5 Jun 06
Posts: 36
Credit: 28,333,251
RAC: 0
Message 73098 - Posted: 18 May 2012, 17:13:47 UTC - in response to Message 73095.  

I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this?


I think I've solved my own problem. I increased the memory avaiable to 75% (when the computer is in use). Decressed thrashing and my PC is much faster.
ID: 73098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,332,891
RAC: 3,380
Message 73099 - Posted: 18 May 2012, 17:32:42 UTC - in response to Message 73098.  

I've noticed that my computer functions MUCH slower ever since the 3.31. version was rolled out. A lot of disk thrashing. Has anyone else experienced this?


I think I've solved my own problem. I increased the memory avaiable to 75% (when the computer is in use). Decressed thrashing and my PC is much faster.


On the other hand, I've noticed my computer run much slower when I increase the available memory above about 3.5 GB (the computer has 8 GB).
ID: 73099 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Henderson

Send message
Joined: 24 May 06
Posts: 9
Credit: 643,001
RAC: 0
Message 73102 - Posted: 18 May 2012, 19:27:13 UTC

All of the Hybrid WUs I complete are WAY exceeding the time and only giving 20 credits. Just FYI.
ID: 73102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73106 - Posted: 18 May 2012, 21:44:16 UTC

Hi.

Got this error after 18sec.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=460793916

ab_11_29__optpps_T5311_optpps_03_09_35686_276619_1


Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME
can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3

ERROR: core::util::switch_to_residue_type_set fails

ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 73106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 73110 - Posted: 19 May 2012, 23:22:39 UTC

A couple of failures on W7, both work units crashing after 10 hours on a 6 hour preference

Task 506906466 (secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_60017_0)

Watchdog active.
# cpu_run_time_pref: 21600
Hbond tripped: [2012- 5-18 19: 5:27:]
BOINC:: CPU time: 36058.2s, 14400s + 21600s[2012- 5-19 3:39:15:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 36058.2 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>secY_hybrid_04a_secY_SAVE_ALL_OUT_IGNORE_THE_REST_50172_60017_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Similar result for task 506906470
ID: 73110 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 73112 - Posted: 20 May 2012, 0:53:34 UTC - in response to Message 73110.  

A couple of failures on W7, both work units crashing after 10 hours on a 6 hour preference



I see a similar problem with 3.31 running on Ubuntu Linux 12.04 64 bit.

Looks like all WUs must be abandoned as there is a clear issue with v3.31

My wus are 4 hour WUs that run in the expected amount of time. All 3.31 wus have been returned as client error.
ID: 73112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73116 - Posted: 20 May 2012, 14:45:25 UTC

I'm sorry that you are so upset about this that you felt it was necessary to post twice about your personal choice on handling the situation.

Please don't let credit get in the way of the big picture here. CASP 10 (just like CASP 9, and CASP 8, ...and CASP 7, ... ... and CASP6) will show the significance of what you are contributing to here. The methods developed in BakerLab are consistently proven to be amongst the best and most applicable across the world's entire scientific community.

I know the issues behind the credit will be addressed, indeed they always are. It happens and in recognition that it happens, and that your continued crunching is important to helping resolve it, the project awards credits to reported failures with a nightly run that has been in place for many years now. When this script grants credits, you have to display the WU details to see the awarded credit.
Rosetta Moderator: Mod.Sense
ID: 73116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 73117 - Posted: 20 May 2012, 15:06:30 UTC - in response to Message 73116.  

I'm sorry that you are so upset about this that you felt it was necessary to post twice about your personal choice on handling the situation.

Please don't let credit get in the way of the big picture here. CASP 10 (just like CASP 9, and CASP 8, ...and CASP 7, ... ... and CASP6) will show the significance of what you are contributing to here. The methods developed in BakerLab are consistently proven to be amongst the best and most applicable across the world's entire scientific community.

I know the issues behind the credit will be addressed, indeed they always are. It happens and in recognition that it happens, and that your continued crunching is important to helping resolve it, the project awards credits to reported failures with a nightly run that has been in place for many years now. When this script grants credits, you have to display the WU details to see the awarded credit.


Here is one of my failed work units from Rosetta 3.31 running on Ubuntu 12/04 64 bit under BOINC 7.0.28. Are you saying that this wu was granted credit in spite of the Client Error status???


Task ID 507128901
Name ab_11_29__optpps_T5441_optpps_03_09_35686_298319_0
Workunit 462102711
Created 19 May 2012 13:50:48 UTC
Sent 19 May 2012 13:51:46 UTC
Received 19 May 2012 22:42:03 UTC
Server state Over
Outcome Client error
Client state New
Exit status 0 (0x0)
Computer ID 1543142
Report deadline 29 May 2012 13:51:46 UTC
CPU time 10030.05
stderr out
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 5-19 13:46:14:] :: BOINC:: Initializing ... ok.
[2012- 5-19 13:46:14:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
======================================================
DONE :: 1 starting structures 10029.7 cpu seconds
This process generated 9 decoys from 9 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 87.855182957171
Granted credit 87.855182957171
application version ---

ID: 73117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73118 - Posted: 20 May 2012, 17:40:45 UTC
Last modified: 20 May 2012, 17:44:38 UTC

Are you saying that this wu was granted credit in spite of the Client Error status???


Granted credit 87.855182957171


Yes, that's exactly what I'm saying.

I'm also saying that your work on that task still delivered some scientifically useful results:
This process generated 9 decoys from 9 attempts

Rosetta Moderator: Mod.Sense
ID: 73118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Felix Kaeufer

Send message
Joined: 3 Feb 12
Posts: 2
Credit: 838,409
RAC: 1,355
Message 73150 - Posted: 24 May 2012, 17:41:16 UTC
Last modified: 24 May 2012, 18:00:07 UTC

I've got 9 rosetta units at the moment, 8 running, but crash after some time and start from the beginning. My memory is set 75% while computer is working 95% while not. Sometimes they restart after seconds, sometimes they restart after a long time. There are no other tasks despite a GPU-Collatz task, which runs only if the computer is inactive for 3 minutes. What can I do against this permanent waste of ressources?
Again after 3 minutes and 18 seconds. Maybe I don't have enough Memory (4GB). So I decided to stop all except one.
ID: 73150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73153 - Posted: 24 May 2012, 18:44:52 UTC

Felix, you might try clicking the preference to keep suspended tasks in memory (that "memory" is actually VIRTUAL memory). If they are not kept in memory, they lose their start if BOINC decides that task is the best one to suspend to live within the memory preference.
Rosetta Moderator: Mod.Sense
ID: 73153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Felix Kaeufer

Send message
Joined: 3 Feb 12
Posts: 2
Credit: 838,409
RAC: 1,355
Message 73154 - Posted: 24 May 2012, 18:51:45 UTC

Keeping them in memory was ticked, now there are many ibercivis tasks, and the only rosetta unit working, works fine now.
ID: 73154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73157 - Posted: 25 May 2012, 2:23:14 UTC

Hi.

Two different errors today, first one ran for 14sec / The second ran my full time of 4hrs, 2min then got a validate error for some reason.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462925986

ab_11_29__optpps_T5311_optpps_03_09_35686_285348_0

Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME
can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3

ERROR: core::util::switch_to_residue_type_set fails

ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

================================================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462885310

2LIS_VR14_Swanson_perturbation_pathdock_SAVE_ALL_OUT_pd_2LIS_VR14_out_393_50340_7_0

Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 530 starting structures 14393.3 cpu seconds
This process generated 530 decoys from 530 attempts
======================================================
BOINC :: WS_max 8.81443e-280

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid

ID: 73157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,332,891
RAC: 3,380
Message 73158 - Posted: 25 May 2012, 4:16:47 UTC - in response to Message 73157.  

P. P. L.,

I've seen a few things before that suggest that Rosetta@Home's validator tends to have problems if the decoy count gets past 100. If so, you may need a manual validation for the workunit where the decoy count reached 530; don't expect this to happen very quickly.
ID: 73158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bob

Send message
Joined: 23 Apr 09
Posts: 2
Credit: 8,854,738
RAC: 0
Message 73160 - Posted: 26 May 2012, 10:28:57 UTC

It appears that 3.31 is still having issues. Past four days 14 out 25 3.31 runs or 56 percent have ended with client errors, I have better odds at a casino.
ID: 73160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73161 - Posted: 26 May 2012, 23:04:41 UTC

Hi.

Got another one of these.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=462885410

2LIS_VR14_Swanson_perturbation_pathdock_SAVE_ALL_OUT_pd_2LIS_VR14_out_67_50380_7_1

Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 583 starting structures 14399.1 cpu seconds
This process generated 583 decoys from 583 attempts
======================================================
BOINC :: WS_max 8.81443e-280

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid

ID: 73161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Charles Tomaras

Send message
Joined: 18 Aug 09
Posts: 11
Credit: 25,925,380
RAC: 30,304
Message 73169 - Posted: 30 May 2012, 13:45:54 UTC

Seems about 80% of my recent work units end up with "compute error." I've stopped running Rosetta because of this new problem which appeared with the latest updates. Keep checking back to see if it's solved but ran some more Rosetta stuff last night and it's still screwed up. My computer is fine and runs SETI on all cores with zero errors or issues. Here's what I'm getting on one of the errant tasks:

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2012- 5-29 23: 5:18:] :: BOINC:: Initializing ... ok.
[2012- 5-29 23: 5:18:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007

</stderr_txt>
]]>


ID: 73169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Rosetta@Home version 3.31



©2024 University of Washington
https://www.bakerlab.org