Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 17111 - Posted: 26 May 2006, 5:39:59 UTC - in response to Message 17093.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.


it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.
ID: 17111 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alain Maes

Send message
Joined: 23 Nov 05
Posts: 3
Credit: 4,079,510
RAC: 4,339
Message 17124 - Posted: 26 May 2006, 12:07:48 UTC

Workunit https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18060667 got stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour. Good that I catched it early. Workunit aborted. First time ever I had to do this for Rosetta.

Kind regards

Alain
Kind regards

Alain

ID: 17124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17133 - Posted: 26 May 2006, 14:28:02 UTC - in response to Message 17124.  
Last modified: 26 May 2006, 14:28:51 UTC

Workunit https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18060667 got stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour. Good that I catched it early. Workunit aborted. First time ever I had to do this for Rosetta.

Kind regards

Alain

Some workunits take a long time (over an hour) to reach their first structure, and this is normal behavior. They stay at 1.041% the whole time, until they reach their first structure.

So this work unit you aborted was likely OK.

There is a "watchdog" in the code that deletes work units that are truly stuck for a long time. I generally wait at least 4-5 hours before I start to get concerned.
Regards,
Bob P.
ID: 17133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17136 - Posted: 26 May 2006, 14:44:12 UTC - in response to Message 17124.  

...stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour.

Can you explain what you mean by that? Do you mean that the WU was in a "running" status for an hour, but only shows 8:55min. of CPU time?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17136 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Buffalo Bill
Avatar

Send message
Joined: 25 Mar 06
Posts: 71
Credit: 1,630,458
RAC: 0
Message 17146 - Posted: 26 May 2006, 16:50:36 UTC
Last modified: 26 May 2006, 16:52:06 UTC

Errored out. Max. disk useage exceeded.

JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_1618_0

http://www.boinc.bakerlab.org/rosetta/result.php?resultid=21291895
ID: 17146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17147 - Posted: 26 May 2006, 16:56:12 UTC - in response to Message 17146.  

Errored out. Max. disk useage exceeded.

JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_1618_0

http://www.boinc.bakerlab.org/rosetta/result.php?resultid=21291895


Can anyone identify the units that will cause this problem so I may delete them from my queue?

ID: 17147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17148 - Posted: 26 May 2006, 17:00:37 UTC
Last modified: 26 May 2006, 17:03:49 UTC

Failure:
5/25/2006 8:42:49 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_8465_0: exceeded disk limit: 139256061.000000 > 100000000.000000
5/25/2006 8:42:49 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_8465_0 (Maximum disk usage exceeded)

I crunch a 24hr time preference. Looks like it got about 15hrs in to it.

Seeing a trend here?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17153 - Posted: 26 May 2006, 17:31:02 UTC - in response to Message 17111.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.


it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.



If this is true, I don't understand why 5.16 is still being run on Ralph.
ID: 17153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17155 - Posted: 26 May 2006, 17:36:06 UTC - in response to Message 17153.  

[quote][quote]Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.



If this is true, I don't understand why 5.16 is still being run on Ralph.

I read this as the code being included within the workunit itself (the JUMP_ALLBARCODE_t285__SAVE_ALL_OUT), and not within the 5.16 application, but maybe I misread this.

Regards,
Bob P.
ID: 17155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17170 - Posted: 26 May 2006, 20:43:54 UTC
Last modified: 26 May 2006, 20:47:51 UTC

MAPRELAX_TEST_hom003_1fna__511_7114_0 using rosetta version 5.16
Shows VM size as 1,332,400K and steadily growing after 11.5hrs of crunching. Windows had to extend VM, and PC is getting sluggish.
Windows XP Pro SP 1, BOINC 5.4.9
...and attempting to display the graphic just brings up a totally black screen. No text, no boxes, no proteins.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17204 - Posted: 27 May 2006, 3:18:41 UTC

After experiencing many MANY fatal windows errors in Ralph with both 5.12 and 5.16 WUs I turned off the screensaver and all results after that worked fine. I turned the Screensaver back on and wuid=18092117 failed tonite. Screensaver background, windows error box on top of that.
Tony


Result ID 21609303
Name FRA_t289_hom001_3_LOOPRLX_IGNORE_THE_REST_dect289_3_t289_3_03_4.pdb_541_65_0
Workunit 18092117
Created 26 May 2006 6:29:12 UTC
Sent 26 May 2006 8:27:04 UTC
Received 27 May 2006 3:13:28 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 2 Jun 2006 8:27:04 UTC
CPU time 12781.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 3043036
# cpu_run_time_pref: 28800

</stderr_txt>


Validate state Invalid
Claimed credit 50.6089010366131
Granted credit 0
application version 5.16
ID: 17204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alain Maes

Send message
Joined: 23 Nov 05
Posts: 3
Credit: 4,079,510
RAC: 4,339
Message 17243 - Posted: 27 May 2006, 14:15:58 UTC - in response to Message 17136.  

...stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour.

Can you explain what you mean by that? Do you mean that the WU was in a "running" status for an hour, but only shows 8:55min. of CPU time?



Indeed, 8:55 min CPU time, running but no further progress.
Kind regards

Alain

ID: 17243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2025 University of Washington
https://www.bakerlab.org