Posts by [DPC]Charley

1) Message boards : Number crunching : No checkpoint in more than 1 hour - Largescale_large_fullatom... (Message 13836)
Posted 15 Apr 2006 by [DPC]Charley
Post:
yes you're right.
Rosetta will only save after every completed model. With these large models, you have to complete them in one go or start over from the beginning when you unload the project (switch to another project for a little time, reboot your computer, you get the idea).
2) Message boards : Number crunching : Largescale WU horror (Message 13831)
Posted 15 Apr 2006 by [DPC]Charley
Post:
Target time doesn't make a difference.
System specs just might make a difference in how long the units actually take, but they'll most likely always take longer than your preferred time.
After 2 hours and 20 minutes, I'm still at model 1, step 300-and-some-small-change (1.5622%). Had another one of those yesterday evening, also took several hours (4 hours 34 minutes) to complete.

Just checked my other pc, that one finished a largescale_large_fullatom_relax in just over the preferred time, it took 2 hours and 10 minutes (pref == 2h on all machines). So it seems that there are "normal" wu's available too.

The other one and this one are nearly identical, only relevant difference is memory type and oc/non oc't. (irrelevant: winXP home vs pro, graphics card and some peripherals)
3) Message boards : Number crunching : Miscellaneous Work Unit Errors - II (Message 13652)
Posted 13 Apr 2006 by [DPC]Charley
Post:
Result

Incorrect function. (0x1) - exit code 1 (0x1)

ALL_TOPOLOGY_CODES_1scjB_434_1346_2
4) Message boards : Number crunching : Please ABORT these 4 stuck workunits (Message 13600)
Posted 12 Apr 2006 by [DPC]Charley
Post:
Great, thanks for the warning. Just did with one that seemed to be getting stuck on one of those on this box :)
5) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13532)
Posted 12 Apr 2006 by [DPC]Charley
Post:
Got another two units stuck at 1%, aborted 'm
TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_355 after 6 hours
and
TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_479_0 after 10 hours (seriously stuck, no counters increase except for the time)
6) Message boards : Number crunching : UNHANDLED EXCEPTION V4.97 (Message 13420)
Posted 10 Apr 2006 by [DPC]Charley
Post:
It's a known problem. Update your client to 4.98 and it should be ok.

How do I update clent (I thouhgt all that "magically" happened through the manager)? I have new WUs further in my que that are listed as 4.98 which I assume will change over, but I still have several 4.97 units - do I abort these?


Jup, just abort the lot (4.97). Most if not all of them will give you nothing but a headache anyway ;)
7) Message boards : Number crunching : UNHANDLED EXCEPTION V4.97 (Message 13414)
Posted 10 Apr 2006 by [DPC]Charley
Post:
It's a known problem. Update your client to 4.98 and it should be ok.
8) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13363)
Posted 9 Apr 2006 by [DPC]Charley
Post:
Workunit: FA_RLXpt_hom007_1ptq__361_230
Reason: Stuck at 1.042% (after almost 9 hours)
Stop: Manual
Link: Workunit - Result
9) Message boards : Number crunching : Miscellaneous Work Unit Errors (Message 13242)
Posted 8 Apr 2006 by [DPC]Charley
Post:
I'm getting tons of errors on the HBLR_1* stuff as well.
Out of 11 work units, 9 returned an error taking from about 1 minute to a couple of minutes short of an hour on 1 box (193403, for the admins).

Error codes:
8-4-2006 5:56:24|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_425_5187_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 6:03:32|rosetta@home|Unrecoverable error for result HBLR_1.0_2tif_425_7375_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 6:04:41|rosetta@home|Unrecoverable error for result HBLR_1.0_1n0u_425_9208_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 7:02:35|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_425_9364_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 7:30:31|rosetta@home|Unrecoverable error for result HBLR_1.0_1ogw_425_9448_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 7:43:45|rosetta@home|Unrecoverable error for result HBLR_1.0_1di2_426_203_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 12:05:59|rosetta@home|Unrecoverable error for result HBLR_1.0_2tif_426_571_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 16:08:16|rosetta@home|Unrecoverable error for result HBLR_1.0_2tif_426_3762_0 ( - exit code -1073741819 (0xc0000005))
8-4-2006 16:10:55|rosetta@home|Unrecoverable error for result HBLR_1.0_2reb_426_4608_0 ( - exit code -1073741819 (0xc0000005))


Second machine (181715) is also pumping out errors. Taking from 150 to 1230 seconds. These are the first units it's doing with 4.97.
Error codes:
08/04/2006 16:26:31|rosetta@home|Unrecoverable error for result HBLR_1.0_1ogw_426_283_1 ( - exit code -1073741819 (0xc0000005))
08/04/2006 16:49:21|rosetta@home|Unrecoverable error for result HBLR_1.0_1r69_426_428_1 ( - exit code -1073741819 (0xc0000005))
08/04/2006 16:53:59|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_426_4883_0 ( - exit code -1073741819 (0xc0000005))


Number three (193007) isn't doing any better, 4 out of 4 errors. Can't reach those error codes right now.

[/b]Number four[/b] (187877) is doing slightly better, with only 1 error so far out of 4 units.
Error codes:
8-4-2006 10:52:59|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_426_753_0 ( - exit code -1073741819 (0xc0000005))


All boxen are running windows XP home or pro and Rosetta 4.97.
The differences I can make out on my four boxen:
SP1 generates less errors than SP2. (Box 4 is still on SP1)
Pentium generates less errors than AMD. (Box 4 is a P3 733MHz, other boxen are AMD XP 2500+ and AMD 64 3700+).
Important note: not statistically relevant data of course, need more people for that ;)






©2021 University of Washington
https://www.bakerlab.org