Problems with Rosetta version 5.78

Message boards : Number crunching : Problems with Rosetta version 5.78

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
TimL

Send message
Joined: 16 Sep 06
Posts: 17
Credit: 15,480,956
RAC: 0
Message 46149 - Posted: 13 Sep 2007, 20:33:00 UTC

Got these messages this morning

14/09/2007 6:28:49 AM|rosetta@home|Scheduler RPC succeeded
14/09/2007 6:28:49 AM|rosetta@home|Message from server: Project encountered internal error: shared memory
14/09/2007 6:28:49 AM|rosetta@home|Deferring communication for 1 hr 0 min 0 sec
14/09/2007 6:28:49 AM|rosetta@home|Reason: project is down

ID: 46149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Wits End

Send message
Joined: 16 Apr 07
Posts: 4
Credit: 29,477
RAC: 0
Message 46160 - Posted: 14 Sep 2007, 0:03:14 UTC - in response to Message 45712.  

I'm running v5.78 on a 600 MHz machine. Three of the last nine WUs that I've uploaded reported a "Validate error" (103801478/94222135; 104683771/94998439; and, 104848402/95151762). In total, these three WUs represent just shy of 19 CPU hours, with a combined credit claim of just over 58.

As you might imagine, wasting 19 hours of CPU time because every 1 out of 3 WUs is rejected has me a bit frustrated with R@H! (I'm also running World Community Grid and Seti@Home, neither of which are producing errors.) Is anyone else experiencing similar problems with v5.78?
ID: 46160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 46183 - Posted: 14 Sep 2007, 11:20:35 UTC

Here is another one

104430230 94762288 9 Sep
Jmarks
ID: 46183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 46188 - Posted: 14 Sep 2007, 12:23:55 UTC

I realize that 5.78 is (unpleasant) history now, but as a historical record, and in the hope that this information will make future versions of Rosetta more stable, I offer the following:

Five out of six Capri WUs killed by the watchdog, the casualties were:

WU 95336696
WU 95336686
WU 95336658
WU 95336625
WU 95366073

I won't try to link them all here, but they are all individually linked in my posts in this thread.

The sole survivor of the batch was:

WU 95336673

Even though it was a success, and this is a thread for problems, I have linked it here since there doesn't seem to be a place to post the (rare) successes in v5.78...

This represent about 90 hours on a pretty decent computer, about half of which was accomplished AFTER the problem with 5.78 was identified and (we hope) fixed.

When/If a situation like this arises again, I would specifically request a notice on the home page asking users to ABORT the WUs in question. I suspect the project would be better served thereby.

Respectfully,
David Emigh
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 46188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Wits End

Send message
Joined: 16 Apr 07
Posts: 4
Credit: 29,477
RAC: 0
Message 46204 - Posted: 14 Sep 2007, 16:30:28 UTC - in response to Message 45712.  
Last modified: 14 Sep 2007, 16:32:16 UTC


v5.80 appears to have inherited the same problem (Message 46203)!

ID: 46204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 46248 - Posted: 15 Sep 2007, 4:09:07 UTC

WU

**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -457.996 for 900 seconds
**********************************************************************
GZIP SILENT FILE: .xx1he8.out
ID: 46248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,705,781
RAC: 1,723
Message 46312 - Posted: 15 Sep 2007, 23:33:36 UTC
Last modified: 15 Sep 2007, 23:36:02 UTC

same problem with this WU

**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 117.804 for 900 seconds
**********************************************************************

thats 3 work units this week that got stuck and gave me only 20 points.
ID: 46312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,705,781
RAC: 1,723
Message 46334 - Posted: 16 Sep 2007, 9:02:27 UTC
Last modified: 16 Sep 2007, 9:05:09 UTC

this wu got stuck
# cpu_run_time_pref: 21600
# random seed: 1279344
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -231.574 for 900 seconds
**********************************************************************

yet another 20 instead of actual points, 4 of them now out of over 20 wu's
ID: 46334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,705,781
RAC: 1,723
Message 46335 - Posted: 16 Sep 2007, 9:05:58 UTC

more errors

104452248 94782448 10 Sep 2007 0:41:23 UTC 15 Sep 2007 1:03:19 UTC Over Validate error Done 21,553.84 --- ---
104452246 94782446 10 Sep 2007 0:41:23 UTC 15 Sep 2007 1:03:19 UTC Over Validate error Done 21,609.22 --- ---
ID: 46335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,705,781
RAC: 1,723
Message 46469 - Posted: 17 Sep 2007, 20:00:50 UTC

ID: 46469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Problems with Rosetta version 5.78



©2024 University of Washington
https://www.bakerlab.org