Workunit series 1hz6a

Message boards : Number crunching : Workunit series 1hz6a

To post messages, you must log in.

AuthorMessage
Christian Barrett
Avatar

Send message
Joined: 17 Sep 05
Posts: 11
Credit: 14,933
RAC: 0
Message 3165 - Posted: 14 Nov 2005, 9:20:56 UTC

I just recieved new workunits for this series and the first two i started crashed during the run. Is this series stable? I havent had any problems with the other series.

11/13/2005 6:30:43 PM|rosetta@home|Pausing result 1hz6A_abrelaxmode_random_length20_jitter02_omega_00594_0 (removed from memory)
11/13/2005 6:30:45 PM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_00594_0 ( - exit code -1073741819 (0xc0000005))
11/13/2005 6:30:45 PM||request_reschedule_cpus: process exited

and

11/14/2005 1:07:15 AM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_sim_aneal_01678_0 ( - exit code -1073741819 (0xc0000005))

i know from the FAQ that these are general client errors but it didnt start until this new series.
ID: 3165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vester
Avatar

Send message
Joined: 2 Nov 05
Posts: 257
Credit: 3,359,447
RAC: 15,632
Message 3169 - Posted: 14 Nov 2005, 11:56:23 UTC

No problems on my two computers which have uploaded about 60 of these jobs.
ID: 3169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile stephan_t
Avatar

Send message
Joined: 20 Oct 05
Posts: 129
Credit: 35,464
RAC: 0
Message 3172 - Posted: 14 Nov 2005, 12:43:17 UTC

Mine is stuck at 70% for the past 6 hours. Never had any problems before.
Team CFVault.com
http://www.cfvault.com

ID: 3172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Barrett
Avatar

Send message
Joined: 17 Sep 05
Posts: 11
Credit: 14,933
RAC: 0
Message 3226 - Posted: 14 Nov 2005, 22:22:50 UTC - in response to Message 3172.  

Mine is stuck at 70% for the past 6 hours. Never had any problems before.


well, i have had 3 fail on me now. the fourth has run twice if not three times as long as the previous set but hasnt client errored yet. we shall see.
ID: 3226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Foxfire

Send message
Joined: 3 Nov 05
Posts: 12
Credit: 582,360
RAC: 0
Message 3229 - Posted: 14 Nov 2005, 22:27:24 UTC - in response to Message 3226.  

Mine is stuck at 70% for the past 6 hours. Never had any problems before.


well, i have had 3 fail on me now. the fourth has run twice if not three times as long as the previous set but hasnt client errored yet. we shall see.


I did not have any failures yet, but my WUs went from average 10-15 min (1,5 Weeks ago) to average 130 min now.
ID: 3229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 3232 - Posted: 14 Nov 2005, 23:09:30 UTC
Last modified: 14 Nov 2005, 23:12:24 UTC

I haven't had any errors with this new series. Some of them can take much longer than other recent protein WUs, but this is normal and should be no cause for alarm.

@Stephan: Are you still stuck at 70%?
ID: 3232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile stephan_t
Avatar

Send message
Joined: 20 Oct 05
Posts: 129
Credit: 35,464
RAC: 0
Message 3236 - Posted: 14 Nov 2005, 23:24:57 UTC
Last modified: 14 Nov 2005, 23:26:53 UTC

Hello there - no I'm no longer stuck, I used some patience and it ended up taking 8h30 total to get processed. It was this result for this WU on this box here

Got me a cool 52.54 points.

EDIT: I should add, looking at my bench for the machine they seem very low. It's not the first time this happened - I posted a graph of my RAC for that box before... so I reckon it might be overheating and the p4 is throttling back. Will investigate.
Team CFVault.com
http://www.cfvault.com

ID: 3236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Red Squirrel

Send message
Joined: 26 Sep 05
Posts: 13
Credit: 3,613
RAC: 0
Message 3239 - Posted: 15 Nov 2005, 0:35:43 UTC

Yes, I've got one of the 1hz6a work units and it's got to 90% after 3 hours 30 mins. Most of the other work units have taken just over an hour.
I wonder if we could be given some idea how long a WU is going to take in comparison with the original WU's that were given out. The project team must have some idea how complex each different protein WU is.
ID: 3239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Barrett
Avatar

Send message
Joined: 17 Sep 05
Posts: 11
Credit: 14,933
RAC: 0
Message 3254 - Posted: 15 Nov 2005, 5:40:27 UTC - in response to Message 3239.  
Last modified: 15 Nov 2005, 5:44:23 UTC

Yes, I've got one of the 1hz6a work units and it's got to 90% after 3 hours 30 mins. Most of the other work units have taken just over an hour.
I wonder if we could be given some idea how long a WU is going to take in comparison with the original WU's that were given out. The project team must have some idea how complex each different protein WU is.


mine went for 4 hours and 40min before i got this
11/14/2005 5:48:49 PM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_sim_aneal_03386_0 ( - exit code -164 (0xffffff5c))

thats 4 failures out of 4 different units. They also failed at various times during the run.

the unit is located here
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1680973
ID: 3254 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 3257 - Posted: 15 Nov 2005, 6:01:55 UTC

Christian,

This is probably due to the increased size of the work units and/or having an older client version. If you are running multiple projects, be sure to keep the application in memory and set the "Switch between applications every" option in your general preferences to at least 2 hours. You may want to try the most recent version of the BOINC client. I have reduced the size of new work units, but there will likely be large work units in the future (larger proteins, longer methods etc..).
ID: 3257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 3263 - Posted: 15 Nov 2005, 6:46:06 UTC

No problems with me on these longer ones, wen from about 1 hour to 3 hours. We ex-FaD will not find a problem with even longer ones....at FaD there were some that took a couple of DAYS....even on my machines...
ID: 3263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Barrett
Avatar

Send message
Joined: 17 Sep 05
Posts: 11
Credit: 14,933
RAC: 0
Message 3423 - Posted: 16 Nov 2005, 15:49:12 UTC - in response to Message 3257.  

Christian,

This is probably due to the increased size of the work units and/or having an older client version. If you are running multiple projects, be sure to keep the application in memory and set the "Switch between applications every" option in your general preferences to at least 2 hours. You may want to try the most recent version of the BOINC client. I have reduced the size of new work units, but there will likely be large work units in the future (larger proteins, longer methods etc..).


Ok, thanks. I am upgrading to the new 5.2.* tonight. I didnt want to upgrade earlier because i was running a spinup for another project and was worried about the future stability but they assured it wont crash. We shall see.
ID: 3423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Workunit series 1hz6a



©2024 University of Washington
https://www.bakerlab.org