LONG..... work unit

Message boards : Number crunching : LONG..... work unit

To post messages, you must log in.

AuthorMessage
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 14678 - Posted: 26 Apr 2006, 15:52:11 UTC
Last modified: 26 Apr 2006, 15:52:34 UTC

I have a condition on two of my computers that I have not seen before.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13402092

Its a HBLR_1.0 unit.

It does not appear to be stuck, just VERY SLOW. It gains about 0.01% per 200 seconds. It has been running for 16 hours 50 minutes and is now at 5.15%. It looks like others have had "trouble" with this unit before (I am the third to receive it). Is it of value to allow it to continue? At this rate it should be done in 13 days which is beyond the due date. Its app is Rosetta 5.01, running on an Win XP with SP2.

On the other computer, there is also an HBLR_1.0 work unit that appears to have completed successfully by another person but was re-issued to me anyway. Its a much faster computer but has been running for 5 hours and is 2.91% complete.
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13420677


ID: 14678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 14682 - Posted: 26 Apr 2006, 16:03:30 UTC - in response to Message 14678.  
Last modified: 26 Apr 2006, 16:11:52 UTC

I have a condition on two of my computers that I have not seen before.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13402092

Its a HBLR_1.0 unit.

It does not appear to be stuck, just VERY SLOW. It gains about 0.01% per 200 seconds. It has been running for 16 hours 50 minutes and is now at 5.15%. It looks like others have had "trouble" with this unit before (I am the third to receive it). Is it of value to allow it to continue? At this rate it should be done in 13 days which is beyond the due date. Its app is Rosetta 5.01, running on an Win XP with SP2.

On the other computer, there is also an HBLR_1.0 work unit that appears to have completed successfully by another person but was re-issued to me anyway. Its a much faster computer but has been running for 5 hours and is 2.91% complete.
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13420677



Per FAQ on Thread 1453


"3. More aggressive full atom sampling:

HBLR_1.0_xxxx_ROT_TRIALS_TRIE
The final stage Rosetta's folding strategy consists of fine movements that try to fit the protein pieces togeth iner atomic detail (the "fullatom" stage, often abbreviated FA). These simulations use David Baker's latest energy terms (the "HBLR_1.0" refers to the weight on long-range hydrogen bonding) using an aggressive minimization protocol ("rotamer trials") that is made efficient with a neat graph representation within rosetta (the "trie").


So I take it these units are by their nature complex and do require a long of computing time and thus they are sloooooooooooooooooooooow. They will be completed for sure; they will be as exciting as watching two sloths mating and they will test your faith on computer's data processing powers but they will be completed. :)

I cannot tell you why the resent unit was resent.



This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 14682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 14683 - Posted: 26 Apr 2006, 16:11:49 UTC

If it is still doing Model #1 and/or switches between AbInitio/FullAtom (or is in Model #1, with many red dots in lower right RMSD/Energy chart), I'd abort it, as it's probably in an endless loop (this particular bug is a new 5.01 issue...), see:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1447#14324

I see, but I think it's in a endless loop, because

1/ It has "switched" from Full-atom-relax Model 1 / Step 34k+, back to Ab-initio and Model 1 / Step 1 in front of my eyes (I enabled the graphics to monitor it for a while). In the past it was always Ab-initio -> Full-atom-relax -> done this model, process next Model, right?

Oddly, the WU graphics show several (~14) red-dots (energy min), which afaik should mean that at least 14 Models were processed. But Model # remains at 1.

2/ It's been running for 15.5hr already (my time setting is 8hr/WU) on a P4 CPU which has never exceeded 4hr/model in the past.

Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 14683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 14692 - Posted: 26 Apr 2006, 17:55:25 UTC - in response to Message 14682.  

I have a condition on two of my computers that I have not seen before.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13402092

Its a HBLR_1.0 unit.

It does not appear to be stuck, just VERY SLOW. It gains about 0.01% per 200 seconds. It has been running for 16 hours 50 minutes and is now at 5.15%. It looks like others have had "trouble" with this unit before (I am the third to receive it). Is it of value to allow it to continue? At this rate it should be done in 13 days which is beyond the due date. Its app is Rosetta 5.01, running on an Win XP with SP2.

On the other computer, there is also an HBLR_1.0 work unit that appears to have completed successfully by another person but was re-issued to me anyway. Its a much faster computer but has been running for 5 hours and is 2.91% complete.
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13420677



Per FAQ on Thread 1453


"3. More aggressive full atom sampling:

HBLR_1.0_xxxx_ROT_TRIALS_TRIE
The final stage Rosetta's folding strategy consists of fine movements that try to fit the protein pieces togeth iner atomic detail (the "fullatom" stage, often abbreviated FA). These simulations use David Baker's latest energy terms (the "HBLR_1.0" refers to the weight on long-range hydrogen bonding) using an aggressive minimization protocol ("rotamer trials") that is made efficient with a neat graph representation within rosetta (the "trie").


So I take it these units are by their nature complex and do require a long of computing time and thus they are sloooooooooooooooooooooow. They will be completed for sure; they will be as exciting as watching two sloths mating and they will test your faith on computer's data processing powers but they will be completed. :)

I cannot tell you why the resent unit was resent.




WOUZA !!!! I processed a 2 hour one of these on RALPH and it finished wo error. ( Insert Dancing Emotie)

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 14692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 14714 - Posted: 27 Apr 2006, 2:53:16 UTC
Last modified: 27 Apr 2006, 2:54:48 UTC

1 day 4hrs and still going... currently at 8.31% complete. Is there a timer on these units that if they take too long they self abort? App is Rosetta 5.01
ID: 14714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 14716 - Posted: 27 Apr 2006, 3:28:34 UTC - in response to Message 14714.  

1 day 4hrs and still going... currently at 8.31% complete. Is there a timer on these units that if they take too long they self abort? App is Rosetta 5.01


That definitely looks like one of those WUs that creeps foward in percent complete, but which is actually in an endless loop and will never finish.

I suspect it will auto-abort eventually, but that could take several days.

Once we get the new client with the watchdog (maybe early next week) stuck WUs should auto-end after about an hour.
ID: 14716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : LONG..... work unit



©2024 University of Washington
https://www.bakerlab.org