Help us solve the 1% bug!

Message boards : Number crunching : Help us solve the 1% bug!

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13198 - Posted: 8 Apr 2006, 0:59:19 UTC - in response to Message 13195.  

PS: AFAIK, the only info needed when reporting a stuck WU, is just WU number e.g. #11819500 in this case (or just its name). If you just abort it, the project will also know the random-seed (it shows in stderr.txt output in resultid)


I believe they would like to know the exact percentage complete that the WU was stuck at.


yes, we need to know this. the name of the work unit is also helpful as we can then see at a glance whether particular types of work units are having the most problems.
ID: 13198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Corgi
Avatar

Send message
Joined: 17 Oct 05
Posts: 2
Credit: 389,209
RAC: 0
Message 13233 - Posted: 8 Apr 2006, 14:26:17 UTC - in response to Message 13198.  

PS: AFAIK, the only info needed when reporting a stuck WU, is just WU number e.g. #11819500 in this case (or just its name). If you just abort it, the project will also know the random-seed (it shows in stderr.txt output in resultid)


I believe they would like to know the exact percentage complete that the WU was stuck at.


yes, we need to know this. the name of the work unit is also helpful as we can then see at a glance whether particular types of work units are having the most problems.


Heh, I'd rather be providing too much than too little. Would you remind me where I can find the ID # and date for any suspect workunits again, please?

Corgi

ID: 13233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Desti

Send message
Joined: 16 Sep 05
Posts: 50
Credit: 3,018
RAC: 0
Message 13250 - Posted: 8 Apr 2006, 17:08:16 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=16615299

2 hours CPU time and still at 1%, i will abort it now.
LUE
ID: 13250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 13256 - Posted: 8 Apr 2006, 17:57:31 UTC - in response to Message 13233.  

Heh, I'd rather be providing too much than too little. Would you remind me where I can find the ID # and date for any suspect workunits again, please?


I go to the particular PC's host-page on Rosetta's Boinc server, i.e. in your case it'd be https://boinc.bakerlab.org/rosetta/results.php?hostid=23940 and click on the "Work Unit ID" to see which has the name you see on your BOINC. Obviously it's not easy to do this, if one has 50 nameless PCs and/or his PC downloads 30 WUs at a time.

Probably just reporting the WU name e.g. HBLR_1.0_1di2_425... etc along with % done (or Model/Step #s?) it got stuck, is the as useful and easier afterall, as others suggested.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 13256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 13304 - Posted: 9 Apr 2006, 4:38:33 UTC - in response to Message 13250.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=16615299

2 hours CPU time and still at 1%, i will abort it now.

Not all WUs that run for a few hours at 1% are failing or hung. There is more information in the FAQs on this subject and how to tell if the WU is hung. You can get to the FAQs from the link in my signature below.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 13304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Number crunching : Help us solve the 1% bug!



©2024 University of Washington
https://www.bakerlab.org