Message boards : Number crunching : Report stuck & aborted WU here please - II
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next
| Author | Message |
|---|---|
dagSend message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
12 hours, 1% - Linux https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13955810 TRUNCATE_TERMINI_FULLRELAX_1enh__433_645 10 hours, 1% - Windoz https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13960661 TRUNCATE_TERMINI_FULLRELAX_1ptq__433_697 dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
adrianxwSend message Joined: 18 Sep 05 Posts: 662 Credit: 12,167,519 RAC: 0 |
Find a post that does a link, then click on "reply to this post" for that post. Look at the quoted text in the editing window and it will show how they did it. You're right, it does. I'd not noticed that before! Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
rbpeakeSend message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
Pardon my ignorance, but how does one technically do a link? Thanks! In BBCode you use the opening and closing "square brackets" characters, "[" and "]". Thank you both very much! I have saved these responses for my future reference! Regards, Bob P. |
JDHalterSend message Joined: 3 Nov 05 Posts: 13 Credit: 722,679 RAC: 0 |
Here's another 1% hang...again at 1.04%...on a 3rd machine. https://boinc.bakerlab.org/rosetta/result.php?resultid=17036887 |
|
cwangersky Send message Joined: 6 Nov 05 Posts: 6 Credit: 325,556 RAC: 0 |
Here's an odd one... THank you -- I'll give that a try. |
|
Robert J Send message Joined: 7 Oct 05 Posts: 3 Credit: 397,467 RAC: 0 |
This work unit was stuck at 1.04% for over six hours. Windows XP SP2. TRUNCATE_TERMINI_FULLRELAX_1ptq__433_663_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=17026441
|
|
RC Send message Joined: 27 Sep 05 Posts: 13 Credit: 262,048 RAC: 0 |
This unit stuck at 1.04% for 5.5 hours on Linux with Rosetta 4.98: TRUNCATE_TERMINI_FULLRELAX_1enh__433_593_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=17018950 |
|
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Just a reminder to those who are posting stuck WU's -- please abort the 4 work units below. We know why they're hanging, are not sending out anymore, and are giving credits to any of these jobs that have timed out! Thanks. Found a bug! David Baker and I just tracked down the problem with these 4 workunits. Its a stupid infinite loop that only occurs with proteins with lengths of exactly 44 residues using one particular mode of Rosetta -- somehow no one in our group had ever looked at a protein exactly that size! So TallGuy-13088, you predicted right ... |
|
Bill Hepburn Send message Joined: 18 Sep 05 Posts: 14 Credit: 14,975,271 RAC: 0 |
This one stuck at 1.04% for over 13 hours. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13967978 |
|
RC Send message Joined: 27 Sep 05 Posts: 13 Credit: 262,048 RAC: 0 |
Another one (almost 6 hours at 1.04% on Mac OS X) - aborted. https://boinc.bakerlab.org/rosetta/result.php?resultid=17050793 |
|
K1100LTSE Send message Joined: 28 Feb 06 Posts: 7 Credit: 192,387 RAC: 0 |
abort by gui Windows 20 hours, 1.043% https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13969518
|
|
K1100LTSE Send message Joined: 28 Feb 06 Posts: 7 Credit: 192,387 RAC: 0 |
abort by gui windows 20.15 Hour, 1.042% https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13977888
|
|
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
Just a reminder to those who are posting stuck WU's -- please abort the 4 work units below. We know why they're hanging, are not sending out anymore, and are giving credits to any of these jobs that have timed out! Thanks. Random, I suppose. I have let a Wu go on (graphics were moving) so it timed out or it aborted itself and no credits. Can't be positive anymore about this project. Have run it for about 5 months and another fortnight and than it's over. |
|
Chilcotin Send message Joined: 5 Nov 05 Posts: 15 Credit: 16,969,500 RAC: 0 |
Workunit aborted after 23 hours. Stuck at 1.04 %. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13940837 Link edit: looks like this may be one of the 4 already flagged in the postings above ... |
|
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
The 1.4 stalls are still coming I am vary tired of aborting them and losing the tens of thousands of points that are NOT granted in wasted CPU time. If this project is going to keep letting out BAD work WU's. Rosetta need to find a way to purge these Bad WU's from there servers when they are found to cause problems like these have. And / or send commands to the users client to delete or abort the Bad WU's on any upload / download to the Rosetta servers. To keep all the bad WU's in the system or on the Rosetta servers and forcing us to run them to purge them them from the Rosetta system is unfair to us and does damage to the project reputation. if this continue with out relief people will start to abandon this project If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
|
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
The 1.4 stalls are still coming I am vary tired of aborting them and losing the tens of thousands of points that are NOT granted in wasted CPU time. again, we are very sorry for the problems of the recent days. we have spent most of today taking steps to ensure that these problems do not occur again. all the problem work units have been cancelled, and everything should be back to normal very soon (once the jobs that have already been downloaded have left your machines). since CASP is starting soon, and many of the proteins will be larger, we wanted to do some calculatoins on a broader range of sizes. before pursuing this much further, we need some way of ensuring that these jobs are only sent out to machines appropriate for them, which is difficult with the current BOINC setup; we hope Rom can help us with this. |
rbpeakeSend message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
since CASP is starting soon, and many of the proteins will be larger, we wanted to do some calculatoins on a broader range of sizes. before pursuing this much further, we need some way of ensuring that these jobs are only sent out to machines appropriate for them, which is difficult with the current BOINC setup; we hope Rom can help us with this. Beta 5.00 under Ralph@home preliminarily seems to be successfully processing work units that had previously failed under earlier versions of Rosetta. So it may not be the machines that are at fault, but the underlying Rosetta software itself (which seems to be on the way to being cleaned up if these early successes continue to hold up). Regards, Bob P. |
|
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
since CASP is starting soon, and many of the proteins will be larger, we wanted to do some calculatoins on a broader range of sizes. before pursuing this much further, we need some way of ensuring that these jobs are only sent out to machines appropriate for them, which is difficult with the current BOINC setup; we hope Rom can help us with this. When you upload information to the server, does it verify who it's coming from, or just blindly accept it, and then process it to see if it came from an actual machine running Rosetta? If Boinc sends a request from hostid=121218 for another workunit, can't the amount of ram (and cpu speed) be looked up from the database that displays this info: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=121218 Oops.. it doesn't list speed.. just the text cpuID. (speed would thus be based on the floating point and integer ratings..) And then use something like this to determine what to send to each machine? (Ram=Ram/number of cpu cores) If hostid(121218).Ram > 750 Megs, then send EvenBiggerRamWU. If hostid(121218).Ram > 500 Megs, then send BigRamWU. If hostid(121218).Ram > 225 Megs, then send NormalWU. |
|
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
But what about the removal of bad WU's from your servers You must set up a way to stop the resending out of the BAD WU's Letting the system purge it self is not right. You have the capability to do auto upgrades you should have the capability to auto abort bad WU;s on client side To let bad WU's run on yours or our system is a BAD THING If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
|
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
But what about the removal of bad WU's from your servers You must set up a way to stop the resending out of the BAD WU's Letting the system purge it self is not right. You have the capability to do auto upgrades you should have the capability to auto abort bad WU;s on client side To let bad WU's run on yours or our system is a BAD THING The bad WU's are removed from our servers, but we can't remove them from your machines. Hopefully there will be no more bad WU's at all so this won't be a problem anymore. |
Message boards :
Number crunching :
Report stuck & aborted WU here please - II
©2026 University of Washington
https://www.bakerlab.org