Report stuck & aborted 5.01 WU here please - III

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 14898 - Posted: 28 Apr 2006, 17:54:56 UTC

Four more 5.01 WUs were aborted this morning

50.1 hrs
https://boinc.bakerlab.org/rosetta/result.php?resultid=18296499
HB_BARCODE_30_5croA_351_21027

51.9 hrs
https://boinc.bakerlab.org/rosetta/result.php?resultid=18296492
HB_BARCODE_30_1a19A_351_28780_3

53.0 hrs
https://boinc.bakerlab.org/rosetta/result.php?resultid=18296362
HBLR_1.0_1dtj_ROT_TRIALS_TRIE_449_27

89.6 hrs
https://boinc.bakerlab.org/rosetta/result.php?resultid=18037119
FA_RLXfn_hom001_1fna__357_63
ID: 14898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XS_DDT's_Cattle_Prods

Send message
Joined: 24 Mar 06
Posts: 12
Credit: 1,180,072
RAC: 0
Message 14945 - Posted: 29 Apr 2006, 1:45:13 UTC

So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time?
ID: 14945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14956 - Posted: 29 Apr 2006, 3:19:54 UTC - in response to Message 14945.  

So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time?


As far as I have been able to determine, the work Units are granted what they claim. If they do not claim any credit (mostly from win 98 machines) them I think they have been getting 30 credits. But I have nothing definitive from the project on this, it is from my own observations, so don't hold the project to these numbers until we hear from them directly.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]Division_Brabant~OldButNotSoWise
Avatar

Send message
Joined: 23 Jan 06
Posts: 42
Credit: 371,797
RAC: 0
Message 14974 - Posted: 29 Apr 2006, 9:20:42 UTC - in response to Message 14945.  

So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time?


Checked that with my error results.

I think that's indeed what happens, this one has crunched for over 5 days.

Result ID 17773392
Name HBLR_1.0_2tif_420_9913_1
Workunit 13429467
Created 20 Apr 2006 21:56:28 UTC
Sent 21 Apr 2006 4:39:59 UTC
Received 26 Apr 2006 19:26:25 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -177 (0xffffff4f)
Computer ID 147219
Report deadline 5 May 2006 4:39:59 UTC
CPU time 369442.734375
stderr out

<core_client_version>5.3.12.tx36</core_client_version>
<message>Maximum CPU time exceeded
</message>
<stderr_txt>
# random seed: 1574792
# cpu_run_time_pref: 7200

</stderr_txt>

Validate state Invalid
Claimed credit 1792.64206533905
Granted credit 300
application version 5.01
ID: 14974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Hassan

Send message
Joined: 7 Mar 06
Posts: 4
Credit: 750,146
RAC: 0
Message 15026 - Posted: 29 Apr 2006, 17:32:21 UTC - in response to Message 14956.  

So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time?


As far as I have been able to determine, the work Units are granted what they claim. If they do not claim any credit (mostly from win 98 machines) them I think they have been getting 30 credits. But I have nothing definitive from the project on this, it is from my own observations, so don't hold the project to these numbers until we hear from them directly.


Result ID 18122796
Name HBLR_1.0_1dtj_420_3452_3
Workunit 13389346
Created 24 Apr 2006 15:15:38 UTC
Sent 24 Apr 2006 20:33:06 UTC
Received 27 Apr 2006 5:24:19 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -177 (0xffffff4f)
Computer ID 175797
Report deadline 8 May 2006 20:33:06 UTC
CPU time 130784.75
stderr out <core_client_version>5.2.13</core_client_version>
<message>Maximum CPU time exceeded
</message>
<stderr_txt>
# random seed: 1597253
# cpu_run_time_pref: 7200
# random seed: 1597253
# random seed: 1597253

</stderr_txt>


Validate state Invalid
Claimed credit 1165.54456988046
Granted credit 300
application version 5.01

ID: 15026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XS_lv_dicedealer

Send message
Joined: 3 Jan 06
Posts: 16
Credit: 1,761,309
RAC: 0
Message 15044 - Posted: 29 Apr 2006, 21:07:31 UTC

Here is another stuck 5.01 WU

https://boinc.bakerlab.org/rosetta/result.php?resultid=18392691

I have since exorcised my farm of the 5.01s and the 5.06s... this one slipped by me though.

Thanks for all you hard work at getting these snags worked out, the R@H team deserves a pat on the back for trying to get this fixed so quickly for us crunchers.
ID: 15044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15070 - Posted: 30 Apr 2006, 5:13:20 UTC - in response to Message 14945.  
Last modified: 30 Apr 2006, 5:15:50 UTC

So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time?



HERE is more information on this question

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile belldandy from pleiades

Send message
Joined: 2 Nov 05
Posts: 6
Credit: 102,731
RAC: 0
Message 15082 - Posted: 30 Apr 2006, 12:07:28 UTC

2 WUs that I aborted because it takes wayyyy to much time, they didn't hang though.

https://boinc.bakerlab.org/rosetta/result.php?resultid=17827510
FACONTACTS_NOFILTERS_1r69__441_248_1

https://boinc.bakerlab.org/rosetta/result.php?resultid=17773776
HBLR_1.0_2tif_420_9927_1
Campeones everywhere!
ID: 15082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]Alexcj

Send message
Joined: 21 Mar 06
Posts: 3
Credit: 8,374
RAC: 0
Message 15319 - Posted: 2 May 2006, 19:31:27 UTC

I have also a WU that is taking WAY to long to complete.
It is progressing though, I would like to see it finished.
It's HB_BARCODE_30_1bm8__351_34196_3
allthough I think it is not going to finish in time.

Is it helpfull for the project to have it progress as much as possible ?

ID: 15319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15329 - Posted: 2 May 2006, 19:57:46 UTC - in response to Message 15319.  

Is it helpfull for the project to have it progress as much as possible ?

Alex, in general, yes, it's helpful. In your case other work units are running 10,000 seconds and that one looks like you aborted after 443,000 seconds! 5+ days! Unless you changed your preference to be 4 days... you were more than patient with that one.

I see it was crunched on release 5.01. The newer release has the "watchdog" and it should find WUs such as this and end them much sooner, thus saving you those days of wondering.

So, you did the right things here. You were patient and didn't end it in a sudden panic after 2 hrs and 1 minute, you reported it here, and you're now crunching more WUs. And I believe you will find the current release has resolved problems like this as well, so it shouldn't happen again.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
McPDR

Send message
Joined: 3 Nov 05
Posts: 5
Credit: 149,760
RAC: 0
Message 15375 - Posted: 3 May 2006, 1:38:35 UTC

Here is one I just aborted after it had spent almost 80 hours trying to process:
https://boinc.bakerlab.org/rosetta/result.php?resultid=17896566
It had been up to about 10% done, then I just rebooted, & it went back down to 1%. 80 hours sure is alot of time...

Thanks, McPDR
ID: 15375 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15383 - Posted: 3 May 2006, 2:07:46 UTC - in response to Message 15375.  

Here is one I just aborted after it had spent almost 80 hours trying to process:
https://boinc.bakerlab.org/rosetta/result.php?resultid=17896566
It had been up to about 10% done, then I just rebooted, & it went back down to 1%. 80 hours sure is alot of time...

Thanks, McPDR

That was run under version 5.01. The new version will prevent this from happening again. In the mean time you will get some credit for the Work.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
McPDR

Send message
Joined: 3 Nov 05
Posts: 5
Credit: 149,760
RAC: 0
Message 15870 - Posted: 11 May 2006, 2:46:43 UTC - in response to Message 15383.  

Here is one I just aborted after it had spent almost 80 hours trying to process:
https://boinc.bakerlab.org/rosetta/result.php?resultid=17896566
It had been up to about 10% done, then I just rebooted, & it went back down to 1%. 80 hours sure is alot of time...

Thanks, McPDR

That was run under version 5.01. The new version will prevent this from happening again. In the mean time you will get some credit for the Work.


Thanks for your help... Unfortunately, I did not receive any credit (out of 620+) for this result that I had to abort. And now, when I check it today, it has been deleted from the system.

Help please!

Thanks, McPDR
ID: 15870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15885 - Posted: 11 May 2006, 4:15:52 UTC - in response to Message 15870.  
Last modified: 11 May 2006, 4:53:12 UTC

Here is one I just aborted after it had spent almost 80 hours trying to process:
https://boinc.bakerlab.org/rosetta/result.php?resultid=17896566
It had been up to about 10% done, then I just rebooted, & it went back down to 1%. 80 hours sure is alot of time...

Thanks, McPDR

That was run under version 5.01. The new version will prevent this from happening again. In the mean time you will get some credit for the Work.


Thanks for your help... Unfortunately, I did not receive any credit (out of 620+) for this result that I had to abort. And now, when I check it today, it has been deleted from the system.

Help please!

Thanks, McPDR

The credit would not have shown up in the stats listing page. It would only have shown in the result for that Work Unit. The project has so far been unable to find a way to display the credit in the stats overview page, because the credit is awarded at the result level. The Maximum credit would have been 300 points. The credit should show in your credit awards from any of the stats sites, and in your project credits. The fact that the Work Unit has been removed from your stats indicates that the process has awarded the credit. The process is run each day.

EDIT: I have just looked at your BOINCstats report. If you follow the link and scroll down the page to the graph graph of daily credit awards you will see a 300 point spike in your stats for May 4. That is most likely the credit granted for the Work Unit in question. A similar spike in credit is shown for the same date in the daily stats charts at the bottom of the same page. That award also jumped you up significantly in the world standings.



Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15885 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
McPDR

Send message
Joined: 3 Nov 05
Posts: 5
Credit: 149,760
RAC: 0
Message 16005 - Posted: 12 May 2006, 4:00:01 UTC - in response to Message 15885.  

Here is one I just aborted after it had spent almost 80 hours trying to process:
https://boinc.bakerlab.org/rosetta/result.php?resultid=17896566
It had been up to about 10% done, then I just rebooted, & it went back down to 1%. 80 hours sure is alot of time...

Thanks, McPDR

That was run under version 5.01. The new version will prevent this from happening again. In the mean time you will get some credit for the Work.


Thanks for your help... Unfortunately, I did not receive any credit (out of 620+) for this result that I had to abort. And now, when I check it today, it has been deleted from the system.

Help please!

Thanks, McPDR

The credit would not have shown up in the stats listing page. It would only have shown in the result for that Work Unit. The project has so far been unable to find a way to display the credit in the stats overview page, because the credit is awarded at the result level. The Maximum credit would have been 300 points. The credit should show in your credit awards from any of the stats sites, and in your project credits. The fact that the Work Unit has been removed from your stats indicates that the process has awarded the credit. The process is run each day.

EDIT: I have just looked at your BOINCstats report. If you follow the link and scroll down the page to the graph graph of daily credit awards you will see a 300 point spike in your stats for May 4. That is most likely the credit granted for the Work Unit in question. A similar spike in credit is shown for the same date in the daily stats charts at the bottom of the same page. That award also jumped you up significantly in the world standings.






Thanks! I do see it now.
ID: 16005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III



©2024 University of Washington
https://www.bakerlab.org