Report stuck & aborted 5.01 WU here please - III

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Hassan

Send message
Joined: 7 Mar 06
Posts: 4
Credit: 750,146
RAC: 0
Message 14784 - Posted: 27 Apr 2006, 18:57:32 UTC
Last modified: 27 Apr 2006, 18:59:11 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=18122796

2006-04-26 21:35:57 [rosetta@home] Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
2006-04-26 22:23:18 [rosetta@home] Aborting result HBLR_1.0_1dtj_420_3452_3: exceeded CPU time limit 130783.200508
2006-04-26 22:23:18 [rosetta@home] Unrecoverable error for result HBLR_1.0_1dtj_420_3452_3 (Maximum CPU time exceeded)
2006-04-26 22:23:19 [---] request_reschedule_cpus: process exited
2006-04-26 22:23:19 [rosetta@home] Computation for result HBLR_1.0_1dtj_420_3452_3 finished
2006-04-26 22:23:19 [rosetta@home] Starting result PROD_ABINITIO_ALPHABETABAR_1tul__447_85936_0 using rosetta version 501

Mine it appears auto-aborted, thats almost 1200 claimed credit 0 granted, so is that a waste or do I get still get credit.

Result ID 18122796
Name HBLR_1.0_1dtj_420_3452_3
Workunit 13389346
Created 24 Apr 2006 15:15:38 UTC
Sent 24 Apr 2006 20:33:06 UTC
Received 27 Apr 2006 5:24:19 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -177 (0xffffff4f)
Computer ID 175797
Report deadline 8 May 2006 20:33:06 UTC
CPU time 130784.75
stderr out <core_client_version>5.2.13</core_client_version>
<message>Maximum CPU time exceeded
</message>
<stderr_txt>
# random seed: 1597253
# cpu_run_time_pref: 7200
# random seed: 1597253
# random seed: 1597253

</stderr_txt>


Validate state Invalid
Claimed credit 1165.54456988046
Granted credit 0
application version 5.01

ID: 14784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 14787 - Posted: 27 Apr 2006, 19:08:30 UTC - in response to Message 14784.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=18122796


Mine it appears auto-aborted, thats almost 1200 claimed credit 0 granted, so is that a waste or do I get still get credit.

Result ID 18122796
Name HBLR_1.0_1dtj_420_3452_3
Workunit 13389346


That's an amazing WU. It was first send out April 6th and after the deadline was reached without a result it was three more times sent out and all three times failed. It was then VALID returned from the first host with a reported runtime from only 6800 seconds and with app 4.83. Whoppa!
ID: 14787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]Division_Brabant~OldButNotSoWise
Avatar

Send message
Joined: 23 Jan 06
Posts: 42
Credit: 371,797
RAC: 0
Message 14794 - Posted: 27 Apr 2006, 20:36:28 UTC
Last modified: 27 Apr 2006, 20:46:51 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=18190274
https://boinc.bakerlab.org/rosetta/result.php?resultid=18190275

both run more then 15 hours and claimed to be at 5 %

I learned my lesson after a job crunched for more then 5 days and end with a CPU time exceeded error, so I aborted the jobs.

This happens more and more, very anoying.



ID: 14794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [DPC]FOKschaap~_mcintosh_
Avatar

Send message
Joined: 4 Dec 05
Posts: 5
Credit: 118,303
RAC: 0
Message 14798 - Posted: 27 Apr 2006, 22:16:13 UTC
Last modified: 27 Apr 2006, 22:17:05 UTC

AB_CASP6_t216__458_807 stuck at 1% aborted

AB_CASP6_t216__456_2307 stuck at 1%, but it seems that the job is succesfully crunched by another user.

This is strange because i have a much faster CPU than the one it was completed with, and he finished in 9,834.69 sec. and mine was still at 1% after 5,698.12
ID: 14798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ackiss

Send message
Joined: 13 Apr 06
Posts: 1
Credit: 607,960
RAC: 0
Message 14803 - Posted: 28 Apr 2006, 0:06:52 UTC

4/27/2006 7:49:18 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1a68__351_24311_2 (aborted via GUI RPC)


5.01 error. Had over 85 hours cpu time and claimed to be just over 11% done. Should've been watching more closely. I wasn't aware there was a problem until I checked the news.

Ackiss
ID: 14803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mewbysea

Send message
Joined: 29 Jan 06
Posts: 17
Credit: 15,880,002
RAC: 3,012
Message 14809 - Posted: 28 Apr 2006, 0:49:13 UTC

Aborted this workunit ID 12454830. Two other computers failed to return this WU, or any others, so they may *still* be crunching it! See HB_BARCODE_30_1ctf_351_39751

Result ID 18270294

Stopped at 20:05 hours and 8..518%

Run on an HP D530, Pentium 4 @ 2.66 GHz (stock), under WIN XP Home.

ID: 14809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Delk

Send message
Joined: 20 Feb 06
Posts: 25
Credit: 995,624
RAC: 0
Message 14811 - Posted: 28 Apr 2006, 1:23:16 UTC

I thought FACONTACTS_RECENTER* were meant to be removed from the workunit queue on your end? Why are known bad units still being sent out? This one is now back in queue for the next lucky recipient.


name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803_1
WU name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803
project URL: https://boinc.bakerlab.org/rosetta/
report deadline: Thu May 11 13:42:38 2006
app version num: 501
checkpoint CPU time: 38443.460000
current CPU time: 40928.280000
fraction done: 0.016453
VM usage: 0.000000
resident set size: 0.000000
estimated CPU time remaining: 67723.265742
supports graphics: no


https://boinc.bakerlab.org/rosetta/result.php?resultid=18357585
ID: 14811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14816 - Posted: 28 Apr 2006, 2:56:45 UTC
Last modified: 28 Apr 2006, 3:11:30 UTC

Help has arrived.

Version 5.06 has been released, This should put an end to the hangs and long run times that have been a problem to many of you reporting in this thread.

Please do not abort your current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded for these failed Work units, and the answer is yes.

The credit is awarded on Fridays for failed Work Units.

For errors resulting from Rosetta Version 5.01, continue to report here.
For errors relating to Rosetta Version 5.06 report here

For information on the new Version and what it is supposed to do see this post.
For a message from Dr. Baker about the work unit runs in preparation for CASP see his journal entry here.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mhhall

Send message
Joined: 28 Mar 06
Posts: 7
Credit: 10,193,127
RAC: 2
Message 14817 - Posted: 28 Apr 2006, 3:03:54 UTC - in response to Message 14816.  

[snip]

Please do not abort you current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded of these failed Work units, and the answer is yes.
[snip]

My current work unit has been running since Monday eve. and shows progress
at 5.90% and CPU time of 148:47:21. Does this meet the criteria of
"running well".

I don't mind letting this process continue to run.....
I'm just worried that its not getting finish anyway....

Mike Hall / Engineering Solutions, Inc.
ID: 14817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14818 - Posted: 28 Apr 2006, 3:06:18 UTC - in response to Message 14817.  

[snip]

Please do not abort you current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded of these failed Work units, and the answer is yes.
[snip]

My current work unit has been running since Monday eve. and shows progress
at 5.90% and CPU time of 148:47:21. Does this meet the criteria of
"running well".

I don't mind letting this process continue to run.....
I'm just worried that its not getting finish anyway....

Mike Hall / Engineering Solutions, Inc.

If it has been running longer than four (4) times your run time preference setting then abort it.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 14827 - Posted: 28 Apr 2006, 5:31:03 UTC
Last modified: 28 Apr 2006, 5:46:19 UTC

I just realized there was a thread to even report this. I'd have to go through all my results just to see which and when. I'm new to rosetta and figured this out when all my team was having the same issues with the same WU's. Checked previous results on the WU's and no one had finished. I know it was at least 8 WU's for my server host alone. Do I need to pull up all the facts for you?

-edit-
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14041906
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14925740
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14560586
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13409472
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408713
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408400
Hope I listed them correctly. Again this is one user. But I guess its their job to report theirs. The ones with the lower cpu time with abort were stuck at 1% for approx 1+hrs. to my recollection.
-edit-

Sincerely,

F. Bulson
Endonet Inc.

ID: 14827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Team_Elteor_Borislavj~Intelligence

Send message
Joined: 7 Dec 05
Posts: 14
Credit: 56,027
RAC: 0
Message 14830 - Posted: 28 Apr 2006, 7:17:28 UTC

I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04%
HBLR_1.0_1ogw_420_6744_2


ID: 14830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anbrager

Send message
Joined: 21 Nov 05
Posts: 3
Credit: 535,292
RAC: 0
Message 14835 - Posted: 28 Apr 2006, 7:54:31 UTC

Hi,

have aborted Result ID 18179436 "HBLR_1.0_1b72_ROT_TRIALS_TRIE_449_13_1"
after 10,5 hours at 5%.
ID: 14835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Team_Elteor_Borislavj~Intelligence

Send message
Joined: 7 Dec 05
Posts: 14
Credit: 56,027
RAC: 0
Message 14841 - Posted: 28 Apr 2006, 8:54:45 UTC - in response to Message 14830.  

I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04%
HBLR_1.0_1ogw_420_6744_2


Can i get my credit for it?
https://boinc.bakerlab.org/rosetta/result.php?resultid=18194769
ID: 14841 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 14843 - Posted: 28 Apr 2006, 9:21:58 UTC
Last modified: 28 Apr 2006, 9:25:12 UTC

Hi, I just aborted FA_RLXpt_hom003_1ptq__361_16_3 . It was running for more than 12 hours, while my runtime preference is default (4 hours). Also Rhiju suggested in this post that WUs with 1ptq in the title should be aborted. Why I had WU that was told to be wrong like 2 weeks ago?
ID: 14843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14854 - Posted: 28 Apr 2006, 12:13:31 UTC
Last modified: 28 Apr 2006, 12:16:58 UTC

Help has arrived.

Version 5.06 has been released, This should put an end to the hangs and long run times that have been a problem to many of you reporting in this thread.

Please do not abort your current 5.01 Work Units if they are runing well. The science from them is still important to the project. If a Work Unit has run longer than 4 times your preferences "Time" setting, then it should be aborted.

Many of you have asked if credit will be awarded for these failed Work units, and the answer is yes. The credit is awarded on Fridays for failed Work Units.

For errors resulting from Rosetta Version 5.01, continue to report here.
For errors relating to Rosetta Version 5.06 report here

For information on the new Version and what it is supposed to do see this post.
For a message from Dr. Baker about the work unit runs in preparation for CASP see his journal entry here.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 14866 - Posted: 28 Apr 2006, 13:23:48 UTC

Going nowhere, the last 5.01 on this box, 38 minutes for 1.06% and a rising completion time:

18432562
Name AB_CASP6_t216__456_5100_0
Workunit 15217912
Created 27 Apr 2006 16:50:33 UTC
Sent 27 Apr 2006 21:04:16 UTC
Received 28 Apr 2006 13:20:57 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 190714
Report deadline 11 May 2006 21:04:16 UTC
CPU time 2293.640625
ID: 14866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14867 - Posted: 28 Apr 2006, 13:29:22 UTC - in response to Message 14866.  

Going nowhere, the last 5.01 on this box, 38 minutes for 1.06% and a rising completion time:

18432562
Name AB_CASP6_t216__456_5100_0
Workunit 15217912
Created 27 Apr 2006 16:50:33 UTC
Sent 27 Apr 2006 21:04:16 UTC
Received 28 Apr 2006 13:20:57 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 190714
Report deadline 11 May 2006 21:04:16 UTC
CPU time 2293.640625


It is normal for the completion time to rise between checkpoints

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 14883 - Posted: 28 Apr 2006, 16:26:01 UTC

Result ID 18229643
Name HBLR_1.0_1hz6_ROT_TRIALS_TRIE_449_43_2
Workunit 14630563
Created 25 Apr 2006 17:03:34 UTC
Sent 25 Apr 2006 17:10:07 UTC
Received 28 Apr 2006 16:24:09 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 168844
Report deadline 9 May 2006 17:10:07 UTC
CPU time 101762.765625

ID: 14883 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
K1100LTSE
Avatar

Send message
Joined: 28 Feb 06
Posts: 7
Credit: 192,387
RAC: 0
Message 14886 - Posted: 28 Apr 2006, 16:41:06 UTC


ID: 14886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III



©2024 University of Washington
https://www.bakerlab.org