Report stuck & aborted 5.01 WU here please - III

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
borkescomputers

Send message
Joined: 3 Apr 06
Posts: 2
Credit: 5,910,922
RAC: 0
Message 14232 - Posted: 21 Apr 2006, 6:39:21 UTC

21-4-2006 8:33:35|rosetta@home|Unrecoverable error for result HBLR_1.0_1di2_420_5587_1 (aborted via GUI RPC)

why: cause the job runned for like 5 hours, i looked at the gui and i saw i ran in a loop, after the atom relax stage it turned in the ab initio stage, same model number, same %.

https://boinc.bakerlab.org/rosetta/result.php?resultid=17754739
ID: 14232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14234 - Posted: 21 Apr 2006, 6:51:25 UTC
Last modified: 21 Apr 2006, 7:16:14 UTC

AS the older threads were getting very long I have started a new thread. The old thread can be found here. This thread is for reporting errors related to Version 5.01 or greater. Please provide a link to the result id in your report.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 14264 - Posted: 21 Apr 2006, 14:27:35 UTC

I have 4 long runners running here Norm run time is 2 Hrs +/-

4/21/2006 2:01:44 AM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1bgf__448_895_1 using rosetta version 501 1.52% @6hr


4/20/2006 11:29:01 PM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1r69__448_100_1 using rosetta version 501 2.44% @9.2hr

4/21/2006 2:49:55 AM|rosetta@home|Starting result HBLR_1.0_1hz6_420_4593_1 using rosetta version 501 3.68% @ 6.1 hr

4/21/2006 2:33:19 AM|rosetta@home|Starting result HBLR_1.0_1mky_420_5056_1 using rosetta version 501==4.11% @ 6.7hr

I am going to work I will check on them again later today to see what is up with them
If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 14264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pb

Send message
Joined: 30 Nov 05
Posts: 6
Credit: 65,632
RAC: 0
Message 14269 - Posted: 21 Apr 2006, 15:22:12 UTC - in response to Message 14234.  

AS the older threads were getting very long I have started a new thread. The old thread can be found here. This thread is for reporting errors related to Version 5.01 or greater. Please provide a link to the result id in your report.



I had 2 long runners, which I aborted:

https://boinc.bakerlab.org/rosetta/result.php?resultid=17777186

https://boinc.bakerlab.org/rosetta/result.php?resultid=17555605 [this one was restarting over and over again, so i've aborted it after a restart so it have 129 seconds, but in fact it was about 10 hours]

Will i get a credit for the first one?

PS. here is one more:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14105384
ID: 14269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14270 - Posted: 21 Apr 2006, 15:25:10 UTC - in response to Message 14269.  
Last modified: 21 Apr 2006, 15:27:32 UTC

...I had 2 long runners, which I aborted:

https://boinc.bakerlab.org/rosetta/result.php?resultid=17777186

https://boinc.bakerlab.org/rosetta/result.php?resultid=17555605 [this one was restarting over and over again, so i've aborted it after a restart so it have 129 seconds, but in fact it was about 10 hours]

Will i get a credit for the first one?

PS. here is one more:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14105384

Please see this post.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 14270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 14303 - Posted: 21 Apr 2006, 21:06:04 UTC - in response to Message 14264.  

I have 4 long runners running here Norm run time is 2 Hrs +/-

4/21/2006 2:01:44 AM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1bgf__448_895_1 using rosetta version 501 1.52% @6hr


4/20/2006 11:29:01 PM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1r69__448_100_1 using rosetta version 501 2.44% @9.2hr

4/21/2006 2:49:55 AM|rosetta@home|Starting result HBLR_1.0_1hz6_420_4593_1 using rosetta version 501 3.68% @ 6.1 hr

4/21/2006 2:33:19 AM|rosetta@home|Starting result HBLR_1.0_1mky_420_5056_1 using rosetta version 501==4.11% @ 6.7hr

I am going to work I will check on them again later today to see what is up with them


Well back after 7 Hr and they are all still running all up 0.05% at this rate it will take over 200 Hrs to complete I am Aborting all with about 12 to 15 Hrs

I will try to post a link later after the UL

If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 14303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
biodoc

Send message
Joined: 19 Feb 06
Posts: 14
Credit: 30,298,003
RAC: 7,011
Message 14310 - Posted: 21 Apr 2006, 22:16:18 UTC

I just abort 2 "long runners" (17+ hours; about 2% complete)

Result ID 17758781
Name FACONTACTS_RECENTER_NOFILTERS_1scjB_448_993_1

Result ID 17757908
Name FACONTACTS_RECENTER_NOFILTERS_1tit__448_560_1


ID: 14310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14340 - Posted: 22 Apr 2006, 4:54:59 UTC - in response to Message 14310.  

I'm stopping the sendout of these jobs. For some machines,
there appears to be an incompatibility of 5.01 with these
jobs. Please abort FACONTACTS_RECENTER jobs and HBLR1.0 jobs.

I just abort 2 "long runners" (17+ hours; about 2% complete)

Result ID 17758781
Name FACONTACTS_RECENTER_NOFILTERS_1scjB_448_993_1

Result ID 17757908
Name FACONTACTS_RECENTER_NOFILTERS_1tit__448_560_1



ID: 14340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
K1100LTSE
Avatar

Send message
Joined: 28 Feb 06
Posts: 7
Credit: 192,387
RAC: 0
Message 14343 - Posted: 22 Apr 2006, 5:19:06 UTC

I just abort 2 "long runners"
1.FACONTACTS_RECENTER_NOFILTERS_1cei__448_30_1
7.35 hour
https://boinc.bakerlab.org/rosetta/result.php?resultid=17753170

2.PROD_ABINITIO_7STRANDBAR_1tul__447_35477_0
5.05 hour
https://boinc.bakerlab.org/rosetta/result.php?resultid=17757187


ID: 14343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 14344 - Posted: 22 Apr 2006, 5:52:58 UTC

FACONTACTS_RECENTER_NOFILTERS_1vls__448_282_1

<core_client_version>5.3.12.tx36</core_client_version>
<message>aborted by user
</message>
<stderr_txt>
# random seed: 2488779
# random seed: 2488779
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# random seed: 2488779

</stderr_txt>

Aborted after 12 hours on a 2600xp and was at 1.8 percent.
ID: 14344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14345 - Posted: 22 Apr 2006, 5:54:45 UTC - in response to Message 14343.  

Actually the second result returned a lot of data! Cool.
You shouldn't get any more of the first kind of workunit -- please abort
any you have in your queue, and post here if you get more. Thanks!

I just abort 2 "long runners"
1.FACONTACTS_RECENTER_NOFILTERS_1cei__448_30_1
7.35 hour
https://boinc.bakerlab.org/rosetta/result.php?resultid=17753170

2.PROD_ABINITIO_7STRANDBAR_1tul__447_35477_0
5.05 hour
https://boinc.bakerlab.org/rosetta/result.php?resultid=17757187



ID: 14345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B^S] Gamma^Ray
Avatar

Send message
Joined: 20 Apr 06
Posts: 12
Credit: 21,284
RAC: 0
Message 14348 - Posted: 22 Apr 2006, 6:48:42 UTC


Just aborted a "Long Runner" after 8:17:39 Cpu at around 1.8% Complete.

FACONTACTS_RECENTER_NOFILTERS_1elwA_448_330_1
Result ID 17832302
Workunit 14535551
ID: 14348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XS_DDT's_Cattle_Prods

Send message
Joined: 24 Mar 06
Posts: 12
Credit: 1,180,072
RAC: 0
Message 14349 - Posted: 22 Apr 2006, 7:09:36 UTC
Last modified: 22 Apr 2006, 7:10:13 UTC

OK, I'm running Ubuntu 64-bit, and I've got a HBLR_1.0_1mky_420_6443_1 that's been running for 16:56 with 14.69% done, and a FACONTACTS_RECENTER_NOFILTERS that's been running for 10:59 with 3.49% done. Should I abort them, or will they finish sometime soon?

This is node # 207728
ID: 14349 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Delk

Send message
Joined: 20 Feb 06
Posts: 25
Credit: 995,624
RAC: 0
Message 14351 - Posted: 22 Apr 2006, 7:25:44 UTC

name: FACONTACTS_RECENTER_NOFILTERS_1gvp__448_830_1
WU name: FACONTACTS_RECENTER_NOFILTERS_1gvp__448_830
project URL: https://boinc.bakerlab.org/rosetta/
report deadline: Fri May 5 10:23:27 2006
stderr_out: app version num: 501
checkpoint CPU time: 72062.600000
current CPU time: 72824.870000
fraction done: 0.024063
VM usage: 0.000000
resident set size: 0.000000
estimated CPU time remaining: 98151.432619

https://boinc.bakerlab.org/rosetta/result.php?resultid=17756235

Manually aborted, meant to be a 8hr workunit obviously stuck, noticed this was a recycled unit someone else aborted previously.
ID: 14351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 14360 - Posted: 22 Apr 2006, 8:45:10 UTC
Last modified: 22 Apr 2006, 8:48:51 UTC

Ok here is a Small list of all the LONG Runners I Abortd Today
I am sure it toped over 200 Hrs

https://boinc.bakerlab.org/rosetta/result.php?resultid=17751455

https://boinc.bakerlab.org/rosetta/result.php?resultid=17768645

https://boinc.bakerlab.org/rosetta/result.php?resultid=17586899

https://boinc.bakerlab.org/rosetta/result.php?resultid=16971548

https://boinc.bakerlab.org/rosetta/result.php?resultid=17752270

https://boinc.bakerlab.org/rosetta/result.php?resultid=17751466

https://boinc.bakerlab.org/rosetta/result.php?resultid=17786078

https://boinc.bakerlab.org/rosetta/result.php?resultid=17757931

https://boinc.bakerlab.org/rosetta/result.php?resultid=17774722

https://boinc.bakerlab.org/rosetta/result.php?resultid=17754712

https://boinc.bakerlab.org/rosetta/result.php?resultid=17760449

https://boinc.bakerlab.org/rosetta/result.php?resultid=17748955

https://boinc.bakerlab.org/rosetta/result.php?resultid=17763107

https://boinc.bakerlab.org/rosetta/result.php?resultid=17764880

https://boinc.bakerlab.org/rosetta/result.php?resultid=17757632

https://boinc.bakerlab.org/rosetta/result.php?resultid=17721400

https://boinc.bakerlab.org/rosetta/result.php?resultid=17764855

https://boinc.bakerlab.org/rosetta/result.php?resultid=17755473

If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 14360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tallguy-13088

Send message
Joined: 14 Dec 05
Posts: 9
Credit: 843,378
RAC: 0
Message 14373 - Posted: 22 Apr 2006, 13:44:56 UTC

Folks,

Just as an FYI at this point, I have two "long runners" on two seperate machines. The first is HBLR_1.0_1djt_420_4640_1 running on a Dual Xeon P4 @ 2.8Ghz. The numbers are CPU_Time: 18.49 Hrs, Complete: 7.68% and To Completion: 20.12 Hrs. The second is HBLR_1.0_1n0u_420_9492_1 running on a 3.2 Ghz P4 with CPU_Time: 19.26 Hrs, Complete: 7.28% and To Completion: 20.45. Both are version 5.01

The graphics seem to be updating and both work units are apparently making forward progress (tons of data points). Both appear to swapping between 2+ seconds per step and then the normal "fast" multiple steps per second. Obviously the green data points are the "fast steppers".

I plan on running these up to about 50 hours apiece before aborting them (if they appear to require significantly more time). My goal is to give you guys as much diagnostic info as possible which translates into run-time.
ID: 14373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XS_DDT's_Cattle_Prods

Send message
Joined: 24 Mar 06
Posts: 12
Credit: 1,180,072
RAC: 0
Message 14378 - Posted: 22 Apr 2006, 14:45:15 UTC - in response to Message 14349.  
Last modified: 22 Apr 2006, 14:45:38 UTC

OK, I'm running Ubuntu 64-bit, and I've got a HBLR_1.0_1mky_420_6443_1 that's been running for 16:56 with 14.69% done, and a FACONTACTS_RECENTER_NOFILTERS_1who_448_991_1 that's been running for 10:59 with 3.49% done. Should I abort them, or will they finish sometime soon?

This is node # 207728


OK, these are still going, and I can't check graphics in linux, so I'm aborting.
FACONTACTS_RECENTER_NOFILTERS_1who_448_991_1 & HBLR_1.0_1mky_420_6443_1. HBLR ran for 24:09:13 and FACONTACTS was at 18:12:00.

https://boinc.bakerlab.org/rosetta/result.php?resultid=17758649 - this is the HBLR
https://boinc.bakerlab.org/rosetta/result.php?resultid=17758516 - This is the FACONTACTS
ID: 14378 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 14379 - Posted: 22 Apr 2006, 14:45:58 UTC
Last modified: 22 Apr 2006, 15:33:49 UTC

Two long-runners here, both churned from previous machines:

HBLR_1.0_1hz6_420_9949 previously aborted on a PC and a Mac, 3h 40m for 4% so far, graphics normal.

NO_TERM_STRAND_1ogw_423_8642 previously time limit expired, 4h 45m for 4.64%, graphics very slow.

I'll let 'em run if they stay running.

Edit: The Turing Halting Problem made real. Smart chap, A. Turing.
ID: 14379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 14383 - Posted: 22 Apr 2006, 15:27:26 UTC

And another one the same 2600xp
Name FACONTACTS_RECENTER_NOFILTERS_1ail__448_912_1
Workunit 14575111

Funny thing is there are 2 running on my opty 165 right now and they seem to be doing just fine.
ID: 14383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grutte Pier [Wa Oars]~Ytsmabeer

Send message
Joined: 10 Nov 05
Posts: 2
Credit: 100,205
RAC: 0
Message 14385 - Posted: 22 Apr 2006, 15:50:05 UTC

When I come back of dinner Iam going to Abort a long runner 17 hours 14%.

Would be nice to get points.
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13422021

ID: 14385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III



©2024 University of Washington
https://www.bakerlab.org