Report stuck & aborted WU here please

Message boards : Number crunching : Report stuck & aborted WU here please

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Delk

Send message
Joined: 20 Feb 06
Posts: 25
Credit: 995,624
RAC: 0
Message 12666 - Posted: 25 Mar 2006, 8:21:38 UTC - in response to Message 12647.  

work-units aborted at 1%:

FA_RLXci_hom029_2ci2I_362_311_0 after 109,513.44 secs
FA_RLXpt_hom002_1ptq__361_439_0 after 207,726.50 secs

maybe its time to stop doing longer work units so I can see when servers haven't reported results in the last few hours...


Result ID's: 14740903 & 14592911
ID: 12666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Team_Elteor_Borislavj~Intelligence

Send message
Joined: 7 Dec 05
Posts: 14
Credit: 56,027
RAC: 0
Message 12670 - Posted: 25 Mar 2006, 9:54:21 UTC

HB_BARCODE_30_4ubpA_351_16734_0 still stuck at 1% after 9 hours of crunching with 100% load!

ID: 12670 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mewbysea

Send message
Joined: 29 Jan 06
Posts: 17
Credit: 15,917,465
RAC: 1,790
Message 12678 - Posted: 25 Mar 2006, 12:31:26 UTC

FA_RLXpt_hom004_1ptq_361_127_0 stuck at 83.81%.
WU ID = 11670028; Result ID = 14405752
PC (153231) = Dell 8400, P4 (HT) 3.2 GHz (stock), WIN XP (SP2)
Aborted after over 30 hours of crunching.


ID: 12678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grutte Pier [Wa Oars]~Nemesis

Send message
Joined: 8 Nov 05
Posts: 3
Credit: 386,730
RAC: 0
Message 12691 - Posted: 25 Mar 2006, 16:14:46 UTC
Last modified: 25 Mar 2006, 16:15:49 UTC

After a bogus WU on one of my pc's that cost me over 300 credits (it was hanging for a long time) I went though all of my WU's. This is a list of all my recent WU's that were aborted with an error:

Intel(R) Pentium(R) M processor 1.73GHz
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11410757
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10507541
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10454400
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10309222
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10644097

AuthenticAMD mobile AMD Athlon(tm) XP 2000+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11665942
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11639527
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11478408

Intel(R) Pentium(R) 4 CPU 1.60GHz (@2.40GHz)
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11045076
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11068185
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11008648
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10993712
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10976761
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10961239
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10961160
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10931034
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10928750
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10808712

AuthenticAMD mobile AMD Athlon(tm) XP-M 2800+ (LV)
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10419709
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10421027
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10438624
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10529395
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10455024
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10417302
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10390604
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10095664
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10064299
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10015662

AuthenticAMD AMD Sempron(tm) Processor 3000+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10387309
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9956247

AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10629816
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10459506
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10283291
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10059896
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9544176
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=5796746
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=5733085

AuthenticAMD AMD Sempron(tm) 2400+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11452627
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11439431
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10345630
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10727823
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10548190

I'm running Rosetta for the medical purpose, but I think there's over 1000 credits in the list above...
ID: 12691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_GeoffS

Send message
Joined: 16 Dec 05
Posts: 2
Credit: 704,640
RAC: 0
Message 12700 - Posted: 25 Mar 2006, 19:56:56 UTC
Last modified: 25 Mar 2006, 19:59:37 UTC

I'll try to be more vigilent with respect to the status of the WU when I killed it, but I don't think any of these were 1% issues... they were well into the WU and stuck (no progress over a 20 minute span, graphic not moving at all... should I be looking for something else?) All machines are dedicated crunchers with very little else being done on them.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11711529 (68k CPU seconds, 358 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11682684 (117k CPU seconds, 606 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11011944 (142k CPU seconds, 732 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11733000 (43k CPU seconds, 247 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11732999 (71k CPU seconds, 406 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11733500 (60k CPU seconds, 347 pts)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11580576 (3k CPU seconds, 19 pts)

ID: 12700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rossmor35

Send message
Joined: 24 Sep 05
Posts: 4
Credit: 84,870
RAC: 0
Message 12711 - Posted: 26 Mar 2006, 13:33:04 UTC


This WU stuck at 1% for 6.5hrs before i aborted it.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11960938
ID: 12711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Hoogie

Send message
Joined: 4 Nov 05
Posts: 13
Credit: 1,572,894
RAC: 0
Message 12712 - Posted: 26 Mar 2006, 14:21:12 UTC
Last modified: 26 Mar 2006, 14:24:22 UTC

The following workunit 12125177, HB_BARCODE_30_1c8cA_351_20458, has stopped at Model 1 Step 20167. This is repeatable, and I have aborted it.
ID: 12712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 12713 - Posted: 26 Mar 2006, 16:31:31 UTC

This wu https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11694953 got stuck
at Model 1 step 20690 100% cpu and at 1 %.
After restart it got stuck at the same place again.

Aborted

Anders n


ID: 12713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rich

Send message
Joined: 30 Nov 05
Posts: 5
Credit: 594,384
RAC: 0
Message 12719 - Posted: 27 Mar 2006, 10:56:10 UTC

Good morning.

Attached is a work unit I just aborted at 1% after 16 hrs or so. I assume you can pull up the result codes. Let me know if there is more information you'all usually collect and report. I just discovered this thread and will make an effort to report more often.

Hope you'all find a solution. I get these about once every 2 weeks. What is really frustrating to me is to come home from travel and find several days wasted on a 1% work-unit. However, I understand that it is a work-in-progress.

Take care and have a good day.

Rich Seyfert

Work unit name = FA_RLX56_hom014_256bA_362_392_0
Rich Seyfert
Eatontown, NJ
SeyfertR@att.net
ID: 12719 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
casio7131

Send message
Joined: 10 Oct 05
Posts: 35
Credit: 149,748
RAC: 0
Message 12721 - Posted: 27 Mar 2006, 12:59:35 UTC
Last modified: 27 Mar 2006, 13:05:12 UTC

stuck at 1% after 11h40min:
27/03/2006 10:32:47 PM|rosetta@home|Pausing task HB_BARCODE_30_5croA_351_23561_0 (left in memory)
https://boinc.bakerlab.org/rosetta/result.php?resultid=14998210

command executed: projects/boinc.bakerlab.org_rosetta/rosetta_4.82_windows_intelx86.exe cc 5cro A -abrelax -stringent_relax -more_relax_cycles -output_chi_silent -vary_omega -rand_envpair_res_wt -rand_SS_wt -farlx -ex1 -ex2 -silent -barcode_from_fragments -new_centroid_packing -barcode_from_fragments_length 30 -ssblocks -barcode_mode 3 -omega_weight 0.5 -jitter_frag -jitter_variation gauss -output_silent_gz -nstruct 10 -paths ccfrags200.txt -relax_score_filter -filter1 -85 -filter2 -95 -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -increase_cycles 10 -cpu_run_time 7200 -constant_seed -jran 3349200

i've looked at it for a further 10-20 min and it didn't seem to have moved any more. i will restart boinc now and see what happens.
---
after restart, it has stuck again (at the same point). workunit aborted.
ID: 12721 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_GeoffS

Send message
Joined: 16 Dec 05
Posts: 2
Credit: 704,640
RAC: 0
Message 12723 - Posted: 27 Mar 2006, 13:56:50 UTC - in response to Message 12721.  

ID: 12723 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rich

Send message
Joined: 30 Nov 05
Posts: 5
Credit: 594,384
RAC: 0
Message 12753 - Posted: 28 Mar 2006, 12:26:38 UTC

Workunit: FA_RLXub_hom008_4ubpA_362_450_0 stuck at 1% for 20 hrs. URL: https://boinc.bakerlab.org/rosetta/result.php?resultid=14787271.
Rich Seyfert
Eatontown, NJ
SeyfertR@att.net
ID: 12753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Libristes>Jip] otax

Send message
Joined: 25 Sep 05
Posts: 1
Credit: 312,969
RAC: 0
Message 12762 - Posted: 28 Mar 2006, 17:14:21 UTC

Hello,

this is my list of Wu client errors :

FA_RLX56_hom007_256bA_362_202
FA_RLXch_hom015_2chf__362_223
FA_RLXwi_hom026_1wit__362_411
FA_RLXac_hom021_2acy__362_430
FA_RLXch_hom017_2chf__362_264
FA_RLXci_hom024_2ci2I_362_380
FA_RLXpt_hom006_1ptq__361_347
FA_RLXpt_hom002_1ptq__361_380
FA_RLXwh_hom024_1who__362_476
FA_RLXwh_hom017_1who__362_476

For a total of about 60 hours .... (on 3 PCs in 2 days )

Otax.


ID: 12762 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brf

Send message
Joined: 17 Jan 06
Posts: 1
Credit: 901,500
RAC: 0
Message 12769 - Posted: 28 Mar 2006, 21:54:25 UTC

I have: FA_RLXai_hom028_1aiu_359_210_0 stuck qat 46.06%. If I close Boinc or reboot, it starts up again, the CPU resets at 55 minutes, and it runs until the CPU is at 57 mins and 57 seconds and gets stuck at Model 2, Step 21273. The CPU continues counting up, but will rewind to 55 minutes if I restart Boinc.
ID: 12769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John Perko

Send message
Joined: 1 Jan 06
Posts: 3
Credit: 604,568
RAC: 0
Message 12770 - Posted: 28 Mar 2006, 22:18:04 UTC

3/28/2006 4:17:39 PM|rosetta@home|Starting result HB_BARCODE_30_2chf__351_32846_0 using rosetta version 482

The above WU was running for 35 minutes (out of a total time of 2:35). At that point, I turned on the graphic and saw that it was stuck at 1%. A second later it jumped to 29.5% and started filling up the graphs in the graphic box, which were previously empty.
ID: 12770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 12772 - Posted: 28 Mar 2006, 23:01:32 UTC

The following were aborted today. All were stuck at 1.00% after running for 20+ hours

ID=12326404 name = HB_BARCODE_30_1c8cA_351_32403
ID=12261321 name = HB_BARCODE_30_256bA_351_28680
ID=12034212 name = HB_BARCODE_30_1bk2__351_16205
ID=11076727 name = FA_RLXb3_hom001_1b3aA_359_347
ID=11972587 name = FA_RLXb3_hom010_2chf__362_384
ID=11761822 name = FA_RLXur_hom004_1urnA_362_308
ID: 12772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12778 - Posted: 29 Mar 2006, 3:18:20 UTC - in response to Message 12772.  

The following were aborted today. All were stuck at 1.00% after running for 20+ hours

ID=12326404 name = HB_BARCODE_30_1c8cA_351_32403
ID=12261321 name = HB_BARCODE_30_256bA_351_28680
ID=12034212 name = HB_BARCODE_30_1bk2__351_16205
ID=11076727 name = FA_RLXb3_hom001_1b3aA_359_347
ID=11972587 name = FA_RLXb3_hom010_2chf__362_384
ID=11761822 name = FA_RLXur_hom004_1urnA_362_308


that is not good. with the jobs currently released, this problem should be greatly reduced, and from the "percent complete" we will be able to tell where the problem is.

ID: 12778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RC

Send message
Joined: 27 Sep 05
Posts: 13
Credit: 262,048
RAC: 0
Message 12787 - Posted: 29 Mar 2006, 13:25:35 UTC
Last modified: 29 Mar 2006, 13:26:20 UTC

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=12388765

I suspended this unit after it remained at 1% for almost 4 hours. After suspending BOINC I tried running rosetta standalone for a while; it went to 17% within 15 minutes. When I restarted BOINC and resumed processing on this unit, it reset itself to zero, so I aborted it.

ID: 12787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grutte Pier [Wa Oars]~Nemesis

Send message
Joined: 8 Nov 05
Posts: 3
Credit: 386,730
RAC: 0
Message 12788 - Posted: 29 Mar 2006, 13:53:31 UTC - in response to Message 12691.  

After a bogus WU on one of my pc's that cost me over 300 credits (it was hanging for a long time) I went though all of my WU's. This is a list of all my recent WU's that were aborted with an error:

Intel(R) Pentium(R) M processor 1.73GHz
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11410757
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10507541
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10454400
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10309222
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10644097

AuthenticAMD mobile AMD Athlon(tm) XP 2000+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11665942
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11639527
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11478408

Intel(R) Pentium(R) 4 CPU 1.60GHz (@2.40GHz)
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11045076
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11068185
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11008648
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10993712
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10976761
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10961239
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10961160
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10931034
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10928750
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10808712

AuthenticAMD mobile AMD Athlon(tm) XP-M 2800+ (LV)
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10419709
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10421027
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10438624
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10529395
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10455024
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10417302
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10390604
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10095664
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10064299
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10015662

AuthenticAMD AMD Sempron(tm) Processor 3000+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10387309
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9956247

AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10629816
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10459506
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10283291
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10059896
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9544176
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=5796746
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=5733085

AuthenticAMD AMD Sempron(tm) 2400+
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11452627
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11439431
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10345630
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10727823
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10548190

I'm running Rosetta for the medical purpose, but I think there's over 1000 credits in the list above...

I'm wondering if the claimed credits will be awarded for these bogus WU's??
ID: 12788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 12789 - Posted: 29 Mar 2006, 13:57:41 UTC - in response to Message 12778.  

[quote]that is not good. with the jobs currently released, this problem should be greatly reduced, and from the "percent complete" we will be able to tell where the problem is.

Yes on the stuck units if you restart boinc the restets the timer to 0 .
I abouted another 4 W/Us to day that brings the total to 9 since Sunday
Sory I am Not much good at gathering Info Just hope the returned W/U will help give you the info you need to stop this BUG
If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 12789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

Message boards : Number crunching : Report stuck & aborted WU here please



©2025 University of Washington
https://www.bakerlab.org