Message boards : Number crunching : Report stuck & aborted WU here please - II
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
| Author | Message |
|---|---|
|
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,720,744 RAC: 0 |
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13936782 Please advise; should I terminate? |
|
keyboards Send message Joined: 3 Mar 06 Posts: 36 Credit: 74,787 RAC: 0 |
Aborting 7485_largescale_large_fullatom_relax_dec7485_1_47_1.pdb_432_95. Completed 1.76% after 2 hours with no further advance. Set for 2 hours. !!Stupidity should be PAINFUL!! |
|
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,720,744 RAC: 0 |
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode. I've aborted this work unit after six hours. https://boinc.bakerlab.org/rosetta/result.php?resultid=17002591 |
Purple RabbitSend message Joined: 24 Sep 05 Posts: 28 Credit: 4,536,152 RAC: 0 |
This one ran for 6 hours stuck at 1.04%. I restarted BOINC and the WU began again at zero. It quickly ran up to 1.04%, but seemed to have hung again according to the graphics display. I aborted the WU after 14 minutes (the second time). TRUNCATE_TERMINI_FULLRELAX_1enh__433_53_0 using rosetta version 498 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13904970 |
|
Christian Barrett Send message Joined: 17 Sep 05 Posts: 11 Credit: 14,933 RAC: 0 |
here is one that cost me dearly https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13792169 10 Apr 2006 22:50:35 UTC 12 Apr 2006 3:54:52 UTC Over Client error Done 70,199.00 105.31 ---
|
|
JT.Ault Send message Joined: 9 Dec 05 Posts: 1 Credit: 829,315 RAC: 0 |
home1 rosetta@home 4/11/2006 9:48:33 PM Unrecoverable error for result TRUNCATE_TERMINI_FULLRELAX_2tif__433_106_0 (aborted via GUI RPC) https://boinc.bakerlab.org/rosetta/result.php?resultid=16970267 Exit status -197 (0xffffff3b) application version 4.98 Stuck at 1.04% |
Team_Elteor_Borislavj~IntelligenceSend message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
1 of my clients hasnt contacted bakerlab since 23 march, will investigate this evening why it died... |
|
[DPC]Charley Send message Joined: 18 Mar 06 Posts: 9 Credit: 295,915 RAC: 0 |
Got another two units stuck at 1%, aborted 'm TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_355 after 6 hours and TRUNCATE_TERMINI_FULLRELAX_1b3aA_433_479_0 after 10 hours (seriously stuck, no counters increase except for the time) |
|
Christian Hagen Send message Joined: 26 Sep 05 Posts: 5 Credit: 46,795 RAC: 0 |
Got also a WU stuck at 1% and aborted it https://boinc.bakerlab.org/rosetta/result.php?resultid=17029338 TRUNCATE_TERMINI_FULLRELAX_1enh__433_691_0 after 2.5 hours |
|
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH Another failed project 17027763 12417632 11 Apr 2006 20:35:20 UTC 12 Apr 2006 12:28:07 UTC Over Client error Computing 44,064.20 136.62 This makes at least 5 projects with crashes and more than 5 cpu days wasted in total. What the hell is happening. To Say I am frustrated is an understatement This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
|
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH Well don't feel to bad Jose I seem to have to abort 60 to 100 Hrs of wasted CPU time every DAY. I did abort just today 7 WU's STUCK at 1.04% for a total of 80 HRs DAVID what are you going to do about solving this problem ??? Any end in sight? Baby sitting your client does consume a lot of my time If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
|
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
2 WUs here stuck at 1.04% TRUNCATE_TERMINI_FULLRELAX_1ptq_433_485_0 TRUNCATE_TERMINI_FULLRELAX_1enh_433_558_0 There are two more in this series to come; I'll abort the stuck ones and see what happens. Edit: the subsequent WUs seem to be running ok, although one of them had already been aborted elsewhere. Anyway, they're both past 8% so fingers crossed. NB: my default is 4 hours and the two units above are the first to have stuck. |
|
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
ARGGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHH sounds to me like things are worse than they were a week ago, is this correct? the only change is that we increased the default run time from 2 hours to 4 hours, which reduces network traffic at the cost of an increased chance of work unit errors (because they are longer). we can set the default back to two hours and see if it helps. anyway--main question--are people seeing more stuck work units now than 7-10 days ago? |
rbpeakeSend message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
anyway--main question--are people seeing more stuck work units now than Rom (or someone) should probably do an analysis to see what (if any) common factors there are for the errored units, and the overall frequency. Knock on wood (although with limited sampling), I have kept my run time at 8 hours and have not had any problems with 4.98. Regards, Bob P. |
arminiusSend message Joined: 23 Sep 05 Posts: 8 Credit: 883,822 RAC: 0 |
|
|
Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0 |
Just got my first stuck WU. Yay me :( Anyway its. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13923483 It's currently at 8+49 CPU time. Stuck at 1.042% It has exceeded both the default run time and my run time setting. I have suspended the WU. Bonic 5.2.13. Please advise as to what to do with this WU. |
|
jomebrew Send message Joined: 31 Mar 06 Posts: 2 Credit: 25,914,516 RAC: 0 |
I have a couple of these on my Linux system. I would appreciate some help on a clean way to abort these on Linux. I have been hacking client_state.xml and deleting files in the slots directory. There has to be a better way. Warning! PRODUCTION_ABINITIO_CENTROID_PACKING_1ctf__429_247_0 was started at 2006-04-09 20:52:34 but has not finished! Warning! HBLR_1.0_2reb_426_994_0 was started at 2006-04-09 23:07:09 but has not finished! Warning! 7449_largescale_large_fullatom_relax_dec7449_1_05_6.pdb_431_53_0 was started at 2006-04-09 20:58:18 but has not finished! Warning! PRODUCTION_ABINITIO_CENTROID_PACKING_1vls__428_262_0 was started at 2006-04-09 21:19:28 but has not finished! Warning! 7485_largescale_large_fullatom_relax_dec7485_1_05_8.pdb_432_129_0 was started at 2006-04-09 21:50:01 but has not finished! Warning! TRUNCATE_TERMINI_FULLRELAX_1ptq__433_587_0 was started at 2006-04-11 17:55:43 but has not finished! |
|
n7zfi Send message Joined: 7 Apr 06 Posts: 1 Credit: 4,623,875 RAC: 0 |
Running on Windows XP Pro, I have a WU stuck at 1.04%. The graphics appears to be locked up; nothing is moving even though the CPU utilization clock keeps ticking. The WU in questions is: TRUNCATE_TERMINI_FULLRELAX_1ptq_433_906_0 I have suspended it after 1:34:22 of run time. The other WUs progress past that point in a few minutes. |
|
snoekbaars Send message Joined: 16 Mar 06 Posts: 2 Credit: 12,136 RAC: 0 |
Work unit aborted at 48% - CPU time used ~24 hours. Time needed to completion only going up. Nothing moved in the graphics. WU Name "FA_RLXpt_hom003_1ptq__361_156_3" - Application "rosetta 4.98" Workunit = 11684527; Result ID = 16802748; System = Intel P4 3.0GHz, Win-XP SP 2 The workunit still reports "in progress" at the time of writing this message. The workunit was aborted manually ("Aborted via GUI RPC"). |
|
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
Just again had a WU that was running for more than 6 hours at 1.17% and when I checked it again another one had started which is running for 45 minutes now at 1.06% but I cannot find that other wu in my results. Better testdrive a project like this more thoroughly before letting so many people waste their money. If I go on this month it wil be the last anyway. Rather fed up with it. No fun at all anymore. |
Message boards :
Number crunching :
Report stuck & aborted WU here please - II
©2026 University of Washington
https://www.bakerlab.org