Message boards : Number crunching : Report stuck & aborted WU here please
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next
Author | Message |
---|---|
Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
PRODUCTION_ABINITIO_INCREASECYCLES50_1ten__312_1196_0 1% after 4h, 2Mio steps again and again, last entry in stdout: Starting score3 moves... kk,score3,low_score,rms_err,low_rms,rms_min,naccept 0 -61.034 -61.034 11.848 11.848 8.788 15290 converged 2.07775331 108316 converged 2.71214509 112159 converged 2.55168295 125540 converged 2.40547872 129158 converged 1.95232618 132867 converged 2.9595387 137826 converged 2.75581789 140668 converged 2.2488966 144434 converged 2.80967975 158799 converged 2.50006342 162126 converged 2.39554954 169710 converged 2.04850674 183329 converged 1.99719334 187299 1 -40.606 -78.377 11.668 12.506 8.788 20134 converged 2.95864868 138902 converged 2.02392745 173508 converged 2.22490144 324940 2 -12.235 -78.377 8.579 12.506 6.717 26539 converged 2.71321511 126774 converged 1.66896379 159099 Time is not updated anymore of the stdout file but content, still at model 1! Restart boinc didn`t solve the problem, only a new random seed, what can I do? |
Team TMR Send message Joined: 2 Nov 05 Posts: 21 Credit: 1,583,679 RAC: 0 |
Another one, 9442770 Over 8 hours in and still stuck on 1%. It's running rosetta 4.82 too, so I guess that didn't fix the 1% problem then. Max CPU setting is 2 hours. |
Team TMR Send message Joined: 2 Nov 05 Posts: 21 Credit: 1,583,679 RAC: 0 |
Well it finished eventually, at 8hr 39mins. But it never did get off 1% as far as I could see. |
Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
Damn, all is going wrong here, another one after 14h and 45% fall back to 15% :(. I will cancel all and waiting for a fix. (Rosetta 4.82, Boinc 5.2.13). I have never had any problems before... 2 of 3 failed :o https://boinc.bakerlab.org/rosetta/result.php?resultid=11747939 https://boinc.bakerlab.org/rosetta/result.php?resultid=11748069 |
Stwainer Send message Joined: 9 Nov 05 Posts: 27 Credit: 4,406,829 RAC: 0 |
I had the following Wu stuck at 1% for 2 hours: PRODUCTION_ABINITIO_INCREASECYCLES50_1dhn__312_608_0 |
Jon Kennedy Send message Joined: 1 Oct 05 Posts: 6 Credit: 418,027 RAC: 0 |
This workunit was stuck at 1% after 27h35m: https://boinc.bakerlab.org/rosetta/result.php?resultid=11510637 Nothing occured on the machine to interrupt crunching - Message log: 2/19/2006 5:04:22 PM|rosetta@home|Starting result PRODUCTION_ABINITIO_RANDOMFRAG_1urnA_309_445_0 using rosetta version 481 2/19/2006 5:04:24 PM|rosetta@home|Started upload of PRODUCTION_ABINITIO_RANDOMFRAG_1ughI_309_445_0_0 2/19/2006 5:04:31 PM|rosetta@home|Finished upload of PRODUCTION_ABINITIO_RANDOMFRAG_1ughI_309_445_0_0 2/19/2006 5:04:31 PM|rosetta@home|Throughput 23263 bytes/sec 2/20/2006 8:26:06 PM||request_reschedule_cpus: project op 2/20/2006 8:26:10 PM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/20/2006 8:26:10 PM|rosetta@home|Reason: Requested by user 2/20/2006 8:26:10 PM|rosetta@home|Reporting 7 results 2/20/2006 8:26:15 PM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 2/20/2006 10:35:53 PM|rosetta@home|Unrecoverable error for result PRODUCTION_ABINITIO_RANDOMFRAG_1urnA_309_445_0 (aborted via GUI RPC) The next WU PRODUCTION_ABINITIO_RANDOMFRAG_2acy__309_389_0 is also seemingly stuck at 1% after 37+ minutes... <sigh> |
Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0 |
I have one stuck on a Mac G4 Laptop running OS 10.4.5. the WU is here. The application version is 4.82. The previous "owner" had a client error on this WU. This will be the result ID If I can make it finish. The WU is stuck at 1% complete after 2:15 of CPU time. My time setting is set for 2 hours. It has completed 97345 steps but shows 1% complete. The WU name is -PRODUCTION_ABINITIO_QUADRUPLELONGRANGEANTIPARALLEL_1acf__311_807 Regards Phil We Must look for intelligent life on other planets as, it is becoming increasingly apparent we will not find any on our own. |
Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0 |
This workunit was stuck at 1% after 27h35m: I wonder if we actually have stuck WUs or if they are just one of the ones that used to take 30 hours. Both yours and mine are "PRODUCTION_ABINITIO_xxxx". I just watched the screen saver for a while and it is running over 100,000 steps on the first model, but it is running. It could be that it is just doing more steps per model and therefore taking longer to checkpoint and that would delay the percent complete. It has run over 20 min and all the WUs I have seen since the New version of the software was released have only taken about 5 mins per model. |
Peter Ingham Send message Joined: 27 Sep 05 Posts: 14 Credit: 4,215,134 RAC: 3 |
FYI, I've just aborted a WU () stuck at 1% after 175K seconds Name: PRODUCTION_ABINITIO_RANDOMFRAG_1vcc__309_441 WU: 9337995 ResultID: 11582797 |
KwintenB Send message Joined: 24 Nov 05 Posts: 6 Credit: 183,329 RAC: 0 |
I've got a WU who's crunching already 51h, now i suspended the WU. Is there any chance that i'll get point voor this job if I abort it. Because this is obviously a project fault Details of the WU: 19/02/2006 04:12:00|rosetta@home|Starting result PRODUCTION_ABINITIO_DBFLAGS_1lis__307_738_0 using rosetta version 481 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9300013 https://boinc.bakerlab.org/rosetta/result.php?resultid=11470130 |
arklms Send message Joined: 17 Dec 05 Posts: 7 Credit: 177,488 RAC: 0 |
PRODUCTION_ABINITIO_INCREASECYCLES50_1tul__317_178_0 Appears stuck on 1%. Can't start it from the DOS window, it crashed. |
arklms Send message Joined: 17 Dec 05 Posts: 7 Credit: 177,488 RAC: 0 |
PRODUCTION_ABINITIO_INCREASECYCLES50_1tul__317_178_0 I just clicked on the Rosetta graphics, which crashed the computer. Upon reboot, it's 17% and ongoing. Strange, but true. |
Daral Send message Joined: 13 Jan 06 Posts: 13 Credit: 870,334 RAC: 0 |
Got a 1% error for 1 hr 21 minutes. Work Unit Production_Abinitio_increasecycles50_1ten_317_127_0 Running it from command line now with seed 1037999 seems to also get stuck on the first iteration. It's run over 512k steps and is still on the first model. |
Nico Send message Joined: 29 Sep 05 Posts: 1 Credit: 548,959 RAC: 0 |
PRODUCTION_ABINITIO_QUADRUPLELONGRANGEANTIPARALLEL_1tul__311_863 stucked at 1%: (requestet 2h WUs and this one is running for more then 2h now and still at 1%) http://666kb.com/i/117ucnv1ep5vl.gif |
O&O Send message Joined: 11 Dec 05 Posts: 25 Credit: 66,900 RAC: 0 |
Hello David PRODUCTION_ABINITIO_1acf__250_809_2 My computer did 13.32 hours on this WU ... before it errored out with -177 (0xffffff4f) Exit status and "Maximum CPU time exceeded". What about the ... 131.16 cliamed credits? Regards, O&O |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
FYI, I've just aborted a WU () stuck at 1% after 175K seconds Some of the PRODUCTION_ABINITIO work units take a long time to pass 1%. In some cases over 4 1/2 hours on a reasonably fast machine. Before aborting them and loosing ALL the time spent, you should check the graphic display and make certain it is not actually running. In many cases these work units take well over 700,000 steps to complete a single model. The work unit will look hung until between model completions. IF it is hung you can usually preserve some of the time spent on running it by restarting boinc. This will in most cases cause the work unit to run successfully to completion. The project team is aware of this and they are making an adjustment in the WUs to fix the problem. But it will take a few days to two weeks for the old work units to run through the system. Moderator9 ROSETTA@home FAQ Moderator Contact |
Runaway1956 Send message Joined: 5 Nov 05 Posts: 19 Credit: 535,400 RAC: 0 |
Well, glad I stopped in to look around. After the upgrade to 4.81, it seemed that none of my previously downloaded WU wanted to run. Which was odd, as I'd already returned a number of similar WU from the same batch. I put those all on hold, and ran some of the newer WU, which said they were for 4.81. I got the 1% glitch on about 4 of them. Hit reset. Everything goes away, and BOINC downloads some new WU. Same thing. 1% lasts about 2 1/2 eternities. Was about to hit reset again, but decided to come here..... Thanks guys. I'll let the little monster run. FYI, I've just aborted a WU () stuck at 1% after 175K seconds |
XS_team_germany Send message Joined: 2 Jan 06 Posts: 6 Credit: 1,469,591 RAC: 0 |
I uploaded these results today and I received no credit for them:
Host: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=166212 The above work units ran 7+ hours each. :( |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
For people having many work Unit Errors!! I have received an e-mail from Dr. Baker with information for any of you who are having a lot of Work Unit errors. "Could you help us to recommend to people having problems with lots of WU to set the target run time to a smaller value like 2 hours. We think there aren't any new bugs, just with longer run times it is more likely for a WU to have problems." So if you are having a lot of errors please reset your Time setting to 2 hours and see if that helps. Moderator9 ROSETTA@home FAQ Moderator Contact |
Marie Lucie Send message Joined: 9 Dec 05 Posts: 5 Credit: 40,616 RAC: 0 |
Hello, I made the change in Rosetta settings as requested and I got again an error. It run 53 minutes and than ... 25/02/2006 10:28:41|rosetta@home|Unrecoverable error for result HBLR_1.0_1hz6_321_998_0 ( - exit code -1073741819 (0xc0000005)) 25/02/2006 10:28:42||request_reschedule_cpus: process exited 25/02/2006 10:28:42|rosetta@home|Computation for result HBLR_1.0_1hz6_321_998_0 finished I've one WU remaining. We will see |
Message boards :
Number crunching :
Report stuck & aborted WU here please
©2024 University of Washington
https://www.bakerlab.org