Message boards : Number crunching : Report problems with Rosetta version 5.36
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Jack Shaftoe Send message Joined: 30 Apr 06 Posts: 115 Credit: 1,307,916 RAC: 0 |
> ...another "disk space exceeded" error Same here, 3 straight WU's, plenty of disk space. Darnit! |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
Why don't you call them zombies? dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
Jack Shaftoe Send message Joined: 30 Apr 06 Posts: 115 Credit: 1,307,916 RAC: 0 |
> ...another "disk space exceeded" error For what it's worth, they are all FRA_t369 tasks. Something is wrong with those WU's. I've lost every single one of them with this error - about 15 so far. I'm aborting the remaining 8 in queue. Team Starfire World BOINC |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
> ...another "disk space exceeded" error Not all. Over half have been FRA_, but other proteins besides t369. Just take a look through these which I listed in an earlier post: A1 + A2 + A3 + B1 + B2 + B3 + B4 + C1 + C2 + C3 that is 5 x FRA_t362, and mix of others. So far these have just hit three of my boxes and not the other 5 that are currently running Rosetta. R~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
... Perhaps we should call this the Norwegian Blue app ;-) Why don't you call them zombies?[/quote] To me a zombie task is one that is still known by the operating system (eg still in memory, or still has open files, etc) but has either permanently stopped running or can never complete because has lost a process on which it depends. In linux terms, a zombie process has a pid that ps / kill / etc still recognise as valid even if we as humans can see that the pid will never be selected to run again. I am only making my best guess about what is happening here, and I may well be wrong -- but if my guess is right then these are not zombies in that sense, they are completely dead processes as far as the OS is concenred, but the client has not yet noticed - the same kind of issue but arising at the level of the client rather than the OS. So I'd understand a claim that these are zombies to be implying a different diagnosis to mine -- usefully so as the two diagnoses may lead on to divergent solutions. Thank you for an enlightening question. R~~ |
Buffalo Bill Send message Joined: 25 Mar 06 Posts: 71 Credit: 1,630,458 RAC: 0 |
|
Rudy Toody Send message Joined: 18 Jul 06 Posts: 4 Credit: 280,134 RAC: 0 |
I've had to abort two WUs and one disappeared on its own. All three had "DUMMY" in the name and all three were being viewed in the graphic window when the problems occurred. I haven't had this problem with any of the other WUs. |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,186,270 RAC: 2,938 |
>> A follow up on my Screensaver lockup and failing workunit problems. Since upgrading to Boinc Client Version 5.4.11, I have had 1 'lack of disc space' error and the last 3 workunits on the 4800+ machine have gone through with no problems, even with the Boinc screensaver on. So perhaps with Boinc software updates the server side of things might no longer work 100% backward compatiable with older client versions? My version was the previous stable recommended one (5.2.13). Anyway all appears to be working. Considering that only 1 or 2 out of a dozen worked to completion (some worked less than 2 minutes before failing), in the previous batch with the screensaver on, I am a lot happier, well at least for the moment. > We live in a world of problems, some we create ourselves, some others create for us. All I can say is "It's not my fault, I was probably asleep at the time".< |
Jerry Camden Send message Joined: 26 Sep 05 Posts: 1 Credit: 226,493 RAC: 0 |
I just had a C++ Error dialog. ResultID is 46651597. Messages in BOINC manager.... 11/11/2006 7:48:31 PM|rosetta@home|Restarting task 1bkrA_BOINC_ABINITIO_SAVE_ALL_OUT_DUMMYMODEL__1364_1405_0 using rosetta version 536 | | 11/12/2006 5:28:37 AM|rosetta@home|Unrecoverable error for result 1bkrA_BOINC_ABINITIO_SAVE_ALL_OUT_DUMMYMODEL__1364_1405_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3)) 11/12/2006 5:28:37 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 11/12/2006 5:28:37 AM|rosetta@home|Computation for task 1bkrA_BOINC_ABINITIO_SAVE_ALL_OUT_DUMMYMODEL__1364_1405_0 finished [/pre] |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
I just opened graphics on this WUand the screen froze--Control,Alt,Delete--and returns an error. Tim |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,186,270 RAC: 2,938 |
>> A follow up on my Screensaver lockup and failing workunit problems. Alas I spoke to soon as the next 5 WU's all failed, all have debugging info with the WU:- https://boinc.bakerlab.org/rosetta/result.php?resultid=46588030 https://boinc.bakerlab.org/rosetta/result.php?resultid=46656845 these 2 had exit code 1073741819 https://boinc.bakerlab.org/rosetta/result.php?resultid=46507566 this one had 'Maximum Disk Usage Exceeded' https://boinc.bakerlab.org/rosetta/result.php?resultid=46656798 https://boinc.bakerlab.org/rosetta/result.php?resultid=46656799 Theses 2 have the 'Stuck' problem, 'exit code 2147483645' also 'Breakpoint Encountered' Guess I am not as happy as I thought I was. |
Rudy Toody Send message Joined: 18 Jul 06 Posts: 4 Credit: 280,134 RAC: 0 |
I just opened graphics on this WUand the screen froze--Control,Alt,Delete--and returns an error. I tried the same thing on my second PC (different graphics setup) and, within 10 seconds, it froze. If I don't peek at them, they run to completion. |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,186,270 RAC: 2,938 |
> Sorry Rosetta team but I am Sick of this, another 2 out 3 failed, I will switch the screensaver back off, at least I can process some work that way. These 2 have no debugging information. https://boinc.bakerlabs.org/rosetta/result.php?resultid=46742010 exit code 1073807364 https://boinc.bakerlabs.org/rosetta/result.php?resultid=46754204 another 'Stuck' one watchdog killed with a validate error. I could not prove the problem in Ralph as I received no Ralph WU's for many days now. |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,186,270 RAC: 2,938 |
> Sorry Rosetta team but I am Sick of this, another 2 out 3 failed, I will switch the screensaver back off, at least I can process some work that way. This one was running when I switched and after 2 hours was at 1.02%, it eventually failed as being 'Stuck' with 'breakpoint encountered' error, has debugging data https://boinc.bakerlab.org/rosetta/result.php?resultid=46742003 |
scsimodo Send message Joined: 17 Sep 05 Posts: 93 Credit: 946,359 RAC: 0 |
|
Buffalo Bill Send message Joined: 25 Mar 06 Posts: 71 Credit: 1,630,458 RAC: 0 |
|
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
as Newbie here in the morning i found these Workunits in Error Workunit 41576568 DOC_1MLC_R061030_st_model_09_1383_1382_0 as next this Workunit 41577287 DOC_2SIC_R061030_st_model_09_1389_1428_0 Validate error - screensaver Version 5.4 crashed and at last 14.11.2006 06:08:44|rosetta@home|Unrecoverable error for result DOC_1MLC_R061030_st_model_10_1383_1421_0 ( - exit code 1073807364 (0x40010004)) |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Welcome to Killersocke. I just wanted to point out, to anyone looking, that you had three WUs so far, the first two failed, the third was successful... and all three were for the brand spankin' new v5.40. And if you check the v5.40 thread you will see Chu found a problem with docking work units running under the new application version. So, hopefully this issue is already addressed. It's unfortunate that your first two WUs failed. But it is sorta like "mistakes". Doing it once is a "learning experience", doing it a second time is a "mistake". Well, with Rosetta, the entire project is a learning experience. We're helping break new ground in science. So they are constantly changing, enhancing and improving the application to try different approaches or test various ideas on how to devise better models. You will note that your first WU has already received credit. There is a daily job that runs to grant credit even for the failed WUs. After all, there wasn't anything that you did to cause it. It is part of the learning experience. The other two WUs should receive credit tomorrow. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
César Send message Joined: 8 Feb 06 Posts: 1 Credit: 14,964 RAC: 0 |
Work unit has been running for days, I have aborted it. s002_BOINC_ABRELAX_SAVE_ALL_OUT_hom001__1313_31444_0 I'll try the new Rosetta version since the Moderator said it will solve this kind of problem. If it doesn't, I'll switch to other projects. I don't think it is reasonable to provide idle time of our computer and have to babysit it to behave. Management of tasks should be automatic (cleaning up those who exceed reasonable time, and not requiring users to inform them manually or check pages of bad work units manually). Sharing of the computing environment with other Boing tasks should be well used. Sorry. |
JohanDM Send message Joined: 22 Nov 05 Posts: 1 Credit: 219,288 RAC: 0 |
|
Message boards :
Number crunching :
Report problems with Rosetta version 5.36
©2024 University of Washington
https://www.bakerlab.org