Message boards : Number crunching : Problems with Rosetta version 5.98
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next
Author | Message |
---|---|
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 4 |
...and AGAIN! This one crashed and put the MessageBox up on my Vista system leaving a core dead until I clicked OK. I am suspending Rosetta at my remote sites - it is clearly unreliable at the moment. 31st July... 1fe6__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1fe6_-crystal_foldanddock__3560_41915_0 Today... 1fe6__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1fe6_-crystal_foldanddock__3560_97958_0 ... somewhat similar. The one 5th August was different. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Guido Platteau Send message Joined: 11 Sep 06 Posts: 2 Credit: 283,392 RAC: 0 |
Validate errors: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167201441 https://boinc.bakerlab.org/rosetta/result.php?resultid=183051417 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167201385 https://boinc.bakerlab.org/rosetta/result.php?resultid=183051388 Client errors: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166338183 https://boinc.bakerlab.org/rosetta/result.php?resultid=182123248 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166333019 https://boinc.bakerlab.org/rosetta/result.php?resultid=182115595 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166311099 https://boinc.bakerlab.org/rosetta/result.php?resultid=182091686 |
ConflictingEmotions Send message Joined: 5 Jun 08 Posts: 10 Credit: 3,081,990 RAC: 0 |
I aborted wuid 167183863 because it hung at 100% but took cputime way beyond expected. Watchdog got the other attempt so I am pointing it out as it may expose some error with Rosetta beta. |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
I'm getting multiple errors: dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
One Pelican Send message Joined: 8 Aug 08 Posts: 3 Credit: 856 RAC: 0 |
New to Rosetta. Have a task OR8C BOINC MFR RELAX PICKED 4370 510 0 runtime 06.15.0 > . progress 98% > .Runtime should be at 4 hrs. Can see that RMSD = 0 Energy = -397 Is this task good or should I abort? Ver 5.98 AMD 4800 2 Core. XP Ver 3. |
One Pelican Send message Joined: 8 Aug 08 Posts: 3 Credit: 856 RAC: 0 |
OOPS> Sorry. Task has just a minute ago completed. |
One Pelican Send message Joined: 8 Aug 08 Posts: 3 Credit: 856 RAC: 0 |
Aug-2008 19:58:39 [rosetta@home] Computation for task OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1 finished 09-Aug-2008 19:58:39 [rosetta@home] Output file OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1_0 for task OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1 absent |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
The following OR3d__BOINC_MFR_ABRELAX_PICKED_4322_ WUs crunched for the usual length of time and gave the normal message about the number of decoys produced, but then errored out with a -161 error. This happened for both me and the other person who crunched each WU. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167140030 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167149129 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167113344 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167126016 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
just installed boinc mgr 6.2.16 and downloaded new work as the old work from 5.10.45 did not register on 6.2 now i get this from Beta 5.98 after getting new work downloaded these come from m5xx tasks https://boinc.bakerlab.org/rosetta/result.php?resultid=183863626 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862800 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862783 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862782 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862769 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862768 after this it gets to painful to post all the error results, but needless to say between these and stuff from 1.32 that failed, i chewed up over 30 work units. i repaired and restarted boinc mgr and got some new work that seems to be running ok, hope this mess quits. <core_client_version>6.2.16</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> trouble finding Rama_smooth_dyn.dat_ss_6.4 ERROR:: Exit from: .read_paths.cc line: 360 </stderr_txt> ]]> 0 secs compute time |
Matthew Maples Send message Joined: 19 Oct 06 Posts: 5 Credit: 135,659 RAC: 0 |
Repeated crashes (Stopped Working and has been closed) and Compute Errors in Vista X64 on my desktop, both old and new BOINC versions, both in Rosetta Beta and Mini Rosetta |
Robert Gammon Send message Joined: 9 Nov 07 Posts: 14 Credit: 969,848 RAC: 0 |
Boinc 5.10.45 on XP SP2 Rosetta Beta 5.28 Most of the time, the workunit progresses normally, executing 0.05% or so per tick (on my machine 3-5 seconds) UNTIL WE GET TO ABOUT 90%. Work then slows WAY down, executing 0.001% per tick (same schedule as before at about 3-5 seconds per tick). This behavior is work unit specific as some complete in as little as 1.5 hours, with no hangups in the 90+% range, while others take 3+ to 4+ hours. The last one, which I will upload tomorrow is M624_BOINC_MFR_ABRELAX_PICKED_4395_9734_0. It took an exceptionally long time to complete, some 4:43:07. |
DaBrat and DaBear Send message Joined: 9 Aug 08 Posts: 16 Credit: 213,180 RAC: 0 |
I am having the same issue with 5.98. This is the second WU that has progress to the exact time of 9:55 secs remaining and staed there for over an hour.. It seems to be progressing but as the previous person said at .001 ever 30 mins. Should I even bother or just kill it? BTW my preerences are set at the default three hours and it was estimated to complete in 2:46 we are now into hour 4. All crunching that same 9:55 secs. It was crunching along as expected until it hit this time mark. I killed the last one |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
I am having the same issue with 5.98. This is the second WU that has progress to the exact time of 9:55 secs remaining and stated there for over an hour.. It seems to be progressing but as the previous person said at .001 ever 30 mins. Should I even bother or just kill it? You made a similar post in the Tasks end prematurely thread. Mod.Sense and I tried to explain that what you are seeing is normal Rosetta behavior. My rather long explanation is here. It is immediately followed by an important clarifying post from Mod.Sense. Perhaps it would be helpful if you reviewed those posts and then asked for clarification on any specific points. As a quick test of this particular task you could open the graphics window where you will see a model number and a step number. I strongly suspect that you are still working on the first model. If you watch for a few moments you should see the step number change. This tells you the app is not stuck and you should not abort it. Snags |
DaBrat and DaBear Send message Joined: 9 Aug 08 Posts: 16 Credit: 213,180 RAC: 0 |
I believe in you previous post you called me a 'lurker'. Shame on you. I do not believe anything I have posted qualifies me as a 'lurker'... one step up from a troll. I believe that my stats, with the short period of time I have been crunching this project qualifies me as a newb. I don't belive that your answer really addressed my question of premature task ending since th question was why does this only seem to happen at restart. And yes I use my comp for other things than rosie and the last task simply comnpleted itself when I logged back on to windows from linux at less than half the estimated time. Not the first time. Or are you suggesting that I just happen to need to log off when I have a short task? Those are almost lottery chances. Further the reason the post was made here is if you will refer to the home page there was mention of task hanging at the end of processing for this particular model and the request was that any issues be posted 'here' in this thread. Now this behaviour may be particular to Rosetta but most of the time, reagrdless of the complication of the model, The completion time is adjusted druring crunch... say the model say 2:540 to completion, it may not count down as quickly beccause of a time period overrun. This is the first series of models I have ever seen that countdouwn as normal and use the ten minutes of remaining time to crunch 4 hours of calculations. I believe that is the cunfusion with these models. Not only that, but once these run into that hang problem. It default all remaing moidels to the finish time, my latest being 6+ hours' and causes Rosetta to go into panic mode usurping any other projects you may be working on while on the other hand taking days of crunhing and returning WUs under the defualt to get the overall estimated completion time back to normal. Since my normal completion time on the comp is about 2:46, when Rosetta gets the 6 hours blues and defaults them all to that completion time (instead of something mid way or an average of completion time)... my other projects are kicked off their cores. At this rate, and the time it takes for rosie to get the normal processing time back for remaining WUs, nothing else will crunch on my machine if I run into a 5.98 every two days. BTW the 6 hour WU attempted and returned 1 decoy with no errors. |
DaBrat and DaBear Send message Joined: 9 Aug 08 Posts: 16 Credit: 213,180 RAC: 0 |
So I guess the best option would be to change my time preferences to 6 hours... that way they will all download with a 6 hour completion time amd if they complete in two I'll get a new WU somewhere in that window... But wait.... on the days when the server is short of work, I will simply be crunching empty space. Nah at least CPN will get SOME crunch time. Maybe three would be a better option... oh wait that is the default. Right now she is running in panic mode for tasks due over a week from now. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Right now she is running in panic mode for tasks due over a week from now. This is why I suggest only changing runtime preference gradually. Sounds like you either had a lot of tasks downloaded before the new preference, or, your Rosetta resource share is low enough that the work would miss the deadline if not run in "high priority" (panic) mode. BOINC's best guess is that these tasks are in danger of missing their deadlines. But don't worry, it keeps track of the time used and pays back other projects once it gets caught up as compared to the deadlines. If you approach the deadlines and still have too many tasks, you can set your preference back lower again. But you are right, some of them will find they complete early. This is the first series of models I have ever seen that countdouwn as normal and use the ten minutes of remaining time to crunch 4 hours of calculations. I believe that is the cunfusion with these models. This has been the expected behavior for over a year. That is why I would prefer to discuss the issue in a seperate thread from the release-specific issues. And tasks are released several times a month that have runtimes over 3 hours on most machines to complete a single model. But the number of such tasks released is usually fairly limited. The task you describe that ran with 6hr preference, that only completed one model, would have run for the same amount of time with any runtime preference lower then the time it took for that one model. And any time after the preferred runtime the client would have showed about 10minutes remaining. Rosetta Moderator: Mod.Sense |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
I believe in you previous post you called me a 'lurker'. Shame on you. I do not believe anything I have posted qualifies me as a 'lurker'... one step up from a troll. I believe that my stats, with the short period of time I have been crunching this project qualifies me as a newb. A lurker is someone who reads but does not post. Since you had already posted in the thread I was clearly not referring to you. I may have even been referring to myself:) I suppose on a purely social board some people might find lurkers creepy but here and on similar boards I imagine quite large numbers of people read the boards the gather information about the project and discover solutions to their technical problems, etc. without ever posting. They are lurkers and there is nothing disreputable or irresponsible about their behavior. In fact, many project boards have locked threads with titles such as "read here first" and "If you're new have a view". Who knows how many newbs come to the boards intending to post, find their answers in one of those threads and end up never posting at all. The project is actually encouraging newbs to behave as lurkers, not posters! (This is written in good humour and I hope you will read it with the same. At any rate, the term was neither directed at you nor used as an insult.) I don't belive that your answer really addressed my question of premature task ending since th question was why does this only seem to happen at restart. You are right and I acknowledged that in my post on the other thread. In none of your previous posts however did you mention finishes immediately following restarts (though other posters did). You only mentioned the timings and it wasn't clear to me that you understood that the information you have provided about the timings is not evidence of premature exits or stalled wus or in fact a problem of any kind. Per Mod.Sense's request I'll make any further response in the other thread. Snags |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
A minor anomaly... The BOINC Manager button 'Show Graphics' is inhibited for new 'AA2A' Work Units when viewing a 'localhost', and I expect that is desired or required due to the stated large size of the proteins. However, if viewing the same task over a remote connection (e.g., via port 31413), the button is not inhibited. In this case, if the button is clicked, a graphics window is generated that is blank except for the lines separating the various sections of the graphic. No adverse effects on the remote task were noted. FYI. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
AA2A_4_modeling_1_AA2A_1_AA2A_2RH1_align_4467_2456_0 <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2950319 ERROR:: Exit from: .pack.cc line: 5278 this ran 4961.234 seconds and died |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
These long AA2A work units don't have checkpoints so any stop loses 3+ hours. Is this normal or is there a problem? In the stdout file there is a checkpoint warning. WARNING!! cant restore counts from checkpoint file:mc_checkpoint |
Message boards :
Number crunching :
Problems with Rosetta version 5.98
©2024 University of Washington
https://www.bakerlab.org