Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 608
Credit: 9,745,637
RAC: 4,638
Message 54979 - Posted: 7 Aug 2008, 11:45:45 UTC
Last modified: 7 Aug 2008, 11:49:48 UTC

...and AGAIN! This one crashed and put the MessageBox up on my Vista system leaving a core dead until I clicked OK.

I am suspending Rosetta at my remote sites - it is clearly unreliable at the moment.

31st July...
1fe6__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1fe6_-crystal_foldanddock__3560_41915_0
Today...
1fe6__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1fe6_-crystal_foldanddock__3560_97958_0

... somewhat similar. The one 5th August was different.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 54979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Guido Platteau

Send message
Joined: 11 Sep 06
Posts: 2
Credit: 283,392
RAC: 0
Message 54993 - Posted: 8 Aug 2008, 9:09:04 UTC
Last modified: 8 Aug 2008, 9:10:00 UTC

Validate errors:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167201441
https://boinc.bakerlab.org/rosetta/result.php?resultid=183051417

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167201385
https://boinc.bakerlab.org/rosetta/result.php?resultid=183051388

Client errors:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166338183
https://boinc.bakerlab.org/rosetta/result.php?resultid=182123248

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166333019
https://boinc.bakerlab.org/rosetta/result.php?resultid=182115595

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=166311099
https://boinc.bakerlab.org/rosetta/result.php?resultid=182091686
ID: 54993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ConflictingEmotions

Send message
Joined: 5 Jun 08
Posts: 10
Credit: 3,081,990
RAC: 0
Message 54996 - Posted: 8 Aug 2008, 16:31:06 UTC

I aborted wuid 167183863 because it hung at 100% but took cputime way beyond expected. Watchdog got the other attempt so I am pointing it out as it may expose some error with Rosetta beta.
ID: 54996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 54997 - Posted: 8 Aug 2008, 16:42:20 UTC

I'm getting multiple errors:


dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 54997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
One Pelican

Send message
Joined: 8 Aug 08
Posts: 3
Credit: 856
RAC: 0
Message 55006 - Posted: 9 Aug 2008, 11:35:50 UTC

New to Rosetta.
Have a task OR8C BOINC MFR RELAX PICKED 4370 510 0
runtime 06.15.0 > . progress 98% > .Runtime should be at 4 hrs.
Can see that RMSD = 0
Energy = -397
Is this task good or should I abort?

Ver 5.98 AMD 4800 2 Core. XP Ver 3.
ID: 55006 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
One Pelican

Send message
Joined: 8 Aug 08
Posts: 3
Credit: 856
RAC: 0
Message 55007 - Posted: 9 Aug 2008, 11:38:21 UTC

OOPS> Sorry.
Task has just a minute ago completed.
ID: 55007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
One Pelican

Send message
Joined: 8 Aug 08
Posts: 3
Credit: 856
RAC: 0
Message 55010 - Posted: 9 Aug 2008, 19:37:44 UTC

Aug-2008 19:58:39 [rosetta@home] Computation for task OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1 finished
09-Aug-2008 19:58:39 [rosetta@home] Output file OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1_0 for task OR3d__BOINC_MFR_ABRELAX_PICKED_4322_1062_1 absent
ID: 55010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 55012 - Posted: 10 Aug 2008, 4:12:30 UTC

The following OR3d__BOINC_MFR_ABRELAX_PICKED_4322_ WUs crunched for the usual length of time and gave the normal message about the number of decoys produced, but then errored out with a -161 error. This happened for both me and the other person who crunched each WU.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167140030
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167149129
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167113344
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=167126016
ID: 55012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4875
Credit: 4,503,902
RAC: 933
Message 55021 - Posted: 10 Aug 2008, 14:16:36 UTC
Last modified: 10 Aug 2008, 14:23:11 UTC

just installed boinc mgr 6.2.16 and downloaded new work as the old work from 5.10.45 did not register on 6.2

now i get this from Beta 5.98 after getting new work downloaded
these come from m5xx tasks


https://boinc.bakerlab.org/rosetta/result.php?resultid=183863626 https://boinc.bakerlab.org/rosetta/result.php?resultid=183862800
https://boinc.bakerlab.org/rosetta/result.php?resultid=183862783
https://boinc.bakerlab.org/rosetta/result.php?resultid=183862782
https://boinc.bakerlab.org/rosetta/result.php?resultid=183862769
https://boinc.bakerlab.org/rosetta/result.php?resultid=183862768
after this it gets to painful to post all the error results, but needless to say between these and stuff from 1.32 that failed, i chewed up over 30 work units.

i repaired and restarted boinc mgr and got some new work that seems to be running ok, hope this mess quits.



<core_client_version>6.2.16</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
trouble finding Rama_smooth_dyn.dat_ss_6.4
ERROR:: Exit from: .read_paths.cc line: 360

</stderr_txt>
]]>


0 secs compute time
ID: 55021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Matthew Maples

Send message
Joined: 19 Oct 06
Posts: 5
Credit: 135,659
RAC: 0
Message 55057 - Posted: 12 Aug 2008, 19:54:06 UTC

Repeated crashes (Stopped Working and has been closed) and Compute Errors in Vista X64 on my desktop, both old and new BOINC versions, both in Rosetta Beta and Mini Rosetta
ID: 55057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 55091 - Posted: 15 Aug 2008, 23:26:41 UTC

Boinc 5.10.45 on XP SP2
Rosetta Beta 5.28

Most of the time, the workunit progresses normally, executing 0.05% or so per tick (on my machine 3-5 seconds) UNTIL WE GET TO ABOUT 90%. Work then slows WAY down, executing 0.001% per tick (same schedule as before at about 3-5 seconds per tick).

This behavior is work unit specific as some complete in as little as 1.5 hours, with no hangups in the 90+% range, while others take 3+ to 4+ hours. The last one, which I will upload tomorrow is M624_BOINC_MFR_ABRELAX_PICKED_4395_9734_0. It took an exceptionally long time to complete, some 4:43:07.

ID: 55091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaBrat and DaBear

Send message
Joined: 9 Aug 08
Posts: 16
Credit: 213,180
RAC: 0
Message 55392 - Posted: 29 Aug 2008, 21:59:55 UTC
Last modified: 29 Aug 2008, 22:04:14 UTC

I am having the same issue with 5.98. This is the second WU that has progress to the exact time of 9:55 secs remaining and staed there for over an hour.. It seems to be progressing but as the previous person said at .001 ever 30 mins. Should I even bother or just kill it?


BTW my preerences are set at the default three hours and it was estimated to complete in 2:46 we are now into hour 4. All crunching that same 9:55 secs. It was crunching along as expected until it hit this time mark. I killed the last one
ID: 55392 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,366,007
RAC: 1,190
Message 55395 - Posted: 30 Aug 2008, 3:22:16 UTC - in response to Message 55392.  

I am having the same issue with 5.98. This is the second WU that has progress to the exact time of 9:55 secs remaining and stated there for over an hour.. It seems to be progressing but as the previous person said at .001 ever 30 mins. Should I even bother or just kill it?


BTW my preerences are set at the default three hours and it was estimated to complete in 2:46 we are now into hour 4. All crunching that same 9:55 secs. It was crunching along as expected until it hit this time mark. I killed the last one


You made a similar post in the Tasks end prematurely thread. Mod.Sense and I tried to explain that what you are seeing is normal Rosetta behavior. My rather long explanation is here. It is immediately followed by an important clarifying post from Mod.Sense. Perhaps it would be helpful if you reviewed those posts and then asked for clarification on any specific points.

As a quick test of this particular task you could open the graphics window where you will see a model number and a step number. I strongly suspect that you are still working on the first model. If you watch for a few moments you should see the step number change. This tells you the app is not stuck and you should not abort it.

Snags
ID: 55395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaBrat and DaBear

Send message
Joined: 9 Aug 08
Posts: 16
Credit: 213,180
RAC: 0
Message 55411 - Posted: 31 Aug 2008, 11:48:18 UTC
Last modified: 31 Aug 2008, 12:06:42 UTC

I believe in you previous post you called me a 'lurker'. Shame on you. I do not believe anything I have posted qualifies me as a 'lurker'... one step up from a troll. I believe that my stats, with the short period of time I have been crunching this project qualifies me as a newb.

I don't belive that your answer really addressed my question of premature task ending since th question was why does this only seem to happen at restart. And yes I use my comp for other things than rosie and the last task simply comnpleted itself when I logged back on to windows from linux at less than half the estimated time. Not the first time. Or are you suggesting that I just happen to need to log off when I have a short task? Those are almost lottery chances.

Further the reason the post was made here is if you will refer to the home page there was mention of task hanging at the end of processing for this particular model and the request was that any issues be posted 'here' in this thread.

Now this behaviour may be particular to Rosetta but most of the time, reagrdless of the complication of the model, The completion time is adjusted druring crunch... say the model say 2:540 to completion, it may not count down as quickly beccause of a time period overrun. This is the first series of models I have ever seen that countdouwn as normal and use the ten minutes of remaining time to crunch 4 hours of calculations. I believe that is the cunfusion with these models.

Not only that, but once these run into that hang problem. It default all remaing moidels to the finish time, my latest being 6+ hours' and causes Rosetta to go into panic mode usurping any other projects you may be working on while on the other hand taking days of crunhing and returning WUs under the defualt to get the overall estimated completion time back to normal.

Since my normal completion time on the comp is about 2:46, when Rosetta gets the 6 hours blues and defaults them all to that completion time (instead of something mid way or an average of completion time)... my other projects are kicked off their cores. At this rate, and the time it takes for rosie to get the normal processing time back for remaining WUs, nothing else will crunch on my machine if I run into a 5.98 every two days.


BTW the 6 hour WU attempted and returned 1 decoy with no errors.
ID: 55411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaBrat and DaBear

Send message
Joined: 9 Aug 08
Posts: 16
Credit: 213,180
RAC: 0
Message 55415 - Posted: 31 Aug 2008, 12:34:58 UTC
Last modified: 31 Aug 2008, 12:40:32 UTC

So I guess the best option would be to change my time preferences to 6 hours... that way they will all download with a 6 hour completion time amd if they complete in two I'll get a new WU somewhere in that window... But wait.... on the days when the server is short of work, I will simply be crunching empty space. Nah at least CPN will get SOME crunch time. Maybe three would be a better option... oh wait that is the default.

Right now she is running in panic mode for tasks due over a week from now.
ID: 55415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55421 - Posted: 31 Aug 2008, 15:41:28 UTC - in response to Message 55415.  

Right now she is running in panic mode for tasks due over a week from now.


This is why I suggest only changing runtime preference gradually. Sounds like you either had a lot of tasks downloaded before the new preference, or, your Rosetta resource share is low enough that the work would miss the deadline if not run in "high priority" (panic) mode. BOINC's best guess is that these tasks are in danger of missing their deadlines. But don't worry, it keeps track of the time used and pays back other projects once it gets caught up as compared to the deadlines.

If you approach the deadlines and still have too many tasks, you can set your preference back lower again. But you are right, some of them will find they complete early.

This is the first series of models I have ever seen that countdouwn as normal and use the ten minutes of remaining time to crunch 4 hours of calculations. I believe that is the cunfusion with these models.


This has been the expected behavior for over a year. That is why I would prefer to discuss the issue in a seperate thread from the release-specific issues. And tasks are released several times a month that have runtimes over 3 hours on most machines to complete a single model. But the number of such tasks released is usually fairly limited. The task you describe that ran with 6hr preference, that only completed one model, would have run for the same amount of time with any runtime preference lower then the time it took for that one model. And any time after the preferred runtime the client would have showed about 10minutes remaining.



Rosetta Moderator: Mod.Sense
ID: 55421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,366,007
RAC: 1,190
Message 55425 - Posted: 31 Aug 2008, 17:04:07 UTC - in response to Message 55411.  

I believe in you previous post you called me a 'lurker'. Shame on you. I do not believe anything I have posted qualifies me as a 'lurker'... one step up from a troll. I believe that my stats, with the short period of time I have been crunching this project qualifies me as a newb.


A lurker is someone who reads but does not post. Since you had already posted in the thread I was clearly not referring to you. I may have even been referring to myself:) I suppose on a purely social board some people might find lurkers creepy but here and on similar boards I imagine quite large numbers of people read the boards the gather information about the project and discover solutions to their technical problems, etc. without ever posting. They are lurkers and there is nothing disreputable or irresponsible about their behavior. In fact, many project boards have locked threads with titles such as "read here first" and "If you're new have a view". Who knows how many newbs come to the boards intending to post, find their answers in one of those threads and end up never posting at all. The project is actually encouraging newbs to behave as lurkers, not posters! (This is written in good humour and I hope you will read it with the same. At any rate, the term was neither directed at you nor used as an insult.)


I don't belive that your answer really addressed my question of premature task ending since th question was why does this only seem to happen at restart.


You are right and I acknowledged that in my post on the other thread. In none of your previous posts however did you mention finishes immediately following restarts (though other posters did). You only mentioned the timings and it wasn't clear to me that you understood that the information you have provided about the timings is not evidence of premature exits or stalled wus or in fact a problem of any kind.

Per Mod.Sense's request I'll make any further response in the other thread.

Snags
ID: 55425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R.L. Casey

Send message
Joined: 7 Jun 06
Posts: 91
Credit: 2,727,582
RAC: 6
Message 55777 - Posted: 15 Sep 2008, 16:17:40 UTC
Last modified: 15 Sep 2008, 16:20:15 UTC

A minor anomaly...
The BOINC Manager button 'Show Graphics' is inhibited for new 'AA2A' Work Units when viewing a 'localhost', and I expect that is desired or required due to the stated large size of the proteins. However, if viewing the same task over a remote connection (e.g., via port 31413), the button is not inhibited. In this case, if the button is clicked, a graphics window is generated that is blank except for the lines separating the various sections of the graphic. No adverse effects on the remote task were noted. FYI.
ID: 55777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4875
Credit: 4,503,902
RAC: 933
Message 55812 - Posted: 16 Sep 2008, 18:24:44 UTC

AA2A_4_modeling_1_AA2A_1_AA2A_2RH1_align_4467_2456_0
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2950319
ERROR:: Exit from: .pack.cc line: 5278

this ran 4961.234 seconds and died
ID: 55812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 55874 - Posted: 18 Sep 2008, 22:08:29 UTC

These long AA2A work units don't have checkpoints so any stop loses 3+ hours. Is this normal or is there a problem?

In the stdout file there is a checkpoint warning.

WARNING!! cant restore counts from checkpoint file:mc_checkpoint

ID: 55874 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2021 University of Washington
https://www.bakerlab.org