Posts by DaBrat and DaBear

1) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 56865)
Posted 12 Nov 2008 by DaBrat and DaBear
Post:
Rosetta/BOINC does not validate against partial results. It should.

The typical Rosetta task runs multiple decoys (each of which I believe is an *independent* simulation). I had such a task terminate because while calculating decoy 7 came it up with a NAN. The results from the correctly completed previous 6 decoys were discarded.

Looked in the 'Workunit Details' page and saw that another system was identified as successfully completing that same task. The catch -- it did only 5 decoys.

There is something fundamentally unfair when ALL the work from a system that did more crunching gets discarded, while accepting work from a system that crunched less.
.


I got the same thing on either machines that returned 7 decoys either a
NAN or validate error though no errors accounted for. Got one that ran sometime today over 9 hours 4G of memory on the machine and wasn't being used for anything else with a 3G dual core. Hope I get more than 9 credits for this one.
2) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 56863)
Posted 12 Nov 2008 by DaBrat and DaBear
Post:
This appeared to run smoothly but invalid.

http://boinc.bakerlab.org/rosetta/result.php?resultid=205965356


Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 871503
Report deadline 18 Nov 2008 2:48:02 UTC
CPU time 9476.889
stderr out <core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>

======================================================
DONE :: 1 starting structures 9476.44 cpu seconds
This process generated 7 decoys from 7 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 30.7157035506969
Granted credit 0
application version 1.40
3) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 56850)
Posted 11 Nov 2008 by DaBrat and DaBear
Post:
Nothing but the following... 8 plus hours run for 9 credits

http://boinc.bakerlab.org/rosetta/result.php?resultid=206158806
4) Message boards : Number crunching : Tasks end prematurely. (Message 55416)
Posted 31 Aug 2008 by DaBrat and DaBear
Post:
Argh! Thanks for the correction Mod.Sense. That "always under 3 hours" should be "almost always under 3 hours". I really should have caught that as my older ppc mac probably runs over its preferred runtime more than most.(Yep, just checked, my DCF is 1.18) I was thinking about this thread as I took a walk early this morning and by the time I got to the keyboard I was pretty focused on explaining why the "to completion" time doesn't match the estimate given by the project. On most if not all other projects this estimate is invisible to the cruncher and it's appearance/use here on Rosetta as the "target CPU run time" seems to lead to a fair amount of confusion. If you know about BOINC but not Rosetta or vice versa you could easily think something was wrong when in fact everything was operating just as it should.

WCG has some projects that run similar types of tasks in that they run as many models as they can within a set period of time. WCG has chosen 8 as the magic number and so far I have never seen one of those tasks run less than eight hours or more than a minute or so over eight hours. So either the runtime for each model is very very very short or the app calls for a finish at the eight hour mark regardless of the progress of the last model. I couldn't swear to it but I would bet on the latter explanation. If BOINC thinks the task will take eight hours and it takes eight hours the DCF will remain 1 and the "to completion" time (showing in the BOINC manager) for the next task will match the estimate (invisible to the cruncher) given to BOINC by the app. The "CPU time" and "to completion" columns will always(ur, maybe I should say almost always!) add up to eight and the percent in the "progress" column will (should) always make sense. But the trade off is the wasted cpu time spent working on that last model before being forced to abandon it by a strict task time limit. Given the many different types of tasks here on Rosetta, and the considerable variation in runtime per model I would guess that a strict task time limit here would result in a lot of wasted cpu time. Personally I prefer coping with the variation.


Thanks for the info on checkpoints. I've always been scared of poking about the BOINC files but I think I'd like to try this. Give me a week or so, to search through several forums for everything that could go wrong and then to build up my courage to try it anyway:)

Snags



I responded to your somewhat long winded and eloquent post : here
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=4213
5) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 55415)
Posted 31 Aug 2008 by DaBrat and DaBear
Post:
So I guess the best option would be to change my time preferences to 6 hours... that way they will all download with a 6 hour completion time amd if they complete in two I'll get a new WU somewhere in that window... But wait.... on the days when the server is short of work, I will simply be crunching empty space. Nah at least CPN will get SOME crunch time. Maybe three would be a better option... oh wait that is the default.

Right now she is running in panic mode for tasks due over a week from now.
6) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 55411)
Posted 31 Aug 2008 by DaBrat and DaBear
Post:
I believe in you previous post you called me a 'lurker'. Shame on you. I do not believe anything I have posted qualifies me as a 'lurker'... one step up from a troll. I believe that my stats, with the short period of time I have been crunching this project qualifies me as a newb.

I don't belive that your answer really addressed my question of premature task ending since th question was why does this only seem to happen at restart. And yes I use my comp for other things than rosie and the last task simply comnpleted itself when I logged back on to windows from linux at less than half the estimated time. Not the first time. Or are you suggesting that I just happen to need to log off when I have a short task? Those are almost lottery chances.

Further the reason the post was made here is if you will refer to the home page there was mention of task hanging at the end of processing for this particular model and the request was that any issues be posted 'here' in this thread.

Now this behaviour may be particular to Rosetta but most of the time, reagrdless of the complication of the model, The completion time is adjusted druring crunch... say the model say 2:540 to completion, it may not count down as quickly beccause of a time period overrun. This is the first series of models I have ever seen that countdouwn as normal and use the ten minutes of remaining time to crunch 4 hours of calculations. I believe that is the cunfusion with these models.

Not only that, but once these run into that hang problem. It default all remaing moidels to the finish time, my latest being 6+ hours' and causes Rosetta to go into panic mode usurping any other projects you may be working on while on the other hand taking days of crunhing and returning WUs under the defualt to get the overall estimated completion time back to normal.

Since my normal completion time on the comp is about 2:46, when Rosetta gets the 6 hours blues and defaults them all to that completion time (instead of something mid way or an average of completion time)... my other projects are kicked off their cores. At this rate, and the time it takes for rosie to get the normal processing time back for remaining WUs, nothing else will crunch on my machine if I run into a 5.98 every two days.


BTW the 6 hour WU attempted and returned 1 decoy with no errors.
7) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 55392)
Posted 29 Aug 2008 by DaBrat and DaBear
Post:
I am having the same issue with 5.98. This is the second WU that has progress to the exact time of 9:55 secs remaining and staed there for over an hour.. It seems to be progressing but as the previous person said at .001 ever 30 mins. Should I even bother or just kill it?


BTW my preerences are set at the default three hours and it was estimated to complete in 2:46 we are now into hour 4. All crunching that same 9:55 secs. It was crunching along as expected until it hit this time mark. I killed the last one
8) Message boards : Number crunching : Tasks end prematurely. (Message 55333)
Posted 27 Aug 2008 by DaBrat and DaBear
Post:
LOL neither is mine and this has nothing to do with that selection. If it is not selected it defaults to three hours. Even though the first batch that loaded to my comp said 6 hours for some odd reason. Whenever a new task is downloaded to your comp, it has an estimated time of completion I assume based on your CPU benchmarks.

I believe the previous poster's situation is the same as mine. At 10 mins before completion, even thought the task may not read 99% complete, it simply completes itself and uploads. Usually on mine it reads about 95% complete. No errors, no nothing. As a matter of fact I usually wind up with more credit granted than claimed asnd this behaviour seems to be specific to the minis. At least in my case. The task can have an estimated completion time of 2:50 but will end at 2:18 - 2:44 at less than 98% showing. Maybe it is BOINC reading that it doesnt have time for another decoy in the time remaining.
9) Message boards : Number crunching : Tasks end prematurely. (Message 55329)
Posted 27 Aug 2008 by DaBrat and DaBear
Post:
Nah usually rosie downloads task to my computer when I first attached with ridiculously long completion times such as 6.5 hours..... after a while they even out to about 2:50 a WU. It will show them all tat this completion time or less but no matter what the estimaed completion time... it ends and exits prematurely.

Creids are great usually more than I request when reported and no errors showing
10) Message boards : Number crunching : needs psipred_ss2 to run filters (mini 1.32 tasks) (Message 55326)
Posted 27 Aug 2008 by DaBrat and DaBear
Post:
I actually came here looking for the answer to this question myself... Maybe someone will be along shortly to enlighten us.
11) Message boards : Number crunching : Tasks end prematurely. (Message 55325)
Posted 27 Aug 2008 by DaBrat and DaBear
Post:
Not sure if this is issue or not. I have watched several tasks get exactly down to the 10 min marak and then terminate and report. Mini as well. The tasks always start wit a long completion time when I attached my comp. Even if the completion time was 6 hours to begin with, the task will sometimes hit 2:40 or less and report... not sure what that means
12) Message boards : Number crunching : Servers running, but no work available?? (Message 55324)
Posted 27 Aug 2008 by DaBrat and DaBear
Post:
Just when you thought it was safe to go back into the water...lol. I was here when this happened before. As a matter of fact I wound up here because of server issues wuth SETI. was waiting on my comp to finish crunching its last CPN and no work from Rosie. What IS a girl to do?
13) Message boards : Number crunching : Servers running, but no work available?? (Message 55149)
Posted 17 Aug 2008 by DaBrat and DaBear
Post:
Hopefully something will give soon... I am Crunching CPN on the same comp so its not my comp is sitting here idle. I will be patient. Heck my other comp is crunching SETI...lol at least when it can lately.
14) Message boards : Number crunching : Servers running, but no work available?? (Message 55108)
Posted 16 Aug 2008 by DaBrat and DaBear
Post:
Forgive my lack of knowledge, but is the "Target CPU Runtime" preference the option that changes how much work your client will retrieve? I just started running boinc with any sort of regularity lately. Thanks. :)



You would adjust tht setting in BOINC ubder 'Advanced' Preferences' Network Usage'... this is where you set you buffer for how many days work you would like.
15) Message boards : Number crunching : Why no R@h work downloading? I also run SETI (Message 55106)
Posted 16 Aug 2008 by DaBrat and DaBear
Post:
Learn something new everyday. I was actually browsing looking for the server issues and instead found out why I got that big 'no work' message from the server after one of my comps crunched nothing but Rosie for a few days during the SETI outage... thanks so much.
16) Message boards : Number crunching : Servers running, but no work available?? (Message 55105)
Posted 16 Aug 2008 by DaBrat and DaBear
Post:
Shoulda checked here fisrt...lol!!! Was wondering if CPN wqas keeping Rosie from sending me work.... both of my boxes will be out of work within the day. Ok for one since I was going to exclusively run Rosie on the other... but the other... not so thrilled.

Mybe its because of all of us SETI crunchers finding our way here over the past three weeks with the server issues over there.....lol






©2024 University of Washington
https://www.bakerlab.org