Message boards : Number crunching : 600,000 second/165 Hour/7 day WU!!!
Author | Message |
---|---|
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
I was looking through my recent boinc history because I havn't checked it in awhile and noticed that one of the WU's took almost 600,000 seconds to complete!!! It had a stop error of max cpu time exceeded. This WU should have stopped LONG before the almost 7 whole days it took. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=8103040 The wierdest part about it is that another person completed this WU in 2,434.05 seconds with no error. I would like to know if I am going to get my 2,175.86 credit that was claimed by this WU. :( |
Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0 |
I don't know why this happened, but the task errored out so there will be no credit granted. BOINC WIKI BOINCing since 2002/12/8 |
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
Can I please get a professional opinion on this? I would really like to know why it happened. |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
It looks like a MCTE problem so I assume you could report it here https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1008 Perhaps time for D.B. to report something about this problem ? Credits or not ???? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
It was most likely the 1% bug. It can also happen if you do not keep the app in memory when preempted and your client changes projects before the work unit is able to make it's first prediction. If this was the case, you can prevent it by selecting in your general preferences "Leave applications in memory while preempted?" to yes or by setting "Switch between applications every" to at least two hours or even more. I'll grant credit for this extreme circumstance. We may consider granting credit for all time out errors in the future. |
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
I'll grant credit for this extreme circumstance. We may consider granting credit for all time out errors in the future. Thank you very much. |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
|
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
I wonder, is there any way to tell whether or not I have the record for longest WU? |
Darren Send message Joined: 6 Oct 05 Posts: 27 Credit: 43,535 RAC: 0 |
Had it on a machine only running R@H 24/7 (no switching or something else) and still don't know why it happened. Keep in mind that if you don't leave applications in memory, even with only one project they still get removed when boinc automatically runs the benchmarks. Perhaps that's what got you on that one. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Had it on a machine only running R@H 24/7 (no switching or something else) and still don't know why it happened. This is correct. As far as the having the record for longest WU, I am afraid not. There have been larger ones. Usually this happens wonly on "Launch and forget" systems. Systems that are attended do not have this problem often, abcause people intervene. The New application should help prevent this. Moderator9 ROSETTA@home FAQ Moderator Contact |
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
I'll grant credit for this extreme circumstance. We may consider granting credit for all time out errors in the future. I am really wondering why I never actually received credit for this work unit, even after being promised it would be granted to me... |
James Send message Joined: 27 Mar 06 Posts: 4 Credit: 23,809 RAC: 0 |
I'll grant credit for this extreme circumstance. We may consider granting credit for all time out errors in the future. Change your max timeout settings, perhaps using tux's xml script (not the OPTIMIZED client, the 'calibration' client that won't artificially inflate your benchmarks) that comes with his boinc client. This should have been 'killed' way before 600k seconds. For example, Rosetta runs 120 minute work units. I 'kill' all WUs that do not complete after 145 minutes. You can 'tweak' your preferences:) As for the credit issue, I have sympathy because I have participated in the climate projects and had unrecoverable errors at 50+ percent ( you know the MASSIVE as in WEEKS/MONTHS WUs). I did get credit though. Change your settings so you don't have it happen again. This part isn't addressed to you: Credit should be granted for 'real' processor usage. Rosetta, unlike say Einstein, does not calibrate WU times. It's getting to be pretty sickening in general because there are 3800s/2.+ghz machines that are claiming massive amounts of credits based upon unreal benchmarks. I overclock my 4800 from a stock 2.4ghz to 2.7ghz for each core and I know that a 3800 can't get 3 times my floating and integers:) The same is true for the 2ghzs intels that are doing the same thing. I'm not necessarily upset about the 'cheating' but it encourages others to do the same and it creates almost amusing benchmarks on the top computers pages. The 1 percent error is annoying - so is the fact that Rosetta has yet to incorporate a calibration feature like, say, Einstein that grants credit where credit is deserved, not manipulated artificially. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
For example, Rosetta runs 120 minute work units. I 'kill' all WUs that do not complete after 145 minutes. You can 'tweak' your preferences:) James, Rosetta has adjustable run-time WUs, where it keeps creating as many new "predicted models" from the same "raw protein data" as will fit in your time settings. Currently, default is only 2hr, but the max WU runtime is 24hr (and used to be 4 CPU days = 96 hours, before the project reduced it, so they can give a time-to-live of 24hr for every WU). While 120min is the default, many people run every WU for much longer, e.g. I use 8hr myself. Also, I have to add that I've been crunching Rosetta on 3 P4 PCs for the past 3 months and I've had just ONE case of the 1% bug sofar on WinXP (plus some problems 2.5months ago on a massively underspec'ed Linux, which have since been solved).
Since you keep mentioning Einstein as a model to follow, where did you read that they do this kind of calibration? (web address please). My BOINC massively underclaim credits (as using akosf's app my PCs complete a WU in 1/4th of the time it used to take) for Einstein. From looking at my results, Einstein just uses a quorum of 3 and grants the credit of the middle claim e.g. wu6428418. My BOINC's claim was for 13.99 credits, someone else's 56 and we all 3 received the middle one of 41 credits. A project which is using quorum of 3,4 etc is effectively slashing effective CPU speed available to 1/3rd or 1/4th of donated CPU speed. I see this as an ultimate waste of donated resources and personally have stopped crunching for projects which did this just to appease credit-obsessed people, unless there were a valid science reason. Anyway, afaik the "credit calibration" feature you mentioned is used in SETI-Beta and I hope Rosetta and other projects will use it as soon as it goes mainstream. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Michael Kirberger Send message Joined: 16 Dec 05 Posts: 1 Credit: 11,041 RAC: 0 |
Hello, I think, I have such a WU, too. After 2:28 h there is only 1.40 % of the work done. If there will be no faster Progress, i will need between 500 an 600 h to complet this WU (I think, yesterday I had 1,56 % after 5 h, but I did not reach a checkpoint, so the work starts again, but now I will run the Computer 24 h until this WU is done). The WU is 7486_largescale_large_fullatom_relax_dec7486_1_02_9.pdb_435_36_0. Should I compute or abort this WU? If I compute, I want the credits for this work :-). Bye Michael Kirberger |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
Hello, No, you don't have to abort this WU. Look in this thread about these big molecules. It will run for a long time on about 1.5 - 2 % and then it will finish in a snap. You'll need to let it stay in memory and don't shut your computer down while running it. I did that myself last night, and now I'm back to zero with the one I have in my cache at the moment. :-( [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
Message boards :
Number crunching :
600,000 second/165 Hour/7 day WU!!!
©2024 University of Washington
https://www.bakerlab.org