Problems with Rosetta version 5.93

Message boards : Number crunching : Problems with Rosetta version 5.93

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
csbyseti

Send message
Joined: 24 Dec 05
Posts: 11
Credit: 5,202,425
RAC: 5,894
Message 50959 - Posted: 25 Jan 2008, 8:20:08 UTC - in response to Message 50954.  

Ended by watchdog, and running beyond their runtime target are two rather different things.


I think i'll mean the same as FalconFly.
The 2h4o - WU's have got a Problem. I restartet one WU on the Quad, CPU-time jumps down to 1h:xx (last working Checkpoint?) and seems to be running. Finished with wrong CPU-Time of 11337 sec (3h:8) but with Heartbeat-error.

https://boinc.bakerlab.org/rosetta/result.php?resultid=135428650

On the X2 the CPU-Time jumps down to 0h:0x after restart (from 6h:59), seems to run but dont work anymore until the watchdog will stop it. This WU would konsum 4x3h + 6h:59 = 19h of CPU-Time.
If such a WU will be stopped and restarted because of the Scheduler and resets the CPU-Time it will be a never ending loop.
ID: 50959 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 50962 - Posted: 25 Jan 2008, 14:32:19 UTC

If such a WU will be stopped and restarted because of the Scheduler and resets the CPU-Time it will be a never ending loop.


The watchdog will catch such a thing and abort it for you. In this case, it would notice that the task was restarted 5 times from the exact same point. In other words, "I've started this thing 5 times and never reached a checkpoint, so I'm going to abort it".

The basic idea being that whatever it is about that task is not well suited to how you are using your computer, and so the watchdog ends it, reports it back and get another task, which will tend to have different behavior.
Rosetta Moderator: Mod.Sense
ID: 50962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50965 - Posted: 25 Jan 2008, 17:36:45 UTC
Last modified: 25 Jan 2008, 18:25:05 UTC

Here's a snapshot of what others might be describing about the 2h4o wus. This is on my wifes laptop which was set to 1 hour run time pref, but I changed it at somepoint last nite to 6 hours(note: I changed it before I knew about this one, her laptop is WAAAAY out in the dining room, which never sees meals on the table, so I'm seldom there). Either way, we're way past that. It's longest recorded decoy (out of 912 recorded) was a 1gida which lasted 16627 seconds (4.61 hrs). I suspended the other projects to see what happens with it.



[edit] after 1 hour run time the cpu time has progressed one hour, and the "% complete has progressed from 98.558 to 98.664, but the "to comp" has remained unchanged.
ID: 50965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50974 - Posted: 25 Jan 2008, 20:31:20 UTC

I can't edit after 60 min.

After 3 hours the cpu time seems right, % comp has progressed up to 98.846, and "to comp" has gone up one second to 00:09:54. Hmmm, at .1%/hour there's just 11 more hours to go making it 25 hours/decoy...gotta be a record. I'll not post again until it's nearly over. (yes...I know 98.848% seems like it'd be nearly over...LOL)
ID: 50974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 50975 - Posted: 25 Jan 2008, 21:14:45 UTC

Astro, the estimates are based on the time to completion as compared to the target runtimg... except for the final 10min to completion. They make increasingly fine adjustments to show things are still moving forward, but the client really doesn't know how long that model is going to take.

Once the model completes your time to completion will zip from whereever it was near 10min to zero. You can't take a .1% adjustment and extrapolate that into a prediction on final time to completion. The last 10-12min of the time to completion do not work that way. And the time prior to that is just based on the time spent, as compared to target runtime. So, until you've completed a model on that task, there really isn't a great method to arrive at a true predicted time to completion. For most tasks, which take less then an hour per model, this method works fairly well. These 6+hr per model tasks are basicaly the worst case for the time estimate calculations.
Rosetta Moderator: Mod.Sense
ID: 50975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50976 - Posted: 25 Jan 2008, 21:28:14 UTC - in response to Message 50975.  

These 6+hr per model tasks are basicaly the worst case for the time estimate calculations.

I do think somethings wrong.

For the last 3 hours, it's progressed .1%/hour. If that were true for the full length of the wu, and given I'm 15 hours into it, then I should only be at 1.5% complete. At some point the "% comp" had to have progressed faster, and then at some point went into slow motion mode. I'm aware of how the "to comp" works and have NO issue with that. Also, If I'd had a 1hr, 2 hr, 3 hr, and shortly a 4 hour run time preference, then this one would have been ended by the watchdog. Ofcourse, I'm assuming it'll finish at all. If the .1%/hour holds, then a 6 hour pref would have been ended by the watchdog (have to wait and see total run time before I can say that definitively).

I guess, If these are really that long, then admin should change the % comp mechanism, and say something about having some "unusually LARGE" wus in the system ATM. Otherwise, you're going to get alot of questions and who knows how many users will "abort" just because they don't know it might be "normal".

Heck, I feel that I'm doing them a favor even running it as my gut feeling (without admin acknowledgement that this is normal) is I'm going to get nada for a days work.
ID: 50976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,263,150
RAC: 144
Message 50977 - Posted: 25 Jan 2008, 21:49:18 UTC

Another 24ho oddity - I had this one, work unit 123329393:

2h4o__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK-2h4o_-native__2668_12846

that ran 12.53 hours of CPU time! My outcome says "Success" and "Done" (and I got plenty of credit for it) - but when I look at the details, I see:

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 1745555
# cpu_run_time_pref: 10800
# random seed: 1745555
# cpu_run_time_pref: 10800
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 45123.3 seconds. Greater than 4X preferred time: 10800 seconds
**********************************************************************
GZIP SILENT FILE: .xx2h4o.out

But it shows a "Validate state" of VALID. I certainly am not complaining about the credit, but how can it be done and valid if Watchdog shut it down??
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 50977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50981 - Posted: 25 Jan 2008, 22:48:34 UTC
Last modified: 25 Jan 2008, 22:54:53 UTC

THat's kind of my point. It was ended by the watchdog at 4X his/her 3 hour run pref at 12 hour + a bit. The Task ID shows NO decoy info at all. Was any scientifically worthwhile work performed? Or is it just credit for time served?? This is going to be very typical of all participants except those with a "cpu run time pref" exceeding 8-12 hours(depending on processor, etc).

and now that I think about my one wu and my 6 hour run time. It looks like I should bump that up to the next step above 6hrs or suffer the same fate as everyone else. It's at 99.005 percent after 16:38:00, so is holding to the .1%/hour.

[edit] moved up to 8 hour pref
ID: 50981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,263,150
RAC: 144
Message 50982 - Posted: 25 Jan 2008, 23:17:18 UTC

What's the default on CPU runtime limits? I looked at my settings and target CPU runtime is "not selected", so I have whatever you get when you don't specify. Do we have a consensus on what it should be?? I'm a little confused.
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 50982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 50983 - Posted: 25 Jan 2008, 23:35:41 UTC - in response to Message 50982.  

What's the default on CPU runtime limits? I looked at my settings and target CPU runtime is "not selected", so I have whatever you get when you don't specify. Do we have a consensus on what it should be?? I'm a little confused.

i got mine set for 1 day however i think it defaults to 3 hours
ID: 50983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50984 - Posted: 25 Jan 2008, 23:39:27 UTC
Last modified: 25 Jan 2008, 23:41:44 UTC

yes, the "not selected" is the default of 3 hours.

also, that's set on the "web" so your client must "call home" in order to see and apply the change. This happens when it gets/reports work. But, if you want it to change in the middle of a run, you must do a "project update". You can manually update the projects from the "projects" tab on the manager. Highlight the project name in the right hand box by clicking on it. Then click the "update" button to the left.
ID: 50984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50986 - Posted: 26 Jan 2008, 11:16:21 UTC
Last modified: 26 Jan 2008, 11:32:07 UTC

Here's an updated pic taken 17 hours + later(from the first pic posted) of the same wu. I changed my runtime pref to 8 last nite, but that only gets me up to 32 hours before the watchdog kicks in. Perhaps I'll go to 12 hour pref so it'll be able to finish normally as long as it doesn't take more than "48 HOURS to do ONE decoy" on a Mobile AMD64 3700 w/1 G ram. Boy, If the others I have in cache take anywhere near this long....all my Boincsimap, Einstein, and the rest of my rosetta will be past the deadline.

Also notice the rate of completion seems to be continually slowing (atleast I assume so) since it's only progressed .3% overnite while I slept, instead of the .1%/hour I was seeing. At 28 + hours, this decoy has already taken more than 6 times it's previous "longest decoy".




[Edit] I went to 12 hours "run time pref", so hopefully it'll finish in the next 19 hours.
ID: 50986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 50987 - Posted: 26 Jan 2008, 12:37:22 UTC - in response to Message 50986.  



[Edit] I went to 12 hours "run time pref", so hopefully it'll finish in the next 19 hours.


You really are on a mission to find out how long it will take.
Go for it, Astro!
ID: 50987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50988 - Posted: 26 Jan 2008, 12:48:43 UTC - in response to Message 50987.  
Last modified: 26 Jan 2008, 12:55:36 UTC



[Edit] I went to 12 hours "run time pref", so hopefully it'll finish in the next 19 hours.


You really are on a mission to find out how long it will take.
Go for it, Astro!

Someone's got to be the guinea pig. "show me the little plastic wheel, and I'll take her for a spin".

I'll take this posting opportunity for an update:

CPU time 30:21:01, 99.453% complete, 00:09:56 remaining, using the benchmark claiming method, this wu is worth 429.75 credits so far. I wonder....
ID: 50988 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50991 - Posted: 26 Jan 2008, 13:48:16 UTC
Last modified: 26 Jan 2008, 13:48:37 UTC

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?
ID: 50991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 50992 - Posted: 26 Jan 2008, 15:19:40 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 50993 - Posted: 26 Jan 2008, 15:20:27 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 50994 - Posted: 26 Jan 2008, 15:21:11 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 50995 - Posted: 26 Jan 2008, 15:24:42 UTC

sorry for the triple-post. I had some problems with my connection.
ID: 50995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50997 - Posted: 26 Jan 2008, 15:47:14 UTC - in response to Message 50995.  
Last modified: 26 Jan 2008, 16:32:51 UTC

sorry for the triple-post. I had some problems with my connection.

up to 461 credits now. LOL

Say, you do know that you can "edit" your posted messages as long as you do so within 60 min of the original post. You should see an "edit" box on each of your previous posts. You could (only if you wanna) delete everything and just put "deleted" or some other message into all but the intended one. At that point a nice moderator might come along and hide those extra posts. Anyway, just wanted you to know. Hope you enjoy the rest of the weekend

32:49:08 cpu time, 99.494% complete with 00:09:57 remaining.

[edit] made a progress chart. Given the curve, I doubt it'll ever finish.

ID: 50997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Problems with Rosetta version 5.93



©2024 University of Washington
https://www.bakerlab.org