Message boards : Number crunching : GIANT work unit
Author | Message |
---|---|
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
I saw a similar thread but my problem is the opposite, so I'll put it here. I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount! On my Athlon 3500+ the work unit has been going for 5 hours, and has 7 left to go, according to BOINC. However, it's been stuck at 1% completion since last night. Should I cancel it or let it go? |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
I saw a similar thread but my problem is the opposite, so I'll put it here. Try to exit your BOINC manager and start it again. This sounds like a "hang" and can be jumpstarted by doing this! Try this instead of aborting it. [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
Ib Rasmussen Send message Joined: 27 Sep 05 Posts: 16 Credit: 211,416 RAC: 0 |
I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount! I had one of those on a 3.1GHz P4 under BOINC 4.45. I aborted it after 7 hours, when it still said 1% done. |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount! Hej Ib, og velkommen! These WU's can be saved by exit the BOINC client and starting it again. So sometimes they can be saved instead of aborting them. [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
Thanks for the advice, Fuzzy. I hate to abort work units. I exited BOINC and started it back up again. I'll report back in the morning to see if it's still at 1%. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Matt, So far (knock wood), all that have "stuck" at 1% did successfully restart and complete ... Why they get stuck, well, I have nary a clue. They consume CPU as expected so they don't show up in BOINC View as a problem, other than having to look at the % done and if it stays 1%, well, time to kick it ... :) |
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
It looks like the restart fix worked. I hope this isn't a pattern, however, because I don't keep a daily watch on several of my BOINC computers. Does the "random" word in the workunit name mean anything specific? |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Yes, without order ... :) If I understood David Kim, they are using our resources to work on several different "scatter" techniques. The sub-text is that, as with LHC@Home, they project has a little "giddiness" from the wealth of resource at hand. But, in this case Rosetta@Home seems a little more aggressive (to me at least) in exploiting the wealth. The problem with all search is how to cover the area, avoid the traps of "local minimas" (when looking for the minima - if you don't hunt far enough past the trough, you can think that the local minima is the global one ...), and yet not waste resources. I am still struggling with the input space to be searched, and again from other descriptions, it is a multi-dimensional space that we see as an output in only two dimensions. So, the "map" we have been shown is definately not the territory. For that, I am looking for some research material that might help me visualize the input space at least ... |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,359 RAC: 10 |
I had one WU on my Mac that "stuck" at 1%. Restarted BOINC, it again went to 1% and sat. Finally aborted it. Had another that I thought was stuck, but after a couple of hours, it moved... eventually ran all night, around 8 hours, before completing. When the others are running 20 minutes to an hour, this is rather extreme variation! But, of course, when it did complete, it got a ton of credit... The only real problem with the variance in run times is the difficulty in BOINC maintaining an accurate correction factor; on the Mac for Rosetta, after the monster WU, it's at 2.236239 - backwards from the normal 0.8 or so that machine has on all the other projects... it seems the Mac has gotten quite a few "big" WUs (but not THAT big), while my Athlon has gotten a large percentage of very quick WUs. Since I only joined just over a day ago, maybe it'll all average out in time. |
madmaxventi Send message Joined: 27 Sep 05 Posts: 5 Credit: 98,591 RAC: 0 |
hello, my wu stand by 5h45min and 8,33% restart of the boinc manager, time to comletion 60h34min !! the Pc is a 700mhz amd with Fedora core 4 is this normal?? or resett the Wu? sorry for the bad english i'm a german people ;) |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,359 RAC: 10 |
I haven't had another "giant", but I now have enough WUs completed to be puzzled/concerned about something... My AMD/Windows box has done 93 WUs, average time 1,432 seconds. (Shortest 513, longest 5002.) My Mac Mini, normally about half the speed, in that same time has done _8_ WUs, average time 13,801 seconds. That's almost TEN TIMES slower. (Shortest was 5,349, longest 27,710.) Now, either I've gotten incredibly unlucky and gotten 8 'pretty large' WUs on the Mac, while getting 93 'pretty small' WUs on the PC, or the Mac Rosetta application is pretty bad... The Mac has contributed about 40% of my credit, so that's not a big issue - but if the problem _IS_ the Mac app, and the project wants to get the most work done possible, then the Mac app needs to be looked at. I've attached a G3 laptop, just to see if it's a G4-specific issue, but that's a very slow machine anyway, the first WU won't even be done for 10 more hours... I was originally thinking that this would be a good project to run on it as the WUs were small... |
madmaxventi Send message Joined: 27 Sep 05 Posts: 5 Credit: 98,591 RAC: 0 |
ok the wu is ready and uploadet ;) 35,572.28 sek. work |
Divide Overflow Send message Joined: 17 Sep 05 Posts: 82 Credit: 921,382 RAC: 0 |
... I was originally thinking that this would be a good project to run on it as the WUs were small... Bill, there can be dramatic variation in the time it takes to complete different WU's. Your AMD/Windows box has been getting a lot of the quick WU's and your Mac has been getting some of the longer ones. This will average out over the long run. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,359 RAC: 10 |
I've attached a G3 laptop, just to see if it's a G4-specific issue, but that's a very slow machine anyway, the first WU won't even be done for 10 more hours... Well, that "10-hour" WU on the G3 ran for 22 hours. :-( The AMD average for 127 WUs is 25 minutes (1,531 sec), 17.4 credits/hr even with an average credit of 7.38. The G4 average for 12 WUs is 3:46 (13,602 sec), 14.3 credits/hr, in spite of an average credit of 54.09! The G3... well, with 1 WU, it's 22:01 (79,264 sec), 3.9 credits/hr. If/when the G3 ever finishes the second WU it downloaded, I'll detach it from the project. I'll leave the G4 attached for a while with a low resource share - maybe at some point the number of "smaller" WUs will even out on it, 13,602 sec average is definitely a lot better than my first couple of WUs (27,710!), but I still think the Mac's performance is not "right" compared to other projects - I suspect there is a LOT of improvement that could be made in Rosetta's Mac application. The Mac Mini is slower than an overclocked 3700+, generally about 2-2.5X slower - but not 8-10X slower! (On SETI, optimized apps on both, the AMD avg is 3,400 sec, the Mini 8,802 sec.) My 3700 obviously loves Rosetta. I'm seriously considering making this the "main" project for that, my fastest machine... I'll record here the duration correction factor for the three machines, which is probably a fairly good indicator of the performance of the science application vs. the benchmarks... AMD 3700+ (WinXP) - 0.37 Mac Mini G4 (OS X) - 1.5 iBook G3 (OS X) - 2.2 !!!! |
The Pirate Send message Joined: 22 Sep 05 Posts: 20 Credit: 7,090,933 RAC: 0 |
|
Message boards :
Number crunching :
GIANT work unit
©2024 University of Washington
https://www.bakerlab.org