GIANT work unit

Message boards : Number crunching : GIANT work unit

To post messages, you must log in.

AuthorMessage
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 1727 - Posted: 25 Oct 2005, 12:06:03 UTC

I saw a similar thread but my problem is the opposite, so I'll put it here.

I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount!

On my Athlon 3500+ the work unit has been going for 5 hours, and has 7 left to go, according to BOINC. However, it's been stuck at 1% completion since last night. Should I cancel it or let it go?
ID: 1727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 1728 - Posted: 25 Oct 2005, 12:09:41 UTC - in response to Message 1727.  

I saw a similar thread but my problem is the opposite, so I'll put it here.

I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount!

On my Athlon 3500+ the work unit has been going for 5 hours, and has 7 left to go, according to BOINC. However, it's been stuck at 1% completion since last night. Should I cancel it or let it go?


Try to exit your BOINC manager and start it again. This sounds like a "hang" and can be jumpstarted by doing this!

Try this instead of aborting it.



[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 1728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ib Rasmussen

Send message
Joined: 27 Sep 05
Posts: 16
Credit: 211,416
RAC: 0
Message 1733 - Posted: 25 Oct 2005, 15:19:43 UTC - in response to Message 1727.  

I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount!


I had one of those on a 3.1GHz P4 under BOINC 4.45. I aborted it after 7 hours, when it still said 1% done.


ID: 1733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 1734 - Posted: 25 Oct 2005, 15:26:27 UTC - in response to Message 1733.  

I have a work unit with RANDOM in the title, but rather than taking a SMALL amount of time it's taking a HUGE amount!


I had one of those on a 3.1GHz P4 under BOINC 4.45. I aborted it after 7 hours, when it still said 1% done.



Hej Ib, og velkommen!

These WU's can be saved by exit the BOINC client and starting it again. So sometimes they can be saved instead of aborting them.



[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 1734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 1743 - Posted: 25 Oct 2005, 19:16:34 UTC

Thanks for the advice, Fuzzy. I hate to abort work units.

I exited BOINC and started it back up again. I'll report back in the morning to see if it's still at 1%.
ID: 1743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 1755 - Posted: 25 Oct 2005, 21:10:20 UTC

Matt,

So far (knock wood), all that have "stuck" at 1% did successfully restart and complete ...

Why they get stuck, well, I have nary a clue. They consume CPU as expected so they don't show up in BOINC View as a problem, other than having to look at the % done and if it stays 1%, well, time to kick it ... :)
ID: 1755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 1758 - Posted: 25 Oct 2005, 22:12:58 UTC

It looks like the restart fix worked. I hope this isn't a pattern, however, because I don't keep a daily watch on several of my BOINC computers.

Does the "random" word in the workunit name mean anything specific?
ID: 1758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 1776 - Posted: 26 Oct 2005, 7:21:33 UTC

Yes, without order ... :)

If I understood David Kim, they are using our resources to work on several different "scatter" techniques.

The sub-text is that, as with LHC@Home, they project has a little "giddiness" from the wealth of resource at hand. But, in this case Rosetta@Home seems a little more aggressive (to me at least) in exploiting the wealth.

The problem with all search is how to cover the area, avoid the traps of "local minimas" (when looking for the minima - if you don't hunt far enough past the trough, you can think that the local minima is the global one ...), and yet not waste resources.

I am still struggling with the input space to be searched, and again from other descriptions, it is a multi-dimensional space that we see as an output in only two dimensions. So, the "map" we have been shown is definately not the territory.

For that, I am looking for some research material that might help me visualize the input space at least ...
ID: 1776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 1804 - Posted: 26 Oct 2005, 19:18:10 UTC

I had one WU on my Mac that "stuck" at 1%. Restarted BOINC, it again went to 1% and sat. Finally aborted it. Had another that I thought was stuck, but after a couple of hours, it moved... eventually ran all night, around 8 hours, before completing. When the others are running 20 minutes to an hour, this is rather extreme variation! But, of course, when it did complete, it got a ton of credit...

The only real problem with the variance in run times is the difficulty in BOINC maintaining an accurate correction factor; on the Mac for Rosetta, after the monster WU, it's at 2.236239 - backwards from the normal 0.8 or so that machine has on all the other projects... it seems the Mac has gotten quite a few "big" WUs (but not THAT big), while my Athlon has gotten a large percentage of very quick WUs. Since I only joined just over a day ago, maybe it'll all average out in time.

ID: 1804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
madmaxventi

Send message
Joined: 27 Sep 05
Posts: 5
Credit: 98,591
RAC: 0
Message 1822 - Posted: 27 Oct 2005, 6:15:09 UTC

hello, my wu stand by 5h45min and 8,33%
restart of the boinc manager, time to comletion 60h34min !!

the Pc is a 700mhz amd with Fedora core 4
is this normal??

or resett the Wu?

sorry for the bad english i'm a german people ;)

ID: 1822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 1829 - Posted: 27 Oct 2005, 10:20:14 UTC

I haven't had another "giant", but I now have enough WUs completed to be puzzled/concerned about something...

My AMD/Windows box has done 93 WUs, average time 1,432 seconds. (Shortest 513, longest 5002.)

My Mac Mini, normally about half the speed, in that same time has done _8_ WUs, average time 13,801 seconds. That's almost TEN TIMES slower. (Shortest was 5,349, longest 27,710.) Now, either I've gotten incredibly unlucky and gotten 8 'pretty large' WUs on the Mac, while getting 93 'pretty small' WUs on the PC, or the Mac Rosetta application is pretty bad...

The Mac has contributed about 40% of my credit, so that's not a big issue - but if the problem _IS_ the Mac app, and the project wants to get the most work done possible, then the Mac app needs to be looked at. I've attached a G3 laptop, just to see if it's a G4-specific issue, but that's a very slow machine anyway, the first WU won't even be done for 10 more hours... I was originally thinking that this would be a good project to run on it as the WUs were small...

ID: 1829 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
madmaxventi

Send message
Joined: 27 Sep 05
Posts: 5
Credit: 98,591
RAC: 0
Message 1830 - Posted: 27 Oct 2005, 10:59:44 UTC

ok the wu is ready and uploadet ;)
35,572.28 sek. work


ID: 1830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 1842 - Posted: 27 Oct 2005, 15:48:25 UTC - in response to Message 1829.  

... I was originally thinking that this would be a good project to run on it as the WUs were small...

Bill, there can be dramatic variation in the time it takes to complete different WU's. Your AMD/Windows box has been getting a lot of the quick WU's and your Mac has been getting some of the longer ones. This will average out over the long run.
ID: 1842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 1921 - Posted: 29 Oct 2005, 19:10:00 UTC - in response to Message 1829.  

I've attached a G3 laptop, just to see if it's a G4-specific issue, but that's a very slow machine anyway, the first WU won't even be done for 10 more hours...


Well, that "10-hour" WU on the G3 ran for 22 hours. :-(

The AMD average for 127 WUs is 25 minutes (1,531 sec), 17.4 credits/hr even with an average credit of 7.38. The G4 average for 12 WUs is 3:46 (13,602 sec), 14.3 credits/hr, in spite of an average credit of 54.09! The G3... well, with 1 WU, it's 22:01 (79,264 sec), 3.9 credits/hr.

If/when the G3 ever finishes the second WU it downloaded, I'll detach it from the project. I'll leave the G4 attached for a while with a low resource share - maybe at some point the number of "smaller" WUs will even out on it, 13,602 sec average is definitely a lot better than my first couple of WUs (27,710!), but I still think the Mac's performance is not "right" compared to other projects - I suspect there is a LOT of improvement that could be made in Rosetta's Mac application. The Mac Mini is slower than an overclocked 3700+, generally about 2-2.5X slower - but not 8-10X slower! (On SETI, optimized apps on both, the AMD avg is 3,400 sec, the Mini 8,802 sec.)

My 3700 obviously loves Rosetta. I'm seriously considering making this the "main" project for that, my fastest machine...

I'll record here the duration correction factor for the three machines, which is probably a fairly good indicator of the performance of the science application vs. the benchmarks...

AMD 3700+ (WinXP) - 0.37
Mac Mini G4 (OS X) - 1.5
iBook G3 (OS X) - 2.2 !!!!

ID: 1921 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The Pirate
Avatar

Send message
Joined: 22 Sep 05
Posts: 20
Credit: 7,090,933
RAC: 0
Message 1964 - Posted: 31 Oct 2005, 1:15:10 UTC

I've had a couple that stuck at 1%. Stopping Boinc and restarting has worked for me.

ID: 1964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : GIANT work unit



©2024 University of Washington
https://www.bakerlab.org