Boinc Manager 6.4.5 problem

Message boards : Number crunching : Boinc Manager 6.4.5 problem

To post messages, you must log in.

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 58954 - Posted: 21 Jan 2009, 1:10:34 UTC
Last modified: 21 Jan 2009, 1:13:47 UTC

I know this isn't the place to bring this up, but I'm not signed up anywhere else and I'm sure someone here can point me to a solution.

I run Boinc 6.4.5 on an AMD Quad Core. Rosetta is my only project. I set my default runtime to 4 hours and all my recent WUs have finished round about that time - very well behaved.

I set my work buffer to 2.5 days to cover for the recent outages but I've only currently got 6 WUs that aren't running (over and above the 4 that are running). Over the last 4 days I've completed 50 WUs, so I ought to have something like 25-30 waiting to run by my estimation. Isn't that right?

One odd thing is that the WUs waiting to run indicate 7h 49m to completion - double the 4 hours of default. Like I said, everything's running to time so they ought to show nearer 3h 55m or so.

Is this normal? Have I done anything wrong? Or is this just a weird bug?

Any advice appreciated.

Edit: No extra seconds of work are being requested and unfilled by the project. The manager has given me what it thinks I should have, it seems.
ID: 58954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dagorath

Send message
Joined: 20 Apr 06
Posts: 32
Credit: 29,176
RAC: 0
Message 58957 - Posted: 21 Jan 2009, 4:11:04 UTC - in response to Message 58954.  
Last modified: 21 Jan 2009, 4:13:21 UTC

It is normal, we've all seen it on our own systems. You haven't done anything wrong. It's not really a bug, it's due to inaccuracy in numbers that can only be estimated at best, and those inaccuracies multiplying into a greater inaccuracy in the 7h 49m duration estimates.

Along with each task it sends, the project sends a few numbers that are estimates of the number of CPU operations required to crunch the task. Your computer uses those numbers plus the benchmark numbers to calculate the duration estimates you see. The problem is that the neither the benchmark numbers nor the operations estimates are very accurate so the duration estimates aren't either.

BOINC uses another number called the Duration Correction Factor (DCF) for each project in an effort to correct the duration estimates based on actual run times. The DCF is a multiplier. It starts at 1 and either increases or decreases depending on whether the duration was longer or shorter than the estimate. The DCF will eventually rise/fall to where it should be and the duration estimates will become more accurate but you have to process several (many?) tasks for that to happen. Also, anytime you reset a project, the DCF resets to 1.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 58957 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58958 - Posted: 21 Jan 2009, 5:03:22 UTC

Sid, the Number Crunching board is a fine place for such a question.

The other thing is that the 6.4.5 version of the BOINC client has had some "work fetch" problems. So, that could be part of it too. It's not making estimates as well as it used to. They'll get that figured out, and the next Rosetta version will continue the consistent runtimes, and that will help DCF settle in etc. as well.

For now, you might cut your days between connections figure in half and put the other half over on the additional days of work. So the total of the two is still the same as it was. Or maybe bump it a little higher too.
Rosetta Moderator: Mod.Sense
ID: 58958 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,778,115
RAC: 2,646
Message 58962 - Posted: 21 Jan 2009, 10:38:17 UTC - in response to Message 58954.  

I set my work buffer to 2.5 days to cover for the recent outages but I've only currently got 6 WUs that aren't running (over and above the 4 that are running). Over the last 4 days I've completed 50 WUs, so I ought to have something like 25-30 waiting to run by my estimation. Isn't that right?
Edit: No extra seconds of work are being requested and unfilled by the project. The manager has given me what it thinks I should have, it seems.


I have been having the same thing happen and just upped my request to 4 days worth of work and all is fine now. I did it on that pc in the Advanced, Preferences, Network Usage section in Boinc itself. That way that pc got more work, not all of the others that seem to doing okay.

ID: 58962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 58963 - Posted: 21 Jan 2009, 11:40:54 UTC - in response to Message 58957.  
Last modified: 21 Jan 2009, 11:43:14 UTC

It is normal, we've all seen it on our own systems. You haven't done anything wrong. It's not really a bug, it's due to inaccuracy in numbers that can only be estimated at best, and those inaccuracies multiplying into a greater inaccuracy in the 7h 49m duration estimates.

Along with each task it sends, the project sends a few numbers that are estimates of the number of CPU operations required to crunch the task. Your computer uses those numbers plus the benchmark numbers to calculate the duration estimates you see. The problem is that the neither the benchmark numbers nor the operations estimates are very accurate so the duration estimates aren't either.

BOINC uses another number called the Duration Correction Factor (DCF) for each project in an effort to correct the duration estimates based on actual run times. The DCF is a multiplier. It starts at 1 and either increases or decreases depending on whether the duration was longer or shorter than the estimate. The DCF will eventually rise/fall to where it should be and the duration estimates will become more accurate but you have to process several (many?) tasks for that to happen. Also, anytime you reset a project, the DCF resets to 1.

Thanks, I think I get that about the DCF - I've seen it before with my old single-core machine, how it would get messed up by a long-running model, then ease its way back to a better estimate over several WUs. With the quad-core it seemed to ease itself a quarter of the way with each WU, which also makes some kind of sense.

But from my results page I can see 80 results over the last week all about the same time with literally only 3 barely over 4 hours. It should've worked its way out long ago.

Two corrections: First, the back-up WUs increased to 8 max before dropping down to 4 once the running WUs complete, then build up again. A bit better. Second, see below.

Sid, the Number Crunching board is a fine place for such a question.

The other thing is that the 6.4.5 version of the BOINC client has had some "work fetch" problems. So, that could be part of it too. It's not making estimates as well as it used to. They'll get that figured out, and the next Rosetta version will continue the consistent runtimes, and that will help DCF settle in etc. as well.

For now, you might cut your days between connections figure in half and put the other half over on the additional days of work. So the total of the two is still the same as it was. Or maybe bump it a little higher too.

I had no idea what my connections figure was, but it turns out to be 0. The only non-0, non-blank figure on the Network Usage tab is for the Work Buffer.

On the disk and memory usage tab I show 60% memory when the computer's in use and 90% when it's not. On the processor usage tab I set it (or it defaults) to use 100% of the processors on multiprocessor systems, but the figure that's now making me very suspicious is the Use at most CPU time, which is set at 50%.

Is that the default and does it make sense or would I have changed it to that? I can't recall. Should I change it now, given I never see any system slow down when I'm working here. I'm wondering whether this is causing the time to completion on WUs to be double what I expect, so the number of WUs I'm stacking is half what I expect for the number of days I set.

I have a System Monitor sidebar gadget which shows processor usage on all 4 cores which jumps up and down constantly. That seems to reflect the settings.

Oh, and yes, I do realise this makes no sense at all. If CPU time is 50% the time to completion should still be the same amount of actual crunching time, even if the wall-clock time is double.

I'll read the comments before just changing the figure to see what happens. TIA as usual.
ID: 58963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dagorath

Send message
Joined: 20 Apr 06
Posts: 32
Credit: 29,176
RAC: 0
Message 58965 - Posted: 21 Jan 2009, 16:29:02 UTC - in response to Message 58963.  
Last modified: 21 Jan 2009, 16:41:07 UTC

Sid Celery wrote:
I had no idea what my connections figure was, but it turns out to be 0. The only non-0, non-blank figure on the Network Usage tab is for the Work Buffer.


Yes, that's the proper way to do it if your computer is connected 24/7.

On the disk and memory usage tab I show 60% memory when the computer's in use and 90% when it's not. On the processor usage tab I set it (or it defaults) to use 100% of the processors on multiprocessor systems, but the figure that's now making me very suspicious is the Use at most CPU time, which is set at 50%.

Is that the default and does it make sense or would I have changed it to that? I can't recall. Should I change it now, given I never see any system slow down when I'm working here. I'm wondering whether this is causing the time to completion on WUs to be double what I expect, so the number of WUs I'm stacking is half what I expect for the number of days I set.


The default for the "use at most __ % CPU time" is 100%. If it's 50% then you must have changed it. That setting is often referred to as the throttle. The throttle doesn't work properly on multi-core computers and it should be left at 100% for now until they fix the throttle. On multi-core systems, anything less than 100 causes tasks to crash on various projects, not necessarily on Rosetta but maybe. I think you are probably right in saying it's causing your computer to stack less than half of the tasks you expect. If you set to 100% then you might need to wait a day or 2 for BOINC to adjust to the new setting and cache the number of tasks you expect.

As for the 60% when computer not in use and 90% when in use... it depends on how much RAM you have and whether or not you have the "leave apps in memory when suspended" setting set to YES, how many projects you run and the memory requirements of other apps you run while BOINC is running. I have both of those set to 100% on my computers even though they have 1 gig RAM or less BUT I don't leave apps in memory when suspended. You'll have to experiment with those to see what works best for you. Obviously the more RAM you can give to BOINC the better but if it slows your other apps down then you need to cut back on what you give to BOINC.

I have a System Monitor sidebar gadget which shows processor usage on all 4 cores which jumps up and down constantly. That seems to reflect the settings.

Oh, and yes, I do realise this makes no sense at all. If CPU time is 50% the time to completion should still be the same amount of actual crunching time, even if the wall-clock time is double.


Yes, I see what you mean and i'm not sure what to say about it other than boost the "use at most __ % CPU time" to 100% and see what happens.

P.S. What Mod.Sense says is true, BOINC 6.4.5 does have some scheduling issues, and maybe that's contributing somewhat to what you're seeing. I would stick with 6.4.5 for now and just boost the % CPU time back to 100% and give it at least 2 days to settle out. If you change too many things too fast it gets even crazier.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 58965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 58970 - Posted: 21 Jan 2009, 20:26:15 UTC - in response to Message 58965.  
Last modified: 21 Jan 2009, 20:26:57 UTC

Sid Celery wrote:
I had no idea what my connections figure was, but it turns out to be 0. The only non-0, non-blank figure on the Network Usage tab is for the Work Buffer.

Yes, that's the proper way to do it if your computer is connected 24/7.

It is. That's good.

On the disk and memory usage tab I show 60% memory when the computer's in use and 90% when it's not. On the processor usage tab I set it (or it defaults) to use 100% of the processors on multiprocessor systems, but the figure that's now making me very suspicious is the Use at most CPU time, which is set at 50%.

Is that the default and does it make sense or would I have changed it to that? I can't recall. Should I change it now, given I never see any system slow down when I'm working here. I'm wondering whether this is causing the time to completion on WUs to be double what I expect, so the number of WUs I'm stacking is half what I expect for the number of days I set.

The default for the "use at most __ % CPU time" is 100%. If it's 50% then you must have changed it. That setting is often referred to as the throttle. The throttle doesn't work properly on multi-core computers and it should be left at 100% for now until they fix the throttle. On multi-core systems, anything less than 100 causes tasks to crash on various projects, not necessarily on Rosetta but maybe. I think you are probably right in saying it's causing your computer to stack less than half of the tasks you expect. If you set to 100% then you might need to wait a day or 2 for BOINC to adjust to the new setting and cache the number of tasks you expect.

I've been fiddling in the past - it's entirely likely I messed with something I shouldn't have and I wonder now if this was the cause of some problems I had in the past. I'll set it to 100% straight away. Many thanks. I'll only report back again if I have difficulties or a slowdown I can't resolve.

As for the 60% when computer not in use and 90% when in use... it depends on how much RAM you have and whether or not you have the "leave apps in memory when suspended" setting set to YES, how many projects you run and the memory requirements of other apps you run while BOINC is running. I have both of those set to 100% on my computers even though they have 1 gig RAM or less BUT I don't leave apps in memory when suspended. You'll have to experiment with those to see what works best for you. Obviously the more RAM you can give to BOINC the better but if it slows your other apps down then you need to cut back on what you give to BOINC.

I wasn't so worried about my RAM settings. I mentioned that for completeness rather than anything else. I only run Rosetta - no other projects (yet) and I do leave apps in memory when suspended. I run Vista64 (not my best choice overall) but that allowed me to have 8Gb RAM (obviously good) so lack of RAM isn't a consideration, I'd hope. All points noted though - thanks again.

P.S. What Mod.Sense says is true, BOINC 6.4.5 does have some scheduling issues, and maybe that's contributing somewhat to what you're seeing. I would stick with 6.4.5 for now and just boost the % CPU time back to 100% and give it at least 2 days to settle out. If you change too many things too fast it gets even crazier.

Again noted. 6.4.5 has been very good for me, resolving several issues I previously had. I strongly suspect everything's going to come good now I've made this one change. I'm away until the weekend so I'll check the situation on Sunday.

Great help guys. Worth my while asking the question.
ID: 58970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dagorath

Send message
Joined: 20 Apr 06
Posts: 32
Credit: 29,176
RAC: 0
Message 58972 - Posted: 21 Jan 2009, 21:01:15 UTC - in response to Message 58970.  

I run Vista64 (not my best choice overall) but that allowed me to have 8Gb RAM (obviously good) so lack of RAM isn't a consideration, I'd hope.


Let's do the math. You're running only Rosetta at the moment. 60% of 8 GB will give Rosetta 4.8 GB. If you're running a quad-core then you can have up to 8 Rosetta apps in memory if your CPU has hyper-threading. 4.8 divided by 8 allows 600 MB per task. I don't know exactly how much RAM Rosetta tasks need but I doubt it's more than 100 MB.

Just for comparison, tasks from Superlink project need up to 1 GB apiece. 8 of those would eat all your RAM. The point is, when you add projects in the future, inquire at the project about RAM requirements or watch the task in Task Manager to see what it needs for RAM. Sometimes they need only 100 MB for 90% of the run time but have spikes up to 1 GB or more for brief periods.

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 58972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 58973 - Posted: 21 Jan 2009, 21:16:30 UTC - in response to Message 58972.  
Last modified: 21 Jan 2009, 21:17:03 UTC

Let's do the math. You're running only Rosetta at the moment. 60% of 8 GB will give Rosetta 4.8 GB. If you're running a quad-core then you can have up to 8 Rosetta apps in memory if your CPU has hyper-threading. 4.8 divided by 8 allows 600 MB per task. I don't know exactly how much RAM Rosetta tasks need but I doubt it's more than 100 MB.

My machine is an AMD, which either doesn't have HT or I don't know how to turn it on - I have 4 tasks running at a time anyway.

A quick look at Task Manager shows they use around 200Mb each with maybe a 20Mb overhead on other BOINC processes - about 800Mb total. Even allowing for occasional spikes I still ought to have plenty in hand. Physical memory available is pretty constant at 4.25Gb so it looks like I'm good. If not, I'd expect a lot of people to be shouting before me!

Thanks again for the hand-holding though. I've learned a lot.
ID: 58973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58978 - Posted: 22 Jan 2009, 2:59:48 UTC

AMD does not have HT. Intel only.

One of the things about using the local panel for settings is that if you have more than one computer each can get different settings.

The problem is that it is counter intuitive that clicking Ok causes you to start using local settings instead of the Internet settings (set here in your account) ...

If you do open the pane to see what the settings are, you have to click cancel to not start using the local settings. If you want to go back to the web settings (make sure they are what you want on the web site first) click clear at the top of the pane ... then do an update to make sure you have the latest version from the web site.
ID: 58978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 58979 - Posted: 22 Jan 2009, 7:09:17 UTC - in response to Message 58978.  

The problem is that it is counter intuitive that clicking Ok causes you to start using local settings instead of the Internet settings (set here in your account)...

If you do open the pane to see what the settings are, you have to click cancel to not start using the local settings. If you want to go back to the web settings (make sure they are what you want on the web site first) click clear at the top of the pane... then do an update to make sure you have the latest version from the web site.

Ok, this has had me confused before and there seems to be some problem I have in picking up the website settings. They don't seem to be taken up, even after pressing 'clear' then updating. Not to worry - I changed them both manually (online and offline) so they're now definitely the same and correct.

It seems to be working very quickly. Within 11 hours, completion times are already at 5h 47m from 7h 49 on their way down to 4h. I'm up to 14 WUs total from 8-12 and obviously WUs are all finishing more rapidly too, as I can tell from my rapidly rising RAC.

All told, great results thanks to the support I've had here. Every suggestion has helped in practical terms or in my understanding. Excellent.

I'm confident enough to go off on a tour to ensure my other team-mates are set-up ok - I've probably screwed up their machines too!
ID: 58979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dagorath

Send message
Joined: 20 Apr 06
Posts: 32
Credit: 29,176
RAC: 0
Message 58981 - Posted: 22 Jan 2009, 13:02:42 UTC - in response to Message 58979.  
Last modified: 22 Jan 2009, 13:03:07 UTC

All told, great results thanks to the support I've had here. Every suggestion has helped in practical terms or in my understanding. Excellent.


You're welcome. Your clear and concise writing style makes it easy to help you :)

I don't know if you've found your way to any of the BOINC documentation yet so I'll leave you with a few links, see the links in my sig below. The BOINC FAQ is an excellent place to start looking for solutions to problems you might run into from time to time. It also lists most of the error messages you can run into along the way. It explains what the error messages mean and how to fix the problem they point to. The wiki does a pretty good job of explaining how BOINC works, from the basics to more advanced topics, how to install and configure it, etc. Both resources are worth bookmarking in your browser.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 58981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,842,874
RAC: 22,979
Message 59037 - Posted: 26 Jan 2009, 14:51:15 UTC - in response to Message 58981.  

All told, great results thanks to the support I've had here. Every suggestion has helped in practical terms or in my understanding. Excellent.

You're welcome. Your clear and concise writing style makes it easy to help you :)

You're too kind. No really... Concise? If you say so! ;)

When I started with this machine my RAC was 'languishing' in the 500s. With 6.4.5 it headed up to the high 600s (less WUs errored out - now none). My last few days have been around 1600! That's how much of a difference it's made - a good thousand a day at the cost of a little well-directed button pressing. No need to say any more about how pleased I am.

I don't know if you've found your way to any of the BOINC documentation yet so I'll leave you with a few links, see the links in my sig below. The BOINC FAQ is an excellent place to start looking for solutions to problems you might run into from time to time. It also lists most of the error messages you can run into along the way. It explains what the error messages mean and how to fix the problem they point to. The wiki does a pretty good job of explaining how BOINC works, from the basics to more advanced topics, how to install and configure it, etc. Both resources are worth bookmarking in your browser.

I'm getting there slowly as I find time. I started with the current issues before looking through the stickies and I'll start on the documentation before long for the final tweaks. Hopefully I won't send myself up any more blind alleys this time.

A happy and appreciative customer signs off...
ID: 59037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Boinc Manager 6.4.5 problem



©2024 University of Washington
https://www.bakerlab.org