reserve wu's not downloading

Message boards : Number crunching : reserve wu's not downloading

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 46869 - Posted: 23 Sep 2007, 21:45:16 UTC

someone want to explain why the scheduler is ignoring my settings?
I have connect every 1 day and keep 7 days extra work.
since my initial download of 5.80 after the server crash the reserve keeps going down and no new work is downloaded. there are no messages saying no work, yet I have 4.16 days of work left at maximum 6 hrs run time per work unit.

in my other thread, i saw something about the scheduler will download new work at the last minute. that should not be happening when you prefs are set for 7 days reserve. there should always be a 7 day reserve of work, at least logically. or is the scheduler wanting to work through all the work and then download new work when 7 days is up? I don't understand the settings. Prior to the crash it always kept the reserve up to date so that there was always a queue of work to due even during the outage. so why is 5.80 not doing that as well?

or is all this a BOINC problem? is 5.10.20 having a brain fart again?
ID: 46869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46870 - Posted: 23 Sep 2007, 23:28:36 UTC
Last modified: 23 Sep 2007, 23:29:12 UTC

i'd like to see this answer.

i will be out of state for 1 weeks, and unfortunately, the network my crunchers are on goes off-line with some frequency.

i'd hate to only have 0.5 days of Rosie reserve (as it the case now) on my quadcore, the network go down on my first day away, and Rosie is out of a quadcore's worth of processing for nearly 1 week.

don't have much hope that the pc's will reconnect to the network without manual intervention.

so, i'd really like to have/force boinc/Rosie provide me with 28 days (7 days * 4 cores) worth of reserves...
ID: 46870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 46879 - Posted: 24 Sep 2007, 1:32:28 UTC

If you use rah's general pref. - 'Connect to network about every' should be set to the number of days you want in reserve irrelevent of the number of cpu's. Rah uses the benchmarks run on each PC to figure out how many wu's to send for your cache. So 7 days should work for your quad core cpu also because rah already know how much your pc will do.
Jmarks
ID: 46879 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46880 - Posted: 24 Sep 2007, 1:44:42 UTC - in response to Message 46879.  

I have this set to "4 days", and I have less than half a day of reserve.

If you use rah's general pref. - 'Connect to network about every' should be set to the number of days you want in reserve irrelevent of the number of cpu's. Rah uses the benchmarks run on each PC to figure out how many wu's to send for your cache. So 7 days should work for your quad core cpu also because rah already know how much your pc will do.

ID: 46880 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 46881 - Posted: 24 Sep 2007, 2:06:37 UTC - in response to Message 46880.  
Last modified: 24 Sep 2007, 2:17:06 UTC

I have this set to "4 days", and I have less than half a day of reserve.

If you use rah's general pref. - 'Connect to network about every' should be set to the number of days you want in reserve irrelevent of the number of cpu's. Rah uses the benchmarks run on each PC to figure out how many wu's to send for your cache. So 7 days should work for your quad core cpu also because rah already know how much your pc will do.



If you have not changed any other of your settings maybe it is something wrong with rah. I would try rebooting your PC.

ps I changed my Target CPU run time from .5 days to 1 day. Becuase of this my pc has not asked for work so I do not know if rah is down.
Jmarks
ID: 46881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 46884 - Posted: 24 Sep 2007, 2:31:01 UTC

I've seen several simlar reports. The question always boils down to this:
"how much work did your client request from Rosetta?".

And the reason I always ask that question is that people tend to assume that Rosetta is in control of all of this, and therefore is the cause of such low/no work conditions.

I suspect that more recent and perhaps beta BOINC versions have some issues in the work scheduler. They've also been adding more overrides and controls over work fetch policies for each specific machine.

If your machine is not requesting new work from Rosetta, then the BOINC client's code needs to be issueing requests more frequently.

If your machine is requesting new work, but not enough seconds of work to fill out your cache, then again, the problem is with the BOINC client.

If your machine is requesting new work and gets back a "no work from project" message, then you have a Rosetta issue.
Rosetta Moderator: Mod.Sense
ID: 46884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sailor
Avatar

Send message
Joined: 19 Mar 07
Posts: 75
Credit: 89,192
RAC: 0
Message 46886 - Posted: 24 Sep 2007, 3:37:28 UTC

I had the same problem some days ago with my AMD 4000+ and Spinhenge.

Here is how I made a work around:

Select "Extras" then "preferences" in the BOINC Manager. There go into "network usage" and look for : "Additional work buffer".
Once i changed my settings there, BOINC started requesting work right away like i wanted.

Sidenote: this will overwrite settings for all projects on your local PC, to me it didnt matter as the only project I run on that machine is Spinhenge. Also, in the future the machine wont take changes made through the projects interface unless you select "clear" in the preferences.
http://www.MIAteam.eu
ID: 46886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46887 - Posted: 24 Sep 2007, 4:23:50 UTC - in response to Message 46886.  
Last modified: 24 Sep 2007, 4:26:17 UTC

Thanx! I'll test a day or two before i go away to see how this works on the quadcore.

I had the same problem some days ago with my AMD 4000+ and Spinhenge.

Here is how I made a work around:

ID: 46887 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 46892 - Posted: 24 Sep 2007, 6:18:27 UTC - in response to Message 46887.  

that's what i am saying...for me...i have in that very menu of boinc manager..advanced-preferences-network usage (connect every 1 day, additional work 7 days) and the scheduler ignores that.

note: i see that 5.69 gave me some work. but still not at 7 days buffer.

Thanx! I'll test a day or two before i go away to see how this works on the quadcore.

I had the same problem some days ago with my AMD 4000+ and Spinhenge.

Here is how I made a work around:



ID: 46892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46899 - Posted: 24 Sep 2007, 13:50:37 UTC - in response to Message 46892.  
Last modified: 24 Sep 2007, 14:19:21 UTC

i have 5.10.13 and just tried it now.

i set to 1 day buffer, for a test.

i have set to 3 hr wu's, and a quadcore. in 24 hrs, each core can complete 8 wu's, and with 4 cores, thats 32 wu's per day.

just checked BM, 4 wu's being crunched, 28 wu's hanging around. for a total of 32.

seems to be working for me.

maybe there's a difference in this functionality between 5.10.13 and 5.10.20 ?

would seem to be a BM issue, at least relative to your installation/settings.

that's what i am saying...for me...i have in that very menu of boinc manager..advanced-preferences-network usage (connect every 1 day, additional work 7 days) and the scheduler ignores that.

note: i see that 5.69 gave me some work. but still not at 7 days buffer.

ID: 46899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sailor
Avatar

Send message
Joined: 19 Mar 07
Posts: 75
Credit: 89,192
RAC: 0
Message 46903 - Posted: 24 Sep 2007, 14:07:41 UTC

hm I got 5.10.20 and it works... maybe just reset rosetta, greg?
http://www.MIAteam.eu
ID: 46903 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 46912 - Posted: 24 Sep 2007, 17:03:21 UTC - in response to Message 46869.  
Last modified: 24 Sep 2007, 17:13:20 UTC

someone want to explain why the scheduler is ignoring my settings?
I have connect every 1 day and keep 7 days extra work.
since my initial download of 5.80 after the server crash the reserve keeps going down and no new work is downloaded. there are no messages saying no work, yet I have 4.16 days of work left at maximum 6 hrs run time per work unit.

in my other thread, i saw something about the scheduler will download new work at the last minute. that should not be happening when you prefs are set for 7 days reserve. there should always be a 7 day reserve of work, at least logically. or is the scheduler wanting to work through all the work and then download new work when 7 days is up? I don't understand the settings. Prior to the crash it always kept the reserve up to date so that there was always a queue of work to due even during the outage. so why is 5.80 not doing that as well?

or is all this a BOINC problem? is 5.10.20 having a brain fart again?

Well, if I didn't mis-count, your computer has currently 34 unfinished wu, and for a single-core with 6 hour run-time-preference this means atleast 8.5 days expected run-time. But, it's possible you don't have 34 wu's, since Rosetta@home apparently haven't enabled resending of "lost" wu...

But anyway, going after the assumption it's not correct, so some basic "debug"-tips...


If you looks on Tasks-tab, is the running task marked with "High Priority"? If so, work-request to the tasks project is blocked until not in deadline-trouble any longer. Setting "connect every N days" too large compared to deadline is one way to be permanently in deadline-trouble, but shouldn't be a problem if it's set to 1 day with a 10? day deadline, but if you've got any 2-3 days or shorter deadline it's likely culprit.
Not sure if there's any limitations on "Cache additional N days"...

Note, for any others thinks they've got a problem, "High Priority" only shows-up in BOINC v5.10.14 and later.


Are you running multiple projects? If not mis-remembers, if atleast n_cpu tasks is in deadline-trouble, even if they're for other projects, all work-request is blocked. Also, if by chance Rosetta@home has negative long_term_debt, it won't normally ask for work.

A couple obvious, but just to mention them.. Setting "no new work" or suspending a Task or suspending network will also block work-request.
Also, wrong venue or wrong override-values can be an option, but shouldn't be for you since you're looking on the "advanced preferences"...

Still nothing?
At bottom of the computers summary, that are the 4 fields:
% of time BOINC client is running
While BOINC running, % of time work is allowed
Average CPU efficiency
Result duration correction factor

Both the %-values should be close to 100%, while "Average CPU efficiency" close to 1. If significantly less, it can block work-requests.
As for "Result duration correction factor", not sure that would be "correct" with a 6-hour run-time-preference, but maybe somewhere around 1-2... If significantly higher than 2, can again block work-request...


Another method, to take a closer look if BOINC has "lost it's marbles", make a text-file, and place it in BOINC-directory. Name the file "cc_config.xml", and it should include these lines:

<cc_config>
<log_flags>
<rr_simulation>1</rr_simulation>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
</cc_config>

After saved the file, in BOINC Manager, just click "Advanced/Read config file".

Now, debug-logging often very quickly makes a large log, so then finished debugging, the "easy" way is to re-edit file, and just chance the 1 to 0, save file, and "Read config file" again. This makes it easy if you later wants to re-enable debug-logging. ;)

If the log shows one or more result in "deadline-trouble", the project is blocked from asking...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 46912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 46917 - Posted: 24 Sep 2007, 17:26:57 UTC
Last modified: 24 Sep 2007, 17:28:00 UTC

ahh...it is possible the high priority was messing things up.
that would explain alot of the hold up on getting work.
now that i am home and able to check my computer i see i NOW have enough work to keep my system busy for the 8 days i told it to request. last download was around 12 UTC
ID: 46917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 46920 - Posted: 24 Sep 2007, 18:02:48 UTC

what do you guys think about the preferences about using hard disk space in the boinc client. if you i.e. request 7 days of work but you got not enaugh gig's to put those 7 days on, you may in example only download for 4 days, cause your settings dont match with eatch other, and the 1 overrules the other.
ID: 46920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 46923 - Posted: 24 Sep 2007, 18:20:25 UTC - in response to Message 46920.  

what do you guys think about the preferences about using hard disk space in the boinc client. if you i.e. request 7 days of work but you got not enaugh gig's to put those 7 days on, you may in example only download for 4 days, cause your settings dont match with eatch other, and the 1 overrules the other.

With too little disk/memory you'll still continue asking for work-request, but you'll be told in scheduler-reply whatever is wrong and possible solutions. It's possible you'll get a 24hour-deferral.

Now, if not mis-remembers, the BOINC-defaults for disk is use at most 100 GB or 50% and leave 0.1 GB free, so as long as keeps to the defaults it shouldn't be a problem. Still, some projects has AFAIK changed the defaults...

For memory, if you've got very little memory it's possible to be hit the 90%-rule introduced in v5.8.xx, but more commonly it's the 50% "if active" users will see. Well, looking on other projects, some Predictor-wu's has demanded over 1 GB, so many can have been hit by this...

And a fairly new message, something going along the lines of "too slow download-speed" from WCG on their "Africa/climate"-wu's...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 46923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_JC

Send message
Joined: 7 Nov 05
Posts: 13
Credit: 6,250,245
RAC: 7,979
Message 46946 - Posted: 25 Sep 2007, 0:28:04 UTC

Well, we'll see...my box just started its last WU, the other projects are suspended. Last time it downloaded a bunch of WUs at the end, but it hasn't kept the queue full at all, just like others are experiencing.
ID: 46946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_JC

Send message
Joined: 7 Nov 05
Posts: 13
Credit: 6,250,245
RAC: 7,979
Message 46947 - Posted: 25 Sep 2007, 1:47:26 UTC

Hmmm....once again, BOINC let the queue run down until I had an idle core, then it downloaded about 8 days' work. At least I got work, I guess! :P
ID: 46947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 46977 - Posted: 25 Sep 2007, 17:13:15 UTC

the 5.8's are diminishing in my queue and the 5.69 is taking over.
but rosie is back to normal for me on filling up my queue.
ID: 46977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 47009 - Posted: 26 Sep 2007, 5:04:02 UTC - in response to Message 46947.  

Hmmm....once again, BOINC let the queue run down until I had an idle core, then it downloaded about 8 days' work. At least I got work, I guess! :P

This behaviour often means "too large cache-size", atleast if you're running with a large "Connect every N days"...


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 47009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 47050 - Posted: 26 Sep 2007, 21:38:52 UTC - in response to Message 47009.  

Hmmm....once again, BOINC let the queue run down until I had an idle core, then it downloaded about 8 days' work. At least I got work, I guess! :P

This behaviour often means "too large cache-size", atleast if you're running with a large "Connect every N days"...


DOH! Why didn't I think of this before. Deadlines here are ~11 days. So any conbination of connect and extra settings totalling more than ~5 days can cause this kind of problem due to the excessive queue length. Total queue length should not be more than 40% of the deadlines for BOINC to work smoothly.
BOINC WIKI

BOINCing since 2002/12/8
ID: 47050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : reserve wu's not downloading



©2024 University of Washington
https://www.bakerlab.org