Why no R@h work downloading? I also run SETI

Message boards : Number crunching : Why no R@h work downloading? I also run SETI

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 47470 - Posted: 6 Oct 2007, 23:41:14 UTC - in response to Message 47466.  
Last modified: 6 Oct 2007, 23:42:36 UTC

read this wiki thread on work buffer size. from near the bottom comes this: Recommendation for Multiple Project Systems
If you are running several BOINC Powered Projects on a computer we suggest a maximum value of 2 to 4 days divided by the number of Attached Projects if you use a dial-up connection. If you have an always on Internet connection we suggest a setting of 0.1 days (2.4 hours).

there is a chart above this.

there is also the issue of debt to seti in addition to buffer and connec times.
thats a whole different topic.

John McCleod VII wrote some time ago on the BOINC fora:
Settings larger than (report deadline - 1 day) / 2 are just about guaranteed to download a batch and run it to completion before downloading the next batch.
So it seems that Rosetta would work best with caches of less than 6 days, 4 days is better. And this includes the connect interval plus any additional days of cache in your preferences override.


ID: 47470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 47486 - Posted: 7 Oct 2007, 13:28:30 UTC - in response to Message 47387.  
Last modified: 7 Oct 2007, 13:49:21 UTC

Since I/we are not alone, I would suggest that RAH put this problem on the News page and state a resolution will be available within ___ days and have a hyperlink to the solution once it is available.

Short answer, since it is not a bug, no bug-fix will be made.

Longer answer...

With the re-design of work-request and cpu-scheduling for v5.8.xx and later v5.10.xx, some changes was made to meet the following requirements:

A; If at all possible, return all work by the deadline.
B; Keep all cpu's busy.
C; Keep cache filled with enough work.

Requirement A is the most important one, both from most BOINC-projects point of view, but also for the user, since in most BOINC-projects returning after the deadline means no credit and result not used for anything.

To meet the A-requirement, some of the scheduling-rules being used is:
1; If "connect every N days" > "deadline", don't assign wu at all.
2; If "expected run-time" > "deadline", don't assign wu at all.
3; If "connect every N days" > 1/2 * "time to deadline", block work-request to project.
4; If "expected run-time" > 0.9 * "time to deadline", block work-request to project.
5; For 3 & 4, add an additional "switch between project every N minutes" safety-margin. For most users, this means 1 hour.

If multi-project, one project being blocked normally doesn't mean other projects can't continue downloading work, but it is a possibility in some instances.

Blocking work-request due to #3 or #4 can be due to a single wu, or due to the sum of many wu's.

Rule #1 & #2 is checked and enforced by the Scheduling-server, and is only overridden if computer doesn't have a single wu for the particular project, meaning 1 but only 1 wu is assigned.
Rule #3, #4 & #5 is checked and enforced by the Client, and is only overridden if idle cpu.

Why the rules?
#1 should be self-evident, if client says "I won't connect again before in 2 days", it's no hope of a wu with 1-day deadline can be returned by deadline.

#2 should also be fairly self-explanatory, if client reports "runs 50% of the time" and a wu needs 1.5 days cpu-time and has a 2-days deadline, client again has no hope of returning by deadline, since client will use 3 days run-time to get 1.5 days cpu-time.

#3 on the other hand is not immediately apparent for everyone. But, let's say a wu has 13 hours left to run, 23 hours to deadline, and "connect every 12 hours". Meaning, based on run-time and time to deadline, there's no problems. But, "connect every 12 hours" means client will have next connection in 12 hours, and by this time 12 of the 13 hours crunch-time is used. The 2nd. connection on the other hand will also be 12 hours away, and even now only 1 hour left to crunch, client will not connect again until after the deadline...
Also remember, it can be due to a sum of wu's. If example has 10 wu's each taking 6 hours to crunch and 80 hours until deadline, with a "connect every 48 hours" you'll still not connect often enough to manage reporting all wu's before deadline.

#4, since most wu's isn't using exactly the run-time as expected, and usage-patterns can variate, a safety-margin of 10% was added, so wu's can take 10% longer time than expected but still meet the deadline.

#5 is mostly for internal client-scheduling-reasons, but normally has little influence on anything.


So, that is the effects of these rules for someone only running Rosetta@home, knowing Rosetta@home has a 10-days deadline?
#1 and #2 has little effect, even with a 10-days cache. Possibly you'll get 1 wu less than a full 10-days, but little effect.

#3 on the other hand, the moment "connect every N days" > 4.71 days, you will only ask for work then idle cpu. You'll be assigned many wu's at once, but will not re-ask before idle cpu again.
With v5.10.xx, anyone having a permanent connection is recommended to use zero here, or possibly 0.01 days or something, and instead use "additional days"-setting for cache-size.

#4 will give the most "interesting" pattern. With a 10-days cache-setting, you'll on day-0 fill-up with upto 10 days. After crunching 1 day, your cached work has dropped to only 9 days, less than cache-setting, so why not re-fill? Answer, your 9 days of cached work has 9 days until deadline, meaning you don't have the 10% "safety-margin". This will continue, after 2 days crunching you've got 8 days cached with 8 days until deadline, and this will continue till hits idle cpu...
So, if "cache additional days" + "connect every N days" > 8.9 days, you'll fill-up to a full cache, but only re-fill then empty or idle cpu...


For anyone running v5.10.14 or later, if they takes a look on Task-tab and see one of the running tasks marked "High Priority" it means "this project is currently blocked from asking for more work".

If multi-project, one project being blocked doesn't normally mean other projects can't still give you more work, keeping your cache full. But, long-term-debt, resource-share and so on and deadlines in other projects can still influence things, and be the reason for no work-request.

But in any case, to fix "High Priority" in Rosetta@home:
Decrease "connect every N days" to less than 4.7 days. If permanently connected, use 0.1 days or less.
Decrease "cache additional days" until "additonal days" + "connect every N days" is less than 8.9 days.

Now, if you're currently breaking rule #4, changing your preferences won't have any immediate effects. But, hopefully, then cache finally hits empty, then you re-fill again you won't hit the limits a 2nd. time.

And, just to be on the safe side, my recommendation would be max 4 days for "connect every N days", but only if you really needs it due to infrequently connected, and max total cache 8 days.
Or, run multiple projects. :)


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 47486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BorgVampire

Send message
Joined: 5 Jan 06
Posts: 3
Credit: 584,439
RAC: 0
Message 50816 - Posted: 19 Jan 2008, 2:07:47 UTC

If boinc is not downloading work units for rosetta because you have a lot of work units already for another project then you can force rosetta to download more work units by suspending the other project for a few seconds. Rosetta will then download more work units but your computer may not be able to complete both projects work load on time. To solve that fact you could then leave your computer on for longer periods, however if you leave your computer on 24/7 that will obviously be impossible and you may have unfinished work units that go beyond their deadlines for completion.

ID: 50816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 50821 - Posted: 19 Jan 2008, 9:45:16 UTC

While BorgVampire's suggestion will work it is not really a good practice. When you force BOINC to download tasks like that it makes it take that much longer to settle into a more predictable rhythm.

Ingleside gave a very good explanation. One thing to remember is that when you change your queue size it can take up to double the longer setting for things to settle down.
BOINC WIKI

BOINCing since 2002/12/8
ID: 50821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BorgVampire

Send message
Joined: 5 Jan 06
Posts: 3
Credit: 584,439
RAC: 0
Message 50822 - Posted: 19 Jan 2008, 11:34:55 UTC
Last modified: 19 Jan 2008, 11:37:20 UTC

What I would love to see is Boinc deliberately slowing faster projects down to equalize the work done by all projects to a point that all projects get an equal work load finished. Also if a project goes down for a period, when it comes back up I would like boinc to slow the other project down to allow it to catch up. That way both projects would stay at equal footing throughout the years. Obviously there would need to be a setting for which projects you would like this to happen with in your preferences but I think it would be fairly easy to program that in just useing the results tables. At the moment I try to do this by setting Rosetta at 53% and Seti at 47, but they are never equal for long.
ID: 50822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 50827 - Posted: 20 Jan 2008, 7:18:09 UTC

It's the same old thing.

Users want and expect the projects to crunch according to resource share. If I have 3 projects with equal shares, I want each one to crunch 8 hours a day. With task switching set to 1 hour, it should dang-well stop the one it's doing every hour and switch to the next project.

Simple, easy to understand - none of this complicated horse-pucky. If a deadline don't get met, so what? There's plenty of other users wanting to download work, and it'll git done.



Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 50827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aux10

Send message
Joined: 16 Aug 08
Posts: 1
Credit: 137,736
RAC: 0
Message 55099 - Posted: 16 Aug 2008, 20:07:05 UTC

I just added R@H through BAM account manager. I have one computer running SETI, Prime Grid, and Einstein and another just running Prime Grid. I just added a new computer to run R@H. The project was successfully attached to the computer through BAM but it is not receiving any work. The other computers are receiving and running all tasks just fine. Why not the new one?
ID: 55099 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 55100 - Posted: 16 Aug 2008, 20:52:14 UTC - in response to Message 55099.  

I just added R@H through BAM account manager. I have one computer running SETI, Prime Grid, and Einstein and another just running Prime Grid. I just added a new computer to run R@H. The project was successfully attached to the computer through BAM but it is not receiving any work. The other computers are receiving and running all tasks just fine. Why not the new one?

Hello Aux10,

Welcome to Rosetta@home.
Just as many other R@h crunchers you can't download workunits because the server doesn't produce any Wu's; watch the homepage, right hand up corner:
Server Status as of 16 Aug 2008 20:45:33 UTC
[ Scheduler running ] Queued: 0
Don't know why there are no Wu's produced, hopefully the techs will find out soon.

Have a nice day,
Path7.
ID: 55100 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaBrat and DaBear

Send message
Joined: 9 Aug 08
Posts: 16
Credit: 213,180
RAC: 0
Message 55106 - Posted: 16 Aug 2008, 22:17:58 UTC

Learn something new everyday. I was actually browsing looking for the server issues and instead found out why I got that big 'no work' message from the server after one of my comps crunched nothing but Rosie for a few days during the SETI outage... thanks so much.
ID: 55106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RamonS

Send message
Joined: 19 Jun 08
Posts: 3
Credit: 13,195,479
RAC: 0
Message 55150 - Posted: 18 Aug 2008, 0:46:10 UTC

Hi!
Seems that the queue is still not filling up as of 8:37PM EDT on August 17 2008. I also joined other projects and some, too, have nothing in the queue (Lattice) while others work fine. Makes me believe that it is not a client issue.
Happy to have my little servers run for this cause, nobody wants to look at my website anyway. ;)
ID: 55150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55161 - Posted: 18 Aug 2008, 12:38:19 UTC

Ramon, all the posts from August 16-18 time frame were caused by a server problem. Others have been referred to this thread over time to explain more about how the BOINC client works in scheduling things. So, the problems over the weekend were not caused at the client. But, at other times, they can be.
Rosetta Moderator: Mod.Sense
ID: 55161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 55163 - Posted: 18 Aug 2008, 13:46:12 UTC - in response to Message 55161.  

Mod - the page showing the server status showed all green, yet the homepage showed 0 queued and the server page showed something like only 1000 or 10000 ready to send. if it was a server problem, then why the all green? seems that at times the status page does not reflect issues that are seen by us trying to download work. was the server in question that had the trouble not part of the list shown in the status page?


Ramon, all the posts from August 16-18 time frame were caused by a server problem. Others have been referred to this thread over time to explain more about how the BOINC client works in scheduling things. So, the problems over the weekend were not caused at the client. But, at other times, they can be.

ID: 55163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55167 - Posted: 18 Aug 2008, 16:34:38 UTC

Greg, I am not in a position to know about the details behind the scenes to address your specific question. I know there have been problems in the past where the servers were all active, but the file server behind them was having congestion and performance issues. And in such cases new work was being produced, but not fast enough to keep up with demand.
Rosetta Moderator: Mod.Sense
ID: 55167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 55170 - Posted: 18 Aug 2008, 18:15:03 UTC - in response to Message 55167.  

ok thanks for that
is there a 'duty watch' person as such for weekends?
being this last problem happened on a weekend maybe there should be such a person that watches the system or the boards for such problems?

Greg, I am not in a position to know about the details behind the scenes to address your specific question. I know there have been problems in the past where the servers were all active, but the file server behind them was having congestion and performance issues. And in such cases new work was being produced, but not fast enough to keep up with demand.

ID: 55170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile WBT112

Send message
Joined: 11 Dec 05
Posts: 11
Credit: 1,382,693
RAC: 0
Message 55190 - Posted: 20 Aug 2008, 15:30:11 UTC

Hello everybody,

I guess my problem has something to do with the server problems discussed here since it appeared at the weekend after upgrading boinc so i'll post it here:

I still get no work from the project.

2 other machines with the same internet connection run fine now again, but my quad core is just getting work from WCG, not from Rosetta or R@lph (It did get work before). The Machine is just fetching the scheduler list (whatever this is) for Rosetta and R@lph over and over again with no result or error Message.

I lowered the share for WCG, i've even detached it and downgraded to boinc 5 with no result.

Any ideas or suggestions ?



ID: 55190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55191 - Posted: 20 Aug 2008, 16:00:08 UTC

WBT, please see the description in the original post of this thread. Is your host requesting new work from Rosetta? It would be helpful if you could copy a few of the messages here.
Rosetta Moderator: Mod.Sense
ID: 55191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile WBT112

Send message
Joined: 11 Dec 05
Posts: 11
Credit: 1,382,693
RAC: 0
Message 55192 - Posted: 20 Aug 2008, 18:38:40 UTC

that's it. and this happens over and over again after the 24:00 hours (or earlier if i click the button).
i guess it may be a boinc or connection problem but why is Wcg running without problems and other machines too ?

I'll do a clean Boinc install tomorrow and find out. Hope my description isn't too confusing.

20.08.2008 16:37:50||Starting BOINC client version 5.10.45 for windows_x86_64
20.08.2008 16:37:50||log flags: task, file_xfer, sched_ops
20.08.2008 16:37:52||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
20.08.2008 16:37:52||Data directory: C:Program FilesBOINC
20.08.2008 16:37:53||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11]
20.08.2008 16:37:53||Processor features: fpu tsc pae nx sse sse2 pni
20.08.2008 16:37:53||OS: Microsoft Windows Vista: , Service Pack 1, (06.00.6001.00)
20.08.2008 16:37:53||Memory: 4.00 GB physical, 8.17 GB virtual
20.08.2008 16:37:53||Disk: 111.79 GB total, 71.31 GB free
20.08.2008 16:37:54||Local time is UTC +2 hours
20.08.2008 16:37:55|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: not assigned yet; location: (none); project prefs: default
20.08.2008 16:37:55|ralph@home|URL: http://ralph.bakerlab.org/; Computer ID: not assigned yet; location: (none); project prefs: default
20.08.2008 16:37:56||No general preferences found - using BOINC defaults
20.08.2008 16:37:56||Reading preferences override file
20.08.2008 16:37:56||Preferences limit memory usage when active to 2046.79MB
20.08.2008 16:37:56||Preferences limit memory usage when idle to 3684.22MB
20.08.2008 16:37:56||Preferences limit disk usage to 9.31GB
20.08.2008 16:39:47|rosetta@home|Fetching scheduler list
20.08.2008 16:42:26||Fetching configuration file from http://boincstats.com/bam/get_project_config.php
20.08.2008 16:47:09||Fetching configuration file from http://bam.boincstats.com/get_project_config.php
20.08.2008 16:47:50||Contacting account manager at http://bam.boincstats.com/
20.08.2008 16:49:42|rosetta@home|Resetting project
20.08.2008 16:49:42|rosetta@home|Detaching from project
20.08.2008 16:49:46|ralph@home|Resetting project
20.08.2008 16:49:46|ralph@home|Detaching from project
20.08.2008 16:50:17||Account manager: BAM Host-ID: 118955
20.08.2008 16:50:17||Account manager contact succeeded
20.08.2008 16:50:17||Attaching to https://boinc.bakerlab.org/rosetta/
20.08.2008 16:50:17||Attaching to http://www.worldcommunitygrid.org/
20.08.2008 16:50:17||General prefs: from http://bam.boincstats.com/ (last modified 20-Aug-2008 16:50:41)
20.08.2008 16:50:17||Host location: none
20.08.2008 16:50:17||General prefs: using your defaults
20.08.2008 16:50:17||Reading preferences override file
20.08.2008 16:50:17||Preferences limit memory usage when active to 2046.79MB
20.08.2008 16:50:17||Preferences limit memory usage when idle to 3684.22MB
20.08.2008 16:50:17||Preferences limit disk usage to 9.31GB
20.08.2008 16:50:42|http://www.worldcommunitygrid.org/|Master file download succeeded
20.08.2008 16:50:47|http://www.worldcommunitygrid.org/|Sending scheduler request: Project initialization.  Requesting 1 seconds of work, reporting 0 completed tasks
20.08.2008 16:50:57|World Community Grid|Scheduler request succeeded: got 1 new tasks
20.08.2008 16:50:57||General prefs: from World Community Grid (last modified 17-Aug-2008 11:27:09)
20.08.2008 16:50:57||Host location: none
20.08.2008 16:50:57||General prefs: using your defaults
20.08.2008 16:50:57||Reading preferences override file
20.08.2008 16:50:57||Preferences limit memory usage when active to 2046.79MB
20.08.2008 16:50:57||Preferences limit memory usage when idle to 3684.22MB
20.08.2008 16:50:57||Preferences limit disk usage to 9.31GB
20.08.2008 16:50:59|World Community Grid|Started download of wcg_hcc1_img_6.06_windows_intelx86



...later:

20.08.2008 20:07:34|https://boinc.bakerlab.org/rosetta/|Fetching scheduler list

and at the same time on the 'project page' of boinc:

Waiting scheduler request, Project initialization, Communication dalayed for 24:00 h (translated this line into english so dont take every word of the error message too serious)
ID: 55192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile WBT112

Send message
Joined: 11 Dec 05
Posts: 11
Credit: 1,382,693
RAC: 0
Message 55205 - Posted: 21 Aug 2008, 16:15:32 UTC - in response to Message 55192.  

there was no edit button so excuse my double posting please.

just for your interest: reinstalling boinc didn't work. I'll keep rosetta in my project list and crunch for wcg with this host. maybe it is vista that doesn't like rosetta or the vista firewall is blocking rosetta only on this host (shared connection). whatever it is, I don't like it.





ID: 55205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,496
Message 55227 - Posted: 22 Aug 2008, 15:25:17 UTC - in response to Message 55205.  

there was no edit button so excuse my double posting please.

just for your interest: reinstalling boinc didn't work. I'll keep rosetta in my project list and crunch for wcg with this host. maybe it is vista that doesn't like rosetta or the vista firewall is blocking rosetta only on this host (shared connection). whatever it is, I don't like it.



I'm running rosetta under BOINC 5.10.45 also, under Vista SP1, but a 32-bit version. After a recent problem with rosetta not generating workunits as fast as they were downloaded, I haven't seen any such problems.
ID: 55227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile WBT112

Send message
Joined: 11 Dec 05
Posts: 11
Credit: 1,382,693
RAC: 0
Message 55488 - Posted: 2 Sep 2008, 17:49:03 UTC

Just if anyone is interested what happened in the meantime:

I crunched for WCG for about a week on this machine and just 5 minutes ago the master file download finally succeded (i wasn't even here)!

I still don't have any idea why but i'll donate some computing power of this computer to rosetta again.

bye :)
ID: 55488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Why no R@h work downloading? I also run SETI



©2024 University of Washington
https://www.bakerlab.org