too many WUs downloaded

Message boards : Number crunching : too many WUs downloaded

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 11410 - Posted: 26 Feb 2006, 5:04:10 UTC - in response to Message 11371.  
Last modified: 26 Feb 2006, 5:28:12 UTC

How these applications work out in practice is largely a mystery to me.
The only difference between the linux and the other versions is that there is no graphics for linux yet.

All this time I was running with (Linux) 4.80 (which did *not* support "Target CPU run time"). After ABORTING the past-expiration WUs, I had to wait until __ALL__ work queued at my system (including one stray WU with a complete-by date in March) had been processed by my system AND had been reported to the
server -- only THEN did the project start downloading any new work (or 4.81 iteslf) to me !!

[To avoid another "flood" of WUs, as soon as the request (for 256200 seconds of work - that's what I currently have specified in my General preferences) was sent to the server, I *manually* clicked in BOINCmanager for: "No more work from Rosetta".] I'm now watching 9 WUs being downloaded (preceded by 4.81). [That would be correct if each ran for 8 hours. But earlier than today I had set my "Target CPU run time" value in the Rosetta preferences to 10 hours. Oh, well! Just another example of unexpected behavior, to keep in mind.] I __hope__ I caught the client in time, and NO MORE seconds of work will be requested today. It is taking *effort* to juggle the application's actions to avoid another "too many WUs downloaded" situation.


--------
p.s. I was right in what I feared with my initial post to this thread -- I __did__ get OODLES of 'client errors' attributed to me (for all those "too many WUs downloaded" that I then had to manually ABORT after they had expired).
.
ID: 11410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terminal*

Send message
Joined: 23 Nov 05
Posts: 6
Credit: 7,845,878
RAC: 0
Message 11437 - Posted: 27 Feb 2006, 0:59:57 UTC

so isnt it bad for that unit of work if u tell it to finish in 2 hours? wouldnt it be more thurough(SP) if it went for as long as it can...like seti...when it finish's..it finish's, this..you can actually tell it WHEN to finish? whats the point in that..shouldnt it just run till its finished?
ID: 11437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11443 - Posted: 27 Feb 2006, 4:34:13 UTC - in response to Message 11410.  

... [To avoid another "flood" of WUs, as soon as the request (for 256200 seconds of work - that's what I currently have specified in my General preferences) was sent to the server, I *manually* clicked in BOINCmanager for: "No more work from Rosetta".] I'm now watching 9 WUs being downloaded (preceded by 4.81). [That would be correct if each ran for 8 hours. But earlier than today I had set my "Target CPU run time" value in the Rosetta preferences to 10 hours. Oh, well! Just another example of unexpected behavior, to keep in mind.] I __hope__ I caught the client in time, and NO MORE seconds of work will be requested today. It is taking *effort* to juggle the application's actions to avoid another "too many WUs downloaded" situation.


--------
p.s. I was right in what I feared with my initial post to this thread -- I __did__ get OODLES of 'client errors' attributed to me (for all those "too many WUs downloaded" that I then had to manually ABORT after they had expired).
.



I have updated the FAQ on the time setting to try to better explain how this all works in practice. But the system MAY have got it right by giving you only 9 WUs. remember some of them may run longer than the time setting you have, because they will run to the completion time you asked for, but they will also always complete at lest one model. So if your settings would allow 9.9 models to complete, it will run shorter than you expect. But if one model would take 10.5 hours it will run longer.

Assuming you complete a nuber of WUs ok, the quota will rise very quickly

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11443 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 11444 - Posted: 27 Feb 2006, 4:42:24 UTC - in response to Message 11437.  

shouldnt it just run till its finished?


Regardless of the max cpu time setting, it'll always finish at least one model; even if it's a large WU and takes 10 hours on your machine and your max cpu time is set to less than that time period.) The WU will get passed out until the project gets its 10,000 or so models back.

The larger the max cpu time setting, the less frequent communications with the project servers, and the less likely the system will be overwhelmed with communications requests such as happened when we were handed out lots of 15 min WUs and we became a distributed denial of service attack on the project servers.
ID: 11444 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11445 - Posted: 27 Feb 2006, 4:58:04 UTC - in response to Message 11444.  

shouldnt it just run till its finished?


Regardless of the max cpu time setting, it'll always finish at least one model; even if it's a large WU and takes 10 hours on your machine and your max cpu time is set to less than that time period.) The WU will get passed out until the project gets its 10,000 or so models back.

The larger the max cpu time setting, the less frequent communications with the project servers, and the less likely the system will be overwhelmed with communications requests such as happened when we were handed out lots of 15 min WUs and we became a distributed denial of service attack on the project servers.


That is a fair comment, and a reasonable statement of the "possibilities". but you have to consider that what the project is trying to do is accommodate people who want large queues, but low bandwidth, while at the same time allowing people who want short queues and have no bandwidth issues to run that way as well. While it may not be perfect right now, there have been significant changes in the right direction. From what I can see the Max time error issue is gone as a routine issue and is now rare. The number of actual 1% hangs is far less now than only a few weeks ago, and now people are beginning to understand how to tell if they actually have one or not.

There will always be sporadic errors on any project that is the nature of the game. But this all has to be taken in the context of 40,000 users running the project, verses how many errors are reported.

But your basic premise is correct, all WUs will run for at least 2 hours (lowest setting available) and can run for 4 days (highest setting), and somewhere the project will get the 10,000 results.

They are watching very closely now for early signs of the kind of problem that occurred in December, and they will work to head off the problem as they just did about a week ago. Now they have the Ralph testing to try the fixes off line, so the testing is no longer done in the production system. They will get all of these issues taken care of but it will take a little time.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 11450 - Posted: 27 Feb 2006, 8:09:24 UTC - in response to Message 11444.  

The larger the max cpu time setting, the less frequent communications with the project servers, and the less likely the system will be overwhelmed with communications requests such as happened when we were handed out lots of 15 min WUs and we became a distributed denial of service attack on the project servers.

I have no argument with what you are saying. Yes, alowing longer-running WUs *does* mean that participants get to download from the server less frequently.

But it will take a client change to overcome the problem for which this thread was originally opened -- when downloads are SLOW, the current client can __re-request__ (more) work before the download of the *earlier-requested* work has completed. In that case, the second request (plus any follow-on requests) results in TOO MANY WUs DOWNLOADED (no matter what the preferences settings are).
.
ID: 11450 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 11451 - Posted: 27 Feb 2006, 8:31:09 UTC - in response to Message 11443.  
Last modified: 27 Feb 2006, 8:35:07 UTC

To avoid another "flood" of WUs, as soon as the request (for 256200 seconds of work - that's what I currently have specified in my General preferences) was sent to the server, I *manually* clicked in BOINCmanager for: "No more work from Rosetta". I'm now watching 9 WUs being downloaded (preceded by 4.81). That would be correct if each ran for 8 hours. But earlier than today I had set my "Target CPU run time" value in the Rosetta preferences to 10 hours. Oh, well! Just another example of unexpected behavior, to keep in mind.

I have updated the FAQ on the time setting to try to better explain how this all works in practice. But the system MAY have got it right by giving you only 9 WUs. remember some of them may run longer than the time setting you have, because they will run to the completion time you asked for, but they will also always complete at lest one model. So if your settings would allow 9.9 models to complete, it will run shorter than you expect. But if one model would take 10.5 hours it will run longer.

Assuming you complete a number of WUs ok, the quota will rise very quickly

You did not catch what I was complaining about:
- The client asked for 256200 seconds (72 hours) of work. (correct)
- The SERVER sent 9 WUs (as though each would run for 8 hours; 9 WUs times 8 hours/WU = 72 hours total). IF the server had realized that my "Target CPU run time" was set to 10 hours, 9 WUs represent __90__ hors of work (instead of the 72 hours that had been requested). I had expected the server to send me 8 WUs -- 8 WUs times 10 hours/WU = 80 hours total (the closest multiple of 10 <the hours/WU> that exceeds 72). To me, that total of 9 WUs makes yet another case of TOO MANY WUs DOWNLOADED.
.

ID: 11451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11456 - Posted: 27 Feb 2006, 14:19:14 UTC - in response to Message 11451.  
Last modified: 28 Feb 2006, 3:50:51 UTC

To avoid another "flood" of WUs, as soon as the request (for 256200 seconds of work - that's what I currently have specified in my General preferences) was sent to the server, I *manually* clicked in BOINCmanager for: "No more work from Rosetta". I'm now watching 9 WUs being downloaded (preceded by 4.81). That would be correct if each ran for 8 hours. But earlier than today I had set my "Target CPU run time" value in the Rosetta preferences to 10 hours. Oh, well! Just another example of unexpected behavior, to keep in mind.

I have updated the FAQ on the time setting to try to better explain how this all works in practice. But the system MAY have got it right by giving you only 9 WUs. remember some of them may run longer than the time setting you have, because they will run to the completion time you asked for, but they will also always complete at lest one model. So if your settings would allow 9.9 models to complete, it will run shorter than you expect. But if one model would take 10.5 hours it will run longer.

Assuming you complete a number of WUs ok, the quota will rise very quickly

You did not catch what I was complaining about:
- The client asked for 256200 seconds (72 hours) of work. (correct)
- The SERVER sent 9 WUs (as though each would run for 8 hours; 9 WUs times 8 hours/WU = 72 hours total). IF the server had realized that my "Target CPU run time" was set to 10 hours, 9 WUs represent __90__ hors of work (instead of the 72 hours that had been requested). I had expected the server to send me 8 WUs -- 8 WUs times 10 hours/WU = 80 hours total (the closest multiple of 10 <the hours/WU> that exceeds 72). To me, that total of 9 WUs makes yet another case of TOO MANY WUs DOWNLOADED.
.





One would think that the above would cause your system to get to get 8, but because of the run length variations that calculation is really only approximate. So the system could very well have a few of these WUs that will run less than 10 hours because of model length variation, some of them may run 5 or six hours short of 10 hours, if the model size is large. So it is still possible that you have the right amount of Wus. For example, if all 9 of those Wus are large model WUs, that run 2 hours short of the 10 hour time setting, that would represent 18 hours off the time you are especting them to run. The model size is the wild card.

In any case you will have to allow the system to sort this out by not making any adjustments in the parameters for a while, so it can see what is happening and adjust


Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 11660 - Posted: 4 Mar 2006, 23:17:31 UTC

For today's download of work, I did NOT interfere in any way. And everything *was* done more or less correctly. Yesterday, I had changed (at the website) my "time between connects" to 5 days. I've drawn the following two conclusions from today's downlad:

(1)
When the client calculates the amount of work to ask for, it only looks at the 'ready to run' WUs on its queue, and DOES NOT factor in already-requested work that is still in the process of being downloaded. This affects me, because I have a slow connection and my downloads take a LONG time -- that's why I started this topic in the first place.

(2)
However, today the client asked for work only twice -- once when the connection was first established (by me manually un-suspending communication); and once when it realized that the "time between connects" at the website had been upped from 3 days to 5 days. (This second request was for slightly too much -- probably because not ALL of the first request had been downloaded by the time the second request was made.)

That is how I *wanted* the client to behave -- NOT to periodically re-request work while downloads were still in progress. The most likely explanation for it behaving properly is that it had now established a "history" that WUs on my system take 10 hours each (as set in Rosetta preferences "CPU time").

On the earlier runs that I complained about, the new WUs had *longer* run times than the WUs for which the client had "history" -- possibly it was having short __"history"__ values that caused the client to repeat and repeat and repeat its (ever diminishing) work request calculations at four minute intervals (each time causing MORE and MORE and MORE work to be scheduled for download).
.
ID: 11660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : too many WUs downloaded



©2024 University of Washington
https://www.bakerlab.org