Problems with download of WUs: Either now work or overcommitted.

Message boards : Number crunching : Problems with download of WUs: Either now work or overcommitted.

To post messages, you must log in.

AuthorMessage
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 17609 - Posted: 4 Jun 2006, 12:53:29 UTC

Hi,

I run SETI@Home, Einstein and Rosetta. Rosetta is set to 20%. The problem is, that once Rosetta has finished all WUs it never downloads any new WUs. Even when the long_term_debt is highly positive (e.g. 30,000 and bigger) it does not download any WUs. The only way to force download is to pause other projects, but in this case it downloads so many WUs that the computer is overcommitted for many days.

I currently run at "Contact server every 3 days". Even when setting this to 0.3 days before suspending the other projects and resetting it after the download it still downloads too many WUs.

This is what I tried:
1. Set "Contact server every 3 days" to 0.3 days.
2. Set SETI@Home and Einstein to "No new work"
3. Suspend SETI@Home and Einstein
4. Rosetta downloads some WUs
5. Set SETI@Home and Einstein to "Allow new work"
6. Restart SETI@Home and Einstein
7. Set "Contact server every xx days" back to 3 days.
8. Now Rosetta downloads even more WUs, which should not happen since SETI and Einstein are both active -> computer is overcommitted.

Is there a solution to this problem? Resetting long_term_dept to 0.0 on all projects does not help either.

ID: 17609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17643 - Posted: 5 Jun 2006, 4:23:41 UTC - in response to Message 17609.  

Is there a solution to this problem? Resetting long_term_dept to 0.0 on all projects does not help either.

Perhaps "short term" debt is the issue?

If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17643 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 17661 - Posted: 5 Jun 2006, 9:04:42 UTC - in response to Message 17643.  


I see both your machine's seem to be getting many client errors. Have you been able to track that down?


These client errors are there because I had to stop several WUs so that the other projects get at least some share. As I described before Rosetta downloads 12 to 15 WUs and therefore always gets "overcommitted". When I manually delete 10 of these and leave only 3-4 committment gets normal after the first 2 WUs are finished.
ID: 17661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17669 - Posted: 5 Jun 2006, 13:49:04 UTC

Getting BOINC to do what it should is a difficult task. Most of the time I don't understand the decisions of the BOINC scheduler and find them illogical. However as feet1st has pointed out BOINC is designed to achieve the defined share over the long term - which can mean several weeks if BOINC initially screwed it up. Perhaps you leave BOINC alone for a week and then check what it did? Alternatively you crunch your projects one after another. One week SETi only, one week Einsten, one week Rosetta of any fraction you like.
ID: 17669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17673 - Posted: 5 Jun 2006, 15:41:58 UTC - in response to Message 17661.  

...I had to stop several WUs so that the other projects get at least some share.

That is the root cause then. You must have received too much work, gone in to "earliest deadline first" scheduling, and crunched a lot of Rosetta WUs. So, now, BOINC is just trying to pay back that time to the other projects. And it "knows" that is the "plan" for the next 3 days, so it doesn't download any Rosetta WUs.

I for one would like to keep a WU for each project at the ready. But BOINC doesn't think that way. It's ok. It will sort itself out if you let it... but letting it run without suspending, aborting, etc. These actions, trying to FORCE it to do things, only confuses the BOINC manager as it tries to get things back to normal.

What may have started the whole thing could be that you adjusted your WU runtime preference in Rosetta? The initial reaction to that is that BOINC THINKS the new WUs it's downloading will take your previous runtime (usually about 10,000 seconds) but then when it goes to crunch them, finds they take your NEW WU runtime preference time to run. It WILL learn over time that the WUs take longer and will adjust the expected runtimes, but again it takes it some time to "learn" about the change. In general, make changes to WU runtime gradually.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17673 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 17680 - Posted: 5 Jun 2006, 16:22:14 UTC - in response to Message 17673.  

...I had to stop several WUs so that the other projects get at least some share.

That is the root cause then. You must have received too much work, gone in to "earliest deadline first" scheduling, and crunched a lot of Rosetta WUs. So, now, BOINC is just trying to pay back that time to the other projects. And it "knows" that is the "plan" for the next 3 days, so it doesn't download any Rosetta WUs.

I for one would like to keep a WU for each project at the ready. But BOINC doesn't think that way. It's ok. It will sort itself out if you let it... but letting it run without suspending, aborting, etc. These actions, trying to FORCE it to do things, only confuses the BOINC manager as it tries to get things back to normal.

What may have started the whole thing could be that you adjusted your WU runtime preference in Rosetta? The initial reaction to that is that BOINC THINKS the new WUs it's downloading will take your previous runtime (usually about 10,000 seconds) but then when it goes to crunch them, finds they take your NEW WU runtime preference time to run. It WILL learn over time that the WUs take longer and will adjust the expected runtimes, but again it takes it some time to "learn" about the change. In general, make changes to WU runtime gradually.


Hi Feet1st,

thanks for this answer. It sounds reasonable and logical, so I will try as you suggested.


ID: 17680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 18774 - Posted: 16 Jun 2006, 8:05:49 UTC - in response to Message 17643.  


If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?


I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.

ID: 18774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 17 Sep 05
Posts: 22
Credit: 405,640
RAC: 0
Message 18777 - Posted: 16 Jun 2006, 9:36:29 UTC

The same illogical behaviour here too using BOINC 5.4.9.
It only switches between overcommitted / EDF mode and fetching no work for days.
greetz, Uli

ID: 18777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 18778 - Posted: 16 Jun 2006, 9:36:49 UTC - in response to Message 18774.  


If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?


I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


The scheduling of Rosetta is the scheduling of BOINC and yes the BOINC scheduler does not work. You may try to reduce your reconnect time to something belwo 2 days and you will see less overcommitted messages. Besides being overcommitted is in fact no problem other than it sounds stupid. You (probably) won't miss deadlines and that's the important thing
ID: 18778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 18781 - Posted: 16 Jun 2006, 10:07:52 UTC - in response to Message 18778.  


If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?


I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


The scheduling of Rosetta is the scheduling of BOINC and yes the BOINC scheduler does not work. You may try to reduce your reconnect time to something belwo 2 days and you will see less overcommitted messages. Besides being overcommitted is in fact no problem other than it sounds stupid. You (probably) won't miss deadlines and that's the important thing


tralala,

can't be. Works perfectly with all other projects. Never had a problem with E@H, SETI@Home and Pirates on my Macs and additional Climateprediction on my P4. Must be a Rosetta problem then.

ID: 18781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 18793 - Posted: 16 Jun 2006, 12:29:56 UTC - in response to Message 18781.  


If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?


I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


The scheduling of Rosetta is the scheduling of BOINC and yes the BOINC scheduler does not work. You may try to reduce your reconnect time to something belwo 2 days and you will see less overcommitted messages. Besides being overcommitted is in fact no problem other than it sounds stupid. You (probably) won't miss deadlines and that's the important thing


tralala,

can't be. Works perfectly with all other projects. Never had a problem with E@H, SETI@Home and Pirates on my Macs and additional Climateprediction on my P4. Must be a Rosetta problem then.

Have you read the FAQs that discuss the time parameter? If you are constantly adjusting the system it will never settle down. You have to balance the connect time and the time setting with reasonable values and then let the system settle itself out.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 18793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 18797 - Posted: 16 Jun 2006, 13:42:23 UTC - in response to Message 18793.  


If it were me... I'd let it run the way it wants to for 4 days. So after 4 days I'd update to Rosetta, if it still sends no work then I'd do the "reset" project with Rosetta. In general, BOINC will handle it, and assure Rosetta gets it's proper resource share.

I see both your machine's seem to be getting many client errors. Have you been able to track that down?


I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


The scheduling of Rosetta is the scheduling of BOINC and yes the BOINC scheduler does not work. You may try to reduce your reconnect time to something belwo 2 days and you will see less overcommitted messages. Besides being overcommitted is in fact no problem other than it sounds stupid. You (probably) won't miss deadlines and that's the important thing


tralala,

can't be. Works perfectly with all other projects. Never had a problem with E@H, SETI@Home and Pirates on my Macs and additional Climateprediction on my P4. Must be a Rosetta problem then.

Have you read the FAQs that discuss the time parameter? If you are constantly adjusting the system it will never settle down. You have to balance the connect time and the time setting with reasonable values and then let the system settle itself out.


Hi Moderator,

yes, I did read the FAQ. I did not touch the system for 5 days, it's set to connect every 3 days, the time-to-switch is set to 36 minutes and "Keep in memory" is enabled. Shorter connect times are not acceptable because SETI@Home and sometimes even E@H tend to break down for several days on a regular basis.

ID: 18797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 18801 - Posted: 16 Jun 2006, 14:51:09 UTC

I've been running for months and not changed preferences and BOINC settled in and I never miss a deadline, always have work, and am never overcommitted... then I upgraded BOINC. Now it tends to be jumpy and pull the overcommitted trigger.

So, if you are using a newer BOINC version, perhaps you are seeing the same sideeffects.

The only thing that is Rosetta-specific would be the WU runtime preference as described in the FAQs. And it gets a bit odd when you first make a change to that preference. Say you start with the default 3hr preference and you've got your 3 day cache. So, that's 24 WUs (48 if you've got a dual core). Then you update the preference to a longer runtime, so 8 hrs. The WUs presently on deck will run with that new 8hr preference once you've updated to the project to get the preference change. So now all of the sudden you've got 8 days of work rather than 3. And the estimated runtimes of the WUs won't reflect the 8hrs until you've crunched a few and updated to the project.

So, have you recently changed your WU runtime preference? Or, put another way, when you download a new WU, is the estimated runtime matching your WU runtime preference in your Rosetta Preferences?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 18801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 18806 - Posted: 16 Jun 2006, 15:19:40 UTC - in response to Message 18797.  

...Hi Moderator,

yes, I did read the FAQ. I did not touch the system for 5 days, it's set to connect every 3 days, the time-to-switch is set to 36 minutes and "Keep in memory" is enabled. Shorter connect times are not acceptable because SETI@Home and sometimes even E@H tend to break down for several days on a regular basis.


What setting are you using for the "Time" Parameter? The longer time settings are designed for longer connection intervals, and the shorter times are for shorter intervals. But you have to be careful not to play with the setting too much if you are running a very long work queue.

But here is the problem. With the long connect time the system will always want to load a lot of work units. If I read your reply correctly, you are processing three projects. The fact that these other projects might go down should not affect the way you are setting your system. With (at least) three projects you will always have work on your system.

For example I run the same projects as you (in fact others as well). My connect time is set to .2 days. There are never more than 3 or 4 work units in my queue for any single project, yet there is always work. If a project is offline for a time, the system simply process for one of the other projects, until the problem gets fixed.

My systems are never forced into EDF mode, and they are very stable.

In your case you are forcing the system to run in "bursts". This will prevent the system from stabilizing for a very long time. Under the conditions you describe it could take 30 to 40 days for your setup to become truly stable, depending on your "time" setting.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 18806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 19235 - Posted: 24 Jun 2006, 18:48:34 UTC - in response to Message 18806.  
Last modified: 24 Jun 2006, 18:51:51 UTC

...Hi Moderator,

yes, I did read the FAQ. I did not touch the system for 5 days, it's set to connect every 3 days, the time-to-switch is set to 36 minutes and "Keep in memory" is enabled. Shorter connect times are not acceptable because SETI@Home and sometimes even E@H tend to break down for several days on a regular basis.


What setting are you using for the "Time" Parameter? The longer time settings are designed for longer connection intervals, and the shorter times are for shorter intervals. But you have to be careful not to play with the setting too much if you are running a very long work queue.

But here is the problem. With the long connect time the system will always want to load a lot of work units. If I read your reply correctly, you are processing three projects. The fact that these other projects might go down should not affect the way you are setting your system. With (at least) three projects you will always have work on your system.

For example I run the same projects as you (in fact others as well). My connect time is set to .2 days. There are never more than 3 or 4 work units in my queue for any single project, yet there is always work. If a project is offline for a time, the system simply process for one of the other projects, until the problem gets fixed.

My systems are never forced into EDF mode, and they are very stable.

In your case you are forcing the system to run in "bursts". This will prevent the system from stabilizing for a very long time. Under the conditions you describe it could take 30 to 40 days for your setup to become truly stable, depending on your "time" setting.


I did not change the time-settings at all, and I am not willing to wait "30 to 40 days" and accumulate weeks of long-term debt until Rosetta (probably) decides to download a reasonable amount of work. As I mentioned before: this works perfectly with all other BOINC-projects after 1 or 2 days. I am also not a computer guru and simply do not understand some of the explanaitions in the FAQ (might be a language problem as well since my mother-tongue is german).

I realized that a new version was released. Let's hope and see if that one solves this problem, otherwise I will just quit this project.


ID: 19235 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Problems with download of WUs: Either now work or overcommitted.



©2024 University of Washington
https://www.bakerlab.org