Message boards : Number crunching : Downloaded way too many tasks at once?
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
Suddenly my laptop, which runs 3 cores on Rosetta and is set to queue 3+3 hours buffer, and has worked perfectly for ages, downloaded about 250 tasks of 7.5 hours each. This would have taken 26 days to complete, and they have a 3 day deadline. Server hiccup? I've aborted enough so I only have a 2 day queue. After checking the logs, it seems it was downloading one at a time repeatedly for the last few hours. I shall reboot the machine incase Boinc's scheduler is mixed up, but has anyone else seen this happen? Can anyone check the tasks sent to my machine to see what happened? It's this one: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3792849 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
It was just that one computer, and you didn't change any of the local settings (such as cache size)? Even so, it still shouldn't have occurred as the project recently changed the the way Rosetta determines how much work to send. Instead of using the Estimated completion time, it's using the Target CPU Runtime. So even if a new application is released (which has no processing history at all so a very, very low Estimated completion time) it should stop a system from getting more work than it can handle. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
It was just that one computer, and you didn't change any of the local settings (such as cache size)? The other 3 computers didn't do anything unusual. I changed nothing, apart from shutting it down to move it to another room yesterday (along with 2 others which didn't play up). At the most it's local IP could have changed, but from the Rosetta end it probably only sees my public IP, of which I only have one. All the tasks were showing an estimated time of 7.5 hours, as they usually do. I did notice some different task names beginning SR5A that I haven't seen before, but they still have the 7.5 hour estimate like the rest. After aborting most of the tasks and rebooting it, it's not trying to get any more. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
It was just that one computer, and you didn't change any of the local settings (such as cache size)? The same machine did it again. And for I think the same reason. It was shut down (uncleanly if that matters) and rebooted. It's this one: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3792849 Let me know if I can provide some logs to show the problem. I can probably make it do it again by crashing it. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
I had similar behaviour when we switched to the 4.20 app. Fortunately I noticed and set the machines to “no new tasks”. After they had returned their first batch of works the estimates had adjusted to the usual 8 hours run time. Another possibility. Do you use an app_config file with a max concurrent statement? There was a bug with BOINC 7.16 where it would do as you described. It is supposedly fixed with 7.16.6, not sure if it made it into the Windows 7.16.5 version or not. BOINC blog |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
I had similar behaviour when we switched to the 4.20 app. Fortunately I noticed and set the machines to “no new tasks”. After they had returned their first batch of works the estimates had adjusted to the usual 8 hours run time. The estimates are fine, it's just downloading way too much - it shows an estimate of a month for tasks that have a 3 day deadline and my buffer is 3-6 hours. It's only done it twice, and both times shortly after rebooting. Another possibility. Do you use an app_config file with a max concurrent statement? There was a bug with BOINC 7.16 where it would do as you described. It is supposedly fixed with 7.16.6, not sure if it made it into the Windows 7.16.5 version or not. Yes! And that's the only machine with one. The reason I did so is it has 4 cores and 8GB RAM. 4 Rosettas can sometimes make it run out of RAM and a core is (stupidly) left idle. I had to use the file to tell Boinc to limit Rosetta to 3 WUs, so it runs a Universe on the other core. You'd think it would work this out for itself. 0.5GB left, Rosetta is too big, so obviously try to fit in a smaller project's task?! Boinc really annoys me sometimes with its gross stupidity. I'm using 7.16.5 in Windows 10. 4 CPU cores, no GPUs on that machine. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
Another possibility. Do you use an app_config file with a max concurrent statement? There was a bug with BOINC 7.16 where it would do as you described. It is supposedly fixed with 7.16.6, not sure if it made it into the Windows 7.16.5 version or not. I’ll ask the question on the BOINC mailing list if the fix made it into the Windows 7.16.5 build, or maybe the fix didn’t work. I haven’t managed to get a 7.16.6 for Linux yet. BOINC blog |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
0.5GB left, Rosetta is too big, so obviously try to fit in a smaller project's task?! Boinc really annoys me sometimes with its gross stupidity.Not to do what you find obvious could well have been a design decision. Not all people are idiots you know. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,857 |
|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
0.5GB left, Rosetta is too big, so obviously try to fit in a smaller project's task?! Boinc really annoys me sometimes with its gross stupidity.Not to do what you find obvious could well have been a design decision. Not all people are idiots you know. There can be no sensible reason for that. Imagine you have a cardboard box that you're going to use to transport some stuff somewhere, and you can fit 3 large objects into it. It's technically big enough for 3.5 objects, but you can't cut those objects in half. But you need to carry some smaller objects too, why not put them in aswell? Empty space is pointless. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
I haven’t managed to get a 7.16.6 for Linux yet. That is DA's attempt at building under Ubuntu. I'm not sure which version of Ubuntu, probably 18.04. I'll wait for the Debian maintainers version. They got it as far as Bullseye but that has a newer Glibc and GCC than Buster (the current release of Debian). BOINC blog |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
Yes! And that's the only machine with one. Peter, can you give a log with the sched_op_debug flag turned on and you'd have to allow work fetch. If its the bug I raised then it will ask for work as the current tasks get close to completion. It will request one task at a time (repeatedly) until you turn off work fetch. If you send it via a private message would be good and I will pass it onto the mailing list. I only need a few work requests, no need for the whole log. BOINC blog |
floyd Send message Joined: 26 Jun 14 Posts: 23 Credit: 10,268,639 RAC: 0 |
There can be no sensible reason for that. Imagine you have a cardboard boxI understand your thoughts, I just disagree when you say there's no alternative to your suggested action. Let's get back to BOINC, your cardboard box example is a little too simplified. So you have some tasks running and a big Rosetta task and some other smaller tasks waiting. The Rosetta task doesn't fit so you run a smaller one instead and have made sure there's no unused resources. But that's not the end of the story, the big one is still waiting. Now when a task ends and you get some free memory you could face the same situation and again you decide to do something else first. Can you be sure the big task will ever run? You rely on your luck there. You always do as much as possible but you can't be sure that all will be done eventually. There is an alternative if you don't have enough memory, keep what you have and wait for more. That way you'll eventually have enough but you don't do as much work as you could. This is not ignoring the situation, there's a plan behind it but with other objectives than yours. These are just two simple alternatives, both with their advantages and drawbacks. You could of course improve them while adding complexity. I'm not trying to start an argument about the best decision here, I just want to show there's more than one. If someone doesn't do what you think they should that doesn't necessarily mean they haven't thought about it. In fact they could have thought carefully about it and decided otherwise. We all don't know. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,839,945 RAC: 13,173 |
There can be no sensible reason for that. Imagine you have a cardboard boxI understand your thoughts, I just disagree when you say there's no alternative to your suggested action. Let's get back to BOINC, your cardboard box example is a little too simplified. So you have some tasks running and a big Rosetta task and some other smaller tasks waiting. The Rosetta task doesn't fit so you run a smaller one instead and have made sure there's no unused resources. But that's not the end of the story, the big one is still waiting. Now when a task ends and you get some free memory you could face the same situation and again you decide to do something else first. Can you be sure the big task will ever run? You rely on your luck there. You always do as much as possible but you can't be sure that all will be done eventually. There is an alternative if you don't have enough memory, keep what you have and wait for more. That way you'll eventually have enough but you don't do as much work as you could. This is not ignoring the situation, there's a plan behind it but with other objectives than yours. These are just two simple alternatives, both with their advantages and drawbacks. You could of course improve them while adding complexity. I'm not trying to start an argument about the best decision here, I just want to show there's more than one. If someone doesn't do what you think they should that doesn't necessarily mean they haven't thought about it. In fact they could have thought carefully about it and decided otherwise. We all don't know. Every x minutes, running tasks are changed, as per: Boinc manager, options, computing preferences, switch between tasks every x minutes. So even if the RAM was full of small tasks and Rosetta would never get in, Boinc will reassess every x minutes, realise Rosetta is falling behind in the project weighting that's set, and stop all tasks, put Rosetta in first, then add others to refill the CPU. |
Message boards :
Number crunching :
Downloaded way too many tasks at once?
©2024 University of Washington
https://www.bakerlab.org