Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 274 · 275 · 276 · 277 · 278 · 279 · 280 . . . 352 · Next
Author | Message |
---|---|
kotenok2000 Send message Joined: 22 Feb 11 Posts: 288 Credit: 540,373 RAC: 0 |
I hope they will still get points when script runs, because each task would still generate unique data. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
I'd like to comment. I understand the issue better now. Irrespective of fault, it seems like all Boinc projects are having problems coexisting with Folding@home, evidenced by Grant's comment And the same issue is happening with your other projects. This is only a problem to the extent that tasks miss deadlines, which is what you have, so check these settings in turn: 1. Ensure "at most xx% of CPU time" is set to 100% for all Boinc tasks. 2. You may think Rosetta is set to 8hrs, but every one of your tasks runs to 43,200secs of CPU time, which is 12hrs. So go to your account online and within rosetta@home preferences reaffirm "Target CPU run time" is set explicitly to 8hrs and Update Preferences. Rosetta certainly thinks it's set to 12hrs. 3. If you still can't complete tasks within the deadline, reduce your cache size in Boinc, so you don't download too many tasks to complete before deadline. I think Point 2 will be the solution. Rosetta is a bit weird when non-default runtimes are set. They're all downloaded as if they're 8hrs tasks, but when it gets close to that runtime only then does it adjust the remaining time up toward 12hrs. So they run 4hrs longer, then projects the size of the rest of the cache as if it will be 8hrs again. It's been programmed to <not> adjust based on past history. I forget why but I do recall when it was deliberately made to work that way. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
This will solve the <entirety> of your problems, while (coincidentally) massively increasing your contribution to <all> the projects you run within your preferred settings.He's running Folding at home as well. Ta, I didn't pick up the Folding@home involvement - that explains part of it. But I do think it's the Target CPU time aspect that's tipping things over the edge - partly because I'm set to 12hr tasks too and it is a bit weird, but I run a small enough cache and only two projects so it never affects me. The part about using 12hr tasks not changing the projected runtime of the rest of the Rosetta cache is something that was brought in... about 4 years ago. I'm pretty sure that's not a coincidence. Which means the the tipping point <is> a Rosetta issue after all ![]() ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1895 Credit: 18,534,891 RAC: 0 |
I'd like to comment.No you don't, you just ignore what you are told as to how to fix it. Twice now. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.And since it is occurring with a BOINC project- actually all of your BOINC projects, not just this one- might it be somewhat obvious that those of us here doing BOINC work might have some idea of what is actually going on? While those at Folding- unless they do BOINC work as well- won't have the slightest idea of what you are complaining to them about? And if you had paid the slightest bit of attention to the responses i gave you previously, you would understand what the problem is & how to fix it. I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.The third option would be to fix it so that both can co-exist, hundreds (if not a thousand +) of other people have done so. Twice i have told you what the problem is. Twice i have told you how you could fix the problem. And twice you have ignored completely everything you were told that would allow you to sort it out. So, yeah, not doing either of them is probably the best option for you. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1895 Credit: 18,534,891 RAC: 0 |
Which means the the tipping point <is> a Rosetta issue after allNope. If it took 12 hours to do 12 hours of work, there'd be no problem. But because it takes 24hrs to do 12hrs work, it's a big problem. Even set to 8 hours, it would still take 16hrs, so still Panic mode. Make it so the CPU isn't over committed, and all would be OK. His problem is purely down to it taking 2-4 times longer than it should to process any BOINC Tasks, because the CPU is also processing Folding work on the same CPU cores/threads- X cores/threads trying to process X+1 or X+2 applications (that are using 100% of each core/thread) is always going to cause problems. As long as the number of applications being run is equal to or less than the number of cores/threads, all will be well- so limiting the number of cores/threads available to BOINC so Folding has as many as it needs (1, 2, 4 or however many that is) would sort it out. Of course if "Use at most xx % of CPU time" is anything other than 100%, that would just add to the issues of doing Folding on the same cores/threads as BOINC work (as would any GPU Tasks from BOINC projects that require 1 core/thread per GPU Task being run to support it, and that too can be resolved, although it's more difficult than it needs to be). Grant Darwin NT |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
I remember from when I was running Folding@Home also that Folding@Home expects to use entire CPU cores, not just the available threads in that CPU core. An easy way to handle this is to start the Folding@Home program at least a full minute before starting any BOINC program. |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
I've got 15 tasks returned after deadline and they've all validated and credited. Good for you. I have a lot of "cancelled by the server" |
![]() ![]() Send message Joined: 16 Jun 07 Posts: 29 Credit: 5,504,906 RAC: 7 |
same here, i lost a hundred tasks :( i crunch for Ukraine. Join our team forums about Rosetta@home |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 221 Credit: 7,572,744 RAC: 0 |
I've got 15 tasks returned after deadline and they've all validated and credited. So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy. ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
Which means the the tipping point <is> a Rosetta issue after allNope. I don't completely agree. It's not just that a 12hr task (that Rosetta only shows Boinc as 8hrs for the bulk of its run) is taking 20-32hrs to complete, it's that the next tasks in the cache are showing 8hrs to Boinc but will also take 20-32hrs too. Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks. 14-22hrs less processing time to complete tasks will make a huge difference to whether Panic mode arises. I'd guess <all> the difference. This is only an issue if the cache is set above a day. It can be made to work by ensuring Rosetta tasks only run for the time Adrian already thought they were set to (8hrs rather than 12hrs they actually run for). It can certainly be solved your way, but that gets a bit fiddly imo and doesn't resolve the confusion Rosetta runtime introduces. I'd rather my solution if I were him too, especially if RAM and disk space don't come into the equation. And we already know Adrian didn't like your solution, so let's see what he thinks of my alternative. It's entirely up to him. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
I've got 15 tasks returned after deadline and they've all validated and credited. It was a very early call - in the first few hours. In the end I had 13 cancelled by the server, none of which had started to run. However, I did have 1 task that ran to completion, but came up with a validate error because the previous host reported it late. On balance, it could've been a lot worse on a 16-thread machine. I'll live with it. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
I've got 15 tasks returned after deadline and they've all validated and credited. It's a consequence of the whole site being down. It seems like, once the site came back up, it timed-out tasks that missed deadline straight away and reissued them, but the host didn't re-poll the server until it's timer ran out - could've been 4-5hrs after the site came back up - to report they were completed. It's just unfortunate. ![]() ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1895 Credit: 18,534,891 RAC: 0 |
Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.All that does is stop Panic mode from occurring most of the time- there will still be times where it does occur (because of all the other projects all taking longer to complete their Tasks than they expect to as well). Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms. It can certainly be solved your way, but that gets a bit fiddly imo and doesn't resolve the confusion Rosetta runtime introduces.How is it fiddly? I'm changing one value, and fixing the cause of the problem (over committed CPU). You're changing one value, and fixing the symptom (Panic mode occuring). In both cases, only one value needs to be changed. Although it does require some thought to fix the problem, to determine what % "Use at most..." should be set to. 87% leaves 1 core/thread free for non-BOINC work (7/8=0.875). 75% leaves 2 cores/threads free for non-BOINC work (6/8=0.75). Not really a big effort required IMHO. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1895 Credit: 18,534,891 RAC: 0 |
And once again we've got problems. The Validators & Assimilators are down, so the backlog of that work continues to pile up. And if it backs up enough, then the disks end up full & things crash and fall over all over again. Edit- looks like they're all on the one server- boinc-process Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.All that does is stop Panic mode from occurring most of the time- there will still be times where it does occur (because of all the other projects all taking longer to complete their Tasks than they expect to as well). Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution. It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention. Missing deadlines has all sorts of consequences both sides of the server divide. Meeting deadlines has none. For some reason I now want to quote Mr Micawber from Charles Dickens' David Copperfield: “Annual income twenty pounds, annual expenditure nineteen pounds, nineteen and six, result happiness. Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery” Point being, the detail isn't relevant as long as you succeed. Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms. First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing. Second, that it's any business of the user as long as the computer doesn't crash and completes its work successfully and within the envelope of time allowed. If the user is happy for more tasks to be running simultaneously, outside of their individual planned time, but still within the overall deadline, that's entirely up to them. Your alternative being a smaller number of tasks run for each project, but with a core/thread dedicated to them, which is fine but will fall flat when there's a lack of task availability. It's a choice. I recognise it, but I wouldn't personally opt for your one either. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,506,558 RAC: 3,757 ![]() |
I was going to edit the last post, but decided it's worth a new message. I notice adrianxw hasn't reappeared here to comment, so I looked at his tasks and he's taken Rosetta off "no new tasks". I believe he's now set Target Run Time to the default. Not to 8hrs explicitly, but the default. That is "Not Selected". However, his completed tasks now run for ~10,800secs rather than 43,200secs, taking ~15,000secs rather than ~112,000secs. This will definitely provide a solution for him imo. Fine. At some point somewhere - and quite recently - Rosetta's default appears to have changed to 3hrs, meaning tasks get completed and used up far more quickly than intended. And I'm not sure about this, but I think Boinc is forced to assume and schedule Rosetta tasks to run for 8hrs, which is now not right. Can people check what they have set up? Is it 8 hrs or "Not Selected"? Do tasks run for 8hrs or 3 for "Not Selected"? I believe it's the latter. What does Boinc assume runtime will be at download? Somethings gone wrong imo. ![]() ![]() |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 288 Credit: 540,373 RAC: 0 |
It sets 8 hours for 4.20 and 3 for 6.05 |
Link![]() Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.Eliminating the reason for the panic mode is the entire solution, everything else is a workaround, which might fail as soon as something changes (new WU type, new project, whatever) or even before. It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.It's not just not pretty, highly overcommiting the system might slow down the overall production, in particular with hyperthreading CPUs many people leave 1-2 theads for non-BOINC stuff. . ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1895 Credit: 18,534,891 RAC: 0 |
Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.No Panic mode doesn't mean they won't be completed. It means there is a high risk of not being completed if not processed immediately. Which fixing the overcommitted CPU does resolve. It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.No one in their right mind would think taking 12 hours to 6 hours work is good (which is double the time required- on another project it's taking them 4 times as long). You may dispute that, but it doesn't make it any less true.Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing. And it needs fixing because the poster keeps complaining about it. If they don't complain about it, then no it doesn't need fixing. If the user is happy for more tasks to be running simultaneously, outside of their individual planned time, but still within the overall deadline, that's entirely up to them.Yep. But in this case it May cause problems with deadlines, resulting in Panic Mode, which the poster has an issue with, so it is an issue that should be addressed. Why fix the symptom, when fixing the problem would result in more work being done- even with less cores/threads available to BOINC, the amount of work done for BOINC would be almost triple what it presently is. Your alternative being a smaller number of tasks run for each project, but with a core/thread dedicated to them, which is fine but will fall flat when there's a lack of task availability.Why would you think that????? All my setting does is stop 9 things, or 10 things or more from trying to run on 8 cores/threads at the same time. It does not in any way stop cores/threads from being used by different projects at the same time. What it does stop is BOINC from trying to use cores/threads that are being heavily used by non BOINC applications. If there are 10 projects with work, or only one, all available cores/threads will be used. Grant Darwin NT |
tazzduke Send message Joined: 2 Jul 09 Posts: 2 Credit: 1,293,224 RAC: 0 |
Greetings, Well I have 3 systems running at the moment, all using the default location in preferences with target cpu set to 2hrs Ryzen 5700x (#1) 8c/16t - only using 8c, cpu times are averaging 3hrs (Win11) Ryzen 5700x (#2) 8c/16t - only using 8c, cpu times are averaging 2hrs (Win11) Dual Xeon E5-2470v2 (20c/40t) - only using 8c, cpu times are averaging 2hrs (Linux Mint 21.3) also LHC using some cores as well. I have set work fetch preferences to 0.1 days & 0.1 days, which keeps a small amount of workunits in cache on each machine, its how I like it. But as Grant has already mentioned, the validators are still offline, as pendings are growing. I also fine tuned the core usage on these machines, as I have app_config files in each project, cause sometimes I am running various other projects at sometimes, again my preference only. When pushing hard on some projects and I start using the hyperthreads, I still as a rule, leave 2 threads in reserve for each cpu, for the OS and GPU to use, again my preference only. Hope you have a good day :-) |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org