Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 63 · 64 · 65 · 66 · 67 · 68 · 69 . . . 311 · Next
Author | Message |
---|---|
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
When I first saw your msg for some reason none of the other very good replies and suggestions were showing, nor were your useful images, so I've only just caught up. @Sid: Very good of you to go through all that so thoroughly for me! I've implemented most of your suggestions. And thanks for the tip about where application priority is assigned in Windows -- never been in that corner of Task Manager before. I do now suspect that it was expired work units that may have hung up Rosetta. Since it fetched new ones and resumed processing yesterday, I'm now waiting to see when BOINC will fetch new work units for WCG, which it has yet to do since it finished the batch it processed yesterday. Today I've been messing with various commands to probe how that all works. Funny to think that after almost 20 years of running BOINC, I've learned more about it in the last two days than in the entire preceding time. Thanks again! Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,217,610 RAC: 822 |
Mikey: I'm glad it worked for you!!! Now it's depends on your resource share settings for each project on how often you get tasks for each project, the higher the rescourse share the more tasks per day you will run, as long as they are available. Think of the maximum resource share as percentage parts of 100 and each project gets a share. What's easiest is to set Rosetta at say 50%, WCG at 25% and some orther project at 25% andlet Boinc figure it out,which it will do over time.Just besure to keep your cache sizes small so you don't run into deadline problems. With Rosetta's 3 day deadline if you have 3 days of work NO other projects will crunch because their deadline will be further out than 3 days. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Anyway, restricting the number of cores can't help with memory, Boinc will always use as many cores as possible until it hits your set memory limit.No it won't. If you limit it to 1 core, even if you have 128 of them, it will only use the one core for BOINC work. I limited the number of cores to use on one of my systems for a while to avoid out of memory errors until i upgraded the RAM. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
@Sid: Very good of you to go through all that so thoroughly for me! I've implemented most of your suggestions. And thanks for the tip about where application priority is assigned in Windows -- never been in that corner of Task Manager before. Neither had I until Boinc came along for me. I don't think I had any other reason to consider it for anything else I've ever done on my computer. I do now suspect that it was expired work units that may have hung up Rosetta. Since it fetched new ones and resumed processing yesterday, I'm now waiting to see when BOINC will fetch new work units for WCG, which it has yet to do since it finished the batch it processed yesterday. Today I've been messing with various commands to probe how that all works. Funny to think that after almost 20 years of running BOINC, I've learned more about it in the last two days than in the entire preceding time. Thanks again! I don't think it was expired work units. I suspect it was all down to the number of tasks you were aiming to run within limited RAM. Which is why allocating a bit more RAM - or - reducing the tasks running at the same time (thereby needing less RAM) is giving you the room to run successfully. And unnecessarily suspending tasks when they could run is the rest of it. And all WCG projects are way less demanding of RAM so there's never going to be a problem with them. I'll be interested to see what happens after you give it a few days. Hopefully all issues will be solved. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
The idea that IBM might also have done a security check on it makes me think I should use that version as I have very little faith in the standard Boinc Manager. What security problem could Boinc possibly have? You are in control of what projects it communicates with. Suspend when computer is on battery: Untick. Your computer doesn't run on battery so the selection is redundant. It's intended for laptops/portables that aren't plugged into the mains. I use that option on my desktop as it has a UPS. Suspend GPU computing when computer is in use: Untick. None of the projects you run offer GPU computing. And clear the field that mentions mouse/keyboard input - also redundant. Does it matter what it's set to if it's redundant? You may not have picked up on it, but Grant mentioned that Rosetta (and I think tasks from all projects under Boinc) run at 'Low' or 'Idle' priority. Except LHC which is in a virtual machine and doesn't seem to behave properly. If you open the task manager under Windows 10, go to the "Details" column, find a Rosetta task, right-click on it and hover the mouse over the "Set Priority" option, a little sub-window will open showing the priority Rosetta runs at. Actually priorities in Windows 10 are abysmal. I've often seen things at low priority getting more CPU than things at normal. Sometimes circumstances can arise when there's a conflict between Rosetta and other programs you run, but it's generally something pretty specialist. I've got Boinctasks on the monitor to my right, I can see what all 6 machines are doing all the time. Correction, it's spilled into two monitors now. 76 tasks at once. In the usage limits section: I use TThrottle to do a similar thing and it doesn't seem to cause problems. Except TThrottle is better, as you set a temperature and it adjusts the % continuously. I have it on the machine in here because it's too loud at full blast if the room is warm. And note that running all 4-cores at 100%, 100% of the time, with nothing set in the "when to suspend" section, that will push up your temperatures as well, so there's two reasons you may want to go back to 75% cores if it's problematic. Apart from laptops, I've never known a CPU overheat, even on stock fans. Just besure to keep your cache sizes small so you don't run into deadline problems. With Rosetta's 3 day deadline if you have 3 days of work NO other projects will crunch because their deadline will be further out than 3 days. Actually the others will crunch, as Boinc downloads small amounts of work each time, not the whole 3 days, it tops it up. If Rosetta has been using the CPU all of the time, the next call for work will be from another project. No it won't. If you limit it to 1 core, even if you have 128 of them, it will only use the one core for BOINC work. I didn't mean that, I meant if you do not restrict cores, then Boinc will restrict for you when your RAM runs out. |
Christopher Graesser Send message Joined: 26 Jan 16 Posts: 3 Credit: 1,192,390 RAC: 0 |
In the meantime, my BOINC Client works fine again, seems to be a temporary issue. All other projects worked fine, the rosetta website responded slowly, but after quite some time, it worked well again. Thanks to all. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Well, the adventure continues, with a new question about BOINC's work unit triage decisions. In the below task listing, can anyone explain this scenario, referring to the screenshot below? Note: With my current settings, BOINC processes three work units at a time.
system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Since the Rosetta one (red arrows) has been waiting its turn, and has the earliest deadline in the list, I expect BOINC will start on it next. It doesn't do it that way, or Rosetta would hog your machine for a week as the other projects all have longer deadlines. It picks the next task according to your project weights - eg. if you have Rosetta and WCG on even weights, then it will try to do the same amount of each overall (averaged over some days - not sure how many). |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Well, the adventure continues, with a new question about BOINC's work unit triage decisions. In the below task listing, can anyone explain this scenario, referring to the screenshot below? [snip] BOINC does NOT automatically choose to start the task with the nearest deadline first. It's more like: High priority (usually due to a deadline in less than 24 hours) tasks first. Projects in order by how much work is needed to restore the balance, skipping those that have no tasks ready to start. Within each project, usually from the oldest to the newest download time. If multiple tasks are marked with the same download time, something I have not identified. I suspect that is in the order of which the last of the task's input files were downloaded. You don't have to like this, but complaining on Rosetta@home is unlikely to change this. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Why the WCG and not the Rosetta? I can only guess it has to do with the "switch between tasks" setting, which I currently have set to a little longer than a Rosetta work unit requires.No, it's not to do with the Switch between Taks setting- it is all about your Resource share settings. As BOINC does work for each project, it keeps track of the mount of work done. It will then balance what is done between the different projects, in order to meet your Resource share settings. When you add of remove projects, change cache settings, increase (and even more so decrease) the amount of time BOINC can process work it then has to re-juggle the work it does to match the work done on projects with the debt owed to other projects. It takes time for things to settle down. If you keep tweaking things, then they will never settle down. Since Rosetta units require a longer processing time and carry tighter deadlines than the WCG OpenPandemics, I would think it best for BOINC to be toiling on two Rosettas and one WCG at a time, thereby completing 2 of the former and 5 of the latter every 8 hours. But I don't see how to tweak BOINC to achieve that if it is in fact a valid pursuit. Any ideas?Leave it alone and let it sort itself out. The larger your cache, the more projects you run, the less time your projects have to actually do work, the longer it will take for things to settle down (think months). If BOINC is able to run whenever the system is running, and the system is running, and you run with no cache, then things will settle down within a week or 2. NB- Use at most xx% of CPU time is best set at 100%. Reduce the number of cores in use (Use at most xx% of the CPUs) if heat is an issue (or improve the system cooling). Suspend when non-BOINC CPU usage is above xx% is best not being selected at all. If BOINC processing does affect another programme, you can use the Exclusive Application option to stop BOINC when that particular programme is running. Grant Darwin NT |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Thanks for that input, guys! I thought it might be a case of settling in to a routine. I've tweaked the settings according to Grant's suggestions, and will now sit back for a stretch and see how things shake out. Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,217,610 RAC: 822 |
Thanks for that input, guys! I thought it might be a case of settling in to a routine. I've tweaked the settings according to Grant's suggestions, and will now sit back for a stretch and see how things shake out. It also had to do with the fact that the 2nd of July is still 3 days away!!! Boinc thinks your Rosetta task will take 8 hours of which there are still 7 of those between right now and the deadline on the 2nd off July, plenty of time to do a single workunit. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now... Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now... project weight can do that - also, that "switch projects every ____ minutes". Either can affect that - I've noticed the same thing with the same 2 projects. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now... Typical if it decided that it has given enough work to the project of the workunit in progress to catch up on balancing. Also typical if the newly started workunit has come close enough to its deadline to put it into high priority mode. This might be blocked if the main memory if not large enough to keep both workunits there at once. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Peter: you were in one of your moods again, weren't you... lol Does it matter what it's set to if it's redundant? In principle, there's no point constantly monitoring or polling for something you already know isn't there. There's often a wait for a response until a timeout, every moment of which is a waste. And discussing the subject here made me rethink some of my own settings, some of which I've now removed following that logic. Apart from laptops, I've never known a CPU overheat, even on stock fans. You said you use TThrottle to slow your PCs to prevent that. For me, any throttling from full turbo is a personal affront, so if I'm not on the verge of overheating at all times I'm disappointed! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Why the WCG and not the Rosetta? I can only guess it has to do with the "switch between tasks" setting, which I currently have set to a little longer than a Rosetta work unit requires.No, it's not to do with the Switch between Tasks setting- it is all about your Resource share settings Yes. And I'd agree with Robert's list of the order of priorities that tasks run from the few occasions I've seen it come into operation over the years. And moreso with the setting Eric is using. He's taking 'Switch between tasks" out of the equation altogether unless a task runs longer than 500 minutes. NB- Use at most xx% of CPU time is best set at 100%. Reduce the number of cores in use (Use at most xx% of the CPUs) if heat is an issue (or improve the system cooling). As I suggested earlier, yes. After making that change, and having already slightly increased the RAM allocated to Boinc, the only remaining change is to see if the 4th core can be run (CPUs 100%) to see if it'll now fit into RAM. An extra 10% (800Mb) might make all the difference |
Stevie G Send message Joined: 15 Dec 18 Posts: 108 Credit: 866,895 RAC: 389 |
[quote] What's easiest is to set Rosetta at say 50%, WCG at 25% and some other project at 25% and let Boinc figure it out,which it will do over time.Just be sure to keep your cache sizes small so you don't run into deadline problems. [quote] I was trying to get the six Rosetta tasks completed by their 2 July deadline, but that didn't work out. Today I had 13 download failures and one upload failure and they're all gone. Rosetta is trying to download 6 more, but they are taking very long and I they may fail as well. OOopps. I paused my other projects and all six just now downloaded successfully. They are due on 4 July. I have to suspend most of them to let Asteroids and WCG catch up. But I will let them resume in a short while. Is it possible that these things happen because this "low-spec" computer was trying to run two Rosetta tasks while two Asteroids and one WCG tasks were running or waiting to run? Maybe I should add some more RAM? It only has 8 gigs. Steven Gaber Oldsmar, FL |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,619 RAC: 10 |
Apart from laptops, I've never known a CPU overheat, even on stock fans. Ha ha ha LOL. I had to comment as I have in fact been plagued with a CPU Overtemp F1 error on my daily driver since I put it together. Baffling thing is that it was tripping at 80° C. when the cpu spec for thermal throttling is 95° and self-protection at 105°C. But others with the same motherboard and cpu would state they had no issues running to 105° C. before they hit an overtemp error. This is on a system with excellent custom cooling and runs with my normal BOINC load with temps around 70° C. Only when stress testing would I overtemp. Also ignoring the time period when a certain BIOS revision turned off all motherboard fan headers sporadically and as expected the system would overtemp and shut down. Took the mobo OEM six months to fix that BIOS issue. Thankfully I finally figured it out after several years with some OCN forum help. Just was a benign BIOS setting on my host that did not behave normally or as expected. A quirk for this system not replicated on others. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
What's easiest is to set Rosetta at say 50%, WCG at 25% and some other project at 25% and let Boinc figure it out,which it will do over time.Just be sure to keep your cache sizes small so you don't run into deadline problems. 8Gb RAM ought to be plenty for a 2-core processor. Have you looked at the previous advice in this thread and compared to your own settings (even though the advice was for a different machine)? There should be plenty for you to consider. Boinc <ought> to be able to give your other projects enough time to complete before their deadlines without you having to suspend them. The longer you can run without interfering, the better Boinc will be able to decide for you. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org