Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 63 · 64 · 65 · 66 · 67 · 68 · 69 . . . 309 · Next

AuthorMessage
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97809 - Posted: 29 Jun 2020, 2:03:14 UTC - in response to Message 97807.  
Last modified: 29 Jun 2020, 2:19:59 UTC

When I first saw your msg for some reason none of the other very good replies and suggestions were showing, nor were your useful images, so I've only just caught up.
If I can make some further suggestions...

@Sid: Very good of you to go through all that so thoroughly for me! I've implemented most of your suggestions. And thanks for the tip about where application priority is assigned in Windows -- never been in that corner of Task Manager before.
I do now suspect that it was expired work units that may have hung up Rosetta. Since it fetched new ones and resumed processing yesterday, I'm now waiting to see when BOINC will fetch new work units for WCG, which it has yet to do since it finished the batch it processed yesterday. Today I've been messing with various commands to probe how that all works. Funny to think that after almost 20 years of running BOINC, I've learned more about it in the last two days than in the entire preceding time. Thanks again!
Eric

system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 97810 - Posted: 29 Jun 2020, 2:46:18 UTC - in response to Message 97808.  

Mikey:

Thanks for the response.

I uninstalled WCG, downloaded BOINC again, reinstalled the BOINC Manager and all the problems were solved. Got my projects back, but still no Rosetta tasks.

I think you may be right, that WCG may have been installed with BAM. It was making my computer run r-e-a-l-l-y slowly. It changed my Asteroids completion time from around 3 hours to over 4:30 hours.

So I added WCG in the BOINC Manager and it downloaded one task. We'll see how that works.

My machine is back up to speed now and waiting for more Rosetta.

Maybe you guys were correct in saying this box is too low-spec for that kind of work.

Steven Gaber
Oldsmar, FL


I'm glad it worked for you!!!

Now it's depends on your resource share settings for each project on how often you get tasks for each project, the higher the rescourse share the more tasks per day you will run, as long as they are available. Think of the maximum resource share as percentage parts of 100 and each project gets a share. What's easiest is to set Rosetta at say 50%, WCG at 25% and some orther project at 25% andlet Boinc figure it out,which it will do over time.Just besure to keep your cache sizes small so you don't run into deadline problems. With Rosetta's 3 day deadline if you have 3 days of work NO other projects will crunch because their deadline will be further out than 3 days.
ID: 97810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 97811 - Posted: 29 Jun 2020, 7:22:12 UTC - in response to Message 97797.  

Anyway, restricting the number of cores can't help with memory, Boinc will always use as many cores as possible until it hits your set memory limit.
No it won't.
If you limit it to 1 core, even if you have 128 of them, it will only use the one core for BOINC work.
I limited the number of cores to use on one of my systems for a while to avoid out of memory errors until i upgraded the RAM.
Grant
Darwin NT
ID: 97811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 97817 - Posted: 29 Jun 2020, 14:56:48 UTC - in response to Message 97809.  

@Sid: Very good of you to go through all that so thoroughly for me! I've implemented most of your suggestions. And thanks for the tip about where application priority is assigned in Windows -- never been in that corner of Task Manager before.

Neither had I until Boinc came along for me. I don't think I had any other reason to consider it for anything else I've ever done on my computer.

I do now suspect that it was expired work units that may have hung up Rosetta. Since it fetched new ones and resumed processing yesterday, I'm now waiting to see when BOINC will fetch new work units for WCG, which it has yet to do since it finished the batch it processed yesterday. Today I've been messing with various commands to probe how that all works. Funny to think that after almost 20 years of running BOINC, I've learned more about it in the last two days than in the entire preceding time. Thanks again!

I don't think it was expired work units. I suspect it was all down to the number of tasks you were aiming to run within limited RAM.
Which is why allocating a bit more RAM - or - reducing the tasks running at the same time (thereby needing less RAM) is giving you the room to run successfully.
And unnecessarily suspending tasks when they could run is the rest of it.
And all WCG projects are way less demanding of RAM so there's never going to be a problem with them.

I'll be interested to see what happens after you give it a few days. Hopefully all issues will be solved.
ID: 97817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97818 - Posted: 29 Jun 2020, 19:03:22 UTC - in response to Message 97803.  

The idea that IBM might also have done a security check on it makes me think I should use that version as I have very little faith in the standard Boinc Manager.


What security problem could Boinc possibly have? You are in control of what projects it communicates with.

Suspend when computer is on battery: Untick. Your computer doesn't run on battery so the selection is redundant. It's intended for laptops/portables that aren't plugged into the mains.


I use that option on my desktop as it has a UPS.

Suspend GPU computing when computer is in use: Untick. None of the projects you run offer GPU computing. And clear the field that mentions mouse/keyboard input - also redundant.


Does it matter what it's set to if it's redundant?

You may not have picked up on it, but Grant mentioned that Rosetta (and I think tasks from all projects under Boinc) run at 'Low' or 'Idle' priority.


Except LHC which is in a virtual machine and doesn't seem to behave properly.

If you open the task manager under Windows 10, go to the "Details" column, find a Rosetta task, right-click on it and hover the mouse over the "Set Priority" option, a little sub-window will open showing the priority Rosetta runs at.
On mine it shows 'Low'. Do the same on any other program you can identify and it'll show 'Normal' - a higher priority than 'Low'.
That means, when you type, move the mouse, play music, watch video, or run any other program on your PC, they will use the CPU ahead of Rosetta - or Rosetta defers priority to anything else you ask your PC to do.


Actually priorities in Windows 10 are abysmal. I've often seen things at low priority getting more CPU than things at normal.

Sometimes circumstances can arise when there's a conflict between Rosetta and other programs you run, but it's generally something pretty specialist.
Recently, when work was hard to come by here, the only way I could tell I'd run out of Rosetta tasks was that the fans went quieter and the room became less hot. Certainly not for responsiveness of my PC.


I've got Boinctasks on the monitor to my right, I can see what all 6 machines are doing all the time.
Correction, it's spilled into two monitors now. 76 tasks at once.

In the usage limits section:
Use at most 75% of CPU time: Unless you have issues with temperatures, using anything other than 100% can lead to task errors, particularly if tasks are starting & stopping a lot. Over-high temperatures are the only reason I'd think of reducing from 100%. But see below too.


I use TThrottle to do a similar thing and it doesn't seem to cause problems. Except TThrottle is better, as you set a temperature and it adjusts the % continuously. I have it on the machine in here because it's too loud at full blast if the room is warm.

And note that running all 4-cores at 100%, 100% of the time, with nothing set in the "when to suspend" section, that will push up your temperatures as well, so there's two reasons you may want to go back to 75% cores if it's problematic.


Apart from laptops, I've never known a CPU overheat, even on stock fans.

Just besure to keep your cache sizes small so you don't run into deadline problems. With Rosetta's 3 day deadline if you have 3 days of work NO other projects will crunch because their deadline will be further out than 3 days.


Actually the others will crunch, as Boinc downloads small amounts of work each time, not the whole 3 days, it tops it up. If Rosetta has been using the CPU all of the time, the next call for work will be from another project.

No it won't. If you limit it to 1 core, even if you have 128 of them, it will only use the one core for BOINC work.
I limited the number of cores to use on one of my systems for a while to avoid out of memory errors until i upgraded the RAM.


I didn't mean that, I meant if you do not restrict cores, then Boinc will restrict for you when your RAM runs out.
ID: 97818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christopher Graesser

Send message
Joined: 26 Jan 16
Posts: 3
Credit: 1,192,390
RAC: 0
Message 97820 - Posted: 29 Jun 2020, 19:37:22 UTC

In the meantime, my BOINC Client works fine again, seems to be a temporary issue. All other projects worked fine, the rosetta website responded slowly, but after quite some time, it worked well again. Thanks to all.
ID: 97820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97824 - Posted: 29 Jun 2020, 20:37:30 UTC

Well, the adventure continues, with a new question about BOINC's work unit triage decisions. In the below task listing, can anyone explain this scenario, referring to the screenshot below?
Note: With my current settings, BOINC processes three work units at a time.

  1. I'm watching the three WCG work units (yellow box) just before the top one reaches 100%.
  2. Since the Rosetta one (red arrows) has been waiting its turn, and has the earliest deadline in the list, I expect BOINC will start on it next.
  3. But it starts processing the next WCG unit instead (green arrows).


Why the WCG and not the Rosetta? I can only guess it has to do with the "switch between tasks" setting, which I currently have set to a little longer than a Rosetta work unit requires.
Since Rosetta units require a longer processing time and carry tighter deadlines than the WCG OpenPandemics, I would think it best for BOINC to be toiling on two Rosettas and one WCG at a time, thereby completing 2 of the former and 5 of the latter every 8 hours. But I don't see how to tweak BOINC to achieve that if it is in fact a valid pursuit. Any ideas? Am I delving too deep, awakening a Balrog?
Eric



system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97824 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97825 - Posted: 29 Jun 2020, 20:52:49 UTC - in response to Message 97824.  

Since the Rosetta one (red arrows) has been waiting its turn, and has the earliest deadline in the list, I expect BOINC will start on it next.


It doesn't do it that way, or Rosetta would hog your machine for a week as the other projects all have longer deadlines. It picks the next task according to your project weights - eg. if you have Rosetta and WCG on even weights, then it will try to do the same amount of each overall (averaged over some days - not sure how many).
ID: 97825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 97826 - Posted: 29 Jun 2020, 20:54:06 UTC - in response to Message 97824.  
Last modified: 29 Jun 2020, 20:56:43 UTC

Well, the adventure continues, with a new question about BOINC's work unit triage decisions. In the below task listing, can anyone explain this scenario, referring to the screenshot below?
Note: With my current settings, BOINC processes three work units at a time.

[snip]

BOINC does NOT automatically choose to start the task with the nearest deadline first.

It's more like:

High priority (usually due to a deadline in less than 24 hours) tasks first.

Projects in order by how much work is needed to restore the balance, skipping those that have no tasks ready to start.

Within each project, usually from the oldest to the newest download time.

If multiple tasks are marked with the same download time, something I have not identified. I suspect that is in the order of which the last of the task's input files were downloaded.

You don't have to like this, but complaining on Rosetta@home is unlikely to change this.
ID: 97826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 97827 - Posted: 29 Jun 2020, 20:59:33 UTC - in response to Message 97824.  

Why the WCG and not the Rosetta? I can only guess it has to do with the "switch between tasks" setting, which I currently have set to a little longer than a Rosetta work unit requires.
No, it's not to do with the Switch between Taks setting- it is all about your Resource share settings.

As BOINC does work for each project, it keeps track of the mount of work done. It will then balance what is done between the different projects, in order to meet your Resource share settings.
When you add of remove projects, change cache settings, increase (and even more so decrease) the amount of time BOINC can process work it then has to re-juggle the work it does to match the work done on projects with the debt owed to other projects.
It takes time for things to settle down. If you keep tweaking things, then they will never settle down.


Since Rosetta units require a longer processing time and carry tighter deadlines than the WCG OpenPandemics, I would think it best for BOINC to be toiling on two Rosettas and one WCG at a time, thereby completing 2 of the former and 5 of the latter every 8 hours. But I don't see how to tweak BOINC to achieve that if it is in fact a valid pursuit. Any ideas?
Leave it alone and let it sort itself out.
The larger your cache, the more projects you run, the less time your projects have to actually do work, the longer it will take for things to settle down (think months).
If BOINC is able to run whenever the system is running, and the system is running, and you run with no cache, then things will settle down within a week or 2.




NB- Use at most xx% of CPU time is best set at 100%. Reduce the number of cores in use (Use at most xx% of the CPUs) if heat is an issue (or improve the system cooling).
Suspend when non-BOINC CPU usage is above xx% is best not being selected at all. If BOINC processing does affect another programme, you can use the Exclusive Application option to stop BOINC when that particular programme is running.
Grant
Darwin NT
ID: 97827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97828 - Posted: 29 Jun 2020, 21:15:12 UTC

Thanks for that input, guys! I thought it might be a case of settling in to a routine. I've tweaked the settings according to Grant's suggestions, and will now sit back for a stretch and see how things shake out.
Eric

system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 97830 - Posted: 29 Jun 2020, 23:14:18 UTC - in response to Message 97828.  
Last modified: 29 Jun 2020, 23:19:04 UTC

Thanks for that input, guys! I thought it might be a case of settling in to a routine. I've tweaked the settings according to Grant's suggestions, and will now sit back for a stretch and see how things shake out.
Eric


It also had to do with the fact that the 2nd of July is still 3 days away!!! Boinc thinks your Rosetta task will take 8 hours of which there are still 7 of those between right now and the deadline on the 2nd off July, plenty of time to do a single workunit.
ID: 97830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97831 - Posted: 30 Jun 2020, 0:06:53 UTC

Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now...
Eric



system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 97832 - Posted: 30 Jun 2020, 0:22:18 UTC - in response to Message 97831.  

Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now...
Eric


project weight can do that - also, that "switch projects every ____ minutes". Either can affect that - I've noticed the same thing with the same 2 projects.
ID: 97832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 97833 - Posted: 30 Jun 2020, 1:42:43 UTC - in response to Message 97831.  

Follow-up to my previous post, just out of curiosity: Why would BOINC interrupt a work unit in progress to start another? I promise I'll hold off on questions now...
Eric]

Typical if it decided that it has given enough work to the project of the workunit in progress to catch up on balancing.

Also typical if the newly started workunit has come close enough to its deadline to put it into high priority mode.

This might be blocked if the main memory if not large enough to keep both workunits there at once.
ID: 97833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 97834 - Posted: 30 Jun 2020, 2:16:37 UTC - in response to Message 97818.  

Peter: you were in one of your moods again, weren't you... lol

Does it matter what it's set to if it's redundant?

In principle, there's no point constantly monitoring or polling for something you already know isn't there. There's often a wait for a response until a timeout, every moment of which is a waste.
And discussing the subject here made me rethink some of my own settings, some of which I've now removed following that logic.

Apart from laptops, I've never known a CPU overheat, even on stock fans.

You said you use TThrottle to slow your PCs to prevent that.
For me, any throttling from full turbo is a personal affront, so if I'm not on the verge of overheating at all times I'm disappointed!
ID: 97834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 97835 - Posted: 30 Jun 2020, 2:49:36 UTC - in response to Message 97827.  
Last modified: 30 Jun 2020, 2:51:28 UTC

Why the WCG and not the Rosetta? I can only guess it has to do with the "switch between tasks" setting, which I currently have set to a little longer than a Rosetta work unit requires.
No, it's not to do with the Switch between Tasks setting- it is all about your Resource share settings

Yes. And I'd agree with Robert's list of the order of priorities that tasks run from the few occasions I've seen it come into operation over the years.
And moreso with the setting Eric is using. He's taking 'Switch between tasks" out of the equation altogether unless a task runs longer than 500 minutes.

NB- Use at most xx% of CPU time is best set at 100%. Reduce the number of cores in use (Use at most xx% of the CPUs) if heat is an issue (or improve the system cooling).
Suspend when non-BOINC CPU usage is above xx% is best not being selected at all. If BOINC processing does affect another programme, you can use the Exclusive Application option to stop BOINC when that particular programme is running.

As I suggested earlier, yes.
After making that change, and having already slightly increased the RAM allocated to Boinc, the only remaining change is to see if the 4th core can be run (CPUs 100%) to see if it'll now fit into RAM.
An extra 10% (800Mb) might make all the difference
ID: 97835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 865,910
RAC: 814
Message 97861 - Posted: 1 Jul 2020, 23:13:56 UTC - in response to Message 97810.  

[quote] What's easiest is to set Rosetta at say 50%, WCG at 25% and some other project at 25% and let Boinc figure it out,which it will do over time.Just be sure to keep your cache sizes small so you don't run into deadline problems. [quote]

I was trying to get the six Rosetta tasks completed by their 2 July deadline, but that didn't work out. Today I had 13 download failures and one upload failure and they're all gone. Rosetta is trying to download 6 more, but they are taking very long and I they may fail as well.

OOopps. I paused my other projects and all six just now downloaded successfully. They are due on 4 July. I have to suspend most of them to let Asteroids and WCG catch up. But I will let them resume in a short while.

Is it possible that these things happen because this "low-spec" computer was trying to run two Rosetta tasks while two Asteroids and one WCG tasks were running or waiting to run?

Maybe I should add some more RAM? It only has 8 gigs.

Steven Gaber
Oldsmar, FL
ID: 97861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,619
RAC: 25
Message 97864 - Posted: 1 Jul 2020, 23:54:59 UTC

Apart from laptops, I've never known a CPU overheat, even on stock fans.

Ha ha ha LOL.

I had to comment as I have in fact been plagued with a CPU Overtemp F1 error on my daily driver since I put it together. Baffling thing is that it was tripping at 80° C. when the cpu spec for thermal throttling is 95° and self-protection at 105°C.
But others with the same motherboard and cpu would state they had no issues running to 105° C. before they hit an overtemp error.

This is on a system with excellent custom cooling and runs with my normal BOINC load with temps around 70° C. Only when stress testing would I overtemp.

Also ignoring the time period when a certain BIOS revision turned off all motherboard fan headers sporadically and as expected the system would overtemp and shut down. Took the mobo OEM six months to fix that BIOS issue.

Thankfully I finally figured it out after several years with some OCN forum help. Just was a benign BIOS setting on my host that did not behave normally or as expected. A quirk for this system not replicated on others.
ID: 97864 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 97866 - Posted: 2 Jul 2020, 0:03:53 UTC - in response to Message 97861.  

What's easiest is to set Rosetta at say 50%, WCG at 25% and some other project at 25% and let Boinc figure it out,which it will do over time.Just be sure to keep your cache sizes small so you don't run into deadline problems.


I was trying to get the six Rosetta tasks completed by their 2 July deadline, but that didn't work out. Today I had 13 download failures and one upload failure and they're all gone. Rosetta is trying to download 6 more, but they are taking very long and I they may fail as well.

OOopps. I paused my other projects and all six just now downloaded successfully. They are due on 4 July. I have to suspend most of them to let Asteroids and WCG catch up. But I will let them resume in a short while.

Is it possible that these things happen because this "low-spec" computer was trying to run two Rosetta tasks while two Asteroids and one WCG tasks were running or waiting to run?

Maybe I should add some more RAM? It only has 8 gigs.

8Gb RAM ought to be plenty for a 2-core processor.
Have you looked at the previous advice in this thread and compared to your own settings (even though the advice was for a different machine)? There should be plenty for you to consider.
Boinc <ought> to be able to give your other projects enough time to complete before their deadlines without you having to suspend them. The longer you can run without interfering, the better Boinc will be able to decide for you.
ID: 97866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 63 · 64 · 65 · 66 · 67 · 68 · 69 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org