Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 309 · Next

AuthorMessage
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100685 - Posted: 3 Mar 2021, 2:24:20 UTC - in response to Message 100682.  

Thank you for your detailed reply.

Are you using Web based preferences, or settings in the BOINC Manager?

The settings in the manager. I assume that is "local preferences."


With your computing preferences, what "Usage limits" & "When to suspend" values do you have?

Usage limits
Use at most 100 % of the CPUs <<-- This
Use at most 100 % of CPU time <<-- This

When to suspend
Suspend when computer is on battery <--This is checked
Suspend when computer is in use <--not checked
Suspend GPU computing when computer is in use <--not checked
'In use' means mouse/keyboard input in last 3 minutes this would be the value if checked
Suspend when no mouse/keyboard input in last --- minutes this line does not appear in my options
Suspend when non-BOINC CPU usage is above --- % would be 25% but not checked
Compute only between --- this line does not appear in my options

If it's set to suspend at any time, check to see that there is nothing going on, on that system, that meets any of those settings values- eg some system or other process using CPU time, stopping the Tasks from starting.

I unplugged/replugged the power cable and verified that it suspends when on battery. This laptop is the one that I use exclusively for Zoom calls. I have nothing running on it when not using Zoom, and I halt BOINC when making a call (about once per week). I've also gone into the system monitor and killed things like Nemo, Mint Update, bluetooth, etc to maximize available memory.


Check that something isn't hogging system RAM, and hitting the limits that stop BOINC from processing work.

System monitor currently shows 1.2gb of memory used out of 3.3gb available.

The "top" utility in the terminal shows 3336gb total; 195gb free; 805gb used; 2334gb cache/buffer


In the BOINC Manager, you can select one of the Tasks ready to start, Suspend it, then Resume it a few seconds later & see if that kick starts things.

I've tried this; also tried updating; suspending all tasks and updating and restarting; everything that I can think of.


And even with just Rosetta as your only project, with the very short deadlines no cache (or an extremely small one) eg 0.1 + 0.01 is the best way to go.

I read this in one of your responses to someone else and have tried this, but it does not affect the problem. I also went into web preferences and changed everything to match this laptop, but it doesn't work.

One other thing: all three of my computers are running different revisions of BOINC (is that normal?). The version for the affected computer is 7.16.6. I found a reference on the Internet (and the Internet is never wrong [snurk]) that said that this revision is unstable. I don't know if that is currently true. Is there a way to use a different revision and see if that helps?

Thank you for your efforts in trying to help. Any idea where I can go from here? What perplexes me is that it once worked, but now does not, even though I didn't change anything in the settings, or even use the computer at all prior to the onset. Yet deleting and then reinstalling did not cure it. Restarting the computer does not help.
ID: 100685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100686 - Posted: 3 Mar 2021, 5:57:02 UTC - in response to Message 100685.  

Thank you for your efforts in trying to help. Any idea where I can go from here? What perplexes me is that it once worked, but now does not, even though I didn't change anything in the settings, or even use the computer at all prior to the onset. Yet deleting and then reinstalling did not cure it. Restarting the computer does not help.
From the sounds of things, there's no obvious reason for them not to be running.


Grasping at straws here- set it to use the Web based preferences. Once that is done, you need to click Update on the Manager to make sure it has the current Web based settings. If it then works, then go back to the Manager based settings & see if it keeps working.

If it's still not starting- in the Manager, Options, Event log options, select cpu_sched_debug, cpu_sched, and cpu_sched_status and save them.
Exit BOINC (give it a few seconds as it can take a while to fully exit), then restart and once it's running again, go to Tools, Event log & see what's there.
With luck, one of those flags will either show why (or give an indication of why) it's not starting any of the Tasks.

If there's nothing that gives us any sort of hint as to what's going on, then your best option would be to post about the issue in the BOINC forum. There's a good chance someone there will have an idea of what is going on with that system.
Grant
Darwin NT
ID: 100686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100689 - Posted: 4 Mar 2021, 2:59:41 UTC - in response to Message 100686.  
Last modified: 4 Mar 2021, 3:02:20 UTC

Grasping at straws here- set it to use the Web based preferences. Once that is done, you need to click Update on the Manager to make sure it has the current Web based settings. If it then works, then go back to the Manager based settings & see if it keeps working.

It's running again. For the sake of some other poor soul who runs into a similar problem, I will document the sequence of events leading to the breakthrough.

* Changed to web-based preferences, nothing changed. Saved the settings as my local preferences; again no change.

* Added the three flags that you suggested; exited BOINC, restarted computer. Upon restart of BOINC, the task at the top of the list had a new status: "Waiting for memory." Fiddling around with different things and updating did not start the tasks or change the status of the lead task.

* Went into prefs, selected the tab Disk and Memory, and under Memory changed the parameter "When computer is in use, use at most" from 50% to 90%. And at this, the tasks took off and started running again.

* Not being content with success, and wanting answers, I changed the parameter back to 50%, and it is still running.

Based on my experience plus that of one other use that I read, the underlying issue is that BOINC seems to think that it does not have enough available memory to begin running task, so that is where corrective efforts should be directed. If any new information comes my way, I will pass it along. Meanwhile, I'm back in the saddle again, yee-haw!

Many thanks for your assistance with this problem.
ID: 100689 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100690 - Posted: 4 Mar 2021, 4:59:28 UTC - in response to Message 100689.  
Last modified: 4 Mar 2021, 5:07:26 UTC

We both forgot the first rule of computer problem solving- re-boot.


Upon restart of BOINC, the task at the top of the list had a new status: "Waiting for memory."
Probably worth posting this issue to the BOINC boards.
What usually happens is that if a Task uses more RAM than is available, or it requires more RAM than is available in order to start, then it's status becomes "Waiting for memory."
It shouldn't be "Ready to start" (even though it is), if it can't actually start for some reason.

The issue will eventually re-occur- when it does check the Event log to see if there are any messages there about insufficient RAM, even if it still shows the Tasks as "Ready to start."


* Went into prefs, selected the tab Disk and Memory, and under Memory changed the parameter "When computer is in use, use at most" from 50% to 90%. And at this, the tasks took off and started running again.
I've got my system set to 95% for both memory limit settings.
With Rosetta you generally need to allow 1.3GB RAM per Task. Many are much less than that, some are 4GB or more. So with the amount of RAM that system has, and the number of cores/threads the CPU has, and the 50% limit on RAM usage, i would expect you will run in to the issue again in the future.



NB
I have nothing running on it when not using Zoom, and I halt BOINC when making a call (about once per week).
In the BOINC Manager, Options, Exclusive applications. If you put the zoom executable name there, BOINC will automatically suspend processing when Zoom is running, and restart when you're done.
Grant
Darwin NT
ID: 100690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bill F
Avatar

Send message
Joined: 29 Jan 08
Posts: 48
Credit: 1,612,566
RAC: 1,117
Message 100691 - Posted: 4 Mar 2021, 5:04:44 UTC - in response to Message 100199.  

People deserve a proper break and I'm not inclined to demand they're dragged in to suit, what are essentially, hobbyists.
No need to go in, just do a remote login & restart. If it fixes it, good. If not, then it can wait till they do go in.

If it wasn't for people in these forums, no-one would've thought of this...

But people do and start coming out with stuff about "community" and "Rosetta is us", which I will always find head-shakingly bizarre #Cranks
Which is probably why Rosetta is such a small project - that very lack of community.

Right, it's the "community" that motivates people to join and contribute to projects and hang around.
Not even sure what "community" means in this context. Traffic in forums by a few dozen people?
The kind of community that, when tasks stop coming down in the middle of a Xmas holiday, immediately sees comments about the loss of good will? Some "community".
Is this some kind of joke I'm not getting? Even though I'm laughing anyway?
Astonishing self-obsession and lack of proportionality.


For a Small project it has a good base, not counting Users that do not allow data export, 32,000+ Users on 88,000+ Systems contribute daily.

Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.

ID: 100691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 100693 - Posted: 4 Mar 2021, 15:42:20 UTC - in response to Message 100690.  

I have nothing running on it when not using Zoom, and I halt BOINC when making a call (about once per week).
In the BOINC Manager, Options, Exclusive applications. If you put the zoom executable name there, BOINC will automatically suspend processing when Zoom is running, and restart when you're done.

I'd suggest to not suspend it at all and see if some issues occur. So far (and I'm crunching since 2003 on usually pretty old hardware), except for GPU applications, I never have seen any reason to suspend BOINC, if a non-BOINC application needs 100% of the CPU, it will get it.
.
ID: 100693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100694 - Posted: 5 Mar 2021, 6:33:25 UTC - in response to Message 100693.  

I'd suggest to not suspend it at all and see if some issues occur....if a non-BOINC application needs 100% of the CPU, it will get it.


I might try the exclusion thing that Grant suggested. The reason that I turned off BOINC was that on this laptop, which is optimized for weight and not performance, the latency of the handoff from BOINC to Zoom was a problem, not that it made the handoff at all.
ID: 100694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 100701 - Posted: 7 Mar 2021, 19:11:56 UTC - in response to Message 100694.  

I'd suggest to not suspend it at all and see if some issues occur....if a non-BOINC application needs 100% of the CPU, it will get it.


I might try the exclusion thing that Grant suggested. The reason that I turned off BOINC was that on this laptop, which is optimized for weight and not performance, the latency of the handoff from BOINC to Zoom was a problem, not that it made the handoff at all.

This is what I use, especially for meetings involving video and / or screen share. Just make sure you have enough memory to keep tasks suspended indefinitely without thrashing the swap. Not that Zoom uses a lot.
ID: 100701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,947,553
RAC: 210
Message 100702 - Posted: 7 Mar 2021, 19:24:34 UTC - in response to Message 100701.  

The problem I reported earlier to Grant with tasks not running has been solved. It may apply to the memory issues reported recently.

That is, tasks you expected to run were not running. I did not see anything in the event log that provided a clue (but I may need to enable certain messages).

My PC has a small SSD disk so i was careful about how much disk space gets used. The same applies to memory use, although I check the Windows Task manager and see how much memory processes are using. On my son's 8 GB RAM PC, I cannot run two Rosetta tasks at the same time.

If you see a task that should be running but is not, in Preferences (Options > Computing Preferences in the Advanced View), tap/click the Disk and Memory tab and check the settings there.

Mike
ID: 100702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 100704 - Posted: 8 Mar 2021, 15:32:39 UTC - in response to Message 100691.  

People deserve a proper break and I'm not inclined to demand they're dragged in to suit, what are essentially, hobbyists.
No need to go in, just do a remote login & restart. If it fixes it, good. If not, then it can wait till they do go in.

If it wasn't for people in these forums, no-one would've thought of this...

But people do and start coming out with stuff about "community" and "Rosetta is us", which I will always find head-shakingly bizarre #Cranks
Which is probably why Rosetta is such a small project - that very lack of community.

Right, it's the "community" that motivates people to join and contribute to projects and hang around.
Not even sure what "community" means in this context. Traffic in forums by a few dozen people?
The kind of community that, when tasks stop coming down in the middle of a Xmas holiday, immediately sees comments about the loss of good will? Some "community".
Is this some kind of joke I'm not getting? Even though I'm laughing anyway?
Astonishing self-obsession and lack of proportionality.


For a Small project it has a good base, not counting Users that do not allow data export, 32,000+ Users on 88,000+ Systems contribute daily.

Bill F

It's not that they don't allow it, it's that they didn't notice the communist EU ruling and have to tick a box. The EU thinking they can control the entire world.
ID: 100704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100776 - Posted: 20 Mar 2021, 7:34:09 UTC

Me again. I added a new host today. For all of the other hosts in my account, when they download new work and it is queued and waiting to run, the "Time Remaining" value is 8:00:00 hours.

However, I see that the queued work for my new host is 9:00:04 hours [sic].

I have dug through the message boards and see references to making changes to this, like it is user-defined, but nothing about how to actually change it.

My questions: why the difference? Should I change it to 8:00:00 to be consistent with the other hosts, and if so, how to do that?
ID: 100776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100777 - Posted: 20 Mar 2021, 9:05:43 UTC - in response to Message 100776.  
Last modified: 20 Mar 2021, 9:08:41 UTC

My questions: why the difference? Should I change it to 8:00:00 to be consistent with the other hosts, and if so, how to do that?
Don't worry about it. It should be 8 hours, and after it's processed several Tasks it will start heading towards 8 hours & will get there eventually as work is returned.


Edit- i'd run the BOINC Manager benchmarks on the i7- it's showing the default values which are way less than what that system is capable of.
Grant
Darwin NT
ID: 100777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100778 - Posted: 20 Mar 2021, 9:55:14 UTC - in response to Message 100776.  

The initial run time estimate is always off for new hosts. As Grant said: don’t worry about it; give it a couple of days for client and server to agree on how long tasks will run for. Despite the inaccurate estimate, you should find that they will finish after 8 hours (of CPU time, not wall time) regardless. (Indeed they did.)

You can change the target CPU run time in your project preferences. Reasons to reduce it include having a machine with severely limited availability (though if you can’t dedicate 8 hours of CPU time inside 72 hours of wall time, Rosetta@home might not be best suited for you anyway); reasons to increase it might be to reduce the total amount of network traffic between client and server. The credits per hour will be (more or less) the same whatever you choose.
ID: 100778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100783 - Posted: 21 Mar 2021, 0:37:27 UTC - in response to Message 100777.  

Edit- i'd run the BOINC Manager benchmarks on the i7- it's showing the default values which are way less than what that system is capable of.


Tools/Run CPU Benchmarks completed. Do I need to do anything after that? I'm not a smart man, something that the following anecdote will no doubt confirm.

The system in question is operating off of an external SSD in an enclosure (don't ask). That SSD used to be in another, different homegrown desktop box, and was the only install of Linux that I ever did where I put the / and /home in different partitions.

Little did i Know that, by taking that fateful step four years ago, I lit a fuse that reached the powder only yesterday. I found that BOINC shut down because of a lack of available disk space. This happened because, first, BOINC apparently writes to things in the / partition and not the /home partition (which seems a bit strange to me, but whatever); and two, something about BOINC activity on that box is causing a firestorm of comments in the var/log/journal folder, eating up yet more disk space. So that host was down for a time before I discovered this.

I don't know if any of the foregoing is affecting whatever you are seeing about that box, but thought that I would mention it.
ID: 100783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100784 - Posted: 21 Mar 2021, 0:42:29 UTC - in response to Message 100778.  

wall time


I just wanted you to know that this is now become a treasured part of my vocabulary.

ME: blah blah blah in wall time.

COLLEAGUE: What's wall time?

ME: Oh (chuckles genially) just a term from this protein folding project that I'm involved with online.

COLLEAGUE: Do you get paid for that?
ID: 100784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100785 - Posted: 21 Mar 2021, 1:01:56 UTC - in response to Message 100783.  

Tools/Run CPU Benchmarks completed. Do I need to do anything after that?
Nothing else to do; the values have been captured. We see them on the details page for that computer. You should see your credits per task on that machine increase significantly now that BOINC has measured the performance.
ID: 100785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100786 - Posted: 21 Mar 2021, 10:43:15 UTC - in response to Message 100785.  

You should see your credits per task on that machine increase significantly now that BOINC has measured the performance.


Thank thee (that's Quaker).

Can you provide me with an Idiot's Guide to All Things BIONC explanation about why the size of the lift in my pickup truck affects the credit I get for completing a task?
ID: 100786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100788 - Posted: 21 Mar 2021, 12:46:04 UTC - in response to Message 100786.  

The bigger the lift, the more you can lift in one go…

Rosetta@home tasks are fixed duration, not fixed work. The faster the machine, the more work each task accomplishes in that time, and thus the more credit awarded.
ID: 100788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100793 - Posted: 21 Mar 2021, 23:34:15 UTC - in response to Message 100788.  
Last modified: 21 Mar 2021, 23:34:43 UTC

fixed duration


Doesn't have quite the same pizzazz as "wall time," but it gets the point across.

However, I question the point. A scan down my completed tasks list shows a variety of different times (CPU and wall), so it doesn't look like it is fixed. Also, the progress indicators seem to show % completed, which is independent of the time.

If it's fixed duration, wouldn't every task run for 8 hours or whatever and then just end? What am I missing?
ID: 100793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100794 - Posted: 22 Mar 2021, 0:19:28 UTC - in response to Message 100793.  

Well, all right – tasks aren’t strictly fixed duration: they work to a target run time. They process work in chunks (a.k.a. ‘models’ or ‘decoys’), and only consider whether to stop or continue at the end of each chunk. Those chunks can take different lengths of time to process, which leads to some variation in the total run time of each task. Basically: at the end of each chunk the task decides whether it thinks it has time to complete another one without going over the target run time. If not, it will finish early. If so, it will start another chunk – and if that chunk happens to take longer than average, the task will overrun.

With or without CPU benchmarks, there’s a substantial amount of variation in number of models completed (and thus credit granted), even for tasks of the same type within the same batch.

The task percentage-complete calculations can be pretty inaccurate, due to the unpredictability of model duration.

Your tasks have almost all finished within a few minutes of the 8-⁠hour target. The one outlier is the Robetta task which finished 20 minutes early. That’s common; they seem to have quite different characteristics from the ‘normal’ tasks.
ID: 100794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org