Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 215 · Next

AuthorMessage
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100783 - Posted: 21 Mar 2021, 0:37:27 UTC - in response to Message 100777.  

Edit- i'd run the BOINC Manager benchmarks on the i7- it's showing the default values which are way less than what that system is capable of.


Tools/Run CPU Benchmarks completed. Do I need to do anything after that? I'm not a smart man, something that the following anecdote will no doubt confirm.

The system in question is operating off of an external SSD in an enclosure (don't ask). That SSD used to be in another, different homegrown desktop box, and was the only install of Linux that I ever did where I put the / and /home in different partitions.

Little did i Know that, by taking that fateful step four years ago, I lit a fuse that reached the powder only yesterday. I found that BOINC shut down because of a lack of available disk space. This happened because, first, BOINC apparently writes to things in the / partition and not the /home partition (which seems a bit strange to me, but whatever); and two, something about BOINC activity on that box is causing a firestorm of comments in the var/log/journal folder, eating up yet more disk space. So that host was down for a time before I discovered this.

I don't know if any of the foregoing is affecting whatever you are seeing about that box, but thought that I would mention it.
ID: 100783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100784 - Posted: 21 Mar 2021, 0:42:29 UTC - in response to Message 100778.  

wall time


I just wanted you to know that this is now become a treasured part of my vocabulary.

ME: blah blah blah in wall time.

COLLEAGUE: What's wall time?

ME: Oh (chuckles genially) just a term from this protein folding project that I'm involved with online.

COLLEAGUE: Do you get paid for that?
ID: 100784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 765
Message 100785 - Posted: 21 Mar 2021, 1:01:56 UTC - in response to Message 100783.  

Tools/Run CPU Benchmarks completed. Do I need to do anything after that?
Nothing else to do; the values have been captured. We see them on the details page for that computer. You should see your credits per task on that machine increase significantly now that BOINC has measured the performance.
ID: 100785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100786 - Posted: 21 Mar 2021, 10:43:15 UTC - in response to Message 100785.  

You should see your credits per task on that machine increase significantly now that BOINC has measured the performance.


Thank thee (that's Quaker).

Can you provide me with an Idiot's Guide to All Things BIONC explanation about why the size of the lift in my pickup truck affects the credit I get for completing a task?
ID: 100786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 765
Message 100788 - Posted: 21 Mar 2021, 12:46:04 UTC - in response to Message 100786.  

The bigger the lift, the more you can lift in one go…

Rosetta@home tasks are fixed duration, not fixed work. The faster the machine, the more work each task accomplishes in that time, and thus the more credit awarded.
ID: 100788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100793 - Posted: 21 Mar 2021, 23:34:15 UTC - in response to Message 100788.  
Last modified: 21 Mar 2021, 23:34:43 UTC

fixed duration


Doesn't have quite the same pizzazz as "wall time," but it gets the point across.

However, I question the point. A scan down my completed tasks list shows a variety of different times (CPU and wall), so it doesn't look like it is fixed. Also, the progress indicators seem to show % completed, which is independent of the time.

If it's fixed duration, wouldn't every task run for 8 hours or whatever and then just end? What am I missing?
ID: 100793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 765
Message 100794 - Posted: 22 Mar 2021, 0:19:28 UTC - in response to Message 100793.  

Well, all right – tasks aren’t strictly fixed duration: they work to a target run time. They process work in chunks (a.k.a. ‘models’ or ‘decoys’), and only consider whether to stop or continue at the end of each chunk. Those chunks can take different lengths of time to process, which leads to some variation in the total run time of each task. Basically: at the end of each chunk the task decides whether it thinks it has time to complete another one without going over the target run time. If not, it will finish early. If so, it will start another chunk – and if that chunk happens to take longer than average, the task will overrun.

With or without CPU benchmarks, there’s a substantial amount of variation in number of models completed (and thus credit granted), even for tasks of the same type within the same batch.

The task percentage-complete calculations can be pretty inaccurate, due to the unpredictability of model duration.

Your tasks have almost all finished within a few minutes of the 8-⁠hour target. The one outlier is the Robetta task which finished 20 minutes early. That’s common; they seem to have quite different characteristics from the ‘normal’ tasks.
ID: 100794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100795 - Posted: 22 Mar 2021, 2:05:24 UTC - in response to Message 100794.  

Basically: at the end of each chunk the task decides whether it thinks it has time to complete another one without going over the target run time.


This is really interesting. So what happens in the event that a task ends early and has a chunk left over? Does it get added to a different task? It seems like, at some time or another, you would have a task which is mainly composed of "orphan" chunks.
ID: 100795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 308
Credit: 9,148,058
RAC: 1,394
Message 100798 - Posted: 22 Mar 2021, 9:56:40 UTC - in response to Message 100795.  

Basically: at the end of each chunk the task decides whether it thinks it has time to complete another one without going over the target run time.


This is really interesting. So what happens in the event that a task ends early and has a chunk left over? Does it get added to a different task? It seems like, at some time or another, you would have a task which is mainly composed of "orphan" chunks.


My understanding is that Work units consist of a protein chain in an initial state. Each task within the work unit then takes a random seed value which determines where and how to start folding that protein in the search for the lowest energy configuration so there are effectively a near infinite number of tasks that can be performed.

When the work unit is returned any promising configurations found can be used as the starting point for another work unit or a particularly good configuration can be accepted as a working model for the protein.

I would be interested in any corrections to this understanding from those that know.
ID: 100798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 765
Message 100799 - Posted: 22 Mar 2021, 10:41:25 UTC - in response to Message 100795.  

It doesn’t matter. The aim is not to examine every possibility: given the unfathomably large search space, it cannot be. With hundreds of thousands of work units in each batch, it is statistically insignificant whether any individual task completes N or N⁠+⁠1 models. The probability that the ‘orphan’ is the one that will cure all the world’s ills is negligible.

What I imagine does happen is that if any regions of the search space look ‘interesting’ they will be studied more closely in a subsequent batch of work. That is one of the reasons the task deadlines are so short: the results of one batch are analysed rapidly to guide the choice of parameters for the next.
ID: 100799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1174
Credit: 12,979,536
RAC: 6,328
Message 100802 - Posted: 22 Mar 2021, 16:13:50 UTC - in response to Message 100798.  

[snip]

My understanding is that Work units consist of a protein chain in an initial state. Each task within the work unit then takes a random seed value which determines where and how to start folding that protein in the search for the lowest energy configuration so there are effectively a near infinite number of tasks that can be performed.

When the work unit is returned any promising configurations found can be used as the starting point for another work unit or a particularly good configuration can be accepted as a working model for the protein.

I would be interested in any corrections to this understanding from those that know.

Some of them are like that. Some are, instead, one step each from a list of starting points.

Some are two proteins, to see if these two will bind together.

I'm sure that there are more varieties I don't yet know about.
ID: 100802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 308
Credit: 9,148,058
RAC: 1,394
Message 100803 - Posted: 22 Mar 2021, 16:54:49 UTC - in response to Message 100802.  

[snip]

My understanding is that Work units consist of a protein chain in an initial state. Each task within the work unit then takes a random seed value which determines where and how to start folding that protein in the search for the lowest energy configuration so there are effectively a near infinite number of tasks that can be performed.

When the work unit is returned any promising configurations found can be used as the starting point for another work unit or a particularly good configuration can be accepted as a working model for the protein.

I would be interested in any corrections to this understanding from those that know.

Some of them are like that. Some are, instead, one step each from a list of starting points.

Some are two proteins, to see if these two will bind together.

I'm sure that there are more varieties I don't yet know about.


Thanks, I’m learning slowly :-)
ID: 100803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100804 - Posted: 23 Mar 2021, 2:08:32 UTC - in response to Message 100799.  

given the unfathomably large search space


Okay, scratch the infinite monkey approach then.

Where does the excess potential energy go?
ID: 100804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FrankMeade

Send message
Joined: 8 Mar 20
Posts: 1
Credit: 11,549,660
RAC: 0
Message 100823 - Posted: 24 Mar 2021, 15:52:59 UTC

I am getting a little sick of the "waiting for memory" practice of suspending computation of a task when there is no shortage of available memory
ID: 100823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 308
Credit: 9,148,058
RAC: 1,394
Message 100824 - Posted: 24 Mar 2021, 16:19:36 UTC - in response to Message 100823.  

I am getting a little sick of the "waiting for memory" practice of suspending computation of a task when there is no shortage of available memory


Is it restricted to certain computer(s) and / or certain project(s)?
ID: 100824 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 765
Message 100825 - Posted: 24 Mar 2021, 16:43:29 UTC - in response to Message 100823.  

Is this happening when BOINC is trying to switch between projects to satisfy resource share settings? Rosetta tasks allocate hundreds of megabytes each; if they’re being kept in memory when something else is trying to run, there may well be a shortage. Try deselecting Computing preferences » Leave non-GPU tasks in memory while suspended.

Aside: Running the benchmarks on this 3900X will get it earning the right amount of credit. (There may be other machines that need it too; I didn’t go through them all…)
ID: 100825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,454,871
RAC: 258
Message 100827 - Posted: 25 Mar 2021, 1:51:46 UTC - in response to Message 100825.  

Is this happening when BOINC is trying to switch between projects to satisfy resource share settings?


I see it, on multiple machines, and I'm only running Rosetta.

It's peculiar. There's clearly enough free main memory to run a project, not to mention the swap. Diddling with the values in preferences doesn't seem to wake it up, either. It just goes away on its own.

I've written it off to a cost of doing business, but if there is a fix for it I wouldn't mind at all.
ID: 100827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 823
Credit: 50,847,643
RAC: 3,002
Message 100828 - Posted: 25 Mar 2021, 2:00:07 UTC - in response to Message 100827.  

I see it, on multiple machines, and I'm only running Rosetta.

It's peculiar. There's clearly enough free main memory to run a project, not to mention the swap.
Several of your machines have only 8 GB memory. This isn't enough for 12 cores.
You need at least 1 GB/core to run Rosetta (I usually have 32 GB on my 12-core machines, and more on the larger ones).
Just looking at the "free" memory doesn't do it. Rosetta (and all other BOINC projects too) need to reserve a certain amount to run.
They won't leave home without it.
ID: 100828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1325
Credit: 13,624,379
RAC: 211
Message 100831 - Posted: 25 Mar 2021, 8:09:00 UTC - in response to Message 100828.  
Last modified: 25 Mar 2021, 8:13:20 UTC

I see it, on multiple machines, and I'm only running Rosetta.

It's peculiar. There's clearly enough free main memory to run a project, not to mention the swap.
Several of your machines have only 8 GB memory. This isn't enough for 12 cores.
You need at least 1 GB/core to run Rosetta (I usually have 32 GB on my 12-core machines, and more on the larger ones).
And 32GB for 24 cores isn't enough either if you want to use all of them to do Rosetta work.
I generally allow 1.3GB of RAM per Task- you need to leave enough for the Operating System & and any other programmes that might be running as well (with huge core/thread count systems (32+), you'd probably be OK with 800MB or even less per core as you don't often get many high RAM requirement Tasks (2GB+) at any given time).



I've written it off to a cost of doing business, but if there is a fix for it I wouldn't mind at all.
Just adding more RAM won't necessarily fix it- you need to allow BOINC to actually use what is there.
In your Account settings, Computing preferences, Memory
    When computer is in use, use at most 95 %
When computer is not in use, use at most 95 %
Leave non-GPU tasks in memory while suspended
Make sure Leave non-GPU.. is Not selected.
Works for me.
Grant
Darwin NT
ID: 100831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1815
Credit: 33,424,497
RAC: 8,352
Message 100836 - Posted: 25 Mar 2021, 18:39:20 UTC

FWIW

Scheduler down
Project down for maintenance

Just got a 1hr delay after an update attempt
ID: 100836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 215 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org