Rosetta needs 6675.72 MB RAM: is the restriction really needed?

Message boards : Number crunching : Rosetta needs 6675.72 MB RAM: is the restriction really needed?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101730 - Posted: 5 May 2021, 8:11:42 UTC - in response to Message 101728.  

05-May-2021 07:48:46 [Rosetta@home] Scheduler request completed: got 0 new tasks
05-May-2021 07:48:46 [Rosetta@home] No tasks sent
05-May-2021 07:48:46 [Rosetta@home] Rosetta needs 6675.72 MB RAM but only 6117.47 MB is available for use.
On computers with 8 GB of RAM + 8 GB of swap space.
And after such task finally downloaded they usually use less < 1 GB of RAM per task. And computer run up to 4-8 R@H tasks simultaneously without any problems.
But last month usually can not get any because server thinks that there is not enough RAM for just one task and refuse to send any work.

Pure stupidity.

I know...
On my 8Gb laptop, I've amended the amount of RAM allocated within Boinc (Options/Computing preferences Disk & Memory tab) as follows and work comes down ok.

When Computer is in use, use at most 75%
When Computer is not in use, use at most 99%

I don't believe it's the amount of RAM you have, but how much of it is allocated to Boinc in that setting. An 8Gb machine should work with all the tasks currently being issued
ID: 101730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101778 - Posted: 10 May 2021, 2:07:05 UTC - in response to Message 101675.  

some new messages from the server till the 30th of April ...

Before :

29-Apr-2021 09:16:56 [Rosetta@home] Scheduler request completed: got 0 new tasks
29-Apr-2021 09:16:56 [Rosetta@home] No tasks sent
29-Apr-2021 09:16:56 [Rosetta@home] Rosetta needs 6675.72 MB RAM but only 2012.49 MB is available for use.
29-Apr-2021 09:16:56 [Rosetta@home] Rosetta needs 3814.70 MB RAM but only 2012.49 MB is available for use.


Now :

30-Apr-2021 09:07:57 [Rosetta@home] Requesting new tasks for CPU
30-Apr-2021 09:08:00 [Rosetta@home] Scheduler request completed: got 0 new tasks
30-Apr-2021 09:08:00 [Rosetta@home] No tasks sent
30-Apr-2021 09:08:00 [Rosetta@home] Rosetta needs 6675.72 MB RAM but only 2012.49 MB is available for use.
30-Apr-2021 09:08:00 [Rosetta@home] Rosetta needs 3814.70 MB RAM but only 2012.49 MB is available for use.
30-Apr-2021 09:08:00 [Rosetta@home] Rosetta needs 3337.86 MB RAM but only 2012.49 MB is available for use.


no worries because > 01-May-2021 15:07:28 [Rosetta@home] Not requesting tasks: don't need (job cache full)

Thanks for this information.
I expected at least a 5% reduction in RAM demands and hoped for 10% and it looks like they've delivered a 12.5% reduction.
In terms of what I suggested in order to make 4Gb RAM machines viable again at Rosetta, they've delivered.

Separately, I've proposed a way of going back to the pre-April RAM & Disk req'ts on <some> tasks so that 2Gb machines can participate again, but because it requires the researchers to do something they haven't had to do before I haven't had any feedback on that yet. Several people here have quite rightly reported that the maximum resources that tasks call on while they're actually running never reaches 1Gb, let alone 2Gb so it ought to be quite possible.

To be honest I'm tiptoeing my way toward something they don't want to do, but they're throwing away so much resource I'm still hoping to convince them of the plain rationality of it.

I haven't been in any contact with anyone again, but it seems various changes promised have come through in the way they see fit.
From examining my current client_state.xml file for the tasks I have, they show the following RAM & Disk req'ts:

rb_05_09_74941_72921_ab_t000__robetta_cstwt_5.0_FT
RAM 1908Mb Disk 3815Mb

miniprotein_relax11
RAM 3338Mb Disk 3815Mb

jgSP_01
RAM 3338Mb Disk 3815Mb

rb_05_09_74951_72937_ab_t000__h001_robetta
RAM 3338Mb Disk 3815Mb

pre_helical_bundles
RAM 6676Mb Disk 8583Mb

sap_h15_l3_h12_l1_h9_l2
RAM 8583Mb Disk 1908Mb

rb_05_09_74865_72931__t000__ab_robetta
RAM 8583Mb Disk 3815Mb

I've ranked the task-types in order of RAM demand followed by disk-demand
At the bottom end, some tasks are asking for slightly less than 2Gb - maybe not sufficiently low for some 2Gb hosts to run on, depending how they're set up, but certainly small enough for 4Gb hosts.
And at the top end, some even requiring more than seen before - up from 6.676Gb to 8.583Gb, though with small disk demands.

Hopefully some people with more constrained hosts are seeing something coming through. Not quite back to how it was, but close, while more capable machines are getting tasks commensurate with their greater capacity.

And going back to the proxy I'm using for downloadability again - In Progress tasks
Pre increase in RAM & Disk req'ts - 550k IP
Extreme RAM & Disk demands - 318k IP
Current In Progress - 407k - 26% below max, 28% above min - continuing improvement

This figure went up to 431k, then dropped to 380k when we had problems last week

Now back up to 432k - 21.5% below max, 35.8% above min
ID: 101778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101794 - Posted: 12 May 2021, 3:52:21 UTC - in response to Message 101778.  

I haven't been in any contact with anyone again, but it seems various changes promised have come through in the way they see fit.
From examining my current client_state.xml file for the tasks I have, they show the following RAM & Disk req'ts:

rb_05_09_74941_72921_ab_t000__robetta_cstwt_5.0_FT
RAM 1908Mb Disk 3815Mb

miniprotein_relax11
RAM 3338Mb Disk 3815Mb

jgSP_01
RAM 3338Mb Disk 3815Mb

rb_05_09_74951_72937_ab_t000__h001_robetta
RAM 3338Mb Disk 3815Mb

pre_helical_bundles
RAM 6676Mb Disk 8583Mb

sap_h15_l3_h12_l1_h9_l2
RAM 8583Mb Disk 1908Mb

rb_05_09_74865_72931__t000__ab_robetta
RAM 8583Mb Disk 3815Mb

I've ranked the task-types in order of RAM demand followed by disk-demand
At the bottom end, some tasks are asking for slightly less than 2Gb - maybe not sufficiently low for some 2Gb hosts to run on, depending how they're set up, but certainly small enough for 4Gb hosts.
And at the top end, some even requiring more than seen before - up from 6.676Gb to 8.583Gb, though with small disk demands.

Hopefully some people with more constrained hosts are seeing something coming through. Not quite back to how it was, but close, while more capable machines are getting tasks commensurate with their greater capacity.

And going back to the proxy I'm using for downloadability again - In Progress tasks
Pre increase in RAM & Disk req'ts - 550k IP
Extreme RAM & Disk demands - 318k IP
Current In Progress - 407k - 26% below max, 28% above min - continuing improvement

This figure went up to 431k, then dropped to 380k when we had problems last week

Now back up to 432k - 21.5% below max, 35.8% above min

Hmm... not sure if my cache is representative, but it's pretty much all "pre_helical_bundles" tasks demanding a lot of RAM again and in progress tasks have plummeted to 382k as a result.
I haven't got a clue what's going on now tbh <sigh>
ID: 101794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101795 - Posted: 12 May 2021, 7:19:51 UTC - in response to Message 101794.  
Last modified: 12 May 2021, 7:24:26 UTC

Hmm... not sure if my cache is representative, but it's pretty much all "pre_helical_bundles" tasks demanding a lot of RAM again and in progress tasks have plummeted to 382k as a result.
I haven't got a clue what's going on now tbh <sigh>
I'm pretty sure the "pre_helical_bundles" were the ones that first got the larger configuration values, at the time 20 million Jobs were released. It's going to take a while yet for the rest of those Tasks with the excessive values to clear out of the system.

I think we worked it out as around 2 months- but that was if they were the only ones being processed. With some other Tasks coming through & being processed, that means it will take longer still for the mis-configured ones to finally clear from the system completely.
Grant
Darwin NT
ID: 101795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MJH333

Send message
Joined: 29 Jan 21
Posts: 18
Credit: 5,748,861
RAC: 1
Message 101797 - Posted: 12 May 2021, 9:06:18 UTC - in response to Message 101795.  

Hi Grant
May I ask you a quick question about this?
My 4 core laptop (no SMT) is attempting to run 4 pre_helical_bundles tasks, but one is shown as "Waiting for memory". I have set RAM usage to 95% in Computing preferences (whether or not in use) and the system monitor (in Linux Mint) shows that I am using only 2.6GiB of 7.6GiB (34.3%) memory. So I am puzzled as to why the 4th task is not running. The system monitor also says "Cache 4.6GiB". Is that counting against the 95% limit? I tried a 99% limit but that made no difference.
Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong!

Mark
ID: 101797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101798 - Posted: 12 May 2021, 9:30:31 UTC - in response to Message 101797.  

Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong!
Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements.
Unless you've actually got enough RAM that is free that meets those requirements, then BOINC won't let one of those Tasks run until the RAM/Disk available requirements are met (even though the actual usage values are only a fraction of the required values). Same for the disk space requirements.

If you set your BOINC Manager to Advanced view & look at Tools, Event log, you should see some messages there relating to how much RAM you have, and how much RAM BOINC thinks it will need in order to run the Task when it tries to get more work, or start the paused Task.
Grant
Darwin NT
ID: 101798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MJH333

Send message
Joined: 29 Jan 21
Posts: 18
Credit: 5,748,861
RAC: 1
Message 101799 - Posted: 12 May 2021, 10:05:34 UTC - in response to Message 101798.  

Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements.
Unless you've actually got enough RAM that is free that meets those requirements, then BOINC won't let one of those Tasks run until the RAM/Disk available requirements are met (even though the actual usage values are only a fraction of the required values). Same for the disk space requirements.

If you set your BOINC Manager to Advanced view & look at Tools, Event log, you should see some messages there relating to how much RAM you have, and how much RAM BOINC thinks it will need in order to run the Task when it tries to get more work, or start the paused Task.


Thank you, that's really helpful.

Mark
ID: 101799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101807 - Posted: 15 May 2021, 0:25:00 UTC - in response to Message 101798.  

Any thoughts you have on this would be much appreciated. I'm a newbie cruncher, so I'm probably doing something wrong!
Nope, the problem is that there is a (extremely large) batch of work that was incorrectly configured for it's minimum RAM & Disk requirements.
Unless you've actually got enough RAM that is free that meets those requirements, then BOINC won't let one of those Tasks run until the RAM/Disk available requirements are met (even though the actual usage values are only a fraction of the required values). Same for the disk space requirements.

If you set your BOINC Manager to Advanced view & look at Tools, Event log, you should see some messages there relating to how much RAM you have, and how much RAM BOINC thinks it will need in order to run the Task when it tries to get more work, or start the paused Task.

I'm not sure if we can say any more that these pre_helical_bundle tasks were misconfigured. Obviously a lot have been worked through and there was a lot to start with but the total queued to run is down at 14m now and it may be that this is what was intended, with some other newer tasks even demanding more RAM.
When other task-types get exhausted, it doesn't seem like it's a whole 2 months until more come through - it's been a week or less (no idea how or why)

I've taken a glance at my tasks while I'm currently away and new different ones have just started coming down and they must require less RAM as In Progress tasks have quickly shot up by 60k to nearly 440k.

I've given up trying to understand it from afar. It is what it is.

What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised.

When it happens to me on my laptop, I set No New Tasks, suspend all unstarted tasks, then as each running task ends, more RAM becomes available. At the point there's enough for the problem task to start, I then unsuspend one task at a time, then find all my cores can run again.

I know it's a faff, but it's the only way I've found to get around it.
ID: 101807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101808 - Posted: 15 May 2021, 0:44:38 UTC - in response to Message 101807.  
Last modified: 15 May 2021, 0:50:13 UTC

What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised.
It's not a case of how much is needed to start or to actually run (the tasks that can use up to 4GB of RAM only need several hundred MB or so when they first start)- it's just about how much the maximum they claim they will need is.

The problem is the configuration value that says xxGB is required (even though it isn't). As you've noted-if lower configured RAM size Tasks are already running, then one that claims it needs huge amounts of RAM can't start due to the RAM already in use. But if that large configured requirement RAM Task is already running, then Tasks that say they require much, much less can start up without issue.


But yes- it is very similar to the days of DOS & Config.sys and spending hours changing the order that commands were loaded up in order to allow those that needed heaps of RAM to begin, but less to actually run, would have to be running before all the other lower startup RAM requirement commands so you could get all the files you needed running to support the hardware and software you were using.
Grant
Darwin NT
ID: 101808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MJH333

Send message
Joined: 29 Jan 21
Posts: 18
Credit: 5,748,861
RAC: 1
Message 101811 - Posted: 15 May 2021, 9:18:39 UTC - in response to Message 101807.  

When it happens to me on my laptop, I set No New Tasks, suspend all unstarted tasks, then as each running task ends, more RAM becomes available. At the point there's enough for the problem task to start, I then unsuspend one task at a time, then find all my cores can run again.

I know it's a faff, but it's the only way I've found to get around it.


Sid and Grant,

Thank you for your further thoughts on this. I find it puzzling, as yesterday I had 4 of these pre-helical-bundle tasks running at the same time on the laptop whereas the day before it could only manage 3. But anyway, next time it happens I will try Sid’s suggestion for unclogging the bottleneck - thanks for that.

Mark
ID: 101811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101812 - Posted: 15 May 2021, 10:34:22 UTC - in response to Message 101808.  

What has become obvious is that the RAM required to <start> running isn't the same as the actual RAM required <while> running, so it always looks like there's plenty of RAM left over, just not all cores utilised.
It's not a case of how much is needed to start or to actually run (the tasks that can use up to 4GB of RAM only need several hundred MB or so when they first start)- it's just about how much the maximum they claim they will need is.

I know that's the case. What I'm meaning by "not misconfigured" is that they're consciously been configured in the way we see, whether it's being used for an individua; task or makes sense to us at our end or not.

On the 4-core laptop I'm using right now, 3 tasks are "pre_helical_bundles" which we know are set up for 6675Mb RAM (and 8583Mb disk space) but are likely to have been created before these adjustments were made and you're suggesting are "misconfigured". But their "Virtual memory size" while running are only 420Mb, 472Mb and 493Mb.

The 4th core is running a relatively new task "f60030e2d399cf97bd574292ff707fcd_fae0a51cf659d300dc90ab2264960253_barrel6_L2L5L5L7L5L7L3_relax_SAVE_ALL_OUT_1393099_4" (should I call this "barrel6 ?) whose Virtual memory size is only 380Mb, but looking at my "client_state.xml" file, it's set up to ask for 8583Mb RAM and 1907 Disk space. You might call this misconfigured too, but given it's after the adjustments made, it would be deliberate, so who's to call it misconfigured? The individual task is way out of line, but if it's configured to cover the entire batch issued, which it is aiui, the only person who'd know is the researcher themselves.

I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
ID: 101812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101813 - Posted: 15 May 2021, 11:10:17 UTC - in response to Message 101812.  

The 4th core is running a relatively new task "f60030e2d399cf97bd574292ff707fcd_fae0a51cf659d300dc90ab2264960253_barrel6_L2L5L5L7L5L7L3_relax_SAVE_ALL_OUT_1393099_4" (should I call this "barrel6 ?) whose Virtual memory size is only 380Mb, but looking at my "client_state.xml" file, it's set up to ask for 8583Mb RAM and 1907 Disk space. You might call this misconfigured too, but given it's after the adjustments made, it would be deliberate, so who's to call it misconfigured?
Me.
The peak working set size for all of the Tasks i've done of that type so far is less than 500MB.

Asking for 17 times more RAM than is necessary indicates it's not right- that makes it mis-configured. While there may be some Tasks in the batch that will need more RAM, I haven't seen any work in the past where the difference between the most & least RAM actually used has been double, let alone 17 times, the average amount.



I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use.

It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...).
So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used.
Grant
Darwin NT
ID: 101813 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,754,561
RAC: 1,307
Message 101814 - Posted: 15 May 2021, 11:27:05 UTC
Last modified: 15 May 2021, 11:28:17 UTC

hi, new sorts of tasks downloaded on 2Gb units :

77701868c29166a607c77ce7756b607a_1763459782ee1b1b0b72a3468e89a34a_1kq1A_L4L5L9L8L5L4

d41df164c88ed4dcbf254bff465f63e5_8e524ec3969623492e0a5eae34774420_3p8dA_L5L3L5L2

3df5dd9111667971bcada1d166814b33_e897b3f59ad592e5402d6abb8b3ace32_3p8dA_L8L6L6L5L1_fold_SAVE_ALL_OUT_1393099_1_0ec5c2bb9db494ab3c1d59f

2afac2d72d_f9c431c9b3a056e80ecf570278d61987_4a2iV_L4L6L6L7L4L7L4_relax_SAVE_ALL_OUT_1393099_1_08d9c39eaf118c9ff7a35f70d9874f3f4_90739e1dee

d21dc6d7ebd40926db81cf_3fp9A_L6L2L14L4L1_relax_SAVE_ALL_OUT_1393099_1_042d55c3f5f7911bff2bf18342941018d_a2fb045ec1f86b7c4804027d9a9a8aed_

1kq1A_L5L6L6L6L4_relax_SAVE_ALL_OUT_1393099_6_0

pd1_pd1_r8b09_373_fragments_fold_SAVE_ALL_OUT_1393000_304_0




and somes pictures of the 3 computer in use ...

https://www.casimages.com/i/210515013640487009.png.html
https://www.casimages.com/i/210515013640709918.png.html
https://www.casimages.com/i/210515013908653872.png.html
ID: 101814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101820 - Posted: 15 May 2021, 21:07:23 UTC - in response to Message 101813.  

I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use.

It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...).
So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used.
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).

As long as these excessive requirement values are about, it looks like it's going to be a case of whether things line up or not- a system is owed debt to Rosetta, it's got no other Tasks already running & so is able to start a larger RAM requirement Task, then the next Tasks(s) it tries to run are smaller RAM requirement Tasks, so they start up OK. So we end up with plenty of work being done.
But if things don't line up- some small RAM requirement Tasks are already running, so it can't start the large RAM requirement Task. If they have a cache (and worse yet it's a large cache) then the system may load up with more work for their other project(s). So we end up with much less Rosetta work being done, and it will be some time before their cached work for the other projects has been cleared & they can get & run more work for Rosetta- if the first Tasks they get aren't of the small RAM requirement Tasks in which case it will just continue to result in Tasks waiting for RAM (that they don't actually need).
Grant
Darwin NT
ID: 101820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101828 - Posted: 16 May 2021, 19:11:01 UTC - in response to Message 101820.  

I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use.

It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...).
So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used.
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).

As long as these excessive requirement values are about, it looks like it's going to be a case of whether things line up or not- a system is owed debt to Rosetta, it's got no other Tasks already running & so is able to start a larger RAM requirement Task, then the next Tasks(s) it tries to run are smaller RAM requirement Tasks, so they start up OK. So we end up with plenty of work being done.
But if things don't line up- some small RAM requirement Tasks are already running, so it can't start the large RAM requirement Task. If they have a cache (and worse yet it's a large cache) then the system may load up with more work for their other project(s). So we end up with much less Rosetta work being done, and it will be some time before their cached work for the other projects has been cleared & they can get & run more work for Rosetta- if the first Tasks they get aren't of the small RAM requirement Tasks in which case it will just continue to result in Tasks waiting for RAM (that they don't actually need).

I take all your points. I'm on the user side of the fence too and I superficially agree that's how it appears.

Trouble is, all the reasons and causes and needs are on the other side of the server divide, so as there's no way I can know anything about them, there's similarly no way I can tell if anything we're seeing is necessary or not. Nor am I going to tell people who are a million times more knowledgeable than me that they're setting everything up brainlessly wrong. Especially having already revisited their assumptions.
Maybe I'm too squeamish.

That 'In Progress' tasks have hit 449k (18.4% below the pre-April peak but 41.2% above the low) seems more a sign that 2Gb & 4Gb hosts are able to run more tasks because of settings made by the same guys who decided on the huge settings. That also makes me think they're doing only what they need and have reasons to do.

And the other major factor is that, if they have a particular batch of work which makes particularly large resource demands because of the nature of the questions they want answered, they're not going to stop asking those questions just because 50% or more hosts don't have the capacity to assist in answering them. Because up to 50% will have the capacity and the bottom line is getting the answer to their question and nothing else, even if that means it takes a little longer to do.

We've asked the question whether hosts with less resources can continue to contribute on some work and they've gone away and changed things so that they can, which is what kissagogo27 is telling us above - great news.
Beyond that, we get into the sphere of the project only asking for the answer to questions that are no larger than they were before. It's apparent that's not the nature of this project.
And some hosts will no longer be able to contribute here with only the same resources asked of them several years ago. And new hosts will arrive who do have those resources available.
ID: 101828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 8,895,413
RAC: 249
Message 101833 - Posted: 16 May 2021, 23:57:56 UTC - in response to Message 101820.  

I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use.

It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...).
So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used.
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).

As long as these excessive requirement values are about, it looks like it's going to be a case of whether things line up or not- a system is owed debt to Rosetta, it's got no other Tasks already running & so is able to start a larger RAM requirement Task, then the next Tasks(s) it tries to run are smaller RAM requirement Tasks, so they start up OK. So we end up with plenty of work being done.
But if things don't line up- some small RAM requirement Tasks are already running, so it can't start the large RAM requirement Task. If they have a cache (and worse yet it's a large cache) then the system may load up with more work for their other project(s). So we end up with much less Rosetta work being done, and it will be some time before their cached work for the other projects has been cleared & they can get & run more work for Rosetta- if the first Tasks they get aren't of the small RAM requirement Tasks in which case it will just continue to result in Tasks waiting for RAM (that they don't actually need).


I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.
ID: 101833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2059
Credit: 40,463,815
RAC: 885
Message 101834 - Posted: 17 May 2021, 0:53:42 UTC - in response to Message 101833.  

I think the only point I'm making is that if you think it might get better after the huge number of pre_helical_bundles tasks are worked through, I wouldn't personally bank on it.
Which means things won't get any better than they are now, and may even get worse if we get greater numbers of Tasks that are configured for such ridiculously excessive amounts of RAM above & beyond the maximum that they will actually use.

It's been, what, 12months since they announced the batch of larger RAM requirement Tasks that were going to be released? And at the time they were no more than 4GB (apart from a later batch that had some sort of memory leak...).
So why now set such high RAM requirements, when the largest Task i've seen in months has used only 1.5GB- which is still way, way less than even the lowest of the current minimum RAM requirement values being used.
Having said that, the amount of In progress work is the highest it's been since the problems started (449,519), and the Successes last 24hrs would probably be the highest it's been as well (it's a shame we don't have a graph for that value).

As long as these excessive requirement values are about, it looks like it's going to be a case of whether things line up or not- a system is owed debt to Rosetta, it's got no other Tasks already running & so is able to start a larger RAM requirement Task, then the next Tasks(s) it tries to run are smaller RAM requirement Tasks, so they start up OK. So we end up with plenty of work being done.
But if things don't line up- some small RAM requirement Tasks are already running, so it can't start the large RAM requirement Task. If they have a cache (and worse yet it's a large cache) then the system may load up with more work for their other project(s). So we end up with much less Rosetta work being done, and it will be some time before their cached work for the other projects has been cleared & they can get & run more work for Rosetta- if the first Tasks they get aren't of the small RAM requirement Tasks in which case it will just continue to result in Tasks waiting for RAM (that they don't actually need).

I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.

Yeah, it's 8Gb hosts (who should be ok tbh depending on settings) 4Gb hosts (for whom it's on the cusp) and particularly 2Gb hosts who are almost completely excluded, but ought to have bits and pieces coming through now, that are the issue. When or if hosts upgrade it'll improve over time, but never quite get back to where it was, though all these tasks ought to be much more productive than they were before on the 60% - 81% of hosts haven't been affected throughout
ID: 101834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101836 - Posted: 17 May 2021, 5:45:26 UTC - in response to Message 101828.  

And the other major factor is that, if they have a particular batch of work which makes particularly large resource demands because of the nature of the questions they want answered, they're not going to stop asking those questions just because 50% or more hosts don't have the capacity to assist in answering them. Because up to 50% will have the capacity and the bottom line is getting the answer to their question and nothing else, even if that means it takes a little longer to do.
That's all well and good- but is absolutely insane to set such high minimum requirements for Tasks that don't come anywhere close to using the amounts they are requiring as it stops many systems from being able to process them, or results in cores/threads going unused by Rosetta that are available for it's use.
As i mentioned before- we have had high RAM requirement Tasks on the project before- Tasks that required more than double the amount of RAM of any Task i have seen since this excessive configuration value issue started. And people were able to continue processing the existing Tasks at the time without issue as none of them had excessive minimum RAM or disk space requirements above & beyond what they actually required.

Setting a limit that is double what is actually required, just in case, is one thing. But to have a requirement that is 17 times larger than the largest value ever used is beyond ridiculous, and results in them having less resources to process the work they want done. If they really want this work processed, then they should make use of the resources that are available & not block systems that are capable of processing it by having unrealistic & excessive configuration values.
Grant
Darwin NT
ID: 101836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1611
Credit: 16,521,941
RAC: 2,054
Message 101837 - Posted: 17 May 2021, 5:52:43 UTC - in response to Message 101833.  

I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.
Since you make use of only half of your available cores/threads then it's not surprising that you're not having issues. If you were to use all of your cores & threads, then with so little RAM that system would be having issues just like all the others are.
Grant
Darwin NT
ID: 101837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 386
Credit: 11,703,822
RAC: 554
Message 101839 - Posted: 17 May 2021, 8:21:10 UTC - in response to Message 101837.  

I'm running Rosetta on a machine with 16gb of ram and Rosetta is running 8 tasks at once and 2 other projects are using the other 2 available cores and I'm not having any problems getting and returning tasks.
Since you make use of only half of your available cores/threads then it's not surprising that you're not having issues. If you were to use all of your cores & threads, then with so little RAM that system would be having issues just like all the others are.


Surely the point is how many cores are used for Rosetta, not how many cores are in use overall.

I run a 3700x and a 3900. They’re restricted to 5 & 6 Rosetta WUs respectively but also run 3/4 CPDN WUs and the rest of the cores are WCG or TN- Grid so all 16/24 cores are running constantly. All within 16gb / machine with zero problems.

I wouldn’t consider filling either machine with just Rosetta with or without the current config problem because of the L3 cache requirements.
ID: 101839 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Rosetta needs 6675.72 MB RAM: is the restriction really needed?



©2024 University of Washington
https://www.bakerlab.org