Huge RAM usage by some of latest WUs

Message boards : Number crunching : Huge RAM usage by some of latest WUs

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,094,662
RAC: 13,103
Message 91678 - Posted: 12 Feb 2020, 8:32:02 UTC
Last modified: 12 Feb 2020, 9:05:43 UTC

Hello.

One of my computer crashed today. Then i start digging why - it was out of RAM.
And second was in "swap of death" state"(swapping non-stop for hours while almost not doing any useful work )
More digging - reason of out of RAM and non-stop swapping was Rosetta.

I see HUGE RAM usage by some of latest WUs. Form 1.5 to 3.5 GB of RAM per working WU.

You can see a lot of task using 1400-1600 MB of RAM currently and ~2800 MB of RAM as a peak value.
Before crash and reboot few tasks peaked at ~3200-3500 MB before system crash after running out of both RAM and disk swap space.

Usual consumption for R@H in 300-1000 MB range. Is this WUs is something completely new?
Or just bugs like memory leaks?

It all Rosetta 4.07 WUs and names start by "rb_02_xx (where xx = 29, 08, 08 and 10).
I guess it Robetta WUs generated at 29 JAN, 08 FEB, 09 FEB, 10 FEB.

I was forced to limit maximum of concurrency running R@H units using "max concurrency" setting in app config.

Some example WUs
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861215
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861165
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861118
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861128
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861130
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861138
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861090
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861114
https://boinc.bakerlab.org/rosetta/result.php?resultid=1121613378
ID: 91678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,094,662
RAC: 13,103
Message 91680 - Posted: 12 Feb 2020, 10:31:00 UTC
Last modified: 12 Feb 2020, 10:41:20 UTC

Longer they run - more RAM to consume.



Now > 3000 MB per WU after ~5 hours of running.
rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7217
and
rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7469

Looks much like memory leaks. Buy it non linear but RAM usage jump after each stage of computation finished and new begins.
Smell like data/object not released properly after use.
ID: 91680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 272,283,990
RAC: 1,873
Message 91681 - Posted: 12 Feb 2020, 11:39:05 UTC

Yes, I see the same behavior for all rb_02_08_15652_15556__xx units
ID: 91681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91683 - Posted: 12 Feb 2020, 15:22:32 UTC - in response to Message 91680.  
Last modified: 12 Feb 2020, 15:23:53 UTC

I think the moderator says that happens on the development versions. In any case, I am glad to see my memory used.
I have 16 GB on a Ryzen 2600 (using 11 cores) and 32 GB on a Ryzen 3700x (using 15 cores), and haven't run out yet, though I see over 3 GB used on several of them.

Thanks for the warning.
ID: 91683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91687 - Posted: 13 Feb 2020, 7:00:26 UTC - in response to Message 91683.  
Last modified: 13 Feb 2020, 7:01:23 UTC

I just got my first work unit suspended "waiting for memory" on my Ryzen 2600 (with 16 GB).
There was about 1 GB available.

So I will continue on my Ryzen 3700x (32 GB). That should work for the foreseeable future.
ID: 91687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 91696 - Posted: 14 Feb 2020, 6:16:56 UTC

These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.

We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.
ID: 91696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91698 - Posted: 14 Feb 2020, 10:19:41 UTC - in response to Message 91696.  

We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

Great! I couldn't ask for more. I have re-arranged my machines so that Rosetta has plenty of memory.
Throw them at us, though I am not surprised if it causes a lot of problems. I hope people check here for what is going on.
ID: 91698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nick Name

Send message
Joined: 12 Aug 09
Posts: 3
Credit: 2,487,614
RAC: 1
Message 91702 - Posted: 14 Feb 2020, 19:24:39 UTC - in response to Message 91696.  

These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.

We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

This is exciting, but these types of jobs should be accompanied by a News notice so that users aren't surprised. I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large. Most users are not going to be able to run these without problems.
Team USA page | Team USA forum
Follow us on Twitter
ID: 91702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 91703 - Posted: 14 Feb 2020, 22:11:54 UTC - in response to Message 91702.  

Agreed, sorry we didn't add a memory requirement for these jobs.
ID: 91703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jringo

Send message
Joined: 15 Aug 17
Posts: 12
Credit: 2,628,933
RAC: 0
Message 91724 - Posted: 17 Feb 2020, 12:31:57 UTC

We're using BOINC Network to spread the word that R@H is working on corona virus problems.

This is the sort of news that would be a great public driver! This news will not only bring cycles from other BOINC projects to yours (likely only temporary -- to solve an immediate and tangible problem -- so don't feel guilty), but would likely bring a significant number of people into the BOINC network at large.

Always feel free to reach out if you'd like help getting a PR made up.

Good luck on the project!

email: boinc.network@gmail.com
discord: https://discord.gg/wPRafUq
twitter: @BOINCNetwork
ID: 91724 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
retalaznstyle

Send message
Joined: 18 Feb 20
Posts: 1
Credit: 0
RAC: 0
Message 91729 - Posted: 18 Feb 2020, 2:33:56 UTC - in response to Message 91696.  

Hi, mod here from coronavirus subreddit. Do you have a post for new users who want to sign up for the coronavirus research efforts via rosetta?
ID: 91729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91730 - Posted: 18 Feb 2020, 3:24:40 UTC - in response to Message 91702.  

I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large.

Yes, exactly. That would allow the use of machines with more memory where they are needed, while the ordinary machines can do the ordinary work.

Also, is more capacity needed? Just ask and we will do it, but we need to know what the need is.
ID: 91730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 114,354,288
RAC: 52,174
Message 91731 - Posted: 18 Feb 2020, 8:31:36 UTC - in response to Message 91729.  
Last modified: 18 Feb 2020, 8:33:11 UTC

Duplicate...
ID: 91731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 114,354,288
RAC: 52,174
Message 91732 - Posted: 18 Feb 2020, 8:32:43 UTC - in response to Message 91729.  

Hi! Can you ask people just to install BOINC and choose Rosetta, and explain the following?

There is a huge pool of Rosetta tasks, so if some people were to pull out and run the Coronavirus tasks, the rest of us will just end up running more of the other tasks as that is all that would be left.

Does that make sense?

Danny
ID: 91732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JP

Send message
Joined: 18 Feb 20
Posts: 2
Credit: 40,554
RAC: 0
Message 91733 - Posted: 18 Feb 2020, 11:22:45 UTC - in response to Message 91732.  

Hi Danny,

-I am a brand new user to both BOINC & Rosetta. I was brought here from the COVID-19 Reddit post.

-I do not have a clear understanding of how tasks are distributed and/or prioritized among users.

-If it is possible, I am trying to clarify the best course of action with the most simple set of instructions to communicate to a wide, non-technical, audience on to how to best use Rosetta for COVID-19 related tasks.

-I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs.

Further, if there were the capability to allocate ones own computing power to specific (COVID-19) tasks it would force the resources of other users to be allocated to other, non-specified (non-COVID19) tasks rendering the power of any task specification moot.

-I am running Rosetta now and. unless I missed it, I do not see where I could specify or prioritize particular tasks. It appears that this is not an option anyway.

-It seems the best course of action is to simply download BOINC & run Rosetta?

Thank you for any clarification you could provide.

Best,
-JP
ID: 91733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91735 - Posted: 18 Feb 2020, 13:25:00 UTC - in response to Message 91733.  

-I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs.

Further, if there were the capability to allocate ones own computing power to specific (COVID-19) tasks it would force the resources of other users to be allocated to other, non-specified (non-COVID19) tasks rendering the power of any task specification moot.

-I am running Rosetta now and. unless I missed it, I do not see where I could specify or prioritize particular tasks. It appears that this is not an option anyway.

-It seems the best course of action is to simply download BOINC & run Rosetta?

As a long-time Rosetta user, I can answer that. Yes, you just work on the pool of all the tasks. In fact, unless you can figure out their obscure nomenclature, you don't even know which ones are for COVID-19.

That is fine with me. It doesn't matter on which machine which particular task is run, as long as they have enough resources.
And if they run out of work, then they have more than enough.
ID: 91735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JP

Send message
Joined: 18 Feb 20
Posts: 2
Credit: 40,554
RAC: 0
Message 91736 - Posted: 18 Feb 2020, 14:29:04 UTC - in response to Message 91735.  

Thank you Jim1348!

Cheers,
-JP
ID: 91736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 91737 - Posted: 18 Feb 2020, 22:25:01 UTC - in response to Message 91733.  


-It seems the best course of action is to simply download BOINC & run Rosetta?


Yes, with the expectation that your contributed effort will benefit research teams that use Rosetta to study COVID-19, as well as other protein structures. Your efforts also benefit the team at University of Washington that is developing improvements to Rosetta, which makes this type of computational structure prediction possible.
Rosetta Moderator: Mod.Sense
ID: 91737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,094,662
RAC: 13,103
Message 91747 - Posted: 19 Feb 2020, 13:36:47 UTC - in response to Message 91696.  
Last modified: 19 Feb 2020, 13:44:30 UTC

These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.

We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

So it no memory leaks, it just abnormally big (compared to R@H average work) protein model? 1273 amino acid residues if i get it right?

Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread.
Modern computer getting more and more CPU cores/thread and just running multiples copies on each thread means more and more "overhead" for RAM, Disk and Internet(Bandwidth) usage because use of all of these resources is multiplicates by number of task is running. While multi-threaded app is share all of this and only need multiple CPU/threads.

Usual(common) setup for non server computers is about 1 GB of RAM per 1 CPU thread.
2 GB per thread is much more rare cases. And there are almost no "consumer" or "office" or "home" computer with >2 GB RAM per CPU thread.
So you can not just throw task which consume >=3 GB of RAM per thread and expect that all will be working OK. There WILL be problems on majority of computer.

In other case if there is a multi-threaded app is available then using even 5-10 GB of RAM per single large model will be acceptable for most volunteer computers. Also i will help with runtimes of biggest models on older CPUs - really big models often getting aborted on old(or just slow like Intel Atom or AMD Puma/Jaguar/Bobcat) CPUs by watchdog due to exceeding max allowed runtime (8+4 = 12 hour MAX as default) before very first model/decoy is calculated and CPU time spend is wasted.
ID: 91747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91760 - Posted: 20 Feb 2020, 23:47:55 UTC - in response to Message 91747.  
Last modified: 20 Feb 2020, 23:49:35 UTC

Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread.
Modern computer getting more and more CPU cores/thread and just running multiples copies on each thread means more and more "overhead" for RAM, Disk and Internet(Bandwidth) usage because use of all of these resources is multiplicates by number of task is running. While multi-threaded app is share all of this and only need multiple CPU/threads.

Isn't the Internet bandwidth the same? With multi-threaded you run fewer work units at a time, but you download/upload correspondingly more often.

I think the only real saving is memory. Most multi-threaded projects now allow you to select how many threads (cores) you want to use on a single work unit. I usually select "1" or "2", since that is usually more efficient. Most MT projects run less efficiently the more threads you use. I am not sure why that is the case, but it is said that on some of them, one thread may finish early before the others, and have nothing to do. There may be other reasons.

I usually have plenty of memory, though having a choice is nice. But I expect that not all tasks are suitable for MT.
ID: 91760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Huge RAM usage by some of latest WUs



©2024 University of Washington
https://www.bakerlab.org