Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 137 · 138 · 139 · 140 · 141 · 142 · 143 . . . 276 · Next

AuthorMessage
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,017,068
RAC: 357
Message 103541 - Posted: 26 Nov 2021, 14:52:54 UTC - in response to Message 103540.  

The easiest way would be to go to the devices list page, click on "details" and scroll down till you find "VirtualBox VM jobs". Click on skip.

This way you won't receive the Rosetta Python tasks which are the ones that ask for over 7 GB of RAM but actually use a fraction of that (100 mb, etc). However, this may mean that there are times when no Rosetta work will be sent to your device because the standard Rosetta 4.2 application doesn't always have work available.
ID: 103541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103542 - Posted: 26 Nov 2021, 15:11:52 UTC - in response to Message 103541.  

The easiest way would be to go to the devices list page, click on "details" and scroll down till you find "VirtualBox VM jobs". Click on skip.

Good find. I wish they would do it the other way also, and allows us the skip the regular Rosettas.
But this is a start.
ID: 103542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103547 - Posted: 26 Nov 2021, 19:12:24 UTC - in response to Message 103370.  

I pointed that out about a year ago when MIP on WCG (which uses Rosetta) went in-house. They didn't need the crunchers any more.


I don't think so.
In-house HPC needs some points:
- a lot of performative hardware
- a big and prepared IT team
- simulations as much as possible homogeneous

Rosetta@Home has not these points. WCG (when it was IBM) has.
When IPD/BakerLab needs great computational power that cannot split on Boinc, they always use external source: AWS, Azure, TACC, etc.

But, maybe i'm wrong.....


TACC has their own supercomputer (I was trying that project for a time) to run protein folding. They toss scraps off to BOINC.
Not worth wasting your time there.
I don't know about the rest.

RAH has a neural network AI now that takes care of the majority of their work. 2 million tasks to process yet we get a little something here and there? Now they have python, but how long that will last is a good question.
I once went through the Robetta page and saw very little assigned to BOINC.
ID: 103547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103548 - Posted: 26 Nov 2021, 19:25:18 UTC - in response to Message 103520.  

Wishing you all the best with your treatments and your life.
Put your systems where you think they will work best.
If you want COVID related work, SiDock and QuChem are both out there.
If your focusing on Cancer, then stay with WCG and Mapping Cancer Markers and the Childhood Cancer projects.
My mother in law died of a rare non treatable form of abdominal cancer, so that is why I got started here, thought they were looking at cancer stuff.
Later on I discovered WCG and joined up with their cancer projects.

I don't know what all FAH has, but its random stuff for a wide variety of science from what I can tell.
ID: 103548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103551 - Posted: 26 Nov 2021, 20:45:01 UTC - in response to Message 103548.  

SiDock certainly does COVID. That is their only work at the moment, though they will move to other stuff later.

As for QuChemPedia, I don't think they do that sort of thing. But it is a good project otherwise.
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=78#709

FAH has been very BIG on COVID. I am sure that if the new Omicron variant needs them, they will be there.
https://foldingathome.org/2021/09/27/covid-moonshot-wellcome-trust-funding/?lng=en

WCG/OPNG is always short of work; they have too many crunchers. Their CPU work would be done faster on a GPU anyway.

Rosetta has done its share too, but where they are at the moment is a big question. They don't tell us.
ID: 103551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103552 - Posted: 27 Nov 2021, 0:14:41 UTC - in response to Message 103551.  

SiDock certainly does COVID. That is their only work at the moment, though they will move to other stuff later.

As for QuChemPedia, I don't think they do that sort of thing. But it is a good project otherwise.
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=78#709

FAH has been very BIG on COVID. I am sure that if the new Omicron variant needs them, they will be there.
https://foldingathome.org/2021/09/27/covid-moonshot-wellcome-trust-funding/?lng=en

WCG/OPNG is always short of work; they have too many crunchers. Their CPU work would be done faster on a GPU anyway.

Rosetta has done its share too, but where they are at the moment is a big question. They don't tell us.



WCG has plenty of CPU work. I have 6 MCM running and another 85 in queue. They are fast processing. About 90 minutes. OpenPandemics is running COVID. It runs about 2 hours on CPU. I just picked up on QuChem because it was Vbox and it looked interesting. I'm sure their research will help something related to COVID eventually or something else health related. So many things to study and model.

RAH doesn't talk about anything anymore short of the news bites on the homepage.
Dr. B used to write in his journal here about every 2 weeks or so or if they discovered something interesting. Now its nothing. It's a shame. I'll have to read the homepage sometime and see what they say. I have so many things going on I don't pay attention to the homepage.
ID: 103552 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103553 - Posted: 27 Nov 2021, 0:18:19 UTC - in response to Message 103540.  

How can I limit the number of simultaneous R@H tasks? I have 64GB installed, and R@H is consuming the lot, making other projects wait for memory..10 R@H jobs are running just now, and using all the memory. I'd prefer to limit them to, say, 4, but unlike other projects I can't see a way to influence this.



Simple version is this: you can't. You can try messing with resource share, but from what I have read, that's a long term thing and may or may not affect your CPU count. Other than this there is no way.
I messed around with app_config but that can make a mess of things. The only thing I have found is to use a bunch of other projects that have core totals and use them to occupy your system.

Jim_1348 would know more about this than me.
ID: 103553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,600,258
RAC: 15,249
Message 103554 - Posted: 27 Nov 2021, 0:28:59 UTC - in response to Message 103552.  

I'll have to read the homepage sometime and see what they say. I have so many things going on I don't pay attention to the homepage.
No news for over 12 months.
Grant
Darwin NT
ID: 103554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103555 - Posted: 27 Nov 2021, 2:13:00 UTC - in response to Message 103553.  

Simple version is this: you can't. You can try messing with resource share, but from what I have read, that's a long term thing and may or may not affect your CPU count. Other than this there is no way.
I messed around with app_config but that can make a mess of things. The only thing I have found is to use a bunch of other projects that have core totals and use them to occupy your system.

Jim_1348 would know more about this than me.

Well there isn't really a good way at the moment. Normally you would use an app_config.xml file with a "project_max_concurrent" tag, but that produces excessive downloads due to a BOINC bug. It should be fixed eventually.

You can set the resource share of each project to get the number of work units you want. But that will take a few days to stabilize, and with the different memory requirements of the regular Rosettas and the pythons, it is something of a hit-or-miss affair. You will probably have to straighten out the mess often.

I just devote an entire machine to Rosetta. The pythons will then limit themselves when they reach the maximum memory limit, though that usually is less than the full number of cores.
Or you can set the "...use at most XX% of the processors" in BOINC manager to limit the number of cores.

I like the idea of Michael Goetz, to use separate BOINC instances. You can use one for each project. That is fairly foolproof.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103516#103516

However, I have set up multiple BOINC instances before, and am familiar with it. If you have not done so, it is a bit of a hassle the first time, but easy enough afterward.
https://www.overclock.net/threads/guide-setting-up-multiple-boinc-instances.1628924/

Or just do the regular Rosettas when they are available and save yourself some hassle, and Falconet suggests.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103541#103541

Good luck.
ID: 103555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103557 - Posted: 27 Nov 2021, 8:43:59 UTC - in response to Message 103554.  

I'll have to read the homepage sometime and see what they say. I have so many things going on I don't pay attention to the homepage.
No news for over 12 months.


Not surprised
ID: 103557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103562 - Posted: 27 Nov 2021, 13:02:50 UTC - in response to Message 103555.  

Beginning to think I might just go back to 4.2 hit and miss if Python does not settle down.
I put resource share to 50% last night, but I am still getting a lot of Python.
It's leaving my system with unused cores, I think due to memory.
24 gigs and last night my computation was 22 gigs just for Boinc and the remaining amount covers FAH and system usage. Currently with 3 python its 22.88 gigs. 3 SiDock , Einstein and Prime make it 23.34 gigs. That leaves 657 MB left over. Nothing BOINC can run on that. And that leaves me with 7 cores not doing anything.
Not what I expected.
But this calcuation conflicts with HWINFO which says only 10.6 is being used and there is 13.8 free.
Windows Task manager says only 54% maximum is being used.
So why the conflicting information between BOINC and Windows?
BOINC memory is set for 100% and Processors is 99%
ID: 103562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,845,730
RAC: 1,859
Message 103564 - Posted: 27 Nov 2021, 14:14:15 UTC - in response to Message 103562.  
Last modified: 27 Nov 2021, 14:17:36 UTC

[snip]

But this calcuation conflicts with HWINFO which says only 10.6 is being used and there is 13.8 free.
Windows Task manager says only 54% maximum is being used.
So why the conflicting information between BOINC and Windows?
BOINC memory is set for 100% and Processors is 99%

I suspect that one is including memory reserved but not actually used, and one is not.

Also, BOINC has a setting for how much of the computer's memory it can use.

The rest of the memory is then left for things like the operating system (usually either Windows or Linux).
ID: 103564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103569 - Posted: 27 Nov 2021, 17:16:13 UTC - in response to Message 103564.  

[snip]

But this calcuation conflicts with HWINFO which says only 10.6 is being used and there is 13.8 free.
Windows Task manager says only 54% maximum is being used.
So why the conflicting information between BOINC and Windows?
BOINC memory is set for 100% and Processors is 99%

I suspect that one is including memory reserved but not actually used, and one is not.

Also, BOINC has a setting for how much of the computer's memory it can use.

The rest of the memory is then left for things like the operating system (usually either Windows or Linux).



Ok..so if I am giving it 100% memory and it is using it all, but not using it efficiently to run other tasks to keep the other cores busy, then what DO I set it at?? Because this isn't what I want it to do. So either resource share has to go down some more for RAH to eliminate it using 2-3 cores or change the memory settings?
ID: 103569 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,017,068
RAC: 357
Message 103572 - Posted: 27 Nov 2021, 18:11:05 UTC - in response to Message 103569.  
Last modified: 27 Nov 2021, 18:13:40 UTC

The problem is that BOINC is reserving all that RAM because the Pythons say they need all that RAM. By reserve, BOINC is simply taking into account that the Pythons say they need X RAM to run and therefore will only start other tasks with the remainder of non-reserved RAM.
The actual amount of RAM used is far lower than the one that is reserved.
RAM in reserve does not equal RAM in use which is why you see BOINC saying it doesn't have enough memory to run other tasks while Windows Task Manager says you're only using 54% of available RAM.


On my Ryzen 1400 with 16 GB of RAM, I can run 2 Pythons plus 6 MCM tasks. If I tried running Einstein@home CPU tasks, I probably couldn't run 6 because BOINC is told it needs to reserve a lot of RAM for the Pythons.

16 GB of RAM means I can run 2 Pythons with BOINC reserving 7.629 MB of RAM (from the log on my laptop which can't run these tasks) for each Python. That means I have 16384 MB - 15258 MB (Reserved for the 2 Pythons) = 1126 MB of RAM available for the 6 remaining threads, barely over 1 GB. If an Einstein@home CPU app says it needs 350 MB of RAM to run, BOINC will only run 3 of those Einstein tasks while the other 3 threads remain unused because BOINC can't find enough RAM to reserve for each of those remaining threads. With the 2 Pythons plus the 3 Einstein tasks, BOINC would only find a measly 76 MB of RAM - not enough for what a single Einstein@home task asks for. But possibly enough for some other task of some other project.
While BOINC can't find more than 76 MB of RAM, it doesn't mean that the system only has 76 MB of available RAM. It could have 10 GB available for all I know.

If it is causing too much trouble on your computer, I think you should set Rosetta to receive no new work and see if they change the amount of RAM required, which is something Admin said he would ask about. Or simply skip the Pythons and run the 4.20's whenever they are available.
ID: 103572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 103573 - Posted: 27 Nov 2021, 18:57:44 UTC - in response to Message 103572.  

The problem is that BOINC is reserving all that RAM because the Pythons say they need all that RAM. By reserve, BOINC is simply taking into account that the Pythons say they need X RAM to run and therefore will only start other tasks with the remainder of non-reserved RAM.
The actual amount of RAM used is far lower than the one that is reserved.
RAM in reserve does not equal RAM in use which is why you see BOINC saying it doesn't have enough memory to run other tasks while Windows Task Manager says you're only using 54% of available RAM.


On my Ryzen 1400 with 16 GB of RAM, I can run 2 Pythons plus 6 MCM tasks. If I tried running Einstein@home CPU tasks, I probably couldn't run 6 because BOINC is told it needs to reserve a lot of RAM for the Pythons.

16 GB of RAM means I can run 2 Pythons with BOINC reserving 7.629 MB of RAM (from the log on my laptop which can't run these tasks) for each Python. That means I have 16384 MB - 15258 MB (Reserved for the 2 Pythons) = 1126 MB of RAM available for the 6 remaining threads, barely over 1 GB. If an Einstein@home CPU app says it needs 350 MB of RAM to run, BOINC will only run 3 of those Einstein tasks while the other 3 threads remain unused because BOINC can't find enough RAM to reserve for each of those remaining threads. With the 2 Pythons plus the 3 Einstein tasks, BOINC would only find a measly 76 MB of RAM - not enough for what a single Einstein@home task asks for. But possibly enough for some other task of some other project.
While BOINC can't find more than 76 MB of RAM, it doesn't mean that the system only has 76 MB of available RAM. It could have 10 GB available for all I know.

If it is causing too much trouble on your computer, I think you should set Rosetta to receive no new work and see if they change the amount of RAM required, which is something Admin said he would ask about. Or simply skip the Pythons and run the 4.20's whenever they are available.



Ahh! very good explanation. Yes they should lower the RAM, if it is not going to be used, then why grab it?
Well then I am going to abandon Python for now and watch the threads or you could send me a message when you see something about lowering RAM requirements. It's killing my other projects.
ID: 103573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 103574 - Posted: 28 Nov 2021, 1:13:28 UTC
Last modified: 28 Nov 2021, 2:11:35 UTC

Front page - Total queued jobs: 0
server status - Tasks ready to send 27 .. Tasks in progress 85407
Is this the big cleanout of RPP memory monsters
Looks like it will be a quiet weekend for rosetta crunching
Lets hope the VM comes back with much reduced memory footprint
Till then I will pop over to cosmo and give it a quick scrub with my Vb machine
{wich haz turned into a crash test dummy of errors and aborts of late}
ID: 103574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 103576 - Posted: 28 Nov 2021, 15:40:58 UTC - in response to Message 103229.  
Last modified: 28 Nov 2021, 15:44:10 UTC

I wish they'd use KVM/QEMU instead of Virtualbox for Linux. It's the much more efficient method of virtualization on Linux that doesn't require installing external DKMS modules since it's supported directly by the Linux kernel. That said, I don't see why we're even using virtualization when a sandboxed namespace does the job just as well. Anyway, call me when there's interest in seeking open source contributors to transition from Python to Rust.


i've been wondering if some could use things like docker. that'd make do with not needing virtualization. after all python runs natively in linux. besides docker, there are things like lxc https://linuxcontainers.org/. but i'd guess setup is an issue.
but i'd guess it isn't as 'cross platform' as virtualbox.
ID: 103576 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 103577 - Posted: 28 Nov 2021, 15:55:14 UTC
Last modified: 28 Nov 2021, 16:08:45 UTC

no more work ? really?
As of 28 Nov 2021, 12:00:15 UTC [ Scheduler running ]
Total queued jobs: 0
In progress: 66,178
Successes last 24h: 48,304
Users (last day ): 1,376,949 (+11)
Hosts (last day ): 4,479,172 (+48)
Credits last 24h : 10,490,148
Total credits : 140,755,449,638
TeraFLOPS estimate: 104.901
ID: 103577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,017,068
RAC: 357
Message 103578 - Posted: 28 Nov 2021, 16:10:58 UTC - in response to Message 103577.  

I haven't seen COVID-19 work at Rosetta@home since last year - July or August?
Except for the odd Robetta work unit. - Most of Robetta work nowadays goes to RoseTTAFold so we don't crunch that.

I'm sure there will be more work soon, COVID or not.
ID: 103578 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,017,068
RAC: 357
Message 103579 - Posted: 28 Nov 2021, 16:12:46 UTC - in response to Message 103573.  

Ahh! very good explanation. Yes they should lower the RAM, if it is not going to be used, then why grab it?
Well then I am going to abandon Python for now and watch the threads or you could send me a message when you see something about lowering RAM requirements. It's killing my other projects.



I think if you subscribe to a thread you automatically receive an email notification.
ID: 103579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 137 · 138 · 139 · 140 · 141 · 142 · 143 . . . 276 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org