Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 146 · 147 · 148 · 149 · 150 · 151 · 152 . . . 237 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 325
Credit: 9,229,618
RAC: 427
Message 103824 - Posted: 14 Dec 2021, 15:28:02 UTC

Run time 1 days 2 hours 45 min 39 sec
CPU time 25 min 55 sec

Sitting there doing very little
ID: 103824 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MJH333

Send message
Joined: 29 Jan 21
Posts: 12
Credit: 2,413,136
RAC: 959
Message 103825 - Posted: 14 Dec 2021, 16:44:37 UTC - in response to Message 103823.  

I wonder if anyone could help me with something that has been puzzling me.

I have two 4C/4T Intel laptops running RAH. When the 4.20 tasks ran out recently I tried running Pythons on them on Linux without much success.

I then adopted the suggestion from Jim1348 of switching to Windows and trying VirtualBox 5.2.44, which worked (thank you!).

The only problem then was that I could only run 2 tasks at a time instead of 4. So I bought some more memory for one of the laptops, doubling it from 8GB to 16GB. But that laptop would still only run 3 tasks at a time.

Does anyone have any idea why that would be? I had assumed that, if I can run 2 tasks on 8GB, I would certainly be able to run 4 tasks on 16GB. But that assumption appears to have been wrong.

For the moment, I've switched back to 4.20 only, in order to use all the cores. But if anyone has any ideas as to why I can't run 4 Pythons at a time, or otherwise as to how to troubleshoot this issue, I would be very grateful.

I'm happy to buy more memory for the other laptop as well, but it seems a bit of a waste of money just to run 3 Pythons instead of 2.

Cheers,
Mark
ID: 103825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kompakki

Send message
Joined: 14 Jul 14
Posts: 3
Credit: 14,907,010
RAC: 2,059
Message 103826 - Posted: 14 Dec 2021, 17:45:02 UTC - in response to Message 103825.  

Can not say what is reason for that, but usually on my machines one Python (Virtualbox) work unit has taken 7.5GB of RAM.
ID: 103826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,565,689
RAC: 916
Message 103827 - Posted: 14 Dec 2021, 22:57:23 UTC
Last modified: 14 Dec 2021, 22:59:41 UTC

Grab Emfer Boinc Tasks program so you can monitor cpu usage and memory size of what runs on your system in BOINC. I've had a few tasks that ran 12hrs and when I looked at the cpu on Boinc Tasks I saw it was only .08 of a cpu and there was no progress in the percent run.

I share my system with other projects, but with my "massive" memory of 24GB, I was only running 3 Pythons and 2 GPU's plus FAH. The old pythons were huge memory hogs. I don't recall, think 7 gigs (ah Kompakki has the total...7.5G) a task. [This would limit you 2 tasks]
Now with Cages it is only 2861 per task.
But I am still only running 3 tasks.

But at 7gigs per task (old pythons) your 16 gigs doesn't allow for any more than 2.
When i had 3 old pythons that was 21 gigs of memory gone.

Wait and see if you get the new stuff from python.
Cages is small..only 686
aaa* stuff is 2861 per task.
That is what I am running right now.
With those kinds of numbers your machines should pick up 4 tasks.
ID: 103827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grant Morphett

Send message
Joined: 12 May 07
Posts: 4
Credit: 11,826,166
RAC: 2,230
Message 103828 - Posted: 15 Dec 2021, 1:01:26 UTC

I've had to stop getting new tasks for Rosetta. I've been processing with the project since 2007.

I have a 5950X - 32 CPU Threads with 64Gig of RAM, a 3060Ti all running Xubuntu 21.10. My system software and BOINC are fully up to date.

The issue is the same as others are encountering with the vbox64 tasks. They seem to take forever to make almost no progress. The Elapsed counter is ticking over 1second per second but the Remaining (estimated) show's 3 seconds but isn't moving.

So lets assume the estimated process remaining is busted - fine. When I look at my processor usage even though the Rosetta tasks are taking a "slot" they aren't actually using the CPU. If I set 100% CPU utilisation I expect to see 32 processes running. But if I have 4 Rosetta tasks then I only see 28 processes running (running other BOINC projects) and the 4 Rosetta tasks aren't using the CPU. When I check the Rosetta tasks with a ps I can see they aren't using any CPU.

I have 2 remaining Rosetta tasks and when I click properties they show as below:

Application rosetta python projects 1.03 (vbox64)
Name boinc_cages_IL_2728657_17535
State Running
Received Tue 07 Dec 2021 21:24:22
Report deadline Fri 10 Dec 2021 21:24:21
Estimated computation size 80,000 GFLOPs
CPU time 00:27:29
CPU time since checkpoint 00:00:00
Elapsed time 3d 18:47:54
Estimated time remaining 00:00:03
Fraction done 99.999%
Virtual memory size 1.53 GB
Working set size 7.45 GB
Directory slots/8
Process ID 4062
Progress rate 1.080% per hour
Executable vboxwrapper_26198_x86_64-pc-linux-gnu



Application
rosetta python projects 1.03 (vbox64)
Name aagb-mPPR-ACBC-LARE-B3PHE_pp_2_2649998_1
State Running
Received Wed 08 Dec 2021 11:21:11
Report deadline Sat 11 Dec 2021 11:21:10
Estimated computation size 80,000 GFLOPs
CPU time 00:27:10
CPU time since checkpoint 00:00:01
Elapsed time 3d 18:10:43
Estimated time remaining 00:00:04
Fraction done 99.999%
Virtual memory size 1.53 GB
Working set size 2.79 GB
Directory slots/3
Process ID 5141
Progress rate 1.080% per hour
Executable vboxwrapper_26198_x86_64-pc-linux-gnu


If anyone is has resolved this problem or can explain what is going wrong I'd love to hear it.

Thanks, Grant.
ID: 103828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 186
Credit: 23,196,106
RAC: 1,163
Message 103829 - Posted: 15 Dec 2021, 1:14:27 UTC

with only 27min if cpu time and over 3 days of elapsed time my way to fix that kind of tasks is the `abort` button
those are just timewasters
ID: 103829 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 41
Credit: 1,333,730
RAC: 0
Message 103830 - Posted: 15 Dec 2021, 1:16:51 UTC - in response to Message 103828.  

Those task have hung or the processing inside the VM crashed. Just abort the work unit(s).
If you don't want to run the VM tasks here, and just legacy tasks, you go to your individual computer details and look at the bottom where it says "VirtualBox VM jobs" Just turn off the option. The button is only visible if you have Virtual Box installed and the project was allowing you those tasks.
ID: 103830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grant Morphett

Send message
Joined: 12 May 07
Posts: 4
Credit: 11,826,166
RAC: 2,230
Message 103831 - Posted: 15 Dec 2021, 2:26:26 UTC - in response to Message 103829.  

Agreed but I don't want to have to "monitor" boinc tasks and manage them. I just let it run.

Thanks.
ID: 103831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grant Morphett

Send message
Joined: 12 May 07
Posts: 4
Credit: 11,826,166
RAC: 2,230
Message 103832 - Posted: 15 Dec 2021, 2:27:32 UTC - in response to Message 103830.  

Those task have hung or the processing inside the VM crashed. Just abort the work unit(s).
If you don't want to run the VM tasks here, and just legacy tasks, you go to your individual computer details and look at the bottom where it says "VirtualBox VM jobs" Just turn off the option. The button is only visible if you have Virtual Box installed and the project was allowing you those tasks.


Great - I have made that setting. We will see what happens. I really appreciate your advice.

Thanks.
ID: 103832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,565,689
RAC: 916
Message 103833 - Posted: 15 Dec 2021, 7:55:23 UTC - in response to Message 103832.  

Those task have hung or the processing inside the VM crashed. Just abort the work unit(s).
If you don't want to run the VM tasks here, and just legacy tasks, you go to your individual computer details and look at the bottom where it says "VirtualBox VM jobs" Just turn off the option. The button is only visible if you have Virtual Box installed and the project was allowing you those tasks.


Great - I have made that setting. We will see what happens. I really appreciate your advice.

Thanks.



Grant - you need to look at the memory size. Old Python use 7.5 gigs per process
New tasks are max 2 gigs.
At 7 gigs a task that will chew up your 64 really fast.
If a task does not finish within 12 hours of start time uninterrupted, then yes something went wrong in the process and you will have to abort.

The Vbox your using? Is that V5.4?
Python does not like the newest Vbox.
ID: 103833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,236,277
RAC: 1,160
Message 103837 - Posted: 15 Dec 2021, 21:45:56 UTC
Last modified: 15 Dec 2021, 21:47:20 UTC

I'm using VirtualBox 6.0, under Windows 10. The Python tasks run properly, with one oddity. They will only start if there is at least 7.45 GB of available memory, but if they reserve less, only the reserved amount counts against what is available for starting yet another Python task.
ID: 103837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,565,689
RAC: 916
Message 103838 - Posted: 15 Dec 2021, 23:08:11 UTC - in response to Message 103837.  

I'm using VirtualBox 6.0, under Windows 10. The Python tasks run properly, with one oddity. They will only start if there is at least 7.45 GB of available memory, but if they reserve less, only the reserved amount counts against what is available for starting yet another Python task.



Weird, because when I started with Python I was getting 3 at a time running (Vbox 5.4) and at really 7.38 or something like that (7.5 round) it was using up most of my memory (24) and the rest of the memory went to the system and running a few of the other CPU projects.

But as you see, the multi letter ones at the monsters. So just figure on 7.5/7.4 GB per task. And depending on your memory load from other stuff, you might only be able to run 1.
Cages work is only 686 MB. So maybe when those come your way you will see an increase in the amount of cores used.


I don't have any really good answers.
About the only thing I can think of is there might be something in the scheduler that gives out only a max of x number of units no matter the memory size. Weird...but an idea.

Grant(SSSF) or Jim1348 might be able to help you better.
ID: 103838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,107
Message 103855 - Posted: 17 Dec 2021, 20:30:35 UTC
Last modified: 17 Dec 2021, 20:35:32 UTC

I also see that the latest pythons are now down to a reasonable size (686 MB).
There are still a few of the 2861 MB ones around, but I don't see anything larger.


I'm using VirtualBox 6.0, under Windows 10. The Python tasks run properly, with one oddity. They will only start if there is at least 7.45 GB of available memory, but if they reserve less, only the reserved amount counts against what is available for starting yet another Python task.
That should change with the smaller ones, but all my machines have more memory, so I can't check it.
ID: 103855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,236,277
RAC: 1,160
Message 103856 - Posted: 17 Dec 2021, 22:42:08 UTC - in response to Message 103855.  
Last modified: 17 Dec 2021, 22:44:47 UTC

I also see that the latest pythons are now down to a reasonable size (686 MB).
There are still a few of the 2861 MB ones around, but I don't see anything larger.


I'm using VirtualBox 6.0, under Windows 10. The Python tasks run properly, with one oddity. They will only start if there is at least 7.45 GB of available memory, but if they reserve less, only the reserved amount counts against what is available for starting yet another Python task.
That should change with the smaller ones, but all my machines have more memory, so I can't check it.

You can still check it, but expect more Python tasks to be already running when it decides whether to start another one, I should have written free memory instead of available memory, I observed it on a computer with 32 GB main memory.
ID: 103856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,107
Message 103857 - Posted: 17 Dec 2021, 22:58:36 UTC - in response to Message 103856.  

You can still check it, but expect more Python tasks to be already running when it decides whether to start another one, I should have written free memory instead of available memory, I observed it on a computer with 32 GB main memory.

That is quite possible, but I am running all my pythons on Ubuntu now. It seems that "free" and "available" memory have different meanings on Linux than Windows anyway. Linux puts a lot of stuff in cache and buffers; I add to that with a large write-cache. I expect most of it can be reclaimed for running apps as necessary, in addition to the "free" memory.

Have you checked the memory settings in BOINC manager?
ID: 103857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,236,277
RAC: 1,160
Message 103858 - Posted: 17 Dec 2021, 23:03:26 UTC - in response to Message 103857.  

You can still check it, but expect more Python tasks to be already running when it decides whether to start another one, I should have written free memory instead of available memory, I observed it on a computer with 32 GB main memory.

That is quite possible, but I am running all my pythons on Ubuntu now. It seems that "free" and "available" memory have different meanings on Linux than Windows anyway. Linux puts a lot of stuff in cache and buffers; I add to that with a large write-cache. I expect most of it can be reclaimed for running apps as necessary, in addition to the "free" memory.

Have you checked the memory settings in BOINC manager?

I've checked them many times, usually during the changes I made to allow more Python tasks to run at the same time.
ID: 103858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,107
Message 103859 - Posted: 17 Dec 2021, 23:46:02 UTC - in response to Message 103858.  

I've checked them many times, usually during the changes I made to allow more Python tasks to run at the same time.

I figured you had. It is such a strange problem we can only hope that it will go away.
But if not, I would detach from Rosetta and try again. There may be something corrupted.
ID: 103859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 77
Credit: 4,124,489
RAC: 4,237
Message 103861 - Posted: 18 Dec 2021, 17:20:58 UTC - in response to Message 103857.  

That is quite possible, but I am running all my pythons on Ubuntu now. It seems that "free" and "available" memory have different meanings on Linux than Windows anyway. Linux puts a lot of stuff in cache and buffers; I add to that with a large write-cache. I expect most of it can be reclaimed for running apps as necessary, in addition to the "free" memory.


I am running Red Hat Enterprise Linux release 8.5 (Ootpa) release of Linux.
If I run the free program, it gives the following. I am running very few Boinc tasks at the moment.
$ free
           total        used       free          shared    buff/cache  available
Mem:       65435808     6337708     2390400      142808    56707700    58224084
Swap:      16375804      932608    15443196


So I have about 64 Gigabytes of RAM (total), 6.3 GBytes are used, about 2.4 GBytes are free,
142 MBytes are shared (mostly shared libraries, I suppose), 56 GBytes are output buffers and input cache. These last 58 GBytes (the sum of the free ones and the avaiable ones) are available. The input cache can be claimed immediately, but the output buffers must be witten out before they can be used. Anyhow, those free and available are considerable (at the moment. When running 8 Boinc tasks, these figures will change somewhat.

I have about 15 GBytes of disk allocated for paging (swap), of which almost 1 GBytes are used.
ID: 103861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,107
Message 103862 - Posted: 18 Dec 2021, 19:24:16 UTC - in response to Message 103861.  

I have a Ryzen 3900X running all python boinc_cages on 23 cores at the moment under Ubuntu 20.04.3
It has 96 GB of memory.

$ free
              total        used        free      shared  buff/cache   available
Mem:       98808092    35710244     3313196       18312    59784652    62096680
Swap:       1952764        4096     1948668

About the only thing I learn from it is that I have much more memory than I need now.
But it allows me to have a 20 GB write cache, with a 12-hour latency. That should do it.
ID: 103862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grant Morphett

Send message
Joined: 12 May 07
Posts: 4
Credit: 11,826,166
RAC: 2,230
Message 103870 - Posted: 20 Dec 2021, 22:09:10 UTC - in response to Message 103833.  
Last modified: 20 Dec 2021, 22:13:10 UTC

I'm using vbox v6.1. My job requires a lot of virtualisation so I'm always running the latest stable virtualbox.

Interesting that python doesn't like it. I'm running python 3.9.7 which I am guessing isn't "old python".

Any legacy Rosetta is running fine after I made the changes suggested above so I'm happy for the moment. Again, BOINC for me is set and forget - the only thing I adjust is the %CPU it uses on my desktop whilst I'm working. Overnight 100%, email, doco etc it gets 80%, if I'm coding it gets none.

Thanks, Grant.
ID: 103870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 146 · 147 · 148 · 149 · 150 · 151 · 152 . . . 237 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org