Rosetta leaking memory (or even threads)?

Message boards : Number crunching : Rosetta leaking memory (or even threads)?

To post messages, you must log in.

AuthorMessage
Anders Sjöqvist

Send message
Joined: 23 Feb 09
Posts: 8
Credit: 541,727
RAC: 0
Message 73222 - Posted: 5 Jun 2012, 8:44:33 UTC

I have a FreeBSD machine that performs occasional cron jobs, but they are fairly cheap. I also have some things like nginx and Squid running, but they are just rejecting connections. There's also no X Windows on the machine to eat any CPU cycles. Basically, Rosetta has access to the full capacity of the computer.

The computer is not a very powerful one, with two cores and hyper-threading, 2 GB of RAM and 5 GB of swap, but I think it should be powerful enough to run Rosetta. However, apart from the four minirosetta threads running at 100%, there are 13 idle threads, the majority of which are nano-sleeping. One thread also seems starving (waiting on futex).

Last night, Rosetta consumed all of the RAM and all of the swap space, causing my logs to fill up with error messages.

Here's a list of all of the processes running on the computer, except for a few hidden system processes:

last pid: 20518;  load averages:  4.61,  4.53,  4.44                up 62+22:22:33  09:49:29
45 processes:  5 running, 40 sleeping
CPU:  0.3% user, 96.5% nice,  2.2% system,  1.1% interrupt,  0.0% idle
Mem: 1268M Active, 242M Inact, 379M Wired, 54M Cache, 213M Buf, 25M Free
Swap: 5003M Total, 2429M Used, 2574M Free, 48% Inuse

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
20365 boinc       1 155  i31   819M   735M CPU2    2  63:41 100.00% minirosetta_3.31_i6
20366 boinc       1 155  i31   819M   735M nanslp  2   0:02  0.00% minirosetta_3.31_i6
20367 boinc       1 155  i31   819M   735M nanslp  0   0:00  0.00% minirosetta_3.31_i6
19920 boinc       1 155  i31   791M   168M CPU1    1 240:32 100.00% minirosetta_3.31_i6
19921 boinc       1 155  i31   791M   168M nanslp  0   0:07  0.00% minirosetta_3.31_i6
19922 boinc       1 155  i31   791M   168M nanslp  1   0:00  0.00% minirosetta_3.31_i6
15990 boinc       1 155  i31   697M   220K nanslp  3   1:04  0.00% minirosetta_3.30_i6
35537 boinc       1 155  i31   513M  1620K nanslp  2   0:05  0.00% minirosetta_3.31_i6
28665 boinc       1 155  i31   508M   208K nanslp  1   0:52  0.00% minirosetta_3.30_i6
90066 boinc       1 155  i31   504M  1396K nanslp  1   0:34  0.00% minirosetta_3.31_i6
90065 boinc       1 155  i31   504M  1396K futex   1   0:04  0.00% minirosetta_3.31_i6
20245 boinc       1 155  i31   484M   242M RUN     0 103:07 100.00% minirosetta_3.31_i6
20246 boinc       1 155  i31   484M   242M nanslp  0   0:04  0.00% minirosetta_3.31_i6
20247 boinc       1 155  i31   484M   242M nanslp  2   0:00  0.00% minirosetta_3.31_i6
20427 boinc       1 155  i31   313M   230M CPU3    3  38:33 100.00% minirosetta_3.31_i6
20428 boinc       1 155  i31   313M   230M nanslp  0   0:01  0.00% minirosetta_3.31_i6
20429 boinc       1 155  i31   313M   230M nanslp  1   0:00  0.00% minirosetta_3.31_i6
77868 anders      2  20    0 85360K  9228K kqread  2 179:34  0.00% rtorrent
20465 anders      1  20    0 51532K  4348K select  1   0:00  0.00% sshd
20463 root        1  21    0 51532K  4316K sbwait  3   0:00  0.00% sshd
 1477 root        1  20    0 46872K   504K select  1   0:00  0.00% sshd
 1453 boinc       1 155  i31 38576K  2864K select  3 295:09  0.00% boinc_client
39018 www         1  20    0 36608K     0K kqread  3   0:01  0.00% <nginx>
39017 root        1  52    0 36608K     0K pause   2   0:00  0.00% <nginx>
77866 anders      1  20    0 23100K   312K select  2   1:34  0.00% screen
 1389 root        1  20    0 22368K   796K select  3   8:55  0.00% ntpd
 1490 smmsp       1  20    0 20420K   656K pause   0   0:04  0.00% sendmail
 1484 root        1  20    0 20420K   648K select  0   2:49  0.00% sendmail
20466 anders      1  20    0 17624K  2532K wait    0   0:00  0.00% bash
20471 anders      1  20    0 16740K  2084K CPU0    3   0:02  0.00% top
 1496 root        1  20    0 14296K   352K nanslp  2   0:40  0.00% cron
 1434 root        1  21    0 12356K   208K bpf     2  53.4H  0.00% knockd
 1215 root        1  20    0 12220K   520K select  1   0:24  0.00% syslogd
 1396 root        1  20    0 12220K   168K select  1  20:32  0.00% powerd
 1573 root        1  52    0 12220K    88K ttyin   3   0:00  0.00% getty
 1566 root        1  52    0 12220K    88K ttyin   1   0:00  0.00% getty
 1568 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1569 root        1  52    0 12220K    88K ttyin   0   0:00  0.00% getty
 1567 root        1  52    0 12220K    88K ttyin   1   0:00  0.00% getty
 1570 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1571 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1572 root        1  52    0 12220K    88K ttyin   3   0:00  0.00% getty
 1040 root        1  20    0 10372K   600K select  2   0:00  0.00% devd
 1039 _dhcp       1  20    0 10092K   612K select  0   0:06  0.00% dhclient
 1001 root        1  34    0 10092K   476K select  0   0:05  0.00% dhclient


Sometimes, things like this happen (look at the next listing). Note the memory consumption for the first process, and that there's only 8 MB of swap space available. (I guess this is what happened last night.)

last pid: 20559;  load averages:  2.98,  3.08,  3.56                up 62+22:37:40  10:04:36
43 processes:  4 running, 39 sleeping
CPU:  0.4% user, 73.0% nice,  3.1% system,  1.0% interrupt, 22.5% idle
Mem: 1403M Active, 128M Inact, 378M Wired, 54M Cache, 213M Buf, 3688K Free
Swap: 5003M Total, 4995M Used, 8116K Free, 99% Inuse, 2312K In, 308K Out

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
20365 boinc       1 155  i31  3174M  1309M swread  0  69:30  0.29% minirosetta_3.31_i6
19920 boinc       1 155  i31   798M   136M CPU2    2 253:45 99.76% minirosetta_3.31_i6
19921 boinc       1 155  i31   798M   136M nanslp  1   0:07  0.00% minirosetta_3.31_i6
19922 boinc       1 155  i31   798M   136M nanslp  1   0:00  0.00% minirosetta_3.31_i6
---------------------- cut ----------------------


Can anyone help me? Is Rosetta leaking memory, or may it even be leaking processes? Why should I have 17 Rosetta processes running, each locking up several hundred megabytes of memory? Is it normal? Shouldn't the preferences prevent BOINC from using up that much memory? Is my computer simply not powerful enough? Should I remove Rosetta to ensure stability?

The system has only been running for 62 days.

Thanks!
ID: 73222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,786,082
RAC: 1,109
Message 73223 - Posted: 5 Jun 2012, 11:30:53 UTC - in response to Message 73222.  

I have a FreeBSD machine that performs occasional cron jobs, but they are fairly cheap. I also have some things like nginx and Squid running, but they are just rejecting connections. There's also no X Windows on the machine to eat any CPU cycles. Basically, Rosetta has access to the full capacity of the computer.

The computer is not a very powerful one, with two cores and hyper-threading, 2 GB of RAM and 5 GB of swap, but I think it should be powerful enough to run Rosetta. However, apart from the four minirosetta threads running at 100%, there are 13 idle threads, the majority of which are nano-sleeping. One thread also seems starving (waiting on futex).

Can anyone help me? Is Rosetta leaking memory, or may it even be leaking processes? Why should I have 17 Rosetta processes running, each locking up several hundred megabytes of memory? Is it normal? Shouldn't the preferences prevent BOINC from using up that much memory? Is my computer simply not powerful enough? Should I remove Rosetta to ensure stability?

The system has only been running for 62 days.

Thanks!


I think the problem is your memory settings, I think you are starting a new process everytime Boinc uses up all the available free memory as new tasks use less memory. The key is to go into you account and change some lines in Computing Preferences. Specifically these are my settings:
Use at most 75% of page file (swap space)
Use at most
Enforced by version 5.8+ 85% of memory when computer is in use
Use at most
Enforced by version 5.8+ 90% of memory when computer is not in use

If you look at yours they are most likely nearer to 50% and with only 2 gig of ram, and Rosie's larger units, you will easily pass that.
ID: 73223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73233 - Posted: 5 Jun 2012, 22:44:54 UTC

Right, BOINC Manager is trying to run within the current memory constraints configured (perhaps the default settings). It is normal for R@h tasks to take a lot of memory to run. BOINC Manager apparently is starting a task and as it runs for a while it will accumulate more and more information to keep track of and grow it's memory footprint. It eventually crosses the threshold BOINC wants to enforce and so BOINC suspends that task (keeping it in memory which apparently is your configured preference, also the setting I'd recommend), and begins another task, which at first runs in less memory and so it gets to run for a while, gradually grows in memory footprint and the process repeats itself.

Many ways to address the situation, just depends upon your goals:
Configure BOINC to run on less than all of your CPUs. Allow BOINC to use more memory. Add memory to the machine. Attach to an additional project that uses less memory so that some of the active tasks will run in a smaller footprint and BOINC will intermix them with the tasks that require a larger memory footprint.
Rosetta Moderator: Mod.Sense
ID: 73233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Anders Sjöqvist

Send message
Joined: 23 Feb 09
Posts: 8
Credit: 541,727
RAC: 0
Message 73240 - Posted: 6 Jun 2012, 17:49:17 UTC - in response to Message 73233.  

Thanks for the replies, both of you! However, I don't really get why allowing less memory would make BOINC consume more. Is the limit per process, rather than for BOINC as a total?

I found a certain workunit that had previously caused an "Out Of Memory" exception on a 16GB Windows machine. This seems to have been downloaded just before I first noticed the problems.

You offered some good advice on what I should do in the future. I don't think I'll add more RAM just to run BOINC. Rather, I bought a cheap computer with low energy footprint specifically to do simple things, and I just wanted it to help humanity when it wasn't doing anything else. Limiting the number of cores or finding a second project might be good options, though, and I'll soon look into that.

For now, I first disallowed new downloads, and when it was done I shut everything off. Interestingly, different processes required different amounts of effort to shut down. A few of them, among them a couple of processes that were launched in early May, neither listen to HUP, TERM, INT nor QUIT, and I had to resort to KILL. Still, I decided that my system had become so unstable from running out of memory several times (for example, the port knocking daemon didn't work anymore), that I'd better reboot. Strangely enough, the boinc_enable="YES" that I believe has always worked before didn't work anymore, and I had to rewrite it to boinc_client_enable="YES". Weird...

It's just too bad that I had a "% of time BOINC client is running" at 99.9997%, which is now at 99.774%. :(
ID: 73240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73243 - Posted: 6 Jun 2012, 21:30:58 UTC
Last modified: 6 Jun 2012, 21:32:18 UTC

Actually Mikey was speculating that your memory setting was near 50% and suggesting you do something more like his, which is 85%. And he said "new tasks use less memory"... he probably should have added "...at first". As I said, it is normal that it grows as progress into a given model continues. ...however, yes, it sounds like you got a task that used an extraordinary amount of memory.
Rosetta Moderator: Mod.Sense
ID: 73243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,786,082
RAC: 1,109
Message 73249 - Posted: 7 Jun 2012, 12:18:22 UTC - in response to Message 73243.  

Actually Mikey was speculating that your memory setting was near 50% and suggesting you do something more like his, which is 85%. And he said "new tasks use less memory"... he probably should have added "...at first". As I said, it is normal that it grows as progress into a given model continues. ...however, yes, it sounds like you got a task that used an extraordinary amount of memory.


ModSense is correct, what I MEANT to say, and SHOULD have said, is that new tasks start out using just a little bit of memory then as they progress they use more and more memory. At some point along the way they exceed the amount your settings allow them so Boinc stops them and starts another one, after all the new one does not take as much memory! NO the units are NOT designed to tell Boinc how big they can get, so Boinc sees the small amount in the beginning and assumes it will be consistent throughout, which of course they aren't!
ID: 73249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Rosetta leaking memory (or even threads)?



©2024 University of Washington
https://www.bakerlab.org