Posts by Anders Sjöqvist

1) Message boards : Number crunching : Rosetta leaking memory (or even threads)? (Message 73240)
Posted 6 Jun 2012 by Anders Sjöqvist
Post:
Thanks for the replies, both of you! However, I don't really get why allowing less memory would make BOINC consume more. Is the limit per process, rather than for BOINC as a total?

I found a certain workunit that had previously caused an "Out Of Memory" exception on a 16GB Windows machine. This seems to have been downloaded just before I first noticed the problems.

You offered some good advice on what I should do in the future. I don't think I'll add more RAM just to run BOINC. Rather, I bought a cheap computer with low energy footprint specifically to do simple things, and I just wanted it to help humanity when it wasn't doing anything else. Limiting the number of cores or finding a second project might be good options, though, and I'll soon look into that.

For now, I first disallowed new downloads, and when it was done I shut everything off. Interestingly, different processes required different amounts of effort to shut down. A few of them, among them a couple of processes that were launched in early May, neither listen to HUP, TERM, INT nor QUIT, and I had to resort to KILL. Still, I decided that my system had become so unstable from running out of memory several times (for example, the port knocking daemon didn't work anymore), that I'd better reboot. Strangely enough, the boinc_enable="YES" that I believe has always worked before didn't work anymore, and I had to rewrite it to boinc_client_enable="YES". Weird...

It's just too bad that I had a "% of time BOINC client is running" at 99.9997%, which is now at 99.774%. :(
2) Message boards : Number crunching : Rosetta leaking memory (or even threads)? (Message 73222)
Posted 5 Jun 2012 by Anders Sjöqvist
Post:
I have a FreeBSD machine that performs occasional cron jobs, but they are fairly cheap. I also have some things like nginx and Squid running, but they are just rejecting connections. There's also no X Windows on the machine to eat any CPU cycles. Basically, Rosetta has access to the full capacity of the computer.

The computer is not a very powerful one, with two cores and hyper-threading, 2 GB of RAM and 5 GB of swap, but I think it should be powerful enough to run Rosetta. However, apart from the four minirosetta threads running at 100%, there are 13 idle threads, the majority of which are nano-sleeping. One thread also seems starving (waiting on futex).

Last night, Rosetta consumed all of the RAM and all of the swap space, causing my logs to fill up with error messages.

Here's a list of all of the processes running on the computer, except for a few hidden system processes:

last pid: 20518;  load averages:  4.61,  4.53,  4.44                up 62+22:22:33  09:49:29
45 processes:  5 running, 40 sleeping
CPU:  0.3% user, 96.5% nice,  2.2% system,  1.1% interrupt,  0.0% idle
Mem: 1268M Active, 242M Inact, 379M Wired, 54M Cache, 213M Buf, 25M Free
Swap: 5003M Total, 2429M Used, 2574M Free, 48% Inuse

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
20365 boinc       1 155  i31   819M   735M CPU2    2  63:41 100.00% minirosetta_3.31_i6
20366 boinc       1 155  i31   819M   735M nanslp  2   0:02  0.00% minirosetta_3.31_i6
20367 boinc       1 155  i31   819M   735M nanslp  0   0:00  0.00% minirosetta_3.31_i6
19920 boinc       1 155  i31   791M   168M CPU1    1 240:32 100.00% minirosetta_3.31_i6
19921 boinc       1 155  i31   791M   168M nanslp  0   0:07  0.00% minirosetta_3.31_i6
19922 boinc       1 155  i31   791M   168M nanslp  1   0:00  0.00% minirosetta_3.31_i6
15990 boinc       1 155  i31   697M   220K nanslp  3   1:04  0.00% minirosetta_3.30_i6
35537 boinc       1 155  i31   513M  1620K nanslp  2   0:05  0.00% minirosetta_3.31_i6
28665 boinc       1 155  i31   508M   208K nanslp  1   0:52  0.00% minirosetta_3.30_i6
90066 boinc       1 155  i31   504M  1396K nanslp  1   0:34  0.00% minirosetta_3.31_i6
90065 boinc       1 155  i31   504M  1396K futex   1   0:04  0.00% minirosetta_3.31_i6
20245 boinc       1 155  i31   484M   242M RUN     0 103:07 100.00% minirosetta_3.31_i6
20246 boinc       1 155  i31   484M   242M nanslp  0   0:04  0.00% minirosetta_3.31_i6
20247 boinc       1 155  i31   484M   242M nanslp  2   0:00  0.00% minirosetta_3.31_i6
20427 boinc       1 155  i31   313M   230M CPU3    3  38:33 100.00% minirosetta_3.31_i6
20428 boinc       1 155  i31   313M   230M nanslp  0   0:01  0.00% minirosetta_3.31_i6
20429 boinc       1 155  i31   313M   230M nanslp  1   0:00  0.00% minirosetta_3.31_i6
77868 anders      2  20    0 85360K  9228K kqread  2 179:34  0.00% rtorrent
20465 anders      1  20    0 51532K  4348K select  1   0:00  0.00% sshd
20463 root        1  21    0 51532K  4316K sbwait  3   0:00  0.00% sshd
 1477 root        1  20    0 46872K   504K select  1   0:00  0.00% sshd
 1453 boinc       1 155  i31 38576K  2864K select  3 295:09  0.00% boinc_client
39018 www         1  20    0 36608K     0K kqread  3   0:01  0.00% <nginx>
39017 root        1  52    0 36608K     0K pause   2   0:00  0.00% <nginx>
77866 anders      1  20    0 23100K   312K select  2   1:34  0.00% screen
 1389 root        1  20    0 22368K   796K select  3   8:55  0.00% ntpd
 1490 smmsp       1  20    0 20420K   656K pause   0   0:04  0.00% sendmail
 1484 root        1  20    0 20420K   648K select  0   2:49  0.00% sendmail
20466 anders      1  20    0 17624K  2532K wait    0   0:00  0.00% bash
20471 anders      1  20    0 16740K  2084K CPU0    3   0:02  0.00% top
 1496 root        1  20    0 14296K   352K nanslp  2   0:40  0.00% cron
 1434 root        1  21    0 12356K   208K bpf     2  53.4H  0.00% knockd
 1215 root        1  20    0 12220K   520K select  1   0:24  0.00% syslogd
 1396 root        1  20    0 12220K   168K select  1  20:32  0.00% powerd
 1573 root        1  52    0 12220K    88K ttyin   3   0:00  0.00% getty
 1566 root        1  52    0 12220K    88K ttyin   1   0:00  0.00% getty
 1568 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1569 root        1  52    0 12220K    88K ttyin   0   0:00  0.00% getty
 1567 root        1  52    0 12220K    88K ttyin   1   0:00  0.00% getty
 1570 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1571 root        1  52    0 12220K    88K ttyin   2   0:00  0.00% getty
 1572 root        1  52    0 12220K    88K ttyin   3   0:00  0.00% getty
 1040 root        1  20    0 10372K   600K select  2   0:00  0.00% devd
 1039 _dhcp       1  20    0 10092K   612K select  0   0:06  0.00% dhclient
 1001 root        1  34    0 10092K   476K select  0   0:05  0.00% dhclient


Sometimes, things like this happen (look at the next listing). Note the memory consumption for the first process, and that there's only 8 MB of swap space available. (I guess this is what happened last night.)

last pid: 20559;  load averages:  2.98,  3.08,  3.56                up 62+22:37:40  10:04:36
43 processes:  4 running, 39 sleeping
CPU:  0.4% user, 73.0% nice,  3.1% system,  1.0% interrupt, 22.5% idle
Mem: 1403M Active, 128M Inact, 378M Wired, 54M Cache, 213M Buf, 3688K Free
Swap: 5003M Total, 4995M Used, 8116K Free, 99% Inuse, 2312K In, 308K Out

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
20365 boinc       1 155  i31  3174M  1309M swread  0  69:30  0.29% minirosetta_3.31_i6
19920 boinc       1 155  i31   798M   136M CPU2    2 253:45 99.76% minirosetta_3.31_i6
19921 boinc       1 155  i31   798M   136M nanslp  1   0:07  0.00% minirosetta_3.31_i6
19922 boinc       1 155  i31   798M   136M nanslp  1   0:00  0.00% minirosetta_3.31_i6
---------------------- cut ----------------------


Can anyone help me? Is Rosetta leaking memory, or may it even be leaking processes? Why should I have 17 Rosetta processes running, each locking up several hundred megabytes of memory? Is it normal? Shouldn't the preferences prevent BOINC from using up that much memory? Is my computer simply not powerful enough? Should I remove Rosetta to ensure stability?

The system has only been running for 62 days.

Thanks!
3) Message boards : Number crunching : Weird benchmarking results, or am I doing something wrong? (Message 71906)
Posted 28 Dec 2011 by Anders Sjöqvist
Post:
It is winter up north... you could save money by using the heater less?


Heh, maybe. But firstly, yesterday (Tuesday) we had a maximum of 11 °C (52 °F), which I don't consider too cold. Secondly, I'm going away tomorrow morning and will be away for a couple of weeks. I don't need additional heating when I'm away. Thirdly, heating is included in my rent, but not electricity. :)

I wonder how much more a PC uses when it steps up to a higher frequency.


Some guy measured and wrote about it in the FreeBSD forum. I got the impression that I should optimally run the CPU at about half of the maximum speed.
4) Message boards : Number crunching : Weird benchmarking results, or am I doing something wrong? (Message 71905)
Posted 28 Dec 2011 by Anders Sjöqvist
Post:
You can get a fairly good idea if you copy a page of valid results and paste them into your favorite spreadsheet program, and then for each record divide the granted credit by the cpu time (to give the credit per cpu-second), then multiply by 3600 x 24 to give RAC per core per day, then take the average of those values and multiply by the number of threads of Rosetta running...


Thanks a lot! This calculation gives me a RAC approximation of 294 for the old Athlon with one core, and a total of 452 for the two cores with hyper-threading on my Atom. So the new one is a bit faster, but not that much.

Still, I can't really figure the credits. For task 472923334, I claimed 47 credits but was granted only 2.4, although the program ran for over 25000 seconds. There was apparently a SIGSEGV. Does that have anything to do with it? Is there something wrong with my computer?

Another example is work unit 429608498, where we were two computers to do the work. I claimed 42 credits but was granted only 11.7, while the other one was granted 43. How come?

This is what makes me nervous. I'm afraid that something's wrong with my setup.
5) Message boards : Number crunching : Weird benchmarking results, or am I doing something wrong? (Message 71883)
Posted 25 Dec 2011 by Anders Sjöqvist
Post:
Thanks for your answers, guys!

My understanding is that in Linux the different levels of 'nice' are the background tasks whereas 'user' is equivalent to normal priority in Windows. Basically, the higher the nice value, the lower the priority so BOINC tasks have a high nice value (although BOINC itself doesn't). I'm not sure where i31 comes from as nice runs from -20 to 19... is that data from your BSD machine?


I was also confused by it, but yes, the data was from the BSD machine.

[EDIT] Just noticed I mis-read - it's two cores with hyperthreading (so 4 threads) rather than 1 core, 2 threads. My guess is your two computers would get a similar RAC on Rosetta but would be interested to see...


I don't know how to make the RAC comparable. The FreeBSD machine (with the Atom processor) right now has a RAC that's about four times as high, but then it's been running for about two days more.

The Athlon computer is one of my laptops, so I don't expect to run BOINC on it that much. I just wanted to compare the results, as I happened to read somewhere that Rosetta isn't supposed to work on FreeBSD. However, I just had to install the port and attach to the project, and that was it. My concern was thus that the program would be running in some emulation mode wasting 90% of the resources, or that there'd be some computation error.

I bought the new computer so that it could be a cheap and environmentally friendly server, running 24/7. I was considering running Rosetta if it could run given the current clock speed, instead of triggering an increase, but I don't know how to do that.
6) Questions and Answers : Unix/Linux : Work can't be restarted after reboot on Ubuntu 11.10 (Message 71864)
Posted 23 Dec 2011 by Anders Sjöqvist
Post:
This sorted itself out. Here's what the installation instructions said:

After the installation is finished the daemon is configured to start up automatically every time the computer is turned on. You can temporarily disable or re-enable this by modifying a setting in the file /etc/default/boinc-client:
# Set this to 1 to enable and to 0 to disable the init script.
ENABLED="1"

I didn't want the program to start automatically, but I still wanted to have the option of starting it whenever I wanted to. It turned out that changing that flag to 0 made it impossible to start it at all, not just "when the computer is turned on", as stated in the instructions. After I changed it to 1, the program started without any fuss.
7) Message boards : Number crunching : Weird benchmarking results, or am I doing something wrong? (Message 71859)
Posted 22 Dec 2011 by Anders Sjöqvist
Post:
I recently started running Rosetta@home on two computers (one of them stopped working, but that's beside the point). The older one has an AMD Athlon QI-46 processor running at 2.1GHz with one core. The benchmarking says "Measured floating point speed: 1922.96 million ops/sec" and "Measured integer speed: 8611.64 million ops/sec". The new one has an Intel Atom CPU D510 running at 1.66GHz. Sure, it's slower, but with two cores and hyper-threading I'd expect it to be somewhat quicker than the other. However, the benchmarking says "Measured floating point speed: 921.31 million ops/sec" and "Measured integer speed: 2260.48 million ops/sec". The older one is twice as fast at floating point calculations and almost four times as fast with integers. How can the new one be so slow?

Don't know if it matters, but BOINC admits to getting a lot of resources. It says "% of time BOINC client is running: 100 %" and "While BOINC running, % of time work is allowed: 99.9964 %". Here's what top says:
last pid:  3360;  load averages:  5.08,  4.97,  4.90              up 0+13:13:32  13:21:50
36 processes:  5 running, 31 sleeping
CPU:  0.0% user, 99.1% nice,  0.7% system,  0.2% interrupt,  0.0% idle
Mem: 1384M Active, 122M Inact, 357M Wired, 55M Cache, 213M Buf, 51M Free
Swap: 5003M Total, 10M Used, 4993M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 3080 boinc       1 155  i31   473M   381M CPU2    2 137:54 100.00% minirosetta_3.19_i6
 3104 boinc       1 155  i31   483M   391M RUN     0 127:55 100.00% minirosetta_3.19_i6
 3139 boinc       1 155  i31   467M   375M CPU3    3 106:34 100.00% minirosetta_3.19_i6
 3101 boinc       1 155  i31   465M   373M CPU1    1 128:34 99.56% minirosetta_3.19_i6
 3360 anders      1  20    0 16740K  2088K CPU0    3   0:00  0.10% top
 1417 boinc       1 155  i31 34460K  4504K select  2   2:46  0.00% boinc_client
 1374 root        1  20    0 12220K   768K select  2   0:13  0.00% powerd
 1367 root        1  20    0 22368K  1580K select  1   0:05  0.00% ntpd
 3081 boinc       1 155  i31   473M   381M nanslp  2   0:03  0.00% minirosetta_3.19_i6
 3102 boinc       1 155  i31   465M   373M nanslp  3   0:03  0.00% minirosetta_3.19_i6
 3105 boinc       1 155  i31   483M   391M nanslp  0   0:03  0.00% minirosetta_3.19_i6
 3140 boinc       1 155  i31   467M   375M nanslp  0   0:03  0.00% minirosetta_3.19_i6
 1451 root        1  20    0 20420K  1532K select  0   0:02  0.00% sendmail
 1461 root        1  20    0 14296K  1020K nanslp  2   0:00  0.00% cron
 1193 root        1  20    0 12220K   952K select  2   0:00  0.00% syslogd

I don't know what it means that "nice" uses up all the resources instead of "user" (I've googled it but didn't find an answer). Does it matter? Is there anything I can do to speed it up, or is there something that I've misunderstood?

Thanks!
8) Questions and Answers : Unix/Linux : Work can't be restarted after reboot on Ubuntu 11.10 (Message 71857)
Posted 22 Dec 2011 by Anders Sjöqvist
Post:
I recently installed BOINC using packages on my new Ubuntu system. I could start it and connect to Rosetta, but before finishing the first work unit I needed to restart my computer. I shut down BOINC Manager and checked the box "Stop running science applications when exiting the Manager". After reboot I couldn't get it started again. BOINC Manager can't connect to localhost, and I've read a lot of instructions trying to get it working from the command line. "sudo /etc/init.d/boinc-client restart" won't start anything, for example (I checked with "ps -A"). Neither BOINC Manager nor "boinccmd --get_state" claim to be able to connect. Also tried running "boinc". It starts, but claims that the computer isn't connected to any projects, and even then BOINC Manager can't connect to localhost.

How do I get it started again? I only have a couple of more days before the deadline of my work units.

Thanks (and sorry for the newbie questions)!






©2024 University of Washington
https://www.bakerlab.org