1)
Message boards :
Number crunching :
Rosetta v4.08 x86_64-pc-linux-gnu or Rosetta v4.07 i686-pc-linux-gnu
(Message 91004)
Posted 7 Aug 2019 by Jean-David Beyer Post: The Rosetta developers have been repeatedly skeptical about my performance improvement estimates. That is not a surprise. Developers are sensitive about their work and frequently think they know more than they do. I had to explain to many compiler developers why their "really neat improvement" was not going to make the impact they forecast. The application developers are farther away from performance problems than the compiler developers. When I was working on optimizers, another part of my department was working on hardware design for a new 32-bit processor. The hardware designers were even farther away from performance problems than the compiler developers. The hardware guys found out that in a benchmark program that the marketing department thought was important, there was often a multiplication by two, so they were going to design in a special floating point multiply by two instruction. I pointed out that in a normal workload, multiplying floating point numbers by two was seldom done and furthermore, due to the construction of the benchmark program, I could guarantee that the compiler-optimizer would never generate the floating point multiply by two instruction. (The value 2 was in an external variable that could not be seen by the compiler-optimizer). I suggested that a much better use of the chip area would be to put in a larger instruction cache instead, which would be much more useful. But they would not do that; they designed their fancy new instruction, and we never generated it. |
2)
Message boards :
Number crunching :
Rosetta v4.08 x86_64-pc-linux-gnu or Rosetta v4.07 i686-pc-linux-gnu
(Message 90969)
Posted 4 Aug 2019 by Jean-David Beyer Post: Their efforts to "hyper optimize" the binary by pulling functions "inline" is based on running 1 copy on a large, idle machine. The result is "sub optimized" results when running 2 or more WU on a machine that strain a critical resource .. like the instruction cache. I am running 36 copies on a machine and the negative impact of inlining functions is pretty obvious. I guess it really depends on what compiler and optimizer is used. Long ago, a friend and I worked for Bell Labs, doing a post-compiler assembler-level optimizer for their C compiler. One of the optimizations we did was to expand functions in-line. Not if they were "too big" or obviously recursive. By itself, it could save the call return overhead that really mattered only in short fast functions. But sometimes, this also gave the optimizer a better view of what was going on. In one benchmark, a function was called 10,000 times, but the loop was outside that function. Expanded inline, the optimizer noticed that everything inside the loop had the same value each time around, so all those instructions were moved outside the loop, greatly speeding up the execution time. Then the live-dead analysis eliminated the single computation because the values were never used. Even the loop overhead and the function call and return overhead was removed. As far as running more than one instance of a program at the same time, actual RAM use could be reduced because only one instance of the code need be in RAM, independent of the number of processes using that code. And if the working sets were comparable, these days with large instruction caches (my 4-core Xeon processor has 10 MB SmartCache) the working set of the instances could well be pretty much the same, so execution time for both might not degrade at all, compared to running the programs sequentially. For short programs, this might not matter, but programs like climateprediction.net that can take weeks or months to run, this could be quite significant. |
3)
Message boards :
Number crunching :
Problems with version 5.96
(Message 53986)
Posted 25 Jun 2008 by Jean-David Beyer Post: I am close to being out of here! I started crunching Rosetta because it would run for weeks without any attention, it sure wasn't because of the way low Boinc credit. I have gotten a few "stuck jobs", if by that you mean some that get to 100% complete, time remaining: --, but still running for quite a while. I just assumed this was similar to those that run 2x or 3x longer for the last 4% than they took for the first 96%, so I let them continue to run for a while. They ultimately finished. I have not checked if they finished correctly or with an error. |
4)
Message boards :
Number crunching :
Problems with version 5.96
(Message 53973)
Posted 24 Jun 2008 by Jean-David Beyer Post: On about half of the jobs, when I reach around 95% completed progress simply crawls. To completion time stops but percentages increment extremely slowly. I assume the job is progreessing but I don;t know. I guess this is normal, but sometimes, like today, it bugs me. I have two hyperthreaded Xeons (32-bit) and 8 GBytes RAM running Linux kernel 2.6.18-92.1.1.el5PAE on one machine and two Pentium III processors and 512 MBytes RAM running Linux kernel 2.6.9-67.0.15.ELsmp on the other machine. In each case, Rosetta runs up to about 96% complete in a relatively short period of time, and time remaining is usually in the order of 10 minutes. Right now, it has used up about 8 hours since getting to 96% complete (and it took only about three hours to get to 96%). This is time actually consumed by the process, not wall-clock time. I just wish the time remaining would more accurately reflect the time needed to complete. Rosetta is not the worst offender in this regard. Some projects have the time remaining actually increasing as the time consumed increases. |
5)
Message boards :
Number crunching :
Preemption Failures on Linux
(Message 47607)
Posted 10 Oct 2007 by Jean-David Beyer Post: I, too, have problems with rosetta, and Mod.Sense succested I post here. First of all, I have two 3.06 GHz Hyperthreaded Xeon processors, 8 GBytes RAM, and a dedicated disk partition of 16 GBytes for BOINC stuff. I run Red Hat Enterprise Linux 5 with (at the moment) kernel 2.6.18-8.1.14.el5PAE. Swap space is set up as two partitions of 2 GBytes each. My network connection is Verizon FiOS with 20 Megabit/second download speed and 5 Megabit/second upload speed. I usually get these speeds. As far as BOINC is concerned, I say to leave applications in memory when they are suspended, use all 4 processors, switch applications every 60 minutes, and use at most 100% of the processor time. Use at most 15.75 GBytes of disk space, leave at least .1 GByte free, and use at most 98% available disk space. Use at most 75% of swap space, 75% of memory when computer is in use and 90% of memory when computer is not in use. (Computer is turned on about 100% of the time.) For Rosetta, I say give the application 11.11% resource share. The original problem I though I had was a rosetta application ran up about 4 hours of time, which is about what I expect, indicated that there was -- left to complete the work unit, progress 100%, and so on. But it continued running a long time (about 30 hours), really running up CPU time. I.e., it did not freeze. As Mod.Sense suggested, I stopped the BOINC client by running /etc/rc.d/init.d/boinc stop. This shut down everything _except_ the rosetta applications. I nominally had one running, but pstree revealed (in part) something like this (before shutting down): ─su───boinc─┬─hadam3_4.07_i68─┬─hadam3_um_4.07_───{hadam3_um_4.07_} │ │ └─2*[{hadam3_4.07_i68}] │ ├─2*[hadcm3trans_5.4─┬─hadcm3transum_5───{hadcm3transum_5}] │ │ └─2*[{hadcm3trans_5.4}]] │ ├─malariacontrol_───{malariacontrol_} │ ├─rosetta_beta_5.───rosetta_beta_5.───2*[rosetta_beta_5.] │ ├─setiathome-5.27───setiathome-5.27───2*[setiathome-5.27] │ └─wcg_faah_autodo───3*[{wcg_faah_autodo}] (This is one that, as far as I know, is actually running correctly.) Now this time, when everything seems to be running correctly, stopping the boinc clienit causes all the boinc applications to stop too. |
6)
Message boards :
Number crunching :
Silly Newbie Tricks - Suspending a work unit
(Message 47597)
Posted 10 Oct 2007 by Jean-David Beyer Post: Since then it has run up more than 37 hours. I propose to let it run another day or so and see what happens. Note that when I exited BOINC it did not manage to kill the rosetta processes. I seem to remember that this is always the case. Could there be a problem in either the BOINC client, or the rosetta application that makes this happen? I do not care what my preferred run time is. Would it make sense for me to increase it? |
7)
Message boards :
Number crunching :
Silly Newbie Tricks - Suspending a work unit
(Message 47582)
Posted 10 Oct 2007 by Jean-David Beyer Post: I guess I would suggest ending BOINC and restarting. Progress report, sort-of. I probably did not lose any credit, at least as yet. After the boinc client scheduler got around to it, it resumed that 100% progress work unit again and it ran quite a few hours more. Then it started another part of the same work unit (same line in boincmgr), reset the time run to 0, but still indicating 100% progress with no time remaining. Since then it has run up more than 37 hours. I propose to let it run another day or so and see what happens. |
8)
Message boards :
Number crunching :
Silly Newbie Tricks - Suspending a work unit
(Message 47458)
Posted 6 Oct 2007 by Jean-David Beyer Post: I guess I would suggest ending BOINC and restarting. I do not see why my machine would have any trouble getting memory for a BOINC application. I have 8 GBytes RAM and allow 75% of it to BOINC when the machine is busy (whatever that means) and 95% when the machine is not busy. Typically, 75% of the RAM is devoted to the input cache, although that can go down somewhat when I run a postgreSQL database application. I tried stopping BOINC and everything stopped except for the rosetta programs that kept running. The one with all the time on it was the parent of the other three. I killed them and restarted BOINC and all seems to be running normally. I assume I lost 30 hours credit for that mess. |
9)
Message boards :
Number crunching :
Silly Newbie Tricks - Suspending a work unit
(Message 47444)
Posted 6 Oct 2007 by Jean-David Beyer Post: [quote] You assume correctly. Most rosetta work units seem to complete in 5 to 8 hours for me. This one announced it was 100% complete and had no time remaining at about 5 hours, but it has now run up 22 hours 17 minutes. According to "top" command, it has consumed 1338:07 (minutes:seconds) time. If I knew it was running something important, I would just let it run, but most of this time has run up after boincmgr announced the process was complete. Also I do not understand the excess rosetta processes. PID PPID USER PR NI S VIRT RES SHR SWAP %MEM %CPU TIME+ P COMMAND 2420 4627 boinc 39 19 R 56500 45m 20 9632 0.6 74 1342:07 0 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1 4629 4627 boinc 34 19 S 35760 5900 3148 29m 0.1 0 1:07.95 0 hadcm3trans_5.41_i686-pc-linux-gnu hadcm3inct_cmus_1920_160_65869824 1085_ocean.year yafbg 2421 2420 boinc 34 19 S 56500 45m 20 9632 0.6 0 0:00.13 2 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1 2422 2421 boinc 34 19 S 56500 45m 20 9632 0.6 0 0:00.51 1 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1 2423 2421 boinc 35 19 S 56500 45m 20 9632 0.6 0 0:00.04 2 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 1 |
10)
Message boards :
Number crunching :
Silly Newbie Tricks - Suspending a work unit
(Message 47429)
Posted 6 Oct 2007 by Jean-David Beyer Post: [quote] Is rosetta@home one of these? This morning, after about 5 hours, the boincmgr indicated that rosetta@home reached 100% complete, yet it has been running about 10 hours since then. And really running, not stalled. I am running 5.8.16 of the BOINC client and boincmgr. rosetta_5.69_i686-pc-linux-gnu is the program itself. This is a Red Hat Enterprise Linux 5 system with two 3.06 GHz hyperthreaded Xeon processors and 8 GBytes RAM. $ ps -fu boinc UID PID PPID C STIME TTY TIME CMD boinc 2420 4627 86 03:52 ? 15:04:04 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c boinc 2421 2420 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c boinc 2422 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c boinc 2423 2421 0 03:52 ? 00:00:00 rosetta_beta_5.80_i686-pc-linux-gnu xx mcr1 _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -abrelax -output_c boinc 4627 4625 0 Sep29 ? 00:11:16 /home/boinc/BOINC/boinc |
11)
Questions and Answers :
Unix/Linux :
Difficulties downloading new work
(Message 39439)
Posted 15 Apr 2007 by Jean-David Beyer Post: Trying to download 5.59 work units. It has been trying to download three work units, lots of files. Each file has run up over an hour of time trying to download, but has gotten no bytes on any of them. Examining system status indicates that all servers are up and work units available. Messages indicate that servers may be down, but the Rosetta site indicates they are all up. Dual Hyperthreaded Xeon (*86) system with 8 GBytes of RAM running RHEL 3 at the moment. |
12)
Questions and Answers :
Unix/Linux :
some WU's stop executing on linux
(Message 30286)
Posted 30 Oct 2006 by Jean-David Beyer Post: G'day NilsB I also have this problem. I noticed it yesterday and it is still stuck today. Work unit 1n0u_HIGHFREQ_ABRELAX_7_1_NATIVe_ONLY_BARCODE__1312_9043_0. It has accumullated 00:58:44. The BOINC client gives it an hour of CPU from time-to-time and it seems to use none of it. I am running Red Hat Enterprise Linux 3 ES (up to date) on a dual 3.06 GHz Xeon hyperthreaded processor with 8 GBytes RAM, and this leaves one hyperthreaded processor idle all the time it is scheduled. Other Rosetta applications run just fine and one completed sometime yesterday. You say "the programme will eventually stop itself." How long is eventually? Because eventually I will wish to abort it. |
13)
Questions and Answers :
Unix/Linux :
Work unit way too slow, I think.
(Message 13627)
Posted 13 Apr 2006 by Jean-David Beyer Post: P.S.: If the Work Unit is really Hung - 1. suspend the Work Unit, BOINC Manager -> Work (tab) -> click on the Work Unit click the Suspend button (on the left hand side) then Resume button, wait for the computer to re-start the Work Unit (it will need to finish the new Work Unit it started, if it had another available) and see if it's still stuck, give it about 20min. It took more than 20 minutes because BOINC client downloaded about 7 Predictor work units and had to do them first. But after that the rosetta process got up to about 24 hours and still did not progress. 2. Shutdown BOINC, restart BOINC see if the Work Unit is still stuck, give it about 20min. After shutting down BOINC, the rosetta process kept on running, with init as the parent. I killed it and then restarted BOINC. In less than 60 seconds, the rosetta process got up to 1.01% but I do not have hope for it. 3. Reboot your computer. See if the Work Unit is still stuck, give it about 20min. I am not prepared to reboot the computer. What good would that do that shutting down BOINC, killing any leftover BOINC-owned processes, and restarting BOINC would already do? 4. Abort the Work Unit, BOINC Manager -> Work (tab) -> click on the Work Unit that's stuck click the Abort button (on the left hand side). I will consider this if it is still stuck tomorrow. |
14)
Questions and Answers :
Unix/Linux :
Work unit way too slow, I think.
(Message 13609)
Posted 13 Apr 2006 by Jean-David Beyer Post: Work unit TRUNCATE_TERMINI_FULLRELAX_1ptq_433_996_0 is taking far too long. It has used 17:07:13 as I type this and it seems to be at 1.04% complete with 20:52:22 remaining. Normally, a work unit is complete long before this. Should I kill it, or what? And if so, how? |
15)
Questions and Answers :
Getting started :
How long for Rosetta@home to get started?
(Message 2364)
Posted 5 Nov 2005 by Jean-David Beyer Post: Check your preferences, the disk space items in particular. Disk space is not the problem (I allocated 8GBytes to the boinc partition). There were two problems, one with the server being intermittantly down (turns our not to have been the major problem), and that my machine, even with two 3.06 GHz hyperthreaded Xeon processors and 4 GBytes RAM was overcommitted (with four climate prediction work units). Suspending the climate prediction stuff allowed me to download from Rosetta@home. Once that was done, I allowed climate prediction to run again (which it didn't, of course, since the deadline for the Rosetta stuff was December 1 or 2, and the deadline for the deadline for the climate prediction was January 24. I let the Rosetta work units run and they took only a brief time each (less than an hour, IIRC). So things are back to normal. So being patient would not have helped unless I were extremely patient. I assume I would not have gotten any work units from Rosetta until sometime in late January 2006 when the climate-prediction stuff cleared out. |
16)
Questions and Answers :
Getting started :
How long for Rosetta@home to get started?
(Message 2160)
Posted 3 Nov 2005 by Jean-David Beyer Post: Yesterday I registered for this project and attached to it. But I get no application program(s) and no work units. Is this normal, have I configured something incorrectly, or what? I have entries in my log saying (I wish I could copy from the boincmgr window and paste in here, but I cannot, so I hope there are no typos): Sending scheduler request to http://boinc.bakerlab.org... Reason: Requested by user Note: not requesting new work or reporting results Well why not? Sometimes it says: ... to fetch work Requesting 692100 seconds of new work No work from project |
©2021 University of Washington
https://www.bakerlab.org