Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why?
Author | Message |
---|---|
manalog Send message Joined: 8 Apr 15 Posts: 24 Credit: 233,155 RAC: 0 |
Hi all, I am totally aware that crunching on a Pentium 4 is extremely inefficient and contributes as a drop in an ocean. Nonetheless, I will be for some days (probably less than a week) in an house where is present an old Pentium 4 2,8; I do not pay the electricity and it comes from certified renewable source. Considered that I am going to do this only for a short period of time, I don't see any issue, I am doing it just for fun and to give some "glory" to this old machine. The computer runs (very smoothly actually) Debian 7; all the hardware is in very good condition and it has 2GB of Ram. Yesterday I attached it to Rosetta@Home and it started crunching a bmpr1_attempt3_0_SAVE_ALL_OUT_IGNORE_THE_REST_0se1zv2k_951179_2workunit; default runtime: 8hrs. This morning the Wu was at 98.9% of progress, after more than 15 hours of elaboration and no checkpoints were made, that is, no decoy was computed. The process was running, 100% cpu usage, 20% ram usage and, when I checked the /proc/PID/fd/2 file I saw that it was sending signal with its runtime. I killed the WU and now it started another one, tgfbR2_1_SAVE_ALL_OUT_IGNORE_THE_REST_6th1hn7q_951729_3. 16471 seconds have passed and still no checkpoint/decoy. Now, I know that the processor is old (2003) and slow (single core 2.8Ghz) BUT the absence of decoys still looks very inconsistent to me. BMPR and TGFBR2 WUs are very "easy" one and a single core of my Xeon L5420 completes hundreds of decoys in the same time. I know that the IPC of a Pentium 4 is much lower than the one of Core2Duo, but I tried to do some proportions and this behavior is just impossible. Ok, it is slower, but not one hundred time slower! More like 50% and in case I expect half decoys, not no decoys! Just for curiosity, why? Is it related to the 32 bit architecture? Moreover I have found a couple of old P4/Xeon/Celeron Family 15 in the accounts of "Fold for Covid" and "Grcpool" and they are still working. I am just curious because this thing is puzzling me. Any suggestion to a project where switching this machine for these days? Tn-Grid no because in order to support P4 with SSE2 is required a recompilation; I have done it but the binary is in another computer I cannot access it and I cannot recompile it now because yesterday in trying to update from Debian 7 to 10 I broken apt and i do not want to fix it, thus I can not install the dependencies required to compile Tn-Grid for SSE2 and I do not want to crunch it without SIMD as a matter of principle. WCG's MIP? Thank you.[/code] |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I agree, you should be able to complete some models. Just not as many per unit time. Are the tasks actually getting CPU time? Or are they just shown as "running" for their status. Normally I would tell a Windows user to go to task manager to see if they are using CPU, and to confirm with the WU properties the actual CPU time is increasing. On Linux, I guess have a look at top, and see if the tasks are getting CPU. Although you did say they ran 100% CPU. So it sounds like you've already checked this. There is nothing about an older CPU that should impact checkpointing either. As you allude to, a checkpoint is taken at least at the end of each model, and often at various times within a model as well. What have you set for your preference on how often to request tasks to checkpoint? It is always possible that the first model of the WU happened to go rogue. How did the next machine do with the reassigned tasks? Rosetta Moderator: Mod.Sense |
manalog Send message Joined: 8 Apr 15 Posts: 24 Credit: 233,155 RAC: 0 |
Indeed, CPU was around 99,8 all the time, with some slight drop and reprise some time but in the order of few point. The process was alive, for sure, my only worry was that it could have been stuck in some loop. 15 hours with no decoys is very strange, even with a Pentium 4. Now it is crunching MIP; it still has to finish its first workunit, but timing seems not too different from a single core of a old dual core laptop I succesfully used on Rosetta and that produced several decoys in eight hours (unles, of course, for the COVID workunits of April, but that was an exceptional case). I am checking other host: [url]https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1091797531[url]; it is a very powerful computer and it's set on 8 hours runtime. We will see; probably I'll give another try of Rosetta on this host. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority. Rosetta Moderator: Mod.Sense |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,183,306 RAC: 4,551 |
The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority. The task page had over 15 hours of CPU time. |
manalog Send message Joined: 8 Apr 15 Posts: 24 Credit: 233,155 RAC: 0 |
No, CPU was always doing R@H and nothing else. Even the desktop environment was shut down. Now it is running WCG-MIP workunits and the CPU time is almost the same as the elapsed time. I am waiting for my wingman's result, then I will try another workunit, choosing a task that usually produces hundres of decoys in few time on a faster cpu. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,849,790 RAC: 15,242 |
The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority. The P4 system has 2GB of memory running Windows 7. Windows wants most of that 2GB and the Rosetta WU's are paging. If you check the available memory (using TASK MANAGER or some other tool) you will find that there is only a couple hundred MB of memory available for running Rosetta. It needs double that memory or it may never finish a decoy. I would expect the disk activity to be 100%. |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,095,330 RAC: 1,272 |
It seems the same machine is now running Linux and returning results. I started long ago in 2008 on a pentium 4 @ 2,6 ghz and while the credit was much lower than nowadays, it had enough processing power to do something. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 387 Credit: 11,964,584 RAC: 12,926 |
It seems the same machine is now running Linux and returning results. I fear the researchers have made the WUs more complex since then to make use of the extra power. |
manalog Send message Joined: 8 Apr 15 Posts: 24 Credit: 233,155 RAC: 0 |
I answer just to tell you how this experiment is going (if anyone is curious about it). After having successfully crunched 178 hours of WCG's MIP (which uses an older version of Rosetta), I decided to switch the old computer on Rosetta@Home again. Now it is working properly, and after 150 hours it never got stuck again on a WU. I realized that "JHR" workunits crunch very well on this P4; on the other hand, yesterday a cd86 workunit arrived and, after 4 hours of elaboration, no decoys were made. Now it is suspended, deadline is tomorrow and probably I am going to abort it. That's weird, because a similar WU on the core of another machine fast only twice of the P4 computed more than 700 decoys in 8 hours. So I'm wondering if the problem with such an old computer is not in the speed of the process of decoys-making in itself rather than just a kind of intolerance with some WUs which on other computers run even faster than the JHR ones that my P4 loves. RAM is not an issue at it got 2GB of RAM per core, that's a very good amount for Rosetta. Bah. By the way, I am going to turn it off in no longer than one or two weeks. Crunching on a 2.8 P4 is just a completely waste of power: with a rapid proportion, I realized that this computer is 4 times slower than a single core of a mid-range i5 laptop, laptop that is quad core, thus, 16 times slower than a computer (not even brand new) that uses half electricity! I don't know if overclock it a bit, even if it is summer and here in Rome is very hot. I remember that several years ago I ran this P4 2.8 at 3.06 Ghz and it was very stable. |
Message boards :
Number crunching :
No decoys on an old Pentium 4. Just a curiosity: why?
©2024 University of Washington
https://www.bakerlab.org