No decoys on an old Pentium 4. Just a curiosity: why?

Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why?

To post messages, you must log in.

AuthorMessage
manalog

Send message
Joined: 8 Apr 15
Posts: 24
Credit: 233,155
RAC: 0
Message 97963 - Posted: 8 Jul 2020, 16:02:53 UTC

Hi all,
I am totally aware that crunching on a Pentium 4 is extremely inefficient and contributes as a drop in an ocean. Nonetheless, I will be for some days (probably less than a week) in an house where is present an old Pentium 4 2,8; I do not pay the electricity and it comes from certified renewable source. Considered that I am going to do this only for a short period of time, I don't see any issue, I am doing it just for fun and to give some "glory" to this old machine.
The computer runs (very smoothly actually) Debian 7; all the hardware is in very good condition and it has 2GB of Ram. Yesterday I attached it to Rosetta@Home and it started crunching a
bmpr1_attempt3_0_SAVE_ALL_OUT_IGNORE_THE_REST_0se1zv2k_951179_2
workunit; default runtime: 8hrs.
This morning the Wu was at 98.9% of progress, after more than 15 hours of elaboration and no checkpoints were made, that is, no decoy was computed. The process was running, 100% cpu usage, 20% ram usage and, when I checked the /proc/PID/fd/2 file I saw that it was sending signal with its runtime. I killed the WU and now it started another one,
tgfbR2_1_SAVE_ALL_OUT_IGNORE_THE_REST_6th1hn7q_951729_3
. 16471 seconds have passed and still no checkpoint/decoy.
Now, I know that the processor is old (2003) and slow (single core 2.8Ghz) BUT the absence of decoys still looks very inconsistent to me. BMPR and TGFBR2 WUs are very "easy" one and a single core of my Xeon L5420 completes hundreds of decoys in the same time. I know that the IPC of a Pentium 4 is much lower than the one of Core2Duo, but I tried to do some proportions and this behavior is just impossible. Ok, it is slower, but not one hundred time slower! More like 50% and in case I expect half decoys, not no decoys!
Just for curiosity, why? Is it related to the 32 bit architecture? Moreover I have found a couple of old P4/Xeon/Celeron Family 15 in the accounts of "Fold for Covid" and "Grcpool" and they are still working. I am just curious because this thing is puzzling me.
Any suggestion to a project where switching this machine for these days? Tn-Grid no because in order to support P4 with SSE2 is required a recompilation; I have done it but the binary is in another computer I cannot access it and I cannot recompile it now because yesterday in trying to update from Debian 7 to 10 I broken apt and i do not want to fix it, thus I can not install the dependencies required to compile Tn-Grid for SSE2 and I do not want to crunch it without SIMD as a matter of principle. WCG's MIP?
Thank you.[/code]
ID: 97963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 97965 - Posted: 8 Jul 2020, 20:31:22 UTC

I agree, you should be able to complete some models. Just not as many per unit time.

Are the tasks actually getting CPU time? Or are they just shown as "running" for their status. Normally I would tell a Windows user to go to task manager to see if they are using CPU, and to confirm with the WU properties the actual CPU time is increasing. On Linux, I guess have a look at top, and see if the tasks are getting CPU. Although you did say they ran 100% CPU. So it sounds like you've already checked this.

There is nothing about an older CPU that should impact checkpointing either. As you allude to, a checkpoint is taken at least at the end of each model, and often at various times within a model as well. What have you set for your preference on how often to request tasks to checkpoint?

It is always possible that the first model of the WU happened to go rogue. How did the next machine do with the reassigned tasks?
Rosetta Moderator: Mod.Sense
ID: 97965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
manalog

Send message
Joined: 8 Apr 15
Posts: 24
Credit: 233,155
RAC: 0
Message 97970 - Posted: 8 Jul 2020, 23:46:59 UTC - in response to Message 97965.  

Indeed, CPU was around 99,8 all the time, with some slight drop and reprise some time but in the order of few point. The process was alive, for sure, my only worry was that it could have been stuck in some loop. 15 hours with no decoys is very strange, even with a Pentium 4. Now it is crunching MIP; it still has to finish its first workunit, but timing seems not too different from a single core of a old dual core laptop I succesfully used on Rosetta and that produced several decoys in eight hours (unles, of course, for the COVID workunits of April, but that was an exceptional case).
I am checking other host: [url]https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1091797531[url]; it is a very powerful computer and it's set on 8 hours runtime. We will see; probably I'll give another try of Rosetta on this host.
ID: 97970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 97987 - Posted: 9 Jul 2020, 20:17:58 UTC

The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority.
Rosetta Moderator: Mod.Sense
ID: 97987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 353
Credit: 1,183,306
RAC: 4,551
Message 97990 - Posted: 9 Jul 2020, 22:35:27 UTC - in response to Message 97987.  

The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority.


The task page had over 15 hours of CPU time.
ID: 97990 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
manalog

Send message
Joined: 8 Apr 15
Posts: 24
Credit: 233,155
RAC: 0
Message 97993 - Posted: 10 Jul 2020, 1:17:28 UTC - in response to Message 97987.  

No, CPU was always doing R@H and nothing else. Even the desktop environment was shut down. Now it is running WCG-MIP workunits and the CPU time is almost the same as the elapsed time. I am waiting for my wingman's result, then I will try another workunit, choosing a task that usually produces hundres of decoys in few time on a faster cpu.
ID: 97993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 22,849,790
RAC: 15,242
Message 97997 - Posted: 10 Jul 2020, 14:58:24 UTC - in response to Message 97987.  
Last modified: 10 Jul 2020, 14:59:23 UTC

The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority.


The P4 system has 2GB of memory running Windows 7. Windows wants most of that 2GB and the Rosetta WU's are paging. If you check the available memory (using TASK MANAGER or some other tool) you will find that there is only a couple hundred MB of memory available for running Rosetta. It needs double that memory or it may never finish a decoy. I would expect the disk activity to be 100%.
ID: 97997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Daedalus

Send message
Joined: 1 Aug 08
Posts: 39
Credit: 10,095,330
RAC: 1,272
Message 98009 - Posted: 11 Jul 2020, 13:53:23 UTC

It seems the same machine is now running Linux and returning results.

I started long ago in 2008 on a pentium 4 @ 2,6 ghz and while the credit was much lower than nowadays, it had enough processing power to do something.
ID: 98009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 387
Credit: 11,964,584
RAC: 12,926
Message 98010 - Posted: 11 Jul 2020, 14:12:55 UTC - in response to Message 98009.  

It seems the same machine is now running Linux and returning results.

I started long ago in 2008 on a pentium 4 @ 2,6 ghz and while the credit was much lower than nowadays, it had enough processing power to do something.


I fear the researchers have made the WUs more complex since then to make use of the extra power.
ID: 98010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
manalog

Send message
Joined: 8 Apr 15
Posts: 24
Credit: 233,155
RAC: 0
Message 98407 - Posted: 3 Aug 2020, 10:03:46 UTC

I answer just to tell you how this experiment is going (if anyone is curious about it). After having successfully crunched 178 hours of WCG's MIP (which uses an older version of Rosetta), I decided to switch the old computer on Rosetta@Home again. Now it is working properly, and after 150 hours it never got stuck again on a WU. I realized that "JHR" workunits crunch very well on this P4; on the other hand, yesterday a cd86 workunit arrived and, after 4 hours of elaboration, no decoys were made. Now it is suspended, deadline is tomorrow and probably I am going to abort it. That's weird, because a similar WU on the core of another machine fast only twice of the P4 computed more than 700 decoys in 8 hours.
So I'm wondering if the problem with such an old computer is not in the speed of the process of decoys-making in itself rather than just a kind of intolerance with some WUs which on other computers run even faster than the JHR ones that my P4 loves. RAM is not an issue at it got 2GB of RAM per core, that's a very good amount for Rosetta. Bah. By the way, I am going to turn it off in no longer than one or two weeks. Crunching on a 2.8 P4 is just a completely waste of power: with a rapid proportion, I realized that this computer is 4 times slower than a single core of a mid-range i5 laptop, laptop that is quad core, thus, 16 times slower than a computer (not even brand new) that uses half electricity!
I don't know if overclock it a bit, even if it is summer and here in Rome is very hot. I remember that several years ago I ran this P4 2.8 at 3.06 Ghz and it was very stable.
ID: 98407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why?



©2024 University of Washington
https://www.bakerlab.org