Single Core mode delivers far better results!

Message boards : Number crunching : Single Core mode delivers far better results!

To post messages, you must log in.

AuthorMessage
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58621 - Posted: 7 Jan 2009, 16:18:22 UTC

Here's an odd one...

I have a Core 2 Duo laptop T8300. It has 2 cores at 2.4GHz each. When I disable one core in the BIOS then the remaining core benefits from Intel Dynamic Accelaration giving it an extra 200MHz. So it is now 2.6GHz, simple right? 8.3333% faster.

If you look at my results here , look at the results returned on the 6th Jan.
They each receive ~150 credits/WU, these were the ones run in single core mode.
Now look at the ones returned 7th Jan, they each receive ~50 credits. 3 times less, these were run in dual core mode. They are all the same "abinitio_norelax_homfrag" units.
Now is it possible that because the former results had the whole cache and bus to themselves they did far more work per unit. Or is it more likely that because of the problems of the last few days that the credit "giver-outer" has had some health problems?
ID: 58621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 116,976,146
RAC: 80,753
Message 58624 - Posted: 7 Jan 2009, 17:02:52 UTC - in response to Message 58621.  

Here's an odd one...

I have a Core 2 Duo laptop T8300. It has 2 cores at 2.4GHz each. When I disable one core in the BIOS then the remaining core benefits from Intel Dynamic Accelaration giving it an extra 200MHz. So it is now 2.6GHz, simple right? 8.3333% faster.

If you look at my results here , look at the results returned on the 6th Jan.
They each receive ~150 credits/WU, these were the ones run in single core mode.
Now look at the ones returned 7th Jan, they each receive ~50 credits. 3 times less, these were run in dual core mode. They are all the same "abinitio_norelax_homfrag" units.
Now is it possible that because the former results had the whole cache and bus to themselves they did far more work per unit. Or is it more likely that because of the problems of the last few days that the credit "giver-outer" has had some health problems?

i see quite a lot of variation between different work units so i'd be careful to draw any conculsions too soon, but i'm sure the extra cache would help but isn't it turned off when you turn the core off?
ID: 58624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58631 - Posted: 7 Jan 2009, 18:21:01 UTC - in response to Message 58624.  

CPU-z still reports 3MB of cache when only one core active, if anyone else has the ability to turn of multi-core support and see if they get the same results tht would be useful.
ID: 58631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58633 - Posted: 7 Jan 2009, 18:30:36 UTC

Keep in mind that when two cores are running, you are counting it as 2 CPU seconds per second of time on your wrist watch.

So, you seem to be pointing to 50 vs 150 credits as being a 3X difference. But, because hyperthreading to two cores is recording CPU time against two tasks at a time, you should expect to see roughly a 2x difference.

So, if 1 core was getting 150 credits for a given WU run length. If I split that one into 2 HT cores, and each earns 50 credits for roughly same WU length, that's 100 credits earned in that time, because I did the same on two tasks at once. So, really, you need to be comparing 100 with 150.

And since both HT threads need floating point operations, they aren't able to run full out the way a single thread can. They also contend for the same amount of L2 cache on the CPU.

So, yes, running with HT active (i.e. 2 tasks at once) will NOT double your credit per hour. And yes, it's a bit unexpected that you didn't earn something more then 150 with two cores active. But, yes, credit between tasks varies significantly and is very difficult to benchmark. You really have to look at about a month of completed work each way to have enough data to draw any conclusions.
Rosetta Moderator: Mod.Sense
ID: 58633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 116,976,146
RAC: 80,753
Message 58641 - Posted: 7 Jan 2009, 19:31:40 UTC - in response to Message 58633.  

Keep in mind that when two cores are running, you are counting it as 2 CPU seconds per second of time on your wrist watch.

So, you seem to be pointing to 50 vs 150 credits as being a 3X difference. But, because hyperthreading to two cores is recording CPU time against two tasks at a time, you should expect to see roughly a 2x difference.

So, if 1 core was getting 150 credits for a given WU run length. If I split that one into 2 HT cores, and each earns 50 credits for roughly same WU length, that's 100 credits earned in that time, because I did the same on two tasks at once. So, really, you need to be comparing 100 with 150.

And since both HT threads need floating point operations, they aren't able to run full out the way a single thread can. They also contend for the same amount of L2 cache on the CPU.

So, yes, running with HT active (i.e. 2 tasks at once) will NOT double your credit per hour. And yes, it's a bit unexpected that you didn't earn something more then 150 with two cores active. But, yes, credit between tasks varies significantly and is very difficult to benchmark. You really have to look at about a month of completed work each way to have enough data to draw any conclusions.

it's a C2D CPU - not P4/i7! ;)
ID: 58641 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58650 - Posted: 7 Jan 2009, 21:07:25 UTC - in response to Message 58641.  

Keep in mind that when two cores are running, you are counting it as 2 CPU seconds per second of time on your wrist watch.

So, you seem to be pointing to 50 vs 150 credits as being a 3X difference. But, because hyperthreading to two cores is recording CPU time against two tasks at a time, you should expect to see roughly a 2x difference.

So, if 1 core was getting 150 credits for a given WU run length. If I split that one into 2 HT cores, and each earns 50 credits for roughly same WU length, that's 100 credits earned in that time, because I did the same on two tasks at once. So, really, you need to be comparing 100 with 150.

And since both HT threads need floating point operations, they aren't able to run full out the way a single thread can. They also contend for the same amount of L2 cache on the CPU.

So, yes, running with HT active (i.e. 2 tasks at once) will NOT double your credit per hour. And yes, it's a bit unexpected that you didn't earn something more then 150 with two cores active. But, yes, credit between tasks varies significantly and is very difficult to benchmark. You really have to look at about a month of completed work each way to have enough data to draw any conclusions.

it's a C2D CPU - not P4/i7! ;)


Indeed dcdc, so the question remains ;)

ID: 58650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58653 - Posted: 7 Jan 2009, 21:12:10 UTC
Last modified: 7 Jan 2009, 21:14:04 UTC

So... I take it that what I missed is that you actually deactivated one of the independant cores? So it's not HT.

Sounds like there are other resource contention issues. Memory, disk, L2, coprocessors... and 2GB should be plenty of memory, and Rosetta does use much disk access, so neither of those should be an issue.

...again, you need more data to expect to see comparable numbers.
Rosetta Moderator: Mod.Sense
ID: 58653 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58659 - Posted: 7 Jan 2009, 21:47:21 UTC

i am just curious, because, although i know its not conclusive proof, but the arrow certainly seems to be pointing in the way of one core could potentially do more work than two due to cache and bus issues
ID: 58659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 58663 - Posted: 8 Jan 2009, 3:58:24 UTC

My guess is that the laptop went into a low power state with a very low clock speed when both cores were enabled, and with only one core enabled it ran at normal clock speed for some reason.
ID: 58663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58666 - Posted: 8 Jan 2009, 10:15:03 UTC

Nah I made sure it was running at the same clockspeed both times :)
ID: 58666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58693 - Posted: 9 Jan 2009, 5:08:34 UTC - in response to Message 58666.  

Nah I made sure it was running at the same clockspeed both times :)


Then the implication is that you have a large contention problem in your system. Hard to know where, but, chip set and memory bus come immediately to mind. With only one core to feed the limitations do not form so much of a bottleneck. When you have two cores to feed, the minor issue of slightly slow memory or sub-optimal chip set becomes a large factor in movement of data.

Though cache covers a multitude of system sins, it cannot cover them all ... and again, if the cache sizes are small compared to the working set you will get thrashing and higher bus traffic.

I am no longer up on all the latest and have no way to be able to say with certainty as to what is the cause here ... but a couple things to look at ... one is to get a better MB (assuming this is not a laptop) and from there faster memory ... or both at the same time ...

I know at times I could get a cheap MB processor combination cheaper than the CPU alone at Frys and had a habit of buying that, and a higher quality MB ... then I would give away the e-cheapo as it was not worth my time ... YMMV
ID: 58693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58699 - Posted: 9 Jan 2009, 15:21:27 UTC

I figured it would be something to do with the cache. It is a laptop, and the processor has a 3MB cache. When running two concurrent Rosetta tasks, each task will inevitibly have only 1.5MB of cache before having to defer to memory along the sluggish FSB. I would not have thought this to translate into such a large increase in productivity though. This is not without saying this is a good thing as it avoids overheating issues and means the whole thing runs a bit quieter. While still knocking out a healthy output :)
ID: 58699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 116,976,146
RAC: 80,753
Message 58706 - Posted: 9 Jan 2009, 21:49:05 UTC
Last modified: 9 Jan 2009, 21:49:43 UTC

apparently the cache does all remain active if you disable a core on Core2 CPUs, so i guess that's where the speedup is - I thought (and i think Paul thought the same?) that the cache was per-core and so half would be disabled also...

from: http://setiathome.berkeley.edu/forum_thread.php?id=33307
ID: 58706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58707 - Posted: 9 Jan 2009, 22:12:51 UTC

Cache is used to cover up the sin of slow memory. cache hit rates run anywhere from the 90 to 99% mark depending on the working set size. Working set being the technical geek word for the amount of code and data we are currently working on.

A cache miss causes you to access the next level up of memory be it another cache or main memory ... or even the disk drive in the case of a VM miss ...

Most current L1 cache stores the actual microprogram code (some form of RISC code) that the i386 instructions are decoded into before execution (no one, to my knowledge still executes i386 code directly, it is all emulated on a "hidden" RISC architecture). L2 caches store the i386 code and may or may not have interchanges with the other caches on the die (assuming multiple cores) ... L3 caches, well, they tend to be more integrated and allow more sharing ...

Laptops, by their nature have, in general, less of everything ...

Without testing it is hard to say if the second cores cache is available when the core itself is turned off ... it is certainly possible, but, I would still guess that the off-chip communication is to blame ... you may also want to look to see if there are more power saving mode settings. Another possibility is that they are streching the clock for the main memory and it takes a couple clock cycles to get back up to speed on cache miss ... but, that saves power ... only one core running, less contention when a miss goes out to main memory ...

This is what makes computers so fun ... so many inter-acting bits and pieces ... every one touching every other one ... and laptops are one of the places where the biggest compromises are made ...

The other thing you really have to look at is if the speed up is such that the throughput goes up or not ... lots of people hated HT on their processors because run times increased per task and they could not see that the increase was swamped by running two tasks at the same time ... slightly longer run times, twice the number of tasks processed ... 1.5 times or more productivity ... I measured up to 1.8 times the throughput with the right mix of projects ... YMMV :)
ID: 58707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FoldingSolutions
Avatar

Send message
Joined: 2 Apr 06
Posts: 129
Credit: 3,506,690
RAC: 0
Message 58709 - Posted: 9 Jan 2009, 23:45:58 UTC

I think that is a pretty conclusive and well informed answer.
The t8330 in this laptop has a TDP of 35W (and actually operates quite a bit below that, as anyone who has touched a 40W light bulb could testify to), and a bit of light inference using folding@home statistics and comparisons it actually performs on par with a desktop model of the same clockspeed. The mini-rosetta application is 6MB in size so would obviously benefit from a larger cache, which as Paul says is just used to mask the incompetencies of the slow FSB in current generation Intels. Thanks to all for taking the time to think about this :)
ID: 58709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58716 - Posted: 10 Jan 2009, 19:58:52 UTC - in response to Message 58709.  

I think that is a pretty conclusive and well informed answer.
The t8330 in this laptop has a TDP of 35W (and actually operates quite a bit below that, as anyone who has touched a 40W light bulb could testify to), and a bit of light inference using folding@home statistics and comparisons it actually performs on par with a desktop model of the same clockspeed. The mini-rosetta application is 6MB in size so would obviously benefit from a larger cache, which as Paul says is just used to mask the incompetencies of the slow FSB in current generation Intels. Thanks to all for taking the time to think about this :)


For some of us, it is all we think about ... :(

One of the more fun things is to watch history repeating those techniques and hardware innovations that were first used in mainframes are now down to our lowly PCs ...

From cache memories to multiple cores/multiple CPUs, and specialty processors ...

If you want to know where PCs are going ... look to the history of the mainframe ...

What facinates me now is the seeming lack of interest in the outcome of the HP "Dynamo" project ... though that is likely because the S/W and H/W guys for PCs are not sitting in the same room (as it were) ... still, the speed gain potential from that technique was impressive ... especially when it comes at zero hardware cost ...
ID: 58716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Single Core mode delivers far better results!



©2024 University of Washington
https://www.bakerlab.org