Boinc on VMWare?

Message boards : Number crunching : Boinc on VMWare?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 29902 - Posted: 23 Oct 2006, 22:12:27 UTC
Last modified: 23 Oct 2006, 22:13:52 UTC

Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?

ID: 29902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29925 - Posted: 24 Oct 2006, 8:49:38 UTC - in response to Message 29902.  

Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?



I've ran BOINC on a desktop in VMWARE, it works but for your configuration I have no idea . Best bet will be to test with two computers.

Though can you not install it directly onto the blade itself ?

If there is more than one VM running on a single blade they will most likely fight for processor usage but that depends on how the VM are running, I'm not sure how they run in that sort of setup.


Team mauisun.org
ID: 29925 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 29931 - Posted: 24 Oct 2006, 9:49:33 UTC

One of the interesting problems with Virtualization is the priority of the different guests.

If you run two VMWare guests, one with BOINC in it, and another doing some other task, the VMM will see two guests, one of which is hogging the CPU, and the other doing whatever it is doing. Depending on how the scheduler works, it may give more time to the Rosetta than the owner of the other VM wants to give... It does depend on the ability to configure the scheduling whether you can get that to work smoothly or not - I suspect you don't want someone else running an "overnight test" to suffer from running Rosetta, right?

--
Mats
ID: 29931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 29937 - Posted: 24 Oct 2006, 15:03:28 UTC

Most testing is done during the day, but you are right on the overnight testing.

I figure I will just run Rosetta on a few blades that never get used at night and not worry about the rest. This would still give me around 8 more physical servers running Boinc.

Thanks for the input
ID: 29937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29939 - Posted: 24 Oct 2006, 15:22:17 UTC - in response to Message 29937.  

Most testing is done during the day, but you are right on the overnight testing.

I figure I will just run Rosetta on a few blades that never get used at night and not worry about the rest. This would still give me around 8 more physical servers running Boinc.

Thanks for the input


Hardly seems worth it for only eight more ;-)

What 'blades' are they (cpu wise).

Though one advantage is you only need to create 1 VM image. You then just copy that image across to the other servers. I'd guess a small or tiny linux install would give you the smallest footprint.


I don't know how into VMWare you are but VMWorld is very shortly, you may want to take a look.
Team mauisun.org
ID: 29939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 29940 - Posted: 24 Oct 2006, 15:27:29 UTC

They are Intel(R) Xeon(TM) CPU 3.40GHz X 2 per blade.
HP BL20Ps, but last year's model so no duel cores.

I guess a small linux would be good.

On the VMWorld, I received the invite, and I am currently pestering my manger to flip the bill. :-)

J.G.

ID: 29940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 29951 - Posted: 24 Oct 2006, 18:20:45 UTC - in response to Message 29902.  

Has anyone ran the BOINC client on a VM Windows Server?



Hi Jaxom1: I am. Contact me via the team BB.

dag
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 29951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30105 - Posted: 27 Oct 2006, 12:25:13 UTC - in response to Message 29902.  
Last modified: 27 Oct 2006, 12:29:23 UTC

Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?


Yes.. but, I recommend the VMWARE-ESX (native machine) or the VMWARE-LINUX... rather than the Win2K version... Neither of the two I mentioned are susceptable to Microsoft Viruses.. Yes, A client partition could get infected, but, the remainder of your frame will be clean..

VMWARE has a server version in development and they will basically give you the keys to it.... It's a bit more manual work than the ESX version, but, if it does not cost you anything, it's, at least a good learning tool...

VMWARE's emulation is very good as far as emulations go. I have had code run faster and better on a VM, than on native hardware. I attribute this to very diligently optimized routines under the hood of VMWARE and the fact that a lot of the instructions are shared when running multiples of the same client OS.

As far as number crunching... I would not expect a lot. Data sets for BOINC nearly always exceed available cache and force cache-data-miss(es).. you will be accessing main memory for your data. I have had mixed performance under VMWARE.. I have played with partition caches and system caches and still don't feel I have maximized performance yet. But, you might be surprised how well it runs. Don't take my information as a deterrent. Give it a shot.. Who knows, it could be my platform that sucks!!


I am still looking (both inside and outside of VMWARE) for that one best crunching platform.. I am thinking that it's going to turn out to be some form of AMD Hammer processor with disabled L2 caches and 32 bit OS code, but, I will keep at my efforts with as open a mind as possible..


Looking for a team ??? Join BoincSynergy!!


ID: 30105 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 30107 - Posted: 27 Oct 2006, 13:44:10 UTC - in response to Message 30105.  

Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?


Yes.. but, I recommend the VMWARE-ESX (native machine) or the VMWARE-LINUX... rather than the Win2K version... Neither of the two I mentioned are susceptable to Microsoft Viruses.. Yes, A client partition could get infected, but, the remainder of your frame will be clean..

Agreed.

VMWARE has a server version in development and they will basically give you the keys to it.... It's a bit more manual work than the ESX version, but, if it does not cost you anything, it's, at least a good learning tool...

VMWARE's emulation is very good as far as emulations go. I have had code run faster and better on a VM, than on native hardware. I attribute this to very diligently optimized routines under the hood of VMWARE and the fact that a lot of the instructions are shared when running multiples of the same client OS.

There's certainly opurtunities for "rewriting the OS" with the method that VMWare uses, which man share things, etc.


As far as number crunching... I would not expect a lot. Data sets for BOINC nearly always exceed available cache and force cache-data-miss(es).. you will be accessing main memory for your data. I have had mixed performance under VMWARE.. I have played with partition caches and system caches and still don't feel I have maximized performance yet. But, you might be surprised how well it runs. Don't take my information as a deterrent. Give it a shot.. Who knows, it could be my platform that sucks!!

Sorry to say this, but now you're either confused or "talking rubbish". Rosetta will have L2-cache-misses if you have less than 512KB of cache in the processor (or you're running unusually large models).

Partition caches and system caches have ABSOLLUTELY NOTHING to do with the processor caches, however, so you're probably confusing some other forms of caches with the processor caches - perhaps it's something to do with the "binary translation" that VMWare uses to run the "strange x86-instructions" in virtual machines?

The only real performance effect of virtualization that I'd expect to see is that there are more processing needed to run the OS-code [because there's extra work involved in monitoring the virtual machines]. That plus any other virtual machine sharing the same hardware, will be the only reasons it's running slower on a VMWare-based machine than on the "average" machine. Of course, the code inside the VMM and the OS, will use L1 and L2 caches, and thus "throw out" some useful data from the caches, which will have to be reloaded when the application needs it, but that will be ONLY when the VMM is running, and hopefully the overhead of running VMM is only a few percent of the total CPU capacity available.


I am still looking (both inside and outside of VMWARE) for that one best crunching platform.. I am thinking that it's going to turn out to be some form of AMD Hammer processor with disabled L2 caches and 32 bit OS code, but, I will keep at my efforts with as open a mind as possible..


You absolutely, definitely should NOT run with L2-caches disabled - they do help, whatever you're doing, and if you turn L2-caches off, you'll definitely loose between 50 and 90% of the system performance, depending on how much of your data/code-accesses are missing in the cache when the L2 cache is enabled. There are very few special cases where caches will make the system run slower (one of those cases is if you're copying HUGE amounts of data, more than half the L2 cache-size or so from one location in memory to another, and not actually processing the data inbetween. Since none of these known "cache is slower than without" is NOT what DC applications do most of the time, we'll ignore that special case).

AMD's processors have better memory access speeds when you have L2-cache-miss, because there's an internal memory controller and when the processor sees that the data is not in L1-cache, it will look in the L2 cache and talk to the memory controller about getting the data read from memory at the same time - if the L2 cahce has the data, it will then tell memory controller "Oh, found it, you can go on to something else", but if it misses in the L2 cache, it's 10 or so cycles into the actual memory fetch - that's 10 cycles at 2+GHz, so not memory bus cycles, of course...

--
Mats
ID: 30107 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 30221 - Posted: 29 Oct 2006, 9:59:02 UTC

I believe there will not be much choice as to what VMware you get since the HP's come with the VMWare ESX ;-)
http://h18004.www1.hp.com/products/servers/software/vmware/vmwarecert-bl.html




Team mauisun.org
ID: 30221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 30260 - Posted: 30 Oct 2006, 0:47:25 UTC

Oh.. I guess I could run Microsoft's Virtual Server. But yes, I am going to run ESX 3.X.

I think I will setup Linux client on some, and Windows on others just to see which works the best. Then I will switch them all to the "best for this instance" client.

ID: 30260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30290 - Posted: 30 Oct 2006, 14:55:14 UTC - in response to Message 30107.  
Last modified: 30 Oct 2006, 15:10:31 UTC

[quote]Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?



As far as number crunching... I would not expect a lot. Data sets for BOINC nearly always exceed available cache and force cache-data-miss(es).. you will be accessing main memory for your data. I have had mixed performance under VMWARE.. I have played with partition caches and system caches and still don't feel I have maximized performance yet. But, you might be surprised how well it runs. Don't take my information as a deterrent. Give it a shot.. Who knows, it could be my platform that sucks!!


Sorry to say this, but now you're either confused or "talking rubbish". Rosetta will have L2-cache-misses if you have less than 512KB of cache in the processor (or you're running unusually large models).

Partition caches and system caches have ABSOLLUTELY NOTHING to do with the processor caches, however, so you're probably confusing some other forms of caches with the processor caches - perhaps it's something to do with the "binary translation" that VMWare uses to run the "strange x86-instructions" in virtual machines?

The only real performance effect of virtualization that I'd expect to see is that there are more processing needed to run the OS-code [because there's extra work involved in monitoring the virtual machines]. That plus any other virtual machine sharing the same hardware, will be the only reasons it's running slower on a VMWare-based machine than on the "average" machine. Of course, the code inside the VMM and the OS, will use L1 and L2 caches, and thus "throw out" some useful data from the caches, which will have to be reloaded when the application needs it, but that will be ONLY when the VMM is running, and hopefully the overhead of running VMM is only a few percent of the total CPU capacity available.


Mats:
You don't have to be sorry. Grab O'reilly's book "High Performance Computing" ... and do a bit more reading. Sometimes L1 is not L1.. (in the case of RISC cores emulating x86 instructions instead of native).. And try a test of a number crunch with L2 disabled.. if you have a system that can do it.. Yes, the O.S. response will be painful slow, but, you might get a shock from the crunching!! VMWARE emulates memory caching per partition as well (at least one version I have does, perhaps this could be a BETA and not production, if so.. sorry) And, until VMWARE or similar become public domain, we are not going to know what they emulate and what they try with native instructions. (They do have a choice on an x86 CPU -- emulators on non related processors can be easier.. everything has to be emulated!!) ... Finally, I have one system here that has close coupled MP cores and more (and larger) caches than anything I have ever seen. While this machine is absolutely the best general data processor per GHZ I have, the caches do not seem to help the crunching.

The best performance cruncher I have is a large L1 AMD Hammer with a BIOS that does not recognize the processor and mis-configures the L2... about 30% better per GHZ than anything else... It reports a 128K L2 cache, but, it's really missing the L2 and repeating the L1 information. The L1 cache keeps the system together fine and I only get 1 stage of cache miss overhead on data.

BTW... I only experiment with *NIX'en.. no Windows... your mileage may, of course, vary..





Looking for a team ??? Join BoincSynergy!!


ID: 30290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 30295 - Posted: 30 Oct 2006, 16:50:26 UTC - in response to Message 30290.  

[quote]Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?



As far as number crunching... I would not expect a lot. Data sets for BOINC nearly always exceed available cache and force cache-data-miss(es).. you will be accessing main memory for your data. I have had mixed performance under VMWARE.. I have played with partition caches and system caches and still don't feel I have maximized performance yet. But, you might be surprised how well it runs. Don't take my information as a deterrent. Give it a shot.. Who knows, it could be my platform that sucks!!


Sorry to say this, but now you're either confused or "talking rubbish". Rosetta will have L2-cache-misses if you have less than 512KB of cache in the processor (or you're running unusually large models).

Partition caches and system caches have ABSOLLUTELY NOTHING to do with the processor caches, however, so you're probably confusing some other forms of caches with the processor caches - perhaps it's something to do with the "binary translation" that VMWare uses to run the "strange x86-instructions" in virtual machines?

The only real performance effect of virtualization that I'd expect to see is that there are more processing needed to run the OS-code [because there's extra work involved in monitoring the virtual machines]. That plus any other virtual machine sharing the same hardware, will be the only reasons it's running slower on a VMWare-based machine than on the "average" machine. Of course, the code inside the VMM and the OS, will use L1 and L2 caches, and thus "throw out" some useful data from the caches, which will have to be reloaded when the application needs it, but that will be ONLY when the VMM is running, and hopefully the overhead of running VMM is only a few percent of the total CPU capacity available.


Mats:
You don't have to be sorry. Grab O'reilly's book "High Performance Computing" ... and do a bit more reading. Sometimes L1 is not L1.. (in the case of RISC cores emulating x86 instructions instead of native).. And try a test of a number crunch with L2 disabled.. if you have a system that can do it.. Yes, the O.S. response will be painful slow, but, you might get a shock from the crunching!! VMWARE emulates memory caching per partition as well (at least one version I have does, perhaps this could be a BETA and not production, if so.. sorry) And, until VMWARE or similar become public domain, we are not going to know what they emulate and what they try with native instructions. (They do have a choice on an x86 CPU -- emulators on non related processors can be easier.. everything has to be emulated!!) ... Finally, I have one system here that has close coupled MP cores and more (and larger) caches than anything I have ever seen. While this machine is absolutely the best general data processor per GHZ I have, the caches do not seem to help the crunching.

The best performance cruncher I have is a large L1 AMD Hammer with a BIOS that does not recognize the processor and mis-configures the L2... about 30% better per GHZ than anything else... It reports a 128K L2 cache, but, it's really missing the L2 and repeating the L1 information. The L1 cache keeps the system together fine and I only get 1 stage of cache miss overhead on data.

BTW... I only experiment with *NIX'en.. no Windows... your mileage may, of course, vary..



Haven't got O'Reilly's book at hand at the moment.

There's only one single cause that I can think of where HPC calculations are worse due to L2 cache being used, and that is if there's "cache-thrashing". It is a well-known fact that if you walk through a huge amount of memory without actually using it much, not only does the cache not add to the performance, but it wil also reduce the performance because every eviction of a cache-line is causing a write to the memory location which it held, a read of the new cache-line (which if you're writing to the rest of that cache-line is completely unnecessary, since it will be overwritten with new data).

This is why you should write code that resembles for example the "stream" benchmark such that it works on a small buffer (pre-load the data, calculate it to a temporary buffer) and write it out to memory using uncached writes (movnt* instructions). Many times this will improve performance by around 40% compared to a well-written original code - but of course only if the data-set is bigger than about half the L2 cache-size.

However, for the application we're discussing here, Rosetta, L2 cache is definitely worth having - as it operates almost without misses if you have a decent L2-cache, as evidenced by the fact that bigger L2-cache processors of the same speeed (Athlon64 vs. Sempron for example) get better credit based in the new credit system.

I very much doubt that the OS will differ much here - I use both Linux and Windows and I've written high-performance code for both, and I've not noticed any big differences (except that you use different compilers that produce different code and thus are more or less good at any particular problem - both have strong and weak points).

I'll see if I can make one of my systems disable the L2 cache somehow - it would be fun for a play...

--
Mats

ID: 30295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30298 - Posted: 30 Oct 2006, 18:30:40 UTC - in response to Message 30295.  
Last modified: 30 Oct 2006, 18:33:01 UTC

[quote]Has anyone ran the BOINC client on a VM Windows Server?

The reason I am asking is that we are going to (soon) put in around 24 Blades running VMWare for a Dev environment for something. I was wondering at night, while nothing is going on, whether BOINC would run well enough to worry about installing seperate instances of a server on each blade. I figure they would either have the hardware to themselves at night, or be running with maybe one or two other VMs. The Plan right now is to use 2 Proc, 16GB RAM servers.

Has anyone that has ran Boinc on a virtual machine?



As far as number crunching... I would not expect a lot. Data sets for BOINC nearly always exceed available cache and force cache-data-miss(es).. you will be accessing main memory for your data. I have had mixed performance under VMWARE.. I have played with partition caches and system caches and still don't feel I have maximized performance yet. But, you might be surprised how well it runs. Don't take my information as a deterrent. Give it a shot.. Who knows, it could be my platform that sucks!!


Sorry to say this, but now you're either confused or "talking rubbish". Rosetta will have L2-cache-misses if you have less than 512KB of cache in the processor (or you're running unusually large models).

Partition caches and system caches have ABSOLLUTELY NOTHING to do with the processor caches, however, so you're probably confusing some other forms of caches with the processor caches - perhaps it's something to do with the "binary translation" that VMWare uses to run the "strange x86-instructions" in virtual machines?

The only real performance effect of virtualization that I'd expect to see is that there are more processing needed to run the OS-code [because there's extra work involved in monitoring the virtual machines]. That plus any other virtual machine sharing the same hardware, will be the only reasons it's running slower on a VMWare-based machine than on the "average" machine. Of course, the code inside the VMM and the OS, will use L1 and L2 caches, and thus "throw out" some useful data from the caches, which will have to be reloaded when the application needs it, but that will be ONLY when the VMM is running, and hopefully the overhead of running VMM is only a few percent of the total CPU capacity available.


Mats:
You don't have to be sorry. Grab O'reilly's book "High Performance Computing" ... and do a bit more reading. Sometimes L1 is not L1.. (in the case of RISC cores emulating x86 instructions instead of native).. And try a test of a number crunch with L2 disabled.. if you have a system that can do it.. Yes, the O.S. response will be painful slow, but, you might get a shock from the crunching!! VMWARE emulates memory caching per partition as well (at least one version I have does, perhaps this could be a BETA and not production, if so.. sorry) And, until VMWARE or similar become public domain, we are not going to know what they emulate and what they try with native instructions. (They do have a choice on an x86 CPU -- emulators on non related processors can be easier.. everything has to be emulated!!) ... Finally, I have one system here that has close coupled MP cores and more (and larger) caches than anything I have ever seen. While this machine is absolutely the best general data processor per GHZ I have, the caches do not seem to help the crunching.

The best performance cruncher I have is a large L1 AMD Hammer with a BIOS that does not recognize the processor and mis-configures the L2... about 30% better per GHZ than anything else... It reports a 128K L2 cache, but, it's really missing the L2 and repeating the L1 information. The L1 cache keeps the system together fine and I only get 1 stage of cache miss overhead on data.

BTW... I only experiment with *NIX'en.. no Windows... your mileage may, of course, vary..



Haven't got O'Reilly's book at hand at the moment.

There's only one single cause that I can think of where HPC calculations are worse due to L2 cache being used, and that is if there's "cache-thrashing". It is a well-known fact that if you walk through a huge amount of memory without actually using it much, not only does the cache not add to the performance, but it wil also reduce the performance because every eviction of a cache-line is causing a write to the memory location which it held, a read of the new cache-line (which if you're writing to the rest of that cache-line is completely unnecessary, since it will be overwritten with new data).

This is why you should write code that resembles for example the "stream" benchmark such that it works on a small buffer (pre-load the data, calculate it to a temporary buffer) and write it out to memory using uncached writes (movnt* instructions). Many times this will improve performance by around 40% compared to a well-written original code - but of course only if the data-set is bigger than about half the L2 cache-size.

However, for the application we're discussing here, Rosetta, L2 cache is definitely worth having - as it operates almost without misses if you have a decent L2-cache, as evidenced by the fact that bigger L2-cache processors of the same speeed (Athlon64 vs. Sempron for example) get better credit based in the new credit system.

I very much doubt that the OS will differ much here - I use both Linux and Windows and I've written high-performance code for both, and I've not noticed any big differences (except that you use different compilers that produce different code and thus are more or less good at any particular problem - both have strong and weak points).

I'll see if I can make one of my systems disable the L2 cache somehow - it would be fun for a play...

--
Mats


Well.... My comments were very generalized and not Rosetta specific...

Now ... I am wondering why my Sempron 3000+ (misconfigured L2) gets better credit than my Althon-64 3700+ ... by quite a bit... And that is with Rosetta...

Perhaps I am fighting a classical education here??... My training is in hardware anyway...

I will take what you said under advisement and see if any of it plays out in the future.. I would rather not shut my external caches off if I can help it..

BTW.. I checked the emulated BIOS from all my VMWARE versions and they all have the CACHE emulation in them...





Looking for a team ??? Join BoincSynergy!!


ID: 30298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30313 - Posted: 30 Oct 2006, 20:12:21 UTC
Last modified: 30 Oct 2006, 20:13:48 UTC

Mats:

See if you can make sense of this one... I would be curious as to what you think... And no, neither machine is the odd Sempron...

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2495#30184


Looking for a team ??? Join BoincSynergy!!


ID: 30313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 30372 - Posted: 31 Oct 2006, 15:32:27 UTC

L2-cache off result: 6.8 credits per hour.
Average for 40 samples (including the above result): 13.02.
In the last few days I also got another low result, so it may just be "bad luck".

I confirmed that my hack to disable the L2 cache works by measuring the speed of a memcpy() with varying size of data, and it's about half as fast as the corresponding memcpy on a comparable machine without L2-disabling hack.

As to me being "classical education", no, I'm not educated at all - at least not in computer architecture or such - I've just worked closely to processors and such for a long time - on the software side, but close to the hardware (often working with actual hardware/software interface and writing code to excercise some particular hardware interface or features).

I'll leave this machine running without L2-cache for a little longer, to see what happens.

--
Mats
ID: 30372 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 30376 - Posted: 31 Oct 2006, 16:46:58 UTC

I would like to close this thread.

Thanks.....
ID: 30376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 30382 - Posted: 31 Oct 2006, 17:46:01 UTC - in response to Message 30372.  

L2-cache off result: 6.8 credits per hour.
Average for 40 samples (including the above result): 13.02.
In the last few days I also got another low result, so it may just be "bad luck".

I confirmed that my hack to disable the L2 cache works by measuring the speed of a memcpy() with varying size of data, and it's about half as fast as the corresponding memcpy on a comparable machine without L2-disabling hack.

As to me being "classical education", no, I'm not educated at all - at least not in computer architecture or such - I've just worked closely to processors and such for a long time - on the software side, but close to the hardware (often working with actual hardware/software interface and writing code to excercise some particular hardware interface or features).

I'll leave this machine running without L2-cache for a little longer, to see what happens.

--
Mats


The classical education was referring to me. I learned Microprocessor in the 1980's...

Looking for a team ??? Join BoincSynergy!!


ID: 30382 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 30387 - Posted: 31 Oct 2006, 19:57:37 UTC - in response to Message 30376.  

I would like to close this thread.

Thanks.....



Jaxom1, hit the Unsubscribe from this thread at the top and it'll stop you getting emails.
Then you can forget about it unless the mods split the thread up. That is if you are subsribed.
Team mauisun.org
ID: 30387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30395 - Posted: 31 Oct 2006, 22:29:45 UTC - in response to Message 30376.  

I would like to close this thread.

Thanks.....


Hi Jaxom1

As you opened the thread, you can declare it closed. Whether people will listen is another thing, there is no way to stop the converstaion continuing.

This is what you do:

1. add a post to the thread (traditionally saying "thread closed").

2. on this posting, click Edit this post. This must be done within one hour.

3. As you are the owner of the thread, you not only get the chance to change the post, you can also change the thread title. People often put "Thread closed" in the title to make it clear.

If after all that the converstion continues, just ignore it.

Or, even easier, just walk away and ignore the thread and let it continue on its own under the current title. That is up to you.

R~~
ID: 30395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Boinc on VMWare?



©2024 University of Washington
https://www.bakerlab.org