64-Bit Rosetta?

Message boards : Number crunching : 64-Bit Rosetta?

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile XeNO

Send message
Joined: 21 Jan 06
Posts: 9
Credit: 109,466
RAC: 0
Message 15242 - Posted: 2 May 2006, 3:58:12 UTC

Are there any plans for a version of Rosetta that will take advantage of the huge performance benefit in using 64-bit processing? I can only think that good things would come of the project, especially since your AMD users would get the full bang for their buck if they're running a 64-bit OS.


ID: 15242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15244 - Posted: 2 May 2006, 4:07:45 UTC - in response to Message 15242.  

Are there any plans for a version of Rosetta that will take advantage of the huge performance benefit in using 64-bit processing? I can only think that good things would come of the project, especially since your AMD users would get the full bang for their buck if they're running a 64-bit OS.


At this time the project is preparing for the CASP7 competitions which start in 1 week and run all summer. There are no plans to implement a new version for 64 bit processors of which I am aware during that time. If you use the search function for the forums, you will find this issue discussed in previous threads.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15244 · Rating: -0.99999999999999 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ethan
Volunteer moderator

Send message
Joined: 22 Aug 05
Posts: 286
Credit: 9,304,700
RAC: 0
Message 15245 - Posted: 2 May 2006, 4:10:38 UTC - in response to Message 15242.  

I'm only going off an old post about R@H's code, but they said most calculations are 'single precision floats'. . . which I take to mean 32-bit floating point operations. If their code is 32 bit, and they aren't having problems with rounding errors (I forget the geek term). . then does a 64 bit .exe gain the project anything other than >4gb memory allocation (try putting that on the hardware requirement page :)

-E
ID: 15245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15246 - Posted: 2 May 2006, 4:36:30 UTC - in response to Message 15245.  

I'm only going off an old post about R@H's code, but they said most calculations are 'single precision floats'. . . which I take to mean 32-bit floating point operations. If their code is 32 bit, and they aren't having problems with rounding errors (I forget the geek term). . then does a 64 bit .exe gain the project anything other than >4gb memory allocation (try putting that on the hardware requirement page :)

-E

Well the project did some testing on this a while back and posted a news item about it. But I can't find the thing to link it here. In any case a full site search for "64 Bit" will provide a lot of reading on this subject going back over 7 or 8 months.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15246 · Rating: -3 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19022 - Posted: 20 Jun 2006, 23:46:48 UTC
Last modified: 21 Jun 2006, 0:05:32 UTC




You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory.

Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power?

Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack.

It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor.

A 32bit processor can provide 32 bytes at most, using all its general purpose registers.

A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space.

Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain.


I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting).

I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it?
ID: 19022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 19038 - Posted: 21 Jun 2006, 7:46:00 UTC

The way a 64bit app can actually be slower than a 32bit app is rooted in memory access. Since instructions and data are processed in 64bit chunks that means twice as much memory access is needed per cycle. In most cases the efficiency of the operation more than makes up for memory overhead.

I don't think there is a stigma against 64bit processors, however there is a constant battle to convince less techniclly inclined people that double the bits does not mean double the performance. The floating point unit is unchanged in the current 64bit processors compared to 32bit processors. Since most of the processing takes place in the FPU the only place for gain is in more effcient feeding of the FPU.
BOINC WIKI

BOINCing since 2002/12/8
ID: 19038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 19045 - Posted: 21 Jun 2006, 8:53:06 UTC

Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client.

It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either.


ID: 19045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
skutnar

Send message
Joined: 31 Oct 05
Posts: 8
Credit: 133,153
RAC: 0
Message 19055 - Posted: 21 Jun 2006, 12:31:15 UTC - in response to Message 19022.  



You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory.

Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power?

Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack.

It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor.

A 32bit processor can provide 32 bytes at most, using all its general purpose registers.

A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space.

Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain.


I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting).

I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it?


From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements.
ID: 19055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 19077 - Posted: 21 Jun 2006, 19:59:14 UTC - in response to Message 19022.  


I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting).

I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it?


As other's have said, more bits does not equal more speed.

If you have a project that does a ton of pointer dereferencing (E@H ???) it's quite likely to slow down on a 64 bit system, because each pointer access that isn't in D$ means twice the memory bandwidth to fetch the data.

I'm sufficiently long in the tooth that I can remember the exact same story, 15 years ago or so when we jumped from 16 to 32 bit. Everyone figured that 32 bits was guaranteed to fly, and there were quite a few red faces when 16 bit versions ran faster because they had a smaller memory footprint.
ID: 19077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 19162 - Posted: 23 Jun 2006, 14:33:16 UTC

As someone who has worked on the transition from 32-bit to 64-bit processors for some time (like since before the Athlon64/Opteron was released to the public).

You're _ALL_ correct.

An application that actively makes use of the further 32-bits in the register layout will benefit from 64-bit registers. Most current 32-bit applications do not automatically do this... Simply because most code just stores numbers in a variable, and if that variable is just BIG ENOUGH (16, 32 or 64-bit) the code works just fine, and bigger variables (or registers) don't add anything.

The x86_64 architecture also adds further registers, and this DOES help in quite a few cases, because the compiler can avoid storing things in memory, and just keep the same thing in a register for a longer period of time. It also allows more effective passing of arguments from one function to another.

However, bigger registers can also be a drawback - particularly for pointers. A pointer is now 8 bytes (64-bit) instead of 4 bytes (32-bit), so the cache can hold only half as many pointers. So for pointer-intensive code (such as binary trees or linked lists) where lots of pointers are being read/written, the cache becomes fulll much sooner.

Finally, there was the comment that code-size makes a difference, and yes it does. However, at least for x86_64, the code-size for a 64-bit application isn't much larger than the same application in 32-bit compiled with the same (or very similar) compiler. This is because most of the code in a 64-bit app will actually use 32-bit integers, and 32-bit operations are exactly identical in the 32-bit and 64-bit binary. There will be longer instructions for 64-bit operations, but this is to a large extent compensated (or even gained over) by the higher number of registers available and thus the reduced amount of memory read/write operations needed to store/restore the registers when the compiler runs out of actual registers to keep values in.

Of course, all of this is very dependand on the actual application.

I would suspect that the biggest gain for Rosetta would be to compile it to use SSE instructions, rather than to make it 64-bit. This is because it's a primarly floating point-intensive application, and it uses 32-bit single precision floats, so SSE would make sense. On AMD processors, it may be that 3DNow! instructions would be even faster - but they are limited by the lack of some operations, and the gain over SSE would be small - the main gain would be from the fact that two operands can be processed at once, whilst the four operands that SSE provides will essentially be forced into a two-step operation - and although for the same amount of work, two separate operations would be needed for the 3DNow! code, the two separate instructions do not need to be contiguous, so the processor can perform them out-of-order, which can sometimes improve the performance... Marginally, tho'. Having one binary for SSE on AMD and Intel would reduce the amount of different code-bases to support and bug-fix, so that would be my suggestion.

Note also that very few compilers generate decent code for SSE in actual super-scalar mode. There's been a vectorization project for GCC, but it's still not ready for prime-time use. So it would probably require a bit of hand-coding to get anywhere close to ideal performance.

--
Mats
ID: 19162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19198 - Posted: 24 Jun 2006, 7:47:48 UTC - in response to Message 19162.  
Last modified: 24 Jun 2006, 7:54:22 UTC


From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements.


the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with:

// Passing values to a function.
mov rax, [offset FunctionArg2]
mov eax, [offset FunctionArg1]
// I want to pass two values to a function.
mov rax, ecx
shl rax, 32
mov eax, edx

Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. =


However, bigger registers can also be a drawback - particularly for pointers. A pointer is now 8 bytes (64-bit) instead of 4 bytes (32-bit), so the cache can hold only half as many pointers. So for pointer-intensive code (such as binary trees or linked lists) where lots of pointers are being read/written, the cache becomes fulll much sooner.


Thats why you have 32 bit instructions for pointer access? I mean duh? Huh?
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
Another rigid rule that makes ever who reads your post will think that 64 bit is not better, or did you just word it wrong? wtf
You don't use a 64 bit pointer if you do not need to.. = Now, I bet alot of people are jumping around going oh no, you are wrong! OS calls are using 64bit pointers, and and the heap may even allocate things out of reach by a 32bit pointer.. I mean this is all trival fixable things. Writting a simple heap arithogram takes long a little time and effort.. =


I would suspect that the biggest gain for Rosetta would be to compile it to use SSE instructions, rather than to make it 64-bit. This is because it's a primarly floating point-intensive application, and it uses 32-bit single precision floats, so SSE would make sense.

I suspect you have also realized that Rosetta@Home is also a primarly data processing application too right? Data storage inside the processor, makes a big difference.


ote also that very few compilers generate decent code for SSE in actual super-scalar mode. There's been a vectorization project for GCC, but it's still not ready for prime-time use. So it would probably require a bit of hand-coding to get anywhere close to ideal performance.

And, this was exactly why I tried to argue about just setting a compiler flag in another post of mine. =|

And by god, this is not the same problem as it was 15 years ago!



ID: 19198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19242 - Posted: 24 Jun 2006, 21:54:25 UTC
Last modified: 24 Jun 2006, 21:58:49 UTC


Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client.

It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either.



(I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.')

Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower..

If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it.

Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas.


In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =)

ID: 19242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 19255 - Posted: 24 Jun 2006, 23:32:17 UTC - in response to Message 19242.  


Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client.

It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either.



(I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.')

Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower..

If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it.

Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas.


In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =)


Distributed Folding spent around 90% of it's time going through pointers - and is the reason the developers were never interested in having it optimized for SSE, 3Dnow, or Altivec. You've already stated that there is no speed benefit from using 64 bit pointers in such cases and the pointers would have to be hard coded to 32 bit. Where is the benefit of hand optimization to 64 bit - that the developers had neither experience or time for - going to have come from? The point is not that all 64 bit apps are slower than 32 bit apps - but that 64 bits DOES NOT speed up every application as the 64 bit fanboys keep claiming.

If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here.

Let's can the conspiracy theories - and move back to the frightening world of reality, shall we? Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version.
ID: 19255 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Travis DJ

Send message
Joined: 2 May 06
Posts: 10
Credit: 537,572
RAC: 0
Message 19256 - Posted: 24 Jun 2006, 23:45:33 UTC

This is almost over my head..but from what I know..

FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than x64) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register.

What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code).

So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :)

ID: 19256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Travis DJ

Send message
Joined: 2 May 06
Posts: 10
Credit: 537,572
RAC: 0
Message 19258 - Posted: 24 Jun 2006, 23:45:38 UTC

This is almost over my head..but from what I know..

FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than making a pure x64 binary) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register and it wouldn't require a rewrite.

What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code).

So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :)

ID: 19258 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19266 - Posted: 25 Jun 2006, 2:11:03 UTC - in response to Message 19256.  
Last modified: 25 Jun 2006, 2:38:48 UTC


So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :)


The FPU, and the GPRs are two seperate things. The extra GPRs are a gain that x64 provides, the x87 FPU aparently in my knowledge remains almost the same in long mode. The processor stores data in three ways each ordered in speed. The FPU in long mode should behave exactly as it did in protected mode as far as presentation, the performance I do not know for sure.

I have also read somewhere that x87 instructions in long mode should be avoided? Has anyone else read this, even so?


1. REGISTERS
2. CACHE
3. MEMORY

X86 or 32bit code has 8 general purposes registers. Programs TRY to keep as much as possible in these registers while it is processing. However, due to the large number of varibles and/or size of data structors it becomes impossible. The compiler then generates instructions for swapping data from registers back out to memory/program-stack while it loads other data for current processing. Giving the program access to 8 more general purposes registers allows the processor to not have to swap as much between its registers and memory. The registers are faster than cache! The cache is almost always controled by the CPU from my knowledge, although instructions exist to manipulate it. The AMD64 Althon has a prefetch instruction somewhere that allows more detailed manipulation of the cache, but I do not know much about this.

In the case of large loops that perform many functions calls in the core of the loop, a performance gain could be made for functions that are forced to place arguments onto the stack because not enough registers are free OR core code can make use of registers, and possibly not change the functions calling convention.

There should be no worldy reason why no application in the universe could not benifit from x64 in someway no MATTER HOW SMALL, at the VERY LEAST.
I made those words in capital letters for those that can not READ. The conservatives!


You've already stated that there is no speed benefit from using 64 bit pointers in such cases and the pointers would have to be hard coded to 32 bit.

Yes, I did say that.


Where is the benefit of hand optimization to 64 bit - that the developers had neither experience or time for - going to have come from? The point is not that all 64 bit apps are slower than 32 bit apps - but that 64 bits DOES NOT speed up every application as the 64 bit fanboys keep claiming.

Now, I'm going to stretch the limits of sanity apon what amount of performance gain could be had from using long mode instructions by telling you that any program will run faster using x64 mode, and the most easiest way to prove that is to take one function call in the core code of the application that pushes one or more arguments onto the stack and allow it to use one of the extra GPRs instead of pushing a argument onto the stack. The AMD64 documentation already specifies that it is faster to peform a move rather than a push or pop.

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf Section 3.10.9



And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here.

I am sorry, even by ignoring the penil enchacement movtivation part, what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol


Let's can the conspiracy theories - and move back to the frightening world of reality, shall we?

Alright.. I am back to reality! Sorry for skipping out what do you need?


Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version.

Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time?

I wanted to claify that x64 or long mode is not proposed by me to speed up floating point operations, but can speed up basic data handling and storage operations. In case some people are becoming confused. =
ID: 19266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 19295 - Posted: 25 Jun 2006, 22:32:37 UTC

Leonard Kevin Mcguire Jr. stated:
In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =)


I replied
If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing (the) copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here.


After removing the stated justification for the stated delusional conspiracy theory (64 bit envy by those that don't have a 64 bit cpu) - he moves on to this statement:

what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol

You imply/claim I posted my knowledge of negative things about 64 bit compilations of DC apps - because I don't have a 64 bit cpu. Clicking on my name will allow you to see that the machine that has crunched my 42k credits on Rosetta IS an Athlon 64 cpu.

If you click here: https://boinc.bakerlab.org/rosetta/result.php?resultid=25537070
you'll notice that I'm running the standard 5.4.9 Boinc client; not an optimized client. If I was worried about people getting 1.5 to 2.0 times my score for the same performance - I could run an optimized Boinc client. Also, keep in mind the fact that at present, even if you have a Rosetta, SETI, etc client that produces 10 or 100 times the amount of work as the default Rosetta/SETI client, you'll get the exact same amount of credit as someone else that spent the same amount of time with the same Boinc client (with identical overall benchmarks).

Your opinion about my motivations in posting my knowledge of a 64 bit compilation of a DC app is quite obviously erroneous.


BennyRop stated:
Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version.


Leonard the 64 bit Fanboy replied:
[quote]Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time?[quote]

My statement, "Let's can the conspiracy theories - and move back to the frightening world of reality, shall we?" was in regards to ... your rediculous conspiracy theory.

In the off chance that your 64 bit optimization skills are better than your finding logical justifications based on reality for a conspiracy theory - I challenged you to show what can be done for performance by hand optimizing an older Rosetta client for 64 bit code.

Back when WinNT 3.0 came out as a $69 package with the SDK, I tried it out. The example fractal screen saver took over a minute to create the first image on my system. By hand coding the inner loop of the program into assembler that made use of my math co processor, I managed to get the program to create the first image in say.. 30 seconds. Compiling the original code with the compiler optimization flags turned on.. got the program to within 2 seconds of my hand coded assembler version. (The compiler optimizations were probably faster than my version.. although I don't remember.) Regardless of which was faster - my coding was a waste of effort for that project. Good for education, but a waste of effort, especially as I never used that screensaver again. :)

The point being that while it is possible to hand optimize a client and possibly improve the performance of that client - the effort in maintaining that client may not be justified by the performance gain. In which case, Rosetta would be better off putting 32 bit clients in the 64 bit client directories.

Keep in mind that I'm not claiming that Rosetta's client's code mix is identical or even similar to DF's. Prove that there's more than a 10% performance increase in moving to a 64 bit client and get Rosetta to produce a 64 bit client from the current code, and I'll start dual booting again.
ID: 19295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19298 - Posted: 25 Jun 2006, 22:51:41 UTC
Last modified: 25 Jun 2006, 22:56:05 UTC


The point being that while it is possible to hand optimize a client and possibly improve the performance of that client - the effort in maintaining that client may not be justified by the performance gain. In which case, Rosetta would be better off putting 32 bit clients in the 64 bit client directories.


Yes it is possible to hand optimize a client, and possibly improve the performance. Yes, the effort may not be justified by the performance gain. In which case Rosetta would be better off putting 32 bit clients in 64 bit directories. I agree.


Prove that there's more than a 10% performance increase in moving to a 64 bit client and get Rosetta to produce a 64 bit client from the current code, and I'll start dual booting again.

No problems. You are welcome to what threshold of a performance increase that is needed to deem it worthy to be used. =)


Regardless of which was faster - my coding was a waste of effort for that project. Good for education, but a waste of effort, especially as I never used that screensaver again. :)

The screensaver and Rosetta@Home are two distinct goals in science. The first was for fun, and I have no idea what role it played in advancing science.


It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either.

I still remember this quote. If you understood the implications of merely compiling something in 64 bit that was designed for a 32 bit enviroment. Why did you even post to this thread when you knew the entire topic is if a 64 bit build could have performance improvments worthy of use? Thats why it seemed like a politcal ploy for a reader who is not technicaly minded just to rummage through and go, "Oh.. Hmm. So 64 bit is not better." And, of course above that you gave information relating to the statistics. That sounded kind of like a news channel releasing a presidential approval rating (Yet they really got there statistics/percentages from a biased group.).

The Orginal BennyRop Post:

Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client.

It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either.


No, DUH! Use int and get a 64 bit integer, and increase the cache needed. You should work for a news channel! =)

ID: 19298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
skutnar

Send message
Joined: 31 Oct 05
Posts: 8
Credit: 133,153
RAC: 0
Message 19303 - Posted: 26 Jun 2006, 3:26:18 UTC - in response to Message 19198.  


From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements.


the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with:

// Passing values to a function.
mov rax, [offset FunctionArg2]
mov eax, [offset FunctionArg1]
// I want to pass two values to a function.
mov rax, ecx
shl rax, 32
mov eax, edx

Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. =


I think you either misread or misunderstood my statement. I'm saying that in general, there is no direct correlation between the bitness of an application and/or of a CPU and the performance seen. I don't understand how that can be considered to be an "inflexible rule".
ID: 19303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 19317 - Posted: 26 Jun 2006, 12:42:15 UTC

Nice discussion! ;-)

Would anybody of you be willing to actually have a look on the Rosetta source and make an educated guess whether optimizing for 64 BIT (and for SSE etc.) could lead to a significant performance gain? That might soon be possible. If yes, please email me: joachim@iwanuschka.de

regards

Joachim
ID: 19317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : 64-Bit Rosetta?



©2024 University of Washington
https://www.bakerlab.org