64-Bit Rosetta?

Author	Message
XeNO Send message Joined: 21 Jan 06 Posts: 9 Credit: 109,466 RAC: 0	Message 15242 - Posted: 2 May 2006, 3:58:12 UTC Are there any plans for a version of Rosetta that will take advantage of the huge performance benefit in using 64-bit processing? I can only think that good things would come of the project, especially since your AMD users would get the full bang for their buck if they're running a 64-bit OS. ID: 15242 · Rating: 0 · rate: / Reply Quote

Ethan Volunteer moderator Send message Joined: 22 Aug 05 Posts: 286 Credit: 9,304,700 RAC: 0	Message 15245 - Posted: 2 May 2006, 4:10:38 UTC - in response to Message 15242. I'm only going off an old post about R@H's code, but they said most calculations are 'single precision floats'. . . which I take to mean 32-bit floating point operations. If their code is 32 bit, and they aren't having problems with rounding errors (I forget the geek term). . then does a 64 bit .exe gain the project anything other than >4gb memory allocation (try putting that on the hardware requirement page :) -E ID: 15245 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19022 - Posted: 20 Jun 2006, 23:46:48 UTC Last modified: 21 Jun 2006, 0:05:32 UTC You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory. Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power? Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack. It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor. A 32bit processor can provide 32 bytes at most, using all its general purpose registers. A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space. Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain. I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting). I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it? ID: 19022 · Rating: 0 · rate: / Reply Quote

Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0	Message 19038 - Posted: 21 Jun 2006, 7:46:00 UTC The way a 64bit app can actually be slower than a 32bit app is rooted in memory access. Since instructions and data are processed in 64bit chunks that means twice as much memory access is needed per cycle. In most cases the efficiency of the operation more than makes up for memory overhead. I don't think there is a stigma against 64bit processors, however there is a constant battle to convince less techniclly inclined people that double the bits does not mean double the performance. The floating point unit is unchanged in the current 64bit processors compared to 32bit processors. Since most of the processing takes place in the FPU the only place for gain is in more effcient feeding of the FPU. BOINC WIKI BOINCing since 2002/12/8 ID: 19038 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 19045 - Posted: 21 Jun 2006, 8:53:06 UTC Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. ID: 19045 · Rating: 0 · rate: / Reply Quote

skutnar Send message Joined: 31 Oct 05 Posts: 8 Credit: 133,153 RAC: 0	Message 19055 - Posted: 21 Jun 2006, 12:31:15 UTC - in response to Message 19022. You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory. Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power? Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack. It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor. A 32bit processor can provide 32 bytes at most, using all its general purpose registers. A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space. Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain. I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting). I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it? From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements. ID: 19055 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 19077 - Posted: 21 Jun 2006, 19:59:14 UTC - in response to Message 19022. I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting). I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it? As other's have said, more bits does not equal more speed. If you have a project that does a ton of pointer dereferencing (E@H ???) it's quite likely to slow down on a 64 bit system, because each pointer access that isn't in D$ means twice the memory bandwidth to fetch the data. I'm sufficiently long in the tooth that I can remember the exact same story, 15 years ago or so when we jumped from 16 to 32 bit. Everyone figured that 32 bits was guaranteed to fly, and there were quite a few red faces when 16 bit versions ran faster because they had a smaller memory footprint. ID: 19077 · Rating: 0 · rate: / Reply Quote

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 19162 - Posted: 23 Jun 2006, 14:33:16 UTC As someone who has worked on the transition from 32-bit to 64-bit processors for some time (like since before the Athlon64/Opteron was released to the public). You're _ALL_ correct. An application that actively makes use of the further 32-bits in the register layout will benefit from 64-bit registers. Most current 32-bit applications do not automatically do this... Simply because most code just stores numbers in a variable, and if that variable is just BIG ENOUGH (16, 32 or 64-bit) the code works just fine, and bigger variables (or registers) don't add anything. The x86_64 architecture also adds further registers, and this DOES help in quite a few cases, because the compiler can avoid storing things in memory, and just keep the same thing in a register for a longer period of time. It also allows more effective passing of arguments from one function to another. However, bigger registers can also be a drawback - particularly for pointers. A pointer is now 8 bytes (64-bit) instead of 4 bytes (32-bit), so the cache can hold only half as many pointers. So for pointer-intensive code (such as binary trees or linked lists) where lots of pointers are being read/written, the cache becomes fulll much sooner. Finally, there was the comment that code-size makes a difference, and yes it does. However, at least for x86_64, the code-size for a 64-bit application isn't much larger than the same application in 32-bit compiled with the same (or very similar) compiler. This is because most of the code in a 64-bit app will actually use 32-bit integers, and 32-bit operations are exactly identical in the 32-bit and 64-bit binary. There will be longer instructions for 64-bit operations, but this is to a large extent compensated (or even gained over) by the higher number of registers available and thus the reduced amount of memory read/write operations needed to store/restore the registers when the compiler runs out of actual registers to keep values in. Of course, all of this is very dependand on the actual application. I would suspect that the biggest gain for Rosetta would be to compile it to use SSE instructions, rather than to make it 64-bit. This is because it's a primarly floating point-intensive application, and it uses 32-bit single precision floats, so SSE would make sense. On AMD processors, it may be that 3DNow! instructions would be even faster - but they are limited by the lack of some operations, and the gain over SSE would be small - the main gain would be from the fact that two operands can be processed at once, whilst the four operands that SSE provides will essentially be forced into a two-step operation - and although for the same amount of work, two separate operations would be needed for the 3DNow! code, the two separate instructions do not need to be contiguous, so the processor can perform them out-of-order, which can sometimes improve the performance... Marginally, tho'. Having one binary for SSE on AMD and Intel would reduce the amount of different code-bases to support and bug-fix, so that would be my suggestion. Note also that very few compilers generate decent code for SSE in actual super-scalar mode. There's been a vectorization project for GCC, but it's still not ready for prime-time use. So it would probably require a bit of hand-coding to get anywhere close to ideal performance. -- Mats ID: 19162 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19198 - Posted: 24 Jun 2006, 7:47:48 UTC - in response to Message 19162. Last modified: 24 Jun 2006, 7:54:22 UTC From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements. the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with: // Passing values to a function. mov rax, [offset FunctionArg2] mov eax, [offset FunctionArg1] // I want to pass two values to a function. mov rax, ecx shl rax, 32 mov eax, edx Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. = However, bigger registers can also be a drawback - particularly for pointers. A pointer is now 8 bytes (64-bit) instead of 4 bytes (32-bit), so the cache can hold only half as many pointers. So for pointer-intensive code (such as binary trees or linked lists) where lots of pointers are being read/written, the cache becomes fulll much sooner. Thats why you have 32 bit instructions for pointer access? I mean duh? Huh? http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html Another rigid rule that makes ever who reads your post will think that 64 bit is not better, or did you just word it wrong? wtf You don't use a 64 bit pointer if you do not need to.. = Now, I bet alot of people are jumping around going oh no, you are wrong! OS calls are using 64bit pointers, and and the heap may even allocate things out of reach by a 32bit pointer.. I mean this is all trival fixable things. Writting a simple heap arithogram takes long a little time and effort.. = I would suspect that the biggest gain for Rosetta would be to compile it to use SSE instructions, rather than to make it 64-bit. This is because it's a primarly floating point-intensive application, and it uses 32-bit single precision floats, so SSE would make sense. I suspect you have also realized that Rosetta@Home is also a primarly data processing application too right? Data storage inside the processor, makes a big difference. ote also that very few compilers generate decent code for SSE in actual super-scalar mode. There's been a vectorization project for GCC, but it's still not ready for prime-time use. So it would probably require a bit of hand-coding to get anywhere close to ideal performance. And, this was exactly why I tried to argue about just setting a compiler flag in another post of mine. =\| And by god, this is not the same problem as it was 15 years ago! ID: 19198 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19242 - Posted: 24 Jun 2006, 21:54:25 UTC Last modified: 24 Jun 2006, 21:58:49 UTC Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. (I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.') Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower.. If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it. Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas. In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) ID: 19242 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 19255 - Posted: 24 Jun 2006, 23:32:17 UTC - in response to Message 19242. Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. (I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.') Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower.. If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it. Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas. In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) Distributed Folding spent around 90% of it's time going through pointers - and is the reason the developers were never interested in having it optimized for SSE, 3Dnow, or Altivec. You've already stated that there is no speed benefit from using 64 bit pointers in such cases and the pointers would have to be hard coded to 32 bit. Where is the benefit of hand optimization to 64 bit - that the developers had neither experience or time for - going to have come from? The point is not that all 64 bit apps are slower than 32 bit apps - but that 64 bits DOES NOT speed up every application as the 64 bit fanboys keep claiming. If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here. Let's can the conspiracy theories - and move back to the frightening world of reality, shall we? Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version. ID: 19255 · Rating: 0 · rate: / Reply Quote

Travis DJ Send message Joined: 2 May 06 Posts: 10 Credit: 537,572 RAC: 0	Message 19256 - Posted: 24 Jun 2006, 23:45:33 UTC This is almost over my head..but from what I know.. FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than x64) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register. What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code). So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :) ID: 19256 · Rating: 0 · rate: / Reply Quote

Travis DJ Send message Joined: 2 May 06 Posts: 10 Credit: 537,572 RAC: 0	Message 19258 - Posted: 24 Jun 2006, 23:45:38 UTC This is almost over my head..but from what I know.. FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than making a pure x64 binary) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register and it wouldn't require a rewrite. What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code). So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :) ID: 19258 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19266 - Posted: 25 Jun 2006, 2:11:03 UTC - in response to Message 19256. Last modified: 25 Jun 2006, 2:38:48 UTC So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :) The FPU, and the GPRs are two seperate things. The extra GPRs are a gain that x64 provides, the x87 FPU aparently in my knowledge remains almost the same in long mode. The processor stores data in three ways each ordered in speed. The FPU in long mode should behave exactly as it did in protected mode as far as presentation, the performance I do not know for sure. I have also read somewhere that x87 instructions in long mode should be avoided? Has anyone else read this, even so? 1. REGISTERS 2. CACHE 3. MEMORY X86 or 32bit code has 8 general purposes registers. Programs TRY to keep as much as possible in these registers while it is processing. However, due to the large number of varibles and/or size of data structors it becomes impossible. The compiler then generates instructions for swapping data from registers back out to memory/program-stack while it loads other data for current processing. Giving the program access to 8 more general purposes registers allows the processor to not have to swap as much between its registers and memory. The registers are faster than cache! The cache is almost always controled by the CPU from my knowledge, although instructions exist to manipulate it. The AMD64 Althon has a prefetch instruction somewhere that allows more detailed manipulation of the cache, but I do not know much about this. In the case of large loops that perform many functions calls in the core of the loop, a performance gain could be made for functions that are forced to place arguments onto the stack because not enough registers are free OR core code can make use of registers, and possibly not change the functions calling convention. There should be no worldy reason why no application in the universe could not benifit from x64 in someway no MATTER HOW SMALL, at the VERY LEAST. I made those words in capital letters for those that can not READ. The conservatives! You've already stated that there is no speed benefit from using 64 bit pointers in such cases and the pointers would have to be hard coded to 32 bit. Yes, I did say that. Where is the benefit of hand optimization to 64 bit - that the developers had neither experience or time for - going to have come from? The point is not that all 64 bit apps are slower than 32 bit apps - but that 64 bits DOES NOT speed up every application as the 64 bit fanboys keep claiming. Now, I'm going to stretch the limits of sanity apon what amount of performance gain could be had from using long mode instructions by telling you that any program will run faster using x64 mode, and the most easiest way to prove that is to take one function call in the core code of the application that pushes one or more arguments onto the stack and allow it to use one of the extra GPRs instead of pushing a argument onto the stack. The AMD64 documentation already specifies that it is faster to peform a move rather than a push or pop. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf Section 3.10.9 And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here. I am sorry, even by ignoring the penil enchacement movtivation part, what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol Let's can the conspiracy theories - and move back to the frightening world of reality, shall we? Alright.. I am back to reality! Sorry for skipping out what do you need? Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version. Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time? I wanted to claify that x64 or long mode is not proposed by me to speed up floating point operations, but can speed up basic data handling and storage operations. In case some people are becoming confused. = ID: 19266 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 19295 - Posted: 25 Jun 2006, 22:32:37 UTC Leonard Kevin Mcguire Jr. stated: In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) I replied If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing (the) copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here. After removing the stated justification for the stated delusional conspiracy theory (64 bit envy by those that don't have a 64 bit cpu) - he moves on to this statement: what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol You imply/claim I posted my knowledge of negative things about 64 bit compilations of DC apps - because I don't have a 64 bit cpu. Clicking on my name will allow you to see that the machine that has crunched my 42k credits on Rosetta IS an Athlon 64 cpu. If you click here: https://boinc.bakerlab.org/rosetta/result.php?resultid=25537070 you'll notice that I'm running the standard 5.4.9 Boinc client; not an optimized client. If I was worried about people getting 1.5 to 2.0 times my score for the same performance - I could run an optimized Boinc client. Also, keep in mind the fact that at present, even if you have a Rosetta, SETI, etc client that produces 10 or 100 times the amount of work as the default Rosetta/SETI client, you'll get the exact same amount of credit as someone else that spent the same amount of time with the same Boinc client (with identical overall benchmarks). Your opinion about my motivations in posting my knowledge of a 64 bit compilation of a DC app is quite obviously erroneous. BennyRop stated: Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version. Leonard the 64 bit Fanboy replied: [quote]Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time?[quote] My statement, "Let's can the conspiracy theories - and move back to the frightening world of reality, shall we?" was in regards to ... your rediculous conspiracy theory. In the off chance that your 64 bit optimization skills are better than your finding logical justifications based on reality for a conspiracy theory - I challenged you to show what can be done for performance by hand optimizing an older Rosetta client for 64 bit code. Back when WinNT 3.0 came out as a $69 package with the SDK, I tried it out. The example fractal screen saver took over a minute to create the first image on my system. By hand coding the inner loop of the program into assembler that made use of my math co processor, I managed to get the program to create the first image in say.. 30 seconds. Compiling the original code with the compiler optimization flags turned on.. got the program to within 2 seconds of my hand coded assembler version. (The compiler optimizations were probably faster than my version.. although I don't remember.) Regardless of which was faster - my coding was a waste of effort for that project. Good for education, but a waste of effort, especially as I never used that screensaver again. :) The point being that while it is possible to hand optimize a client and possibly improve the performance of that client - the effort in maintaining that client may not be justified by the performance gain. In which case, Rosetta would be better off putting 32 bit clients in the 64 bit client directories. Keep in mind that I'm not claiming that Rosetta's client's code mix is identical or even similar to DF's. Prove that there's more than a 10% performance increase in moving to a 64 bit client and get Rosetta to produce a 64 bit client from the current code, and I'll start dual booting again. ID: 19295 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19298 - Posted: 25 Jun 2006, 22:51:41 UTC Last modified: 25 Jun 2006, 22:56:05 UTC The point being that while it is possible to hand optimize a client and possibly improve the performance of that client - the effort in maintaining that client may not be justified by the performance gain. In which case, Rosetta would be better off putting 32 bit clients in the 64 bit client directories. Yes it is possible to hand optimize a client, and possibly improve the performance. Yes, the effort may not be justified by the performance gain. In which case Rosetta would be better off putting 32 bit clients in 64 bit directories. I agree. Prove that there's more than a 10% performance increase in moving to a 64 bit client and get Rosetta to produce a 64 bit client from the current code, and I'll start dual booting again. No problems. You are welcome to what threshold of a performance increase that is needed to deem it worthy to be used. =) Regardless of which was faster - my coding was a waste of effort for that project. Good for education, but a waste of effort, especially as I never used that screensaver again. :) The screensaver and Rosetta@Home are two distinct goals in science. The first was for fun, and I have no idea what role it played in advancing science. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. I still remember this quote. If you understood the implications of merely compiling something in 64 bit that was designed for a 32 bit enviroment. Why did you even post to this thread when you knew the entire topic is if a 64 bit build could have performance improvments worthy of use? Thats why it seemed like a politcal ploy for a reader who is not technicaly minded just to rummage through and go, "Oh.. Hmm. So 64 bit is not better." And, of course above that you gave information relating to the statistics. That sounded kind of like a news channel releasing a presidential approval rating (Yet they really got there statistics/percentages from a biased group.). The Orginal BennyRop Post: Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. No, DUH! Use int and get a 64 bit integer, and increase the cache needed. You should work for a news channel! =) ID: 19298 · Rating: 0 · rate: / Reply Quote

skutnar Send message Joined: 31 Oct 05 Posts: 8 Credit: 133,153 RAC: 0	Message 19303 - Posted: 26 Jun 2006, 3:26:18 UTC - in response to Message 19198. From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements. the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with: // Passing values to a function. mov rax, [offset FunctionArg2] mov eax, [offset FunctionArg1] // I want to pass two values to a function. mov rax, ecx shl rax, 32 mov eax, edx Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. = I think you either misread or misunderstood my statement. I'm saying that in general, there is no direct correlation between the bitness of an application and/or of a CPU and the performance seen. I don't understand how that can be considered to be an "inflexible rule". ID: 19303 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 19317 - Posted: 26 Jun 2006, 12:42:15 UTC Nice discussion! ;-) Would anybody of you be willing to actually have a look on the Rosetta source and make an educated guess whether optimizing for 64 BIT (and for SSE etc.) could lead to a significant performance gain? That might soon be possible. If yes, please email me: joachim@iwanuschka.de regards Joachim ID: 19317 · Rating: 0 · rate: / Reply Quote

Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0	Message 19333 - Posted: 26 Jun 2006, 23:39:04 UTC Last modified: 26 Jun 2006, 23:40:22 UTC I think you either misread or misunderstood my statement. I'm saying that in general, there is no direct correlation between the bitness of an application and/or of a CPU and the performance seen. I don't understand how that can be considered to be an "inflexible rule". It depends on what size scope is considered general. More importantly what criteria would need to be meet to consider something a general "case"? Many general cases exist, by my definition of general, that involve routines that can have a performance gain - and thus this gain no matter how small is still a gain unless we are talking about non-general routines which might be: int main(void) { return (char)1 + (char)1; } Evidently this would yeild no use from 64 BIT. We only need to store one value, but it could be run in a 64 BIT enviroment!, and not use 64 BIT instructions and have a gain as I have read somewhere... let me go find it. mov eax, 1 add eax, 1 ret Is that a general case? I do not know of many useful programs that do something like that. Would this be a correlation? int main(int a, int b, int c, int d, int e, int f, int g, int h, int i, int k) { return a + b + c + d + e + f + g + h + i + k; } add eax, ebx // a=a+b add eax, ecx // a=a+c add eax, edx // a=a+d add eax, r08 // a=a+e add eax, r09 // a=a+f add eax, r10 // a=a+g add eax, r11 // a=a+h add eax, r12 // a=a+i add eax, r13 // a=a+k ret I'm saying that in general, there is no direct correlation between the bitness of an application and/or of a CPU and the performance seen. 1. I'm saying that in general, 2. there is no direct correlation between 2A. the bitness of an application or of a CPU and the performance seen. 2B. the bitness of an application and of a CPU and the performance seen. So for 2A we mean: A application using totaly 32bit instructions VS one using totaly 64bit instructions. (In general? Every application in the world that is considered a general case application? How are we to scientificly define this general?) Quite frankly you don't use storage space you do not need. It would be a waste to use a 64 bit operation on a 32 bit value unless wraparound was a unwanted effect. A application using only 32bit instructions VS one that can use 32bit instructions and 64bit instructions as such is the 64 bit AMD processors. So for 2B we mean: A application that uses only 32bit run on a 32bit processor VS a 32bit application run on a 64 bit processor. The performance should be no less in "general". =) A application that uses 32bit and 64bit or just 64bit run on a 32 bit processor VS a 64 bit processor. It will not work. =) ID: 19333 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 19334 - Posted: 26 Jun 2006, 23:43:56 UTC - in response to Message 19198. Last modified: 27 Jun 2006, 0:26:16 UTC And by god, this is not the same problem as it was 15 years ago! Are you sure? If it's not the same problem, why, pray tell, was NT 4 on the DEC Alpha such a miserable failure? Why was Win 2K 64 bit on IA64 (Itanium) an even worse failure? And no it's not about pointer size. I'm not sure about the Alpha, but I know for a fact that the IA64 offered a "half pointer" (i.e. 32 bit) mode to solve the 64 bit pointer overhead problem. It's because a 64 bit processor does not automatically equate to extra speed, just as 32 bits didn't automaticaly equate to more speed over 16 bits, 15 years ago. I benchmarked programs when porting to IA 64. Not just "throw a 32 bit program at the IA 64 compiler and recompile" but actually rearchitect to make use of the 64 bit nature. Yes, some programs did run faster. Some ran a hell of a lot faster. But a good many programs did not, and some ran slower. This is exactly what BennyRop is trying to tell you. There is no hard and fast rule, and there are some programs that simply cannot be made to run faster by throwing more processor bits at them. Please read the second post here. That's from Keith Davis, author of THINK, the software behind the Find-A-Drug cancer project. The point he's making is that the algorithms in Think simply could not be improved by allowing the compiler to use SSE. And don't for an instant make the mistake of underestimating his ability to optimze. By working with Intel on that code, he got about a 40 to 1 speedup after Think and UD parted company. That doesn't mean that Rosetta can't benefit from SSE (or 64 bits for that matter). But it is quite a wakeup call for the fact that sometimes those technologies just can't do any good. ID: 19334 · Rating: 0 · rate: / Reply Quote