Message boards : Number crunching : 64-Bit Rosetta?
Author | Message |
---|---|
XeNO Send message Joined: 21 Jan 06 Posts: 9 Credit: 109,466 RAC: 0 |
Are there any plans for a version of Rosetta that will take advantage of the huge performance benefit in using 64-bit processing? I can only think that good things would come of the project, especially since your AMD users would get the full bang for their buck if they're running a 64-bit OS. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Are there any plans for a version of Rosetta that will take advantage of the huge performance benefit in using 64-bit processing? I can only think that good things would come of the project, especially since your AMD users would get the full bang for their buck if they're running a 64-bit OS. At this time the project is preparing for the CASP7 competitions which start in 1 week and run all summer. There are no plans to implement a new version for 64 bit processors of which I am aware during that time. If you use the search function for the forums, you will find this issue discussed in previous threads. Moderator9 ROSETTA@home FAQ Moderator Contact |
Ethan Volunteer moderator Send message Joined: 22 Aug 05 Posts: 286 Credit: 9,304,700 RAC: 0 |
I'm only going off an old post about R@H's code, but they said most calculations are 'single precision floats'. . . which I take to mean 32-bit floating point operations. If their code is 32 bit, and they aren't having problems with rounding errors (I forget the geek term). . then does a 64 bit .exe gain the project anything other than >4gb memory allocation (try putting that on the hardware requirement page :) -E |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I'm only going off an old post about R@H's code, but they said most calculations are 'single precision floats'. . . which I take to mean 32-bit floating point operations. If their code is 32 bit, and they aren't having problems with rounding errors (I forget the geek term). . then does a 64 bit .exe gain the project anything other than >4gb memory allocation (try putting that on the hardware requirement page :) Well the project did some testing on this a while back and posted a news item about it. But I can't find the thing to link it here. In any case a full site search for "64 Bit" will provide a lot of reading on this subject going back over 7 or 8 months. Moderator9 ROSETTA@home FAQ Moderator Contact |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory. Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power? Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack. It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor. A 32bit processor can provide 32 bytes at most, using all its general purpose registers. A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space. Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain. I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting). I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it? |
Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0 |
The way a 64bit app can actually be slower than a 32bit app is rooted in memory access. Since instructions and data are processed in 64bit chunks that means twice as much memory access is needed per cycle. In most cases the efficiency of the operation more than makes up for memory overhead. I don't think there is a stigma against 64bit processors, however there is a constant battle to convince less techniclly inclined people that double the bits does not mean double the performance. The floating point unit is unchanged in the current 64bit processors compared to 32bit processors. Since most of the processing takes place in the FPU the only place for gain is in more effcient feeding of the FPU. BOINC WIKI BOINCing since 2002/12/8 |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Back at DF (Distributed Folding) they released a 64 bit client for a 64 bit flavor of a SUN OS. DF ran at a fairly consistent speed - i.e. 24 hours of crunching would produce roughly the same amount of results on Wednesday as they did on Tuesday. Stats whores tried out the 64 bit client, noticed a dramatic decline in results per day - so gave up and moved back to the 32 bit client. It wasn't optimized for 64 bit.. it was merely compiled in 64 bit. But we don't have optimized Rosetta clients yet, either. |
skutnar Send message Joined: 31 Oct 05 Posts: 8 Credit: 133,153 RAC: 0 |
From a general standpoint, the "bitness" of the application does not equate to performance improvement or degredation. The thing that really matters is the source code, compiler/assembler, and linker taking advantage of the architecture. In some cases, 64-bit will not be faster or slower, but could take up almost twice the memory footprint. Only the developers, who have access to the source code and higher-level algorithms and know how to work with the architecture can tell if an application will see significant improvements. |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
As other's have said, more bits does not equal more speed. If you have a project that does a ton of pointer dereferencing (E@H ???) it's quite likely to slow down on a 64 bit system, because each pointer access that isn't in D$ means twice the memory bandwidth to fetch the data. I'm sufficiently long in the tooth that I can remember the exact same story, 15 years ago or so when we jumped from 16 to 32 bit. Everyone figured that 32 bits was guaranteed to fly, and there were quite a few red faces when 16 bit versions ran faster because they had a smaller memory footprint. |
Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0 |
As someone who has worked on the transition from 32-bit to 64-bit processors for some time (like since before the Athlon64/Opteron was released to the public). You're _ALL_ correct. An application that actively makes use of the further 32-bits in the register layout will benefit from 64-bit registers. Most current 32-bit applications do not automatically do this... Simply because most code just stores numbers in a variable, and if that variable is just BIG ENOUGH (16, 32 or 64-bit) the code works just fine, and bigger variables (or registers) don't add anything. The x86_64 architecture also adds further registers, and this DOES help in quite a few cases, because the compiler can avoid storing things in memory, and just keep the same thing in a register for a longer period of time. It also allows more effective passing of arguments from one function to another. However, bigger registers can also be a drawback - particularly for pointers. A pointer is now 8 bytes (64-bit) instead of 4 bytes (32-bit), so the cache can hold only half as many pointers. So for pointer-intensive code (such as binary trees or linked lists) where lots of pointers are being read/written, the cache becomes fulll much sooner. Finally, there was the comment that code-size makes a difference, and yes it does. However, at least for x86_64, the code-size for a 64-bit application isn't much larger than the same application in 32-bit compiled with the same (or very similar) compiler. This is because most of the code in a 64-bit app will actually use 32-bit integers, and 32-bit operations are exactly identical in the 32-bit and 64-bit binary. There will be longer instructions for 64-bit operations, but this is to a large extent compensated (or even gained over) by the higher number of registers available and thus the reduced amount of memory read/write operations needed to store/restore the registers when the compiler runs out of actual registers to keep values in. Of course, all of this is very dependand on the actual application. I would suspect that the biggest gain for Rosetta would be to compile it to use SSE instructions, rather than to make it 64-bit. This is because it's a primarly floating point-intensive application, and it uses 32-bit single precision floats, so SSE would make sense. On AMD processors, it may be that 3DNow! instructions would be even faster - but they are limited by the lack of some operations, and the gain over SSE would be small - the main gain would be from the fact that two operands can be processed at once, whilst the four operands that SSE provides will essentially be forced into a two-step operation - and although for the same amount of work, two separate operations would be needed for the 3DNow! code, the two separate instructions do not need to be contiguous, so the processor can perform them out-of-order, which can sometimes improve the performance... Marginally, tho'. Having one binary for SSE on AMD and Intel would reduce the amount of different code-bases to support and bug-fix, so that would be my suggestion. Note also that very few compilers generate decent code for SSE in actual super-scalar mode. There's been a vectorization project for GCC, but it's still not ready for prime-time use. So it would probably require a bit of hand-coding to get anywhere close to ideal performance. -- Mats |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with: // Passing values to a function. mov rax, [offset FunctionArg2] mov eax, [offset FunctionArg1] // I want to pass two values to a function. mov rax, ecx shl rax, 32 mov eax, edx Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. =
Thats why you have 32 bit instructions for pointer access? I mean duh? Huh? http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html Another rigid rule that makes ever who reads your post will think that 64 bit is not better, or did you just word it wrong? wtf You don't use a 64 bit pointer if you do not need to.. = Now, I bet alot of people are jumping around going oh no, you are wrong! OS calls are using 64bit pointers, and and the heap may even allocate things out of reach by a 32bit pointer.. I mean this is all trival fixable things. Writting a simple heap arithogram takes long a little time and effort.. =
I suspect you have also realized that Rosetta@Home is also a primarly data processing application too right? Data storage inside the processor, makes a big difference.
And, this was exactly why I tried to argue about just setting a compiler flag in another post of mine. =| And by god, this is not the same problem as it was 15 years ago! |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
(I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.') Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower.. If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it. Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas. In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Distributed Folding spent around 90% of it's time going through pointers - and is the reason the developers were never interested in having it optimized for SSE, 3Dnow, or Altivec. You've already stated that there is no speed benefit from using 64 bit pointers in such cases and the pointers would have to be hard coded to 32 bit. Where is the benefit of hand optimization to 64 bit - that the developers had neither experience or time for - going to have come from? The point is not that all 64 bit apps are slower than 32 bit apps - but that 64 bits DOES NOT speed up every application as the 64 bit fanboys keep claiming. If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here. Let's can the conspiracy theories - and move back to the frightening world of reality, shall we? Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version. |
Travis DJ Send message Joined: 2 May 06 Posts: 10 Credit: 537,572 RAC: 0 |
This is almost over my head..but from what I know.. FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than x64) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register. What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code). So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :) |
Travis DJ Send message Joined: 2 May 06 Posts: 10 Credit: 537,572 RAC: 0 |
This is almost over my head..but from what I know.. FPU registers are 80-bit which is why they can store two 32-bit values and there are 8 of them. That also explains why SSE is a better option (than making a pure x64 binary) because the 8 SSE registers are 128-bit wide and can hold 4 single precision results per register and it wouldn't require a rewrite. What I am not sure I fully comprehend is how the FPU behaves in x64 mode. I understand that the GPRs in x64 are really actually general purpose registers as opposed to IA-32's somewhat limited uses on all the available registers can be used for. At least form Wikipedia, it's said that SSE2 repleaces x87 in x64 mode and there is the choice of 32 or 64 bit precision (I assume that's called by the programmer in the code). So from what I gather from reading that and what I already know, it seems right to assume there is no real benefit from making a 64-bit binary for Rosetta given what the program does as SSE/2 can fully accomodate that in 32-bit mode as it is. Right? :) |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
The FPU, and the GPRs are two seperate things. The extra GPRs are a gain that x64 provides, the x87 FPU aparently in my knowledge remains almost the same in long mode. The processor stores data in three ways each ordered in speed. The FPU in long mode should behave exactly as it did in protected mode as far as presentation, the performance I do not know for sure. I have also read somewhere that x87 instructions in long mode should be avoided? Has anyone else read this, even so? 1. REGISTERS 2. CACHE 3. MEMORY X86 or 32bit code has 8 general purposes registers. Programs TRY to keep as much as possible in these registers while it is processing. However, due to the large number of varibles and/or size of data structors it becomes impossible. The compiler then generates instructions for swapping data from registers back out to memory/program-stack while it loads other data for current processing. Giving the program access to 8 more general purposes registers allows the processor to not have to swap as much between its registers and memory. The registers are faster than cache! The cache is almost always controled by the CPU from my knowledge, although instructions exist to manipulate it. The AMD64 Althon has a prefetch instruction somewhere that allows more detailed manipulation of the cache, but I do not know much about this. In the case of large loops that perform many functions calls in the core of the loop, a performance gain could be made for functions that are forced to place arguments onto the stack because not enough registers are free OR core code can make use of registers, and possibly not change the functions calling convention. There should be no worldy reason why no application in the universe could not benifit from x64 in someway no MATTER HOW SMALL, at the VERY LEAST. I made those words in capital letters for those that can not READ. The conservatives!
Yes, I did say that.
Now, I'm going to stretch the limits of sanity apon what amount of performance gain could be had from using long mode instructions by telling you that any program will run faster using x64 mode, and the most easiest way to prove that is to take one function call in the core code of the application that pushes one or more arguments onto the stack and allow it to use one of the extra GPRs instead of pushing a argument onto the stack. The AMD64 documentation already specifies that it is faster to peform a move rather than a push or pop. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf Section 3.10.9
I am sorry, even by ignoring the penil enchacement movtivation part, what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol
Alright.. I am back to reality! Sorry for skipping out what do you need?
Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time? I wanted to claify that x64 or long mode is not proposed by me to speed up floating point operations, but can speed up basic data handling and storage operations. In case some people are becoming confused. = |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Leonard Kevin Mcguire Jr. stated: In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) I replied If you're going to try to insult my motivations in posting FACTS instead of fanboy wisdom, then keep in mind that my sole machine on this project is a 2.5 year old Athlon 64 that is still waiting for a justification for installing (the) copy of 64 bit WinXP I have sitting beside the machine. And if I was using this as a penile enhancement motivation as you seem to imply - then I could probably move a couple of my systems from another project to enhance my score here. After removing the stated justification for the stated delusional conspiracy theory (64 bit envy by those that don't have a 64 bit cpu) - he moves on to this statement: what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol You imply/claim I posted my knowledge of negative things about 64 bit compilations of DC apps - because I don't have a 64 bit cpu. Clicking on my name will allow you to see that the machine that has crunched my 42k credits on Rosetta IS an Athlon 64 cpu. If you click here: https://boinc.bakerlab.org/rosetta/result.php?resultid=25537070 you'll notice that I'm running the standard 5.4.9 Boinc client; not an optimized client. If I was worried about people getting 1.5 to 2.0 times my score for the same performance - I could run an optimized Boinc client. Also, keep in mind the fact that at present, even if you have a Rosetta, SETI, etc client that produces 10 or 100 times the amount of work as the default Rosetta/SETI client, you'll get the exact same amount of credit as someone else that spent the same amount of time with the same Boinc client (with identical overall benchmarks). Your opinion about my motivations in posting my knowledge of a 64 bit compilation of a DC app is quite obviously erroneous. BennyRop stated: Get ahold of the older Rosetta code that's been mentioned; get ahold of the 64 bit version of the compiler that the Rosetta team is using; hand optimize it for 64 bit mode, and then run each client for 24 hours 3 times on the same WU. We'll see the variability of the same client on the same hardware running the same WU - and be able to see if there's a dramatic improvement between the 64 bit version and the 32 bit version. Leonard the 64 bit Fanboy replied: [quote]Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time?[quote] My statement, "Let's can the conspiracy theories - and move back to the frightening world of reality, shall we?" was in regards to ... your rediculous conspiracy theory. In the off chance that your 64 bit optimization skills are better than your finding logical justifications based on reality for a conspiracy theory - I challenged you to show what can be done for performance by hand optimizing an older Rosetta client for 64 bit code. Back when WinNT 3.0 came out as a $69 package with the SDK, I tried it out. The example fractal screen saver took over a minute to create the first image on my system. By hand coding the inner loop of the program into assembler that made use of my math co processor, I managed to get the program to create the first image in say.. 30 seconds. Compiling the original code with the compiler optimization flags turned on.. got the program to within 2 seconds of my hand coded assembler version. (The compiler optimizations were probably faster than my version.. although I don't remember.) Regardless of which was faster - my coding was a waste of effort for that project. Good for education, but a waste of effort, especially as I never used that screensaver again. :) The point being that while it is possible to hand optimize a client and possibly improve the performance of that client - the effort in maintaining that client may not be justified by the performance gain. In which case, Rosetta would be better off putting 32 bit clients in the 64 bit client directories. Keep in mind that I'm not claiming that Rosetta's client's code mix is identical or even similar to DF's. Prove that there's more than a 10% performance increase in moving to a 64 bit client and get Rosetta to produce a 64 bit client from the current code, and I'll start dual booting again. |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
Yes it is possible to hand optimize a client, and possibly improve the performance. Yes, the effort may not be justified by the performance gain. In which case Rosetta would be better off putting 32 bit clients in 64 bit directories. I agree.
No problems. You are welcome to what threshold of a performance increase that is needed to deem it worthy to be used. =)
The screensaver and Rosetta@Home are two distinct goals in science. The first was for fun, and I have no idea what role it played in advancing science.
I still remember this quote. If you understood the implications of merely compiling something in 64 bit that was designed for a 32 bit enviroment. Why did you even post to this thread when you knew the entire topic is if a 64 bit build could have performance improvments worthy of use? Thats why it seemed like a politcal ploy for a reader who is not technicaly minded just to rummage through and go, "Oh.. Hmm. So 64 bit is not better." And, of course above that you gave information relating to the statistics. That sounded kind of like a news channel releasing a presidential approval rating (Yet they really got there statistics/percentages from a biased group.). The Orginal BennyRop Post:
No, DUH! Use int and get a 64 bit integer, and increase the cache needed. You should work for a news channel! =) |
skutnar Send message Joined: 31 Oct 05 Posts: 8 Credit: 133,153 RAC: 0 |
I think you either misread or misunderstood my statement. I'm saying that in general, there is no direct correlation between the bitness of an application and/or of a CPU and the performance seen. I don't understand how that can be considered to be an "inflexible rule". |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Nice discussion! ;-) Would anybody of you be willing to actually have a look on the Rosetta source and make an educated guess whether optimizing for 64 BIT (and for SSE etc.) could lead to a significant performance gain? That might soon be possible. If yes, please email me: joachim@iwanuschka.de regards Joachim |
Message boards :
Number crunching :
64-Bit Rosetta?
©2024 University of Washington
https://www.bakerlab.org