1)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19786)
Posted 5 Jul 2006 by Leonard Kevin Mcguire Jr. Post: But, you know. Even if - SSE had to use doubles. I think a AMD64 would perform better using SSE doubles than X87 doubles because of the reduced code size at least. I do not know for other processors, I just remember reading that it was recommened to use SSE instead of X87. I think a executable compiled in 32bit causes the processor switch into the compatiblity mode. The 64bit long mode removed some instructions that some programs may use when written as 32bit. So that SSE performance gain might be only when the executable is a native 64BIT exe under a 64BIT operating system. Oh I forgot to mention. Somewhere the Rosetta application is using SSE for double precision values to do some small calculations? (I could have sworn I saw it in the disassembly). I do have a AMD64 3500+ with Win XP64 Pro installed. |
2)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19784)
Posted 5 Jul 2006 by Leonard Kevin Mcguire Jr. Post: We have a solution, and a agreement. I have never in my entire life run into this situation on a internet message board.. I am proud! |
3)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19774)
Posted 4 Jul 2006 by Leonard Kevin Mcguire Jr. Post: I agree with your findings, because I came up with similar results. I gave up on hand optimizing that one function. It was taking way too long, and would be a little difficult to confirm the results as being accurate in-case I made a mistake. That further reinforces the fact that it is a diffcult task to hand optimize something by reverse engineering it. I am not saying I could not do it, nor anyone else, but it is not a easy task by any means and I am most likely not going to spend that much time when I have no way to verify the results, and the possibility of many* other functions existing that need the same optimization.. whew.. lots of work. = I did do some tests where I took a single percision value, randomly generated. I loaded it into a 64bit register and performed a operation on it. Then I loaded it into a 32bit register and performed the same operation on it. I did this many times. Of course you lose percision.. However, Rosetta is performing fstd after almost every math operation on that 64bit register, truncuating it to a single percision value. I know it is not exactly the same, but it did yield exact results in alot of cases as doing the operation completely in 32bit. Thus, mabye depending on Rosetta's arithograms the margin of precision error is small enough. When my tests including the trunuaction of the 64bit result it yielding a exact match thus no precision lost? ([32BIT] = [64BIT] <*+/-> [64BIT]) might be so close to: ([32BIT] = [32BIT] <*+/-> [32BIT]) that for Rosetta's arithograms it would cause no problems, thus enabling the usage of SSE in single percision mode. I have seen at *least* one case were two operations were performed on a double percision value by the x87 in Rosetta thus increasing the margin of precision error by a value unknown to me because of my knowledge limitations on floating-point internal workings, and this in it's self could eliminate the usage of SSE. However, to the above paragraph: The Rosetta application would also suffer precision changes in the event compiler flags were changed and the generated code thus changed thus producing different results, and of course: I feel this is a potential clue that the Rosetta arithograms may not be bothered by the: ([32BIT] = [64BIT] <*+/-> [64BIT]) vs ([32BIT] = [32BIT] <*+/-> [32BIT]) problem. because, I would imagine the developers have already realized the penaltys on their precision for their arithograms when building the application and most likely exmaining their machine code output. So, I guess the application needs to be hand-optimized. Then take the un-hand optimizaed version vs the hand optimized version running a work unit with the exact same random seed to see if their is a result difference? PS: I really appreciate you taking the time to help me ?to try? to solve this very difficult question! |
4)
Message boards :
Rosetta@home Science :
DISCUSSION of Rosetta@home Journal (2)
(Message 19739)
Posted 3 Jul 2006 by Leonard Kevin Mcguire Jr. Post:
If this is correct then taking into consideration the below post by Mats Petersson. I agree with Akos, and I agree with Mats based from the point: Its a daunting task to hand optimize assembler to use SSEx instructions, and it is componded when these routines change often. However, the only people who know how much the routines change and if the routines that are changed alot -- are the developers. So to completely put the issue to the grave would be their input - If there is even a definied way for them to communicate this, that could provide useful information to make the final determination. |
5)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19595)
Posted 30 Jun 2006 by Leonard Kevin Mcguire Jr. Post: Yes, I agree completely. Thank you for posting the correction to my code snip, it is most appreciated. I just realized too that Rosetta@home should be okay using single percision values. I mean if they have been running their code unoptimized for this CASP7 project, almost all the calculations in the application are being truncuated back into single percision from 80 bit anyway, because of all those excessive load and store instructions the compiler they use is generating. lol I do not know why I did not think of the above yesterday, sorry. =) I guess the developers know the results could* change when they decide to turn on optimizations on their compilers one day. =) I remember a post saying Rosetta@Home was being run lately in some form of debug mode or something to try and diagnose the bugs lately too? [edit] I checked the Rosetta@Home application's windows build, and the x87 is set to the default precision of 64 bit like you stated in the previous post. So, apparently Rosetta@Home is not doing 80bit calculations, but instead only 64bit internaly and performing load and store operations in 32bit with the conjunction of their code constantly loading and storing after almost every math operation --- using SSE should be *okay*.. |
6)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19519)
Posted 30 Jun 2006 by Leonard Kevin Mcguire Jr. Post: So essentially by using SSE I lose 80 bit precision for a choice of 32 bit precision or 64 bit precision of the calculations. Even though the SSE uses 128 bit registers. MMX I think offers 128 bit operations, but I do not know if these would have a performance boost from x87? I think SSE2 allows 128bit operations, although my processor does not support it. I end up doing the first part of the other function like so to get double percision. sub esp, 0Ch push esi push edi mov esi,dword ptr [esp+20h] mov edi,dword ptr [esp+4Ch] // todo:fix: unaligned move, data potentialy allocated unaligned. // load two double floats from esi and edi movups xmm0, xmmword ptr [esi] shufps xmm0, xmm0, 050h movups xmm1, xmmword ptr [edi] shufps xmm1, xmm1, 050h // load remaining double float from esi and edi movups xmm2, xmmword ptr [esi] shufps xmm2, xmm2, 0FAh; movups xmm3, xmmword ptr [edi] shufps xmm3, xmm3, 0FAh; // vector subtraction subpd xmm0, xmm1 subsd xmm2, xmm3 |
7)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19515)
Posted 29 Jun 2006 by Leonard Kevin Mcguire Jr. Post: I was doing some reading, and come across: http://www.gamedev.net/community/forums/topic.asp?topic_id=363892 That was not much, but then I read: http://72.14.209.104/search?q=cache:Z7_1qzYnxqQJ:asci-training.lanl.gov/BProc/SSE.html+sse+%22single+scalar%22&hl=en&gl=us&ct=clnk&cd=2 Alot of people have always said, with-out quotes, SSE is only good for doing multiple calculations using one instruction like the name implies. However, SSE supports some Single Scalar instructions that only perform one operation per instruction sorta mimic-ing the x87. According to the last link using SSE with single percision scalars is faster and more efficent than the x87, but using it for double percision scalars is about equal to the x87. That sounds like good news, or is there more to this? You would have to read that last link to see what I read near the top of the page. Unfourtunatly I also read this: http://72.14.209.104/search?q=cache:w4T3-RFklbYJ:www.ddj.com/184406205+x87+instructions&hl=en&gl=us&ct=clnk&cd=10&client=firefox-a That describes that fact that the x87 defaulty provides 80bit percision. According to that - even though Rosetta@Home only uses single percision(32bit) floats, it is only referencing the fact of how these are written/read from/to the x87 unit. Potentialy alot of Rosetta's calculations have become dependant on the 80bit percision and may not function at the required accuracy by using SSE? However the article also states the workarounds as to use double percision with SSE to aleviate or fix the 80bit percision problem? I am not sure if 64bit floats would do enough justice? Neverless, apparently that would still be equal to the x87 performance on modern day processors. The AMD64 seems to recommend using SSE instead of x87 even with double percision(64bit) floats, but I am not pulling a fanboy episode - just throwing this out there as to what is going on? Let me retype that, "I *think* the article recommeneds using double percision SSE instructions over x87 on a AMD64". |
8)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19447)
Posted 29 Jun 2006 by Leonard Kevin Mcguire Jr. Post: I found a interesting function that could use SSE: 00604050 sub esp,0Ch 00604053 push esi 00604054 mov esi,dword ptr [esp+1Ch] 00604058 fld dword ptr [esi] 0060405A push edi 0060405B mov edi,dword ptr [esp+4Ch] 0060405F fsub dword ptr [edi] 00604061 mov eax,dword ptr [esp+88h] 00604068 fstp dword ptr [esp+0Ch] 0060406C fld dword ptr [esi+4] 0060406F fsub dword ptr [edi+4] 00604072 fstp dword ptr [esp+8] 00604076 fld dword ptr [esi+8] 00604079 fsub dword ptr [edi+8] 0060407C fstp dword ptr [esp+10h] 00604080 fld dword ptr [esp+8] 00604084 fld dword ptr [esp+0Ch] 00604088 fld dword ptr [esp+10h] .... I replaced the x87 instructions with SSE. 00604050 83 EC 0C sub esp,0Ch 00604053 56 push esi 00604054 57 push edi 00604055 8B 74 24 20 mov esi,dword ptr [esp+20h] 00604059 8B 7C 24 4C mov edi,dword ptr [esp+4Ch] 0060405D 0F 10 06 movups xmm0,xmmword ptr [esi] 00604060 0F 10 0F movups xmm1,xmmword ptr [edi] 00604063 0F 5C C1 subps xmm0,xmm1 00604066 0F C6 C1 4B shufps xmm0,xmm1,4Bh 0060406A 8B 44 24 10 mov eax,dword ptr [esp+10h] 0060406E 0F 11 4C 24 04 movups xmmword ptr [esp+4],xmm1 00604073 89 44 24 10 mov dword ptr [esp+10h],eax 00604077 8B 84 24 88 00 00 00 mov eax,dword ptr [esp+88h] 0060407E 90 nop 0060407F 90 nop 00604080 D9 44 24 08 fld dword ptr [esp+8] 00604084 D9 44 24 0C fld dword ptr [esp+0Ch] 00604088 D9 44 24 10 fld dword ptr [esp+10h] ... Do you think SSE will give a significant performace gain with something like that? I had to use a shufps, because I thought/think it was swapping the first two FPs. r[1] = a[0] - b[0] r[0] = a[1] - b[1] r[2] = a[2] - b[2] The forth over-writes other stuff in the stack, so I preserve it using eax - mabye all the extra overhead just kills any gain? |
9)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19445)
Posted 29 Jun 2006 by Leonard Kevin Mcguire Jr. Post: Hmmm.. So it seems you were right and I know why, and I am not mad. It could benifit more from SSE rather than 64 BIT instructions primarly because most operations work in cache when moving around data, and the rest is FP operations - and if not so its rarely called functions that have cache misses that move alot of data around so no big deal. I came to this conclusion after trying to figure out a way to optimize the function at the file address 0x204050 in the windows PE32 build. I used a timer, and it found that this function gets called alot. It seems like it takes about 3 arguments - looks just like a vector subtraction at the start, anyway, after looking the function over and over. I could find no where to optimize it favorable >:) with 64bit instructions. I am not saying there is absolutely no performance to be gained, but its prolly not going to be in the core of the application as the 90:10 rule you stated in a earlier post. However, I did notice a few operations of load/store that where kind of redundant looking unless the rounding effect was intentional (80bit[fp reg] -> 32bit[mem]... "fstp fld"). I will look at it some more in a direction towards SSE =). At least this sorta stops the 64bit fanboy arguments! |
10)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19443)
Posted 28 Jun 2006 by Leonard Kevin Mcguire Jr. Post: You were right about the data move routines, they stayed right about the same for each one independant of the register size used. I must meditate on this. |
11)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19389)
Posted 28 Jun 2006 by Leonard Kevin Mcguire Jr. Post: Alright. Here is a function, unknown to me exactly what it is. Samples Address Code Bytes Instruction Symbol CPU0 Event:0x41 0x494a50 0x 56 push esi 0x494a51 0x 57 push edi 0x494a52 0x 8B 7C 24 0C mov edi,[esp+0ch] 0x494a56 0x 8B F1 mov esi,ecx 0x494a58 0x 3B F7 cmp esi,edi 0x494a5a 0x 74 2D jz $+2fh (0x494a89) 0x494a5c 0x 57 push edi 0x494a5d 0x E8 7E F4 FF FF call $-00000b7dh (0x493ee0) 0x494a62 0x 84 C0 test al,al 0x494a64 0x 75 08 jnz $+0ah (0x494a6e) 0x494a66 0x 57 push edi 0x494a67 0x 8B CE mov ecx,esi 0x494a69 0x E8 92 FE FF FF call $-00000169h (0x494900) 0x494a6e 0x 33 C0 xor eax,eax 0x494a70 0x 39 46 0C cmp [esi+0ch],eax 0x494a73 0x 76 14 jbe $+16h (0x494a89) [b] 23 0x494a75 0x 8B 4F 08 mov ecx,[edi+08h] 23 34 0x494a78 0x D9 04 81 fld dword ptr [ecx+eax*4] 34 988 0x494a7b 0x 8B 56 08 mov edx,[esi+08h] 988 0x494a7e 0x D9 1C 82 fstp dword ptr [edx+eax*4] 995 0x494a81 0x 83 C0 01 add eax,01h 995 1 0x494a84 0x 3B 46 0C cmp eax,[esi+0ch] 1 0x494a87 0x 72 EC jb $-12h (0x494a75) [/b] 0x494a89 0x 5F pop edi 0x494a8a 0x 8B C6 mov eax,esi 0x494a8c 0x 5E pop esi 6 0x494a8d 0x C2 04 00 ret 0004h 6 This is apparently where alot of cache misses are occuring, just a thought, if it is copying a large array, as in the case when I broke the thread. The cache misses might just be too bad for such a large array, but on a AMD64 capable processor would the below even though not erasing the cache misses be benificial? cmp eax, [esi+0ch] 0x494a84 *((unsigned long*)(esi + 0xC)) = 0x11D In this instance it was moving (285 * 4) bytes with 285 loops. EAX starts at 0x0, so its definitly looping. Could a 64BIT move be faster than using the floating point unit to perform a data move? Apparently the floating point unit can move it faster than a 32 bit move, as the code is using the x87? (I do not know) [The x87 might be slower, mabye the compiler just did not optimize it correctly?) My WIN32 executables .text section's virtual address is 0x401000. That could help pinpoint where the function is since I'm pretty sure a linux build's CRT might offset it differently? [edit:] I actually looked at the darn code some more and figured it couldnt be optimized at all since it repeatadly loads ecx and edx with the same value. You know I just noticed from your results that it does alot of FP store and load operations, the potential of MMX to reduce transfers from main memory to FP registers? I do think all AMD(I do not know alot about INTEL) 64 processors support the MMX instructions.. I am just guessing here. And in reply to the SSE being slower than the x87 instructions. I have read that SSE is actually slower, and the only performance gain is by using it to perform multiple operations. |
12)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19379)
Posted 27 Jun 2006 by Leonard Kevin Mcguire Jr. Post: You are a retard. I have written a operating system kernel from scratch! I should know! Yes!
I am so tired of this crap.
http://www.unix.org/whitepapers/64bit.html All I have to say, is for all the "fake" stupid idiots that rant and rave about crap they don't know for sure can kiss my white butt. Optimize the Rosetta@Home application your self, since it seems you people don't want to attempt to talk about something constructive apon it helping - instead every single post (almost) was a negative buncha crap.
You wish you knew something. I can not prove it, but I know it. I am so mad, I do not care. So if you post any other CRAP above this, just know I am going to read it and laugh at you. |
13)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19337)
Posted 27 Jun 2006 by Leonard Kevin Mcguire Jr. Post: Oh. I am sorry, I was completely wrong. You are right, dgnuff. Silly me! I am so glad we have smart people like you around to explain to idiots like me that 64 bits does not equate to more performance, it took me forever it understand what everyone is talking about, wow - I have been lost before but never this lost! Thank you lord! I have been saved from a hellish future! The world has been saved! |
14)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19333)
Posted 26 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
It depends on what size scope is considered general. More importantly what criteria would need to be meet to consider something a general "case"? Many general cases exist, by my definition of general, that involve routines that can have a performance gain - and thus this gain no matter how small is still a gain unless we are talking about non-general routines which might be: int main(void) { return (char)1 + (char)1; } Evidently this would yeild no use from 64 BIT. We only need to store one value, but it could be run in a 64 BIT enviroment!, and not use 64 BIT instructions and have a gain as I have read somewhere... let me go find it. mov eax, 1 add eax, 1 ret Is that a general case? I do not know of many useful programs that do something like that. Would this be a correlation? int main(int a, int b, int c, int d, int e, int f, int g, int h, int i, int k) { return a + b + c + d + e + f + g + h + i + k; } add eax, ebx // a=a+b add eax, ecx // a=a+c add eax, edx // a=a+d add eax, r08 // a=a+e add eax, r09 // a=a+f add eax, r10 // a=a+g add eax, r11 // a=a+h add eax, r12 // a=a+i add eax, r13 // a=a+k ret
1. I'm saying that in general, 2. there is no direct correlation between 2A. the bitness of an application or of a CPU and the performance seen. 2B. the bitness of an application and of a CPU and the performance seen. So for 2A we mean: A application using totaly 32bit instructions VS one using totaly 64bit instructions. (In general? Every application in the world that is considered a general case application? How are we to scientificly define this general?) Quite frankly you don't use storage space you do not need. It would be a waste to use a 64 bit operation on a 32 bit value unless wraparound was a unwanted effect. A application using only 32bit instructions VS one that can use 32bit instructions and 64bit instructions as such is the 64 bit AMD processors. So for 2B we mean: A application that uses only 32bit run on a 32bit processor VS a 32bit application run on a 64 bit processor. The performance should be no less in "general". =) A application that uses 32bit and 64bit or just 64bit run on a 32 bit processor VS a 64 bit processor. It will not work. =) |
15)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19298)
Posted 25 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
Yes it is possible to hand optimize a client, and possibly improve the performance. Yes, the effort may not be justified by the performance gain. In which case Rosetta would be better off putting 32 bit clients in 64 bit directories. I agree.
No problems. You are welcome to what threshold of a performance increase that is needed to deem it worthy to be used. =)
The screensaver and Rosetta@Home are two distinct goals in science. The first was for fun, and I have no idea what role it played in advancing science.
I still remember this quote. If you understood the implications of merely compiling something in 64 bit that was designed for a 32 bit enviroment. Why did you even post to this thread when you knew the entire topic is if a 64 bit build could have performance improvments worthy of use? Thats why it seemed like a politcal ploy for a reader who is not technicaly minded just to rummage through and go, "Oh.. Hmm. So 64 bit is not better." And, of course above that you gave information relating to the statistics. That sounded kind of like a news channel releasing a presidential approval rating (Yet they really got there statistics/percentages from a biased group.). The Orginal BennyRop Post:
No, DUH! Use int and get a 64 bit integer, and increase the cache needed. You should work for a news channel! =) |
16)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19266)
Posted 25 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
The FPU, and the GPRs are two seperate things. The extra GPRs are a gain that x64 provides, the x87 FPU aparently in my knowledge remains almost the same in long mode. The processor stores data in three ways each ordered in speed. The FPU in long mode should behave exactly as it did in protected mode as far as presentation, the performance I do not know for sure. I have also read somewhere that x87 instructions in long mode should be avoided? Has anyone else read this, even so? 1. REGISTERS 2. CACHE 3. MEMORY X86 or 32bit code has 8 general purposes registers. Programs TRY to keep as much as possible in these registers while it is processing. However, due to the large number of varibles and/or size of data structors it becomes impossible. The compiler then generates instructions for swapping data from registers back out to memory/program-stack while it loads other data for current processing. Giving the program access to 8 more general purposes registers allows the processor to not have to swap as much between its registers and memory. The registers are faster than cache! The cache is almost always controled by the CPU from my knowledge, although instructions exist to manipulate it. The AMD64 Althon has a prefetch instruction somewhere that allows more detailed manipulation of the cache, but I do not know much about this. In the case of large loops that perform many functions calls in the core of the loop, a performance gain could be made for functions that are forced to place arguments onto the stack because not enough registers are free OR core code can make use of registers, and possibly not change the functions calling convention. There should be no worldy reason why no application in the universe could not benifit from x64 in someway no MATTER HOW SMALL, at the VERY LEAST. I made those words in capital letters for those that can not READ. The conservatives!
Yes, I did say that.
Now, I'm going to stretch the limits of sanity apon what amount of performance gain could be had from using long mode instructions by telling you that any program will run faster using x64 mode, and the most easiest way to prove that is to take one function call in the core code of the application that pushes one or more arguments onto the stack and allow it to use one of the extra GPRs instead of pushing a argument onto the stack. The AMD64 documentation already specifies that it is faster to peform a move rather than a push or pop. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf Section 3.10.9
I am sorry, even by ignoring the penil enchacement movtivation part, what you said does not prove my "conspiracy theory" false. What does it prove that you could move a couple of systems to this project? lol
Alright.. I am back to reality! Sorry for skipping out what do you need?
Well. You just told me to come back to reality, now you are saying go back to insanity because there is a chance you were really out of reality the entire time? I wanted to claify that x64 or long mode is not proposed by me to speed up floating point operations, but can speed up basic data handling and storage operations. In case some people are becoming confused. = |
17)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19242)
Posted 24 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
(I am taking a guess due to the fact it is stated - 'it was merely compiled in 64 bit.') Alright, here is another indication of incorrect statistics for a 64 bit processor. The real questions is WHY this happened, not that it is automaticaly accepted that for some reason the 64 bit processor is just slower.. If a program uses the default int type, and the compiler produces 64 bit code the int becomes 64 bit. The int type is platform specific. i could understand a speed decrease when you start making excess reads/write and processing when you do not need it. Just because you have 64 bits does not mean you use it, but it does not also mean you have no uses for the extra processing or storage power in other relavent and close proximity areas. In my own opinion I think some people may post negative and incorrect information to in some way to politicaly push down any benifits/gains that could come from a 64 bit processor. Honestly, I would hate for 64 bit machines to start crunching more, and push me further down the ranks if I did not have a 64 bit machine. =) |
18)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19198)
Posted 24 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
the "bitness" of a application does not equate to performance improvment or degredation - is incorrect, because you are setting a inflexible rule that can be proved incorrect with: // Passing values to a function. mov rax, [offset FunctionArg2] mov eax, [offset FunctionArg1] // I want to pass two values to a function. mov rax, ecx shl rax, 32 mov eax, edx Just because a register is considered a whole object does not mean you can not manipulate it to store more information while keeping the information seperate. People, think oh a 64 bit register... hmm I will never store a value over a 32 bit limit oh well no use for the extra 32 bits.. =
Thats why you have 32 bit instructions for pointer access? I mean duh? Huh? http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html Another rigid rule that makes ever who reads your post will think that 64 bit is not better, or did you just word it wrong? wtf You don't use a 64 bit pointer if you do not need to.. = Now, I bet alot of people are jumping around going oh no, you are wrong! OS calls are using 64bit pointers, and and the heap may even allocate things out of reach by a 32bit pointer.. I mean this is all trival fixable things. Writting a simple heap arithogram takes long a little time and effort.. =
I suspect you have also realized that Rosetta@Home is also a primarly data processing application too right? Data storage inside the processor, makes a big difference.
And, this was exactly why I tried to argue about just setting a compiler flag in another post of mine. =| And by god, this is not the same problem as it was 15 years ago! |
19)
Message boards :
Number crunching :
64-Bit Rosetta?
(Message 19022)
Posted 20 Jun 2006 by Leonard Kevin Mcguire Jr. Post: You know I still can not understand how everyone rants and raves, about why not make a 64bit version. Yet the only reply is can we really use quad word registers, and over 4 gigabytes of memory. Its almost like there is a stigma associated with a 64bit processor, only being good for having 32 extra bits for processing power? Does anyone know, or has anyone who did know forgotton that a 64 chip has 8 more general purposes registers. This means almost all functions calls that use integral data types can make their calls with out placing arguments onto the stack. It also means fewer RAM reads/writes, and any value in a register has almost instant access for read/write by the processor. A 32bit processor can provide 32 bytes at most, using all its general purpose registers. A 64bit processor can provide 64 bytes with no loss of flexibility due to register usage, plus the original 8 registers that also provide 64 bytes of total space. Thats 32 bytes vs 128 bytes, thats four times larger, while making irelavant the fact that Rosetta@Home only uses single percision 32 bit floating point values, and less than 2 gigabytes of memory.- who cares? That has to be a performance gain. I know there has been lots of discussions regarding this, and even the fact that some say there is no performance gain. I have even read that a application got slower? Who did the benchmarking, and I want to see the source code because something was not done correctly! It should at least equal the 32 bit version, but never fall to half the speed even if the port does not take advantage of any 64 bit features! Thats just plain and simple logic.. Someone is talking alot of bull. I mean at least post some references to some technical information about why it was slower. If you do not know why you do not know if it is being done correctly(porting). I checked ralph@home, and did a quick search of the site using google, and this site and only found a small remark about releasing the source code and that was it? |
20)
Message boards :
Number crunching :
Is this possible?
(Message 19020)
Posted 20 Jun 2006 by Leonard Kevin Mcguire Jr. Post:
You did not state if you needed "host" to actually handle BOINC's protocol, only that you needed to channel "internet traffic" through one machine witch I considered TCP/IP as in "internet traffic". You could use a SOCKS proxy, which BOINC has support for internaly. Look in the menu item advanced, then the sub-item options. Click the SOCKS tab on the dialog. You can find quite a few software packages that provide a SOCKS proxy, one one computer which would be the one you want the internet traffic routed through. Also, configuring a SOCKS daemon is very easy just ensure by testing on a friends computer that someone outside of your private network can not utilize the daemon therefore presenting a security problem. |
©2024 University of Washington
https://www.bakerlab.org