When will cuda support be ready ?

Author	Message
Dirk Broer Send message Joined: 16 Nov 05 Posts: 22 Credit: 3,907,449 RAC: 287	Message 70319 - Posted: 11 May 2011, 15:40:47 UTC - in response to Message 66677. Last modified: 11 May 2011, 15:51:42 UTC Over 95% of Boinc users are cpu only users, that only leaves 5% of us that use our gpu's to crunch with. Are you refering to the 316,738 people that once registered with Rosetta, or the 39,735 that are presently active? (source: nl.boincstats.com) Or are you refering to the 2,186,474 that once registered with BOINC vs the 322,644 that are presently active? (source: nl.boincstats.com) Considering the CP-GPU systems are quite recent, a percentage of 5% from 316,738 (or from 2,186,474) suddenly increases seven- to tenfold IF all those systems are still active. In which case we are suddenly discussing a whopping 35 or even 50% of the active systems. CAN you quote the percentages for the 39,735 and/or the 322,644? ID: 70319 · Rating: 0 · rate: / Reply Quote

sparkler99 Send message Joined: 13 Mar 11 Posts: 7 Credit: 5,469 RAC: 0	Message 70322 - Posted: 12 May 2011, 0:31:22 UTC - in response to Message 70317. I personally could care less about using a graphics card for... graphics. Personally, I only care about its ability to crunch for science, as a gp-gpu. I don't know the answer for certain, but suspect it is "No.": Can "ATI's leading card" handle double precision floating point as well as Fermi? Still couldn't beat ATI's leading card. Although, I don't know whether it's easier to code for an Nvidia video card or an ATI one. Actually the ATI/AMD leading cards of the last years, the HD 5970 the last years and now the HD 6990 can run rings around the best nVidia card when it comes to double precision. The GTX 295, nVidia's leading card of days gone by, could do 1788 GFlops single precision against a very meagre 149 GFlop double precision. The GTX 590, nVidia's present leading card, does 2486 GFlops singkle precision against a still meagre 311 GFLop double precision. The Quadro 6000 -nVidia's less-limited leading professional card- gives 1030 Gflops single precision, but deliveres a modest 515 GFLop double precision. And ATI/AMD? The HD 5970 deliveres 4640 GFlop single precision and does 928 GFlop double precision. The HD 6990 deliveres 5099 GFlop single precision and does 1275 GFlop double precision. I do know the answer for certain, knowing it is "YES": Can "ATI's leading card" handle double precision floating point as well as Fermi? wrong the hd5970/6990 will only deliver 928/1275 GFLOP's double precision in a lab amd's cards also have buggy drivers the only good thing goes for amd is the fact that nvidia cripples there consumer cards if they didn't amd would be screwed ID: 70322 · Rating: 0 · rate: / Reply Quote

Geoff Send message Joined: 16 Feb 07 Posts: 5 Credit: 1,499,399 RAC: 0	Message 70323 - Posted: 12 May 2011, 4:32:09 UTC - in response to Message 70322. wrong the hd5970/6990 will only deliver 928/1275 GFLOP's double precision in a lab amd's cards also have buggy drivers the only good thing goes for amd is the fact that nvidia cripples there consumer cards if they didn't amd would be screwed Why? I've been using AMD products since the 1990's. Why should I believe you that AMD (it's in CAPS, BTW) would be... how did you so eloquently put it.. "screwed"? <rolls eyes> Let me know when they die so I can throw away all my AMD machines. ID: 70323 · Rating: 0 · rate: / Reply Quote

sparkler99 Send message Joined: 13 Mar 11 Posts: 7 Credit: 5,469 RAC: 0	Message 70324 - Posted: 12 May 2011, 9:24:14 UTC amd would be screwed as there would be nothing going for them nvidia have decent drivers that are widely supported amd drivers until recently suffered from driver resets and 99% bug and apparently messed up opencl when using multiple cards in the 11.4 update that and the fact that there recent processors are suffered from problems eg phenom and phenom II especially with the 4 core processors with a faulty 3rd core ID: 70324 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 61	Message 70325 - Posted: 12 May 2011, 10:13:36 UTC Please don't have an AMD vs Nvidia argument on here... They both generally make great products and we'd all be screwed without either of them. Neither of them will run Rosetta at the moment so it's a pointless discussion unless you can help with a port to OpenCL or CUDA? ID: 70325 · Rating: 0 · rate: / Reply Quote

Geoff Send message Joined: 16 Feb 07 Posts: 5 Credit: 1,499,399 RAC: 0	Message 70390 - Posted: 25 May 2011, 5:07:59 UTC No worries. I'll leave Hardware to the Hardware guru's. All I know is my AMD machines work without any issues under Linux, and that's all that counts for me. I've been crunching for years now, and hope in some small way that has helped. cheers! ID: 70390 · Rating: 0 · rate: / Reply Quote

Orgil Send message Joined: 11 Dec 05 Posts: 82 Credit: 169,751 RAC: 0	Message 70399 - Posted: 27 May 2011, 0:58:47 UTC Last modified: 27 May 2011, 0:59:33 UTC My guess is it is all matter algorithm that run the R@H client. If the project team figure out the algorithm for CPU+GPU like Seti@ or Einstein@ the R@H should run on both but for some reason that algorithm is still top secret somewhere in a CIA laptop. ;) ID: 70399 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 70400 - Posted: 27 May 2011, 6:51:32 UTC - in response to Message 66658. I believe the answer is that is has been looked at and the current spec of GPU's does not make it worthwhile. In short there is no advantage to using a GPU due to its limitations. I am sure when those limitations are surpassed the GPU will again be considered as a resource to be exploited. The devil is always in the details and the reason we have a cpu and a gpu is because they each do things differently. The cpu can be very precise while the gpu does not have to be, while Scientific Research MUST be very precise at times. I used my GTX 295 on the FAH Project at Stanford for a while, and it greatly outperformed my two CPUs. All four processors would be running simultaneously, and the video card would finish its work units much more rapidly than the two CPUs. Perhaps they were smaller WUs, but my scores started to increase at a much higher rate, so the contribution must have been significant. If GPUs can be made useful and valuable by FAH at Stanford, why can't it be made usable and valuable by Rosetta? Aren't both projects studying proteins? deesy I can't talk about the GPU port of FAH, but as it happens, I worked in Sony's R&D department at the time a team there ported FAH to run on the Cell processor in the PS3. Their first run at it was simply dividing up the workload among the SPU's, "Throwing a compiler switch" (as others have put it) and trying it out. The PC version when ported in this manner to the SPU's, ran at about ONE TENTH the speed it did on a PC. The team sat down, looked at what was going on, and realized that the PC version was a complete waste of time on the SPU's. The reason for this was that, very briefly, FAH computes forces between adjacent atoms, and did a huge amount of filtering to remove computations if the atoms were too far apart. This filtering simply crippled the program when running on the SPU's. So the team took a rather simpler approach. They looked at the workload, and decided to compute the force for every pair of atoms, filtering be damned. They then very carefully figured out how to divide this workload up between the SPUs to maximize the throughput they could get, and thus was created the version you now see running on PS3's. Which is about ten times FASTER than the PC version was at that time. Overall this rearchitecture of the program resulted in about a hundredfold throughput of the program on the Cell, but trying to run the PS3 version on a PC would probably have caused a massive slowdown - remember how the PC version filters to be able to get things done in a reasonable timeframe? Do you now understand why this is not a trivial problem? Some projects (Seti for example) lend themselves very well to SPU / GPU architectures because AFAIK, Seti isn't doing much more than a boat load of FFT's on the data, an operation that lends itself extremely well to a GPU type system. RAH is a whole different can of worms, I don't know for certain, but I'd guess it's somewhat similar to FAH, meaning that without a total rearchitect and rewrite of the program, it's going to run so slowly on a GPU as to be worthless. Please people, don't assume this is easy - my comments above about FAH should illustrate that it's anything but easy. If you can find a skilled team of GPU engineers with three or four months to burn, then by all means, have a go at porting the problem. Otherwise don't tell people that it should be easy, because you really don't know what you're talking about. ID: 70400 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 61	Message 70404 - Posted: 27 May 2011, 9:26:19 UTC - in response to Message 70400. I believe the answer is that is has been looked at and the current spec of GPU's does not make it worthwhile. In short there is no advantage to using a GPU due to its limitations. I am sure when those limitations are surpassed the GPU will again be considered as a resource to be exploited. The devil is always in the details and the reason we have a cpu and a gpu is because they each do things differently. The cpu can be very precise while the gpu does not have to be, while Scientific Research MUST be very precise at times. I used my GTX 295 on the FAH Project at Stanford for a while, and it greatly outperformed my two CPUs. All four processors would be running simultaneously, and the video card would finish its work units much more rapidly than the two CPUs. Perhaps they were smaller WUs, but my scores started to increase at a much higher rate, so the contribution must have been significant. If GPUs can be made useful and valuable by FAH at Stanford, why can't it be made usable and valuable by Rosetta? Aren't both projects studying proteins? deesy cheers Deesy - that's very useful insight! Maybe we should have a word with Dr A over at Seti and suggest we'll give him our GPUs in exchange for the SETI CPUs! ;) I can't talk about the GPU port of FAH, but as it happens, I worked in Sony's R&D department at the time a team there ported FAH to run on the Cell processor in the PS3. Their first run at it was simply dividing up the workload among the SPU's, "Throwing a compiler switch" (as others have put it) and trying it out. The PC version when ported in this manner to the SPU's, ran at about ONE TENTH the speed it did on a PC. The team sat down, looked at what was going on, and realized that the PC version was a complete waste of time on the SPU's. The reason for this was that, very briefly, FAH computes forces between adjacent atoms, and did a huge amount of filtering to remove computations if the atoms were too far apart. This filtering simply crippled the program when running on the SPU's. So the team took a rather simpler approach. They looked at the workload, and decided to compute the force for every pair of atoms, filtering be damned. They then very carefully figured out how to divide this workload up between the SPUs to maximize the throughput they could get, and thus was created the version you now see running on PS3's. Which is about ten times FASTER than the PC version was at that time. Overall this rearchitecture of the program resulted in about a hundredfold throughput of the program on the Cell, but trying to run the PS3 version on a PC would probably have caused a massive slowdown - remember how the PC version filters to be able to get things done in a reasonable timeframe? Do you now understand why this is not a trivial problem? Some projects (Seti for example) lend themselves very well to SPU / GPU architectures because AFAIK, Seti isn't doing much more than a boat load of FFT's on the data, an operation that lends itself extremely well to a GPU type system. RAH is a whole different can of worms, I don't know for certain, but I'd guess it's somewhat similar to FAH, meaning that without a total rearchitect and rewrite of the program, it's going to run so slowly on a GPU as to be worthless. Please people, don't assume this is easy - my comments above about FAH should illustrate that it's anything but easy. If you can find a skilled team of GPU engineers with three or four months to burn, then by all means, have a go at porting the problem. Otherwise don't tell people that it should be easy, because you really don't know what you're talking about. ID: 70404 · Rating: 0 · rate: / Reply Quote

Samson Send message Joined: 23 May 11 Posts: 8 Credit: 257,870 RAC: 0	Message 70431 - Posted: 28 May 2011, 21:51:33 UTC I believe it all boils down to money. Rosetta needs a bump in PR, so corporations take more notice. No one, with half a brain, believes the "GPUs are not as good" argument. Even if a GPU has to check the data 2 or 3 times, they are still faster than CPUs. Hell, Rosetta could go as far as doing the money raising in house, via a service like : http://www.kickstarter.com/ Get some estimates for either an AMD or Nvidia client [select the most popular]. Once you get some estimates on programming, then send it over to kickstarter or a similar escrow kind of site. Let the community decide if GPU crunching is worth it : ) ID: 70431 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 70433 - Posted: 29 May 2011, 4:28:57 UTC Last modified: 29 May 2011, 4:31:08 UTC No one, with half a brain, believes the "GPUs are not as good" argument. I haven't seen anyone attempt to say GPUs "aren't as good"... simply that they are VERY good at VERY specific types of workloads. And that they are NOT any better at other types of workloads. If they are generically "good" at EVERYTHING... they why is there no MS Office GPU edition? Why can't I buy a PC without a CPU? ...and if they are not generically good at EVERYTHING, why do you presume they are good at the things required to run Rosetta? I won't say you have half a brain if you do not understand the answers to the above. ...only that you can be better informed about what GPUs are designed to do, and how the hardware requirements vary for various types of algorithms. I've noticed many of my neighbors are not driving school buses to work each day... even though they clearly are "good" (more then 10x!) at getting more people to work are compared to a car. So... is it a money problem that more are not driving school buses? If I volunteer to raise the money and GAVE you a bus... should I expect to see you using it a year later? Rosetta Moderator: Mod.Sense ID: 70433 · Rating: 0 · rate: / Reply Quote

Samson Send message Joined: 23 May 11 Posts: 8 Credit: 257,870 RAC: 0	Message 70434 - Posted: 29 May 2011, 7:12:29 UTC - in response to Message 70433. Here's the way I see it. If you are not able to see a solution, then you are the problem. [Not you specifically]. MS Office could, no doubt, be run on a GPU but why do that when OpenCL and CUDA can make all that easier. Windows 7 and 8 are running on ARM now, people said that would never happen. Apple switched to Intel [even though their PPC with vector engine was far superior [their claim], people said that would never happen either. It's all 0's and 1's [code]... no matter how you slice it. It is all about money. Everything in this world is about money. If Rosetta received a 10 million dollar grant tomorrow, I'm sure we would have a GPU client and OpenCL implementation in less than 6 months. People talk but money does the walking. Sorry if you cannot realize this. Also, your school bus analogy is Swiss Cheese as far as logic holes goes. You seem smart enough to know the old saying "Those that say it can't be done should get out of the way of those doing it". Ok, that is some derivation but you get my point. Also, what kind of moderator for Rosetta would shoot me down and not even dare to mention my idea of community funded programmers ? Who's side are you on... I've got 20 bucks that says it can be done. It can also be done to surpass CPU performance and I bet there are about 10 million Indian programmers that would be happy to do it. So why not let the community use a kickstarter type service to dictate the outcome? ID: 70434 · Rating: 0 · rate: / Reply Quote

bookwyrm Send message Joined: 3 Jan 11 Posts: 3 Credit: 1,232,986 RAC: 0	Message 70436 - Posted: 29 May 2011, 13:33:32 UTC Just saying that money can solve everything isn't really a solution. How to spend the money you have is more important. Sure, with more money they could get more scientists, faster computers and maybe a PR department. None of this would change the rosetta algorithm if it's the best that the scientists have come up with so far and that seems to be where the problem lies if I'm not mistaken. What the devs need is for the algorithm to stay intact if/when ported to the GPU. Just quickly browsing the forum, I think the devs have tried moving bits of the code to CUDA but found that it was slower than CPU. Like dgnuff says above, there are many complications in porting programs to GPU. It may not be easy and the result could be that the GPU version of rosetta could end up being slower than on a CPU. The people who know that best are the devs. Rewrites of the rosetta code have happened before. They moved the code from fortran to c++. Rewriting the code for GPUs could make the program run faster but if it does make it faster they would have done it by now. After all, this project has been around for 5+ years. ID: 70436 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 70437 - Posted: 29 May 2011, 17:14:54 UTC - in response to Message 70434. Here's the way I see it. If you are not able to see a solution, then you are the problem. [Not you specifically]. MS Office could, no doubt, be run on a GPU but why do that when OpenCL and CUDA can make all that easier. Windows 7 and 8 are running on ARM now, people said that would never happen. Apple switched to Intel [even though their PPC with vector engine was far superior [their claim], people said that would never happen either. It's all 0's and 1's [code]... no matter how you slice it. It is all about money. Everything in this world is about money. If Rosetta received a 10 million dollar grant tomorrow, I'm sure we would have a GPU client and OpenCL implementation in less than 6 months. People talk but money does the walking. Sorry if you cannot realize this. Also, your school bus analogy is Swiss Cheese as far as logic holes goes. You seem smart enough to know the old saying "Those that say it can't be done should get out of the way of those doing it". Ok, that is some derivation but you get my point. Also, what kind of moderator for Rosetta would shoot me down and not even dare to mention my idea of community funded programmers ? Who's side are you on... I've got 20 bucks that says it can be done. It can also be done to surpass CPU performance and I bet there are about 10 million Indian programmers that would be happy to do it. So why not let the community use a kickstarter type service to dictate the outcome? I don't want to go point by point and create a direct confrontation with you, but I do want to address a few of your points. Firstly, the community at kickstarter cannot possibly have enough information to make an informed decision because they do not know how the Rosetta algorithm works. I have never said that a GPU version will never happen, and I've never said it cannot be done. What I have said is that the current implementations do not lend themselves to the way Rosetta works and to it's hardware requirements. And I regret that you feel I have or am shooting you down. That is not at all my intention. I'm simply trying to inform all who feel in line with the subject line of this thread, which by saying "be ready" implies it is not only realistic with current hardware, but being worked on. The technology was reviewed by the developers and found to be inappropriate (with current hardware) for the way Rosetta runs. And so will likely be reviewed again when the current limitations are relieved by new GPU hardware. Most who post on the subject have no idea how to write a computer program, nor understand that there are cases where a given piece of hardware is not appropriate to solving a given problem. This is where the bus analogy comes it. I can easily make the arm-chair observation that a bus moves more people (i.e. GPU does more work / has more cores) then a car (or CPU). I could even argue that it does so with less energy (power requirements per FLOP). I could say that anyone that argues with these simple observations is insane. But it ignores a number of key factors that will determine whether a bus makes any sense for a given purpose. I looks only at the passenger capacity and draws a desired conclusion from the limited information. That was my point with the analogy. ...that even if you HAVE a bus, you might not use it due to other considerations (such as the number of people at a given time you have available to board the bus). So noone can disagree with you that computers are all "just zeros and ones", but that doesn't mean you can throw FLOPs and money at a given problem and arrive at your desired destination within the estimated time. So I would contend that if given $10M tomorrow, the team would probably put the money to work trying to develop a universal flu vaccine, or an HIV treatment or vaccine rather then try and reinvent 20 years of work in the hope that the non-sequential algorithm you might devise works better then the sequential Monte-Carlo approach they have now for structure prediction. You see how long the prediction calculations take is really not the current most pressing problem. The problem is to have some approach that reliably arrives at an accurate prediction. Since noone has such an algorithm at this point, the money should go in to finding the algorithm, not trying to take the current state-of-the-art algorithm (which still needs to improve to be really useful), and arrive at inaccurate predictions faster. Reminder: I am a volunteer. My comments are my own and not intended to reflect opinion of the Project Team. Rosetta Moderator: Mod.Sense ID: 70437 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1264 Credit: 14,421,737 RAC: 0	Message 70443 - Posted: 30 May 2011, 14:09:52 UTC Last modified: 30 May 2011, 14:41:06 UTC I remember from another thread that the Rosetta@Home developers found that their algorithm was so serial in nature that it could barely use more than one of the GPU cores in a GPU, and would therefore be expected to run SLOWER than on a CPU since the clock rates on graphics boards are lower than on typical CPUs. They could still run multiple copies of that algorithm in parallel, though, IF the graphics board has enough graphics memory. To estimate how worthwhile this would be, check the number of graphics boards with an amount of graphics memory that is at least a multiple of the amount now required for CPU workunits. For example, if you wanted to do 10 times as much work in somewhat more time than a CPU workunit, you'd currently need graphics boards with around 10 times as much graphics memory as the main memory required for CPU workunits. Check the prices of such graphics cards and decide how many you plan to buy. Hint: The only ones I've found so far are part of Nvidia's Tesla series. I'd expect the developers to reconsider, though, when graphics boards with that much graphics memory become available at a much lower price. Note: This does not apply to BOINC projects already using algorithms more parallel in nature, such as GPUGRID. ID: 70443 · Rating: 0 · rate: / Reply Quote

Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 68,252,598 RAC: 5	Message 70447 - Posted: 30 May 2011, 16:11:26 UTC - in response to Message 70443. Join GPU Grid. It requires very little CPU so you can continue to crunch R@H and do GPU Grid at the same time. Thx! Paul ID: 70447 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1264 Credit: 14,421,737 RAC: 0	Message 70448 - Posted: 30 May 2011, 16:29:22 UTC - in response to Message 70447. Join GPU Grid. It requires very little CPU so you can continue to crunch R@H and do GPU Grid at the same time. Note that GPUGRID does require a rather high-end GPU to meet their need for speed, though. I've had one my GPUs working mostly for GPUGRID for probably over a year. The other one is just too low-end to be suitable for them. ID: 70448 · Rating: 0 · rate: / Reply Quote