GPU use... again

Author	Message
Dragokatzov Send message Joined: 5 Oct 05 Posts: 25 Credit: 2,446,376 RAC: 0	Message 69683 - Posted: 21 Feb 2011, 8:55:05 UTC Last modified: 21 Feb 2011, 9:11:26 UTC I know there are many, many threads on this subject. And to many of us here (well, at least some), its a dream. as i understand it, to make Rosetta work would require a lot of reprogramming, and even then, it can't fit the whole database into the ram that most GPU's have available. this is the issue, how can it be overcome. Swapping is pointless. My soldering are not good enough to get my Radeon 5450 up to 2 gigs of ram (haha), and i don't think anyone else here would be willing to give it a shot. so that leaves us with buying a GPU with 2GB+ of ram ($$$),re working the Rosetta application, or saying "Hello!" to GPUgrid. While GPUgrid is a great project, and I have great respect for their efforts, there are still a lot of loyal Rosetta users out there... I looked on Tiger Direct, and they did have a few GPUs that would fit the bill, having 2GB, 4GB, and 6GB of ram. They are pretty pricey, but if you look at the cash other people dumb into their crunching rigs, and farms, that objection might not be so big. So again, redesigning Rosetta, or apocalyptic bank robbing. My question to the Rosetta @home team, besides re working Rosetta, is there anything else that would be holding you back from creating a GPU client? I remember back in the says of SETI Classic. we all thought wouldn't it be cool to use the GPU to compute as well with the state-of-the-art AMD Athlon 1700+ and the Pentium 4 2.5Ghz cpu's were running. At that time I don't think we even really knew just how powerful the GPU was. Now SETI has over 730TeraFlops and Folding @home, 6.2Petaflops... Why can't Rosetta have that same kind of power? Victory is the ONLY option! ID: 69683 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1897 Credit: 12,714,292 RAC: 1,953	Message 69684 - Posted: 21 Feb 2011, 10:14:53 UTC - in response to Message 69683. I know there are many, many threads on this subject. And to many of us here (well, at least some), its a dream. as i understand it, to make Rosetta work would require a lot of reprogramming, and even then, it can't fit the whole database into the ram that most GPU's have available. this is the issue, how can it be overcome. Swapping is pointless. My soldering are not good enough to get my Radeon 5450 up to 2 gigs of ram (haha), and i don't think anyone else here would be willing to give it a shot. so that leaves us with buying a GPU with 2GB+ of ram ($$$),re working the Rosetta application, or saying "Hello!" to GPUgrid. While GPUgrid is a great project, and I have great respect for their efforts, there are still a lot of loyal Rosetta users out there... I looked on Tiger Direct, and they did have a few GPUs that would fit the bill, having 2GB, 4GB, and 6GB of ram. They are pretty pricey, but if you look at the cash other people dumb into their crunching rigs, and farms, that objection might not be so big. So again, redesigning Rosetta, or apocalyptic bank robbing. My question to the Rosetta @home team, besides re working Rosetta, is there anything else that would be holding you back from creating a GPU client? I remember back in the says of SETI Classic. we all thought wouldn't it be cool to use the GPU to compute as well with the state-of-the-art AMD Athlon 1700+ and the Pentium 4 2.5Ghz cpu's were running. At that time I don't think we even really knew just how powerful the GPU was. Now SETI has over 730TeraFlops and Folding @home, 6.2Petaflops... Why can't Rosetta have that same kind of power? Probably because to redo the program to make it fit into a gpu that all of about a dozen people might buy is far from any kind of cost effective to spend the limited resources a project has! There are over half a dozen gpu running projects for you to chose from, crunch for them for now and maybe Rosie will make one when it becomes cost effective and fits their idea of crunching. Rosie has A TON of people willing to use their cpu's to crunch, they are not losing people, it makes no economic sense to change what is working, and should continue to work for the foreseeable future. ID: 69684 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69685 - Posted: 21 Feb 2011, 17:10:29 UTC I think there are a few problems, but I might be wrong and I'd be interested to be corrected if that's the case. Here's my understanding: 1. Rosetta is huge and would require a lot of work to convert it for GPU use. 2. It would then mean maintaining and updating two versions in parallel. 3. It's a relatively new area with competing languages, so it wasn't/isn't obvious whether to use CUDA or OpenCL (or Stream/Brook+?). I think the resources for it would be available to make a significant contribution though - there are lots of dedicated crunchers who would upgrade their GPU for the purpose, plus if it were able to run on integrated GPUs (which I know are considerably slower), then their numbers are increasing all the time. I don't know whether they can access sufficient memory or whether there's a limit on it? I'm thinking AMD's fusion and Intel's integrated offerings... ID: 69685 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 69686 - Posted: 21 Feb 2011, 17:53:07 UTC My question to the Rosetta @home team, besides re working Rosetta, is there anything else that would be holding you back from creating a GPU client? If it were just a matter of porting source code and recompiling each time a new version is released, I'm sure they would do it. They are doing it now for whichever platform is not their development platform to support Mac, Linux and Windows. So, yes both the first-time rework, and ongoing rework are major hurdles. I personally have not read about such massive memories available on GPUs. Assuming they are not limited in use (as to data vs. program, or partitioned out to specific cores vs. shared), then perhaps the memory constraint has now been relieved with sufficient hardware investment. Can you link to any articles about successful ports of high-memory applications like R@h to these new high-memory GPUs? Or to architecture diagrams that outline the usage of the memory? Or to any sales figures to get an idea of how many of these things are out in the wild? I don't mean to raise hopes. The manual conversion effort would be huge, and would take away from efforts to develop new algorithms. But it would be nice to understand if viable hardware now exists to support such an application. That would at least be the first step along a long path. Rosetta Moderator: Mod.Sense ID: 69686 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69688 - Posted: 21 Feb 2011, 20:51:57 UTC GPUGrid has a CUDA app. I don't have a quick enough GPU so I've never tried it although I was tempted to buy one for GPUGrid recently, and I might still, finances permitting. Here's a post from last week on ATI support which highlights the problem with the current lack of dependable standards on the forum there: Given the expected ETA's of the next two APP SDK's and some knowledge of the researchers timetable, I don't expect another sustained attempt until around Aug. If the drivers are not stable for a range of cards then by the time you debug/find work arounds, the next APP SDK will be out, making your efforts worthless. ATI even changed the name from SDK to APP SDK. It's down to ATI to have a useable APP SDK from which to work from. This thread shows the GPUs that are recommended (i.e. quick enough and with enough memory) for GPUGrid: http://www.gpugrid.net/forum_thread.php?id=867... I know a version of Rosetta was used on a supercomputer (bluegene?) a year or two back. I assume that that version must have been able to run massively parallel, assuming it was working on a single protein (being able to work in parallel is the only advantage a supercomputer has over RaH AFAICS) - is that parallelisation applicable to running on GPUs too? ID: 69688 · Rating: 0 · rate: / Reply Quote

Dirk Broer Send message Joined: 16 Nov 05 Posts: 22 Credit: 3,872,079 RAC: 666	Message 69694 - Posted: 23 Feb 2011, 0:21:19 UTC - in response to Message 69684. Last modified: 23 Feb 2011, 0:42:31 UTC My soldering are not good enough to get my Radeon 5450 up to 2 gigs of ram (haha), and i don't think anyone else here would be willing to give it a shot. A HD 5450 is, sorry to say, a pretty crappy card to begin with. I think you should have a poll and ask what cards people working at Rosetta have in their rigs (perhaps you do not even need to given the information the projecty already has). Me myself I have three rigs with GPUs, one with a GTX 260, a HD 4770 and a HD 3850 and I can not even keep up with those who are advancing in the ranks. My next GPU will be either a HD 5850 or a GTX 560 and as much RAM on it as budget permits. Projects like GPUGrid, PrimeGrid, MilkyWay, Collatz, Einstein, DNETC, Seti, etc. are using the GPU with very much effect and the majority of the people working on those projects use pretty advanced GPUs (I see them passing me in the scores) and 2 Gb of memory (or more) will pretty soon be standard on high-end GPUs. ID: 69694 · Rating: 0 · rate: / Reply Quote

Dragokatzov Send message Joined: 5 Oct 05 Posts: 25 Credit: 2,446,376 RAC: 0	Message 69695 - Posted: 23 Feb 2011, 6:25:22 UTC - in response to Message 69694. asides from the few negative comments here and the one person who felt it was crucial to this thread to point out my video card sucks when I already know it sucks, this thread has turned out more interesting then I thought. yes, i know my radeon 5450 sucks. thank you very much for pointing that out. this has nothing to do with the issue i brought up why make a gpu client? why not? why put a man on the moon? why climb mount everest? why not... you get the picture. yes i admit, it is a big undertaking at the current time, but heck, so was the moon landing. do you folks see my point? Victory is the ONLY option! ID: 69695 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1897 Credit: 12,714,292 RAC: 1,953	Message 69698 - Posted: 23 Feb 2011, 13:22:18 UTC - in response to Message 69695. asides from the few negative comments here and the one person who felt it was crucial to this thread to point out my video card sucks when I already know it sucks, this thread has turned out more interesting then I thought. yes, i know my radeon 5450 sucks. thank you very much for pointing that out. this has nothing to do with the issue i brought up why make a gpu client? why not? why put a man on the moon? why climb mount everest? why not... you get the picture. yes i admit, it is a big undertaking at the current time, but heck, so was the moon landing. do you folks see my point? But we did not go to the moon to 'prove a point' we went to the moon so we could eventually go further, budget cuts nixed that idea though! To make a gpu app just to prove you can is like you sending me a hundred thousand dollars just because you can and I am asking you too! You will not and I am still waiting! A project must spend its money wisely or it will run out, to put money into something that they already KNOW will not benefit them, they really did do some research into it already, is not cost effective and just plain silly! ID: 69698 · Rating: 0 · rate: / Reply Quote

Dirk Broer Send message Joined: 16 Nov 05 Posts: 22 Credit: 3,872,079 RAC: 666	Message 69700 - Posted: 23 Feb 2011, 22:36:34 UTC - in response to Message 69698. A project must spend its money wisely or it will run out, to put money into something that they already KNOW will not benefit them, they really did do some research into it already, is not cost effective and just plain silly! Do you guys ever discuss things with other projects? Have you seen the increase in output of the projects using GPU applications? I was doing around 2k of credit points a day on the various projects using only CPUs. Now, with 3 GPUs -not the strongest by any means- I am doing 100-200k a day. That is a hunderdfold increase in productivity. If you want to go on ingnoring the strengths of the new generation of GPUs that's up to you, but you are certainly not on the way of making much progress within your alloted budget. ID: 69700 · Rating: 0 · rate: / Reply Quote

Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0	Message 69702 - Posted: 24 Feb 2011, 1:46:06 UTC - in response to Message 69700. Do you guys ever discuss things with other projects? Have you seen the increase in output of the projects using GPU applications? I was doing around 2k of credit points a day on the various projects using only CPUs. Now, with 3 GPUs -not the strongest by any means- I am doing 100-200k a day. That is a hunderdfold increase in productivity. If you want to go on ingnoring the strengths of the new generation of GPUs that's up to you, but you are certainly not on the way of making much progress within your alloted budget. As you are mentioning other projects, here is an interesting quote from Poem@home on their development of GPU processing: 5.) What are the problems in regards to P++ GPU development? Are the dev tools ok already? The nice thing about GPU development is: You don't need special dev tools (or at least not so many). One of the reasons why the gpu development takes so much time is that some energy terms simply don't want to be parallelized. Even if of 6 energy terms 5 run 20 times faster, you won't realize a big difference if energy term 6 still takes as long as before. GPU cores cannot really talk with each other. In regards to the app distribution there are some nice developments recently: From one hand ATI seems to bundle the OpenCL SDK with Catalyst nowadays and on the other hand some BOINC folks spend time in supporting OpenCL. As they say there, some types of work don't port easily to GPUs. You can see a massive speed up with 90% of your calculations in a task, but if you can't get that last 10% converted properly you can still end up taking just as long as CPU processing. Another issue, as I understand it, is that all the GPUs I have heard of have dedicated memory for each "core". While the board may boast 2GB of memory, if the board has 10 GPU "cores" then they get 200 MB each and won't share. Rosetta needs much more than 200 MB for its current research. ID: 69702 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1897 Credit: 12,714,292 RAC: 1,953	Message 69705 - Posted: 24 Feb 2011, 11:15:40 UTC - in response to Message 69700. A project must spend its money wisely or it will run out, to put money into something that they already KNOW will not benefit them, they really did do some research into it already, is not cost effective and just plain silly! Do you guys ever discuss things with other projects? Have you seen the increase in output of the projects using GPU applications? I was doing around 2k of credit points a day on the various projects using only CPUs. Now, with 3 GPUs -not the strongest by any means- I am doing 100-200k a day. That is a hunderdfold increase in productivity. If you want to go on ingnoring the strengths of the new generation of GPUs that's up to you, but you are certainly not on the way of making much progress within your alloted budget. I do not work for any project, I just crunch but I have heard of gpu crunching for a while now and actually participate in it myself, as you can see by my signature below. ID: 69705 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69707 - Posted: 25 Feb 2011, 10:11:59 UTC - in response to Message 69700. Do you guys ever discuss things with other projects? Have you seen the increase in output of the projects using GPU applications? I was doing around 2k of credit points a day on the various projects using only CPUs. Now, with 3 GPUs -not the strongest by any means- I am doing 100-200k a day. That is a hunderdfold increase in productivity. Cross-project parity is a nice but completely flawed idea (certainly in its present guise) - some projects will get immense speedups with GPUs (evidently), some will end up slower, and many will be somewhere in-between. I'm all for GPU crunching at Rosetta - I'll buy fast GPUs even if it just hit beta - but only if it's possible and practical and the experts say it isn't at the moment. ID: 69707 · Rating: 0 · rate: / Reply Quote

Dan Send message Joined: 6 Feb 06 Posts: 1 Credit: 5,882,550 RAC: 0	Message 69720 - Posted: 1 Mar 2011, 22:13:02 UTC - in response to Message 69686. My question to the Rosetta @home team, besides re working Rosetta, is there anything else that would be holding you back from creating a GPU client? If it were just a matter of porting source code and recompiling each time a new version is released, I'm sure they would do it. They are doing it now for whichever platform is not their development platform to support Mac, Linux and Windows. So, yes both the first-time rework, and ongoing rework are major hurdles. I personally have not read about such massive memories available on GPUs. Assuming they are not limited in use (as to data vs. program, or partitioned out to specific cores vs. shared), then perhaps the memory constraint has now been relieved with sufficient hardware investment. Can you link to any articles about successful ports of high-memory applications like R@h to these new high-memory GPUs? Or to architecture diagrams that outline the usage of the memory? Or to any sales figures to get an idea of how many of these things are out in the wild? I don't mean to raise hopes. The manual conversion effort would be huge, and would take away from efforts to develop new algorithms. But it would be nice to understand if viable hardware now exists to support such an application. That would at least be the first step along a long path. There are several nVidia Quadros that have high ram levels, up to 6 GB as in the Quadro 6000 workstation card. Quadro 6000 I don't think these are very prevalent in the crunching community though as they are for very high end 3d graphics support. Dan ID: 69720 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69722 - Posted: 2 Mar 2011, 8:03:03 UTC Realistically, as far as I can tell, Rosetta would have to run on a single task with all GPU compute cores working in parallel on it, otherwise it would need in the region of 80*512MB for a low end card and many times that for a high end one to have one task per compute core/stream processor or whatever the marketing dept choose to call them. If Rosetta could work in parallel on a single task, then 512MB is enough for small existing work units, on the assumption that there aren't additional overheads for running on the GPU. A few years ago I believe they had some time available on a supercomputer (Roadrunner or Bluegene?) and I think they had to parallelise the code to run on that (otherwise what's the advantage over running on BOINC?), so running a single model on many cores might already be possible... ID: 69722 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2096 Credit: 12,309,122 RAC: 7,337	Message 69723 - Posted: 3 Mar 2011, 15:47:57 UTC - in response to Message 69694. Projects like GPUGrid, PrimeGrid, MilkyWay, Collatz, Einstein, DNETC, Seti, etc. are using the GPU with very much effect Also Poem@home(with opencl) and Docking@home(with cuda) are working on gpu client. ID: 69723 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 69727 - Posted: 4 Mar 2011, 22:22:14 UTC - in response to Message 69722. A few years ago I believe they had some time available on a supercomputer (Roadrunner or Bluegene?) and I think they had to parallelise the code to run on that (otherwise what's the advantage over running on BOINC?), so running a single model on many cores might already be possible... And those supercomputers have how much RAM per CPU? And than compare that to what each stream processor has. . ID: 69727 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69731 - Posted: 4 Mar 2011, 23:32:04 UTC - in response to Message 69727. A few years ago I believe they had some time available on a supercomputer (Roadrunner or Bluegene?) and I think they had to parallelise the code to run on that (otherwise what's the advantage over running on BOINC?), so running a single model on many cores might already be possible... And those supercomputers have how much RAM per CPU? And than compare that to what each stream processor has. I would assume they'd have a few GB per core, but not sure what your point is? My point was that if they were running 5000 tasks on 5000 cores then it's no better than running on BOINC, so I presume they ran one task on all of the cores in parallel, which is what would be needed on a GPU. ID: 69731 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 69735 - Posted: 5 Mar 2011, 9:26:24 UTC - in response to Message 69731. The point is, that one task on all of the cores in parallel needs for sure a lot of RAM and I'm pretty sure that they wouldn't run it on a supercomputer if it could be done on few of our GPUs. . ID: 69735 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 11,698	Message 69738 - Posted: 5 Mar 2011, 10:50:21 UTC - in response to Message 69735. The point is, that one task on all of the cores in parallel needs for sure a lot of RAM and I'm pretty sure that they wouldn't run it on a supercomputer if it could be done on few of our GPUs. Not sure I follow; if it's running all cores in parallel on the same model then the model only needs to be held in memory once (on a gpu at least, on a supercomputer I imagine the networking overheads mean that they actually store a copy locally for each blade in ram?) If you're running different tasks on each core then you need enough ram for one model per core... ID: 69738 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 69749 - Posted: 6 Mar 2011, 22:28:28 UTC Last modified: 6 Mar 2011, 22:30:01 UTC It's actually already written in the posts above. That all cores work on the same thing does not change the fact, that each core still needs an own thread. And each thread some additional memory for itself, as it should do something different than the threads on other cores. And while on a supercomputer each core has a GB or whatever for its own use, one GPU has 2GB for... what have the actual top models, 1000 cores? Secondly, as I said above I can't imagine, they were running the same WUs on that supercomputer as we get them, but for sure something far more complex. And more complex things can usually be better done in paralell, but on the other hand need more RAM. So I don't think there is any point in comparing supercomputer with a GPU, that are two different worlds. If it was possible to replace a supercomputer with few GPUs, they were all shut down by now. So just because something runs on a supercomputer, it's not necessarily running on a GPU. . ID: 69749 · Rating: 0 · rate: / Reply Quote