New FERMI GPU, 4x more cores, more memory

Message boards : Number crunching : New FERMI GPU, 4x more cores, more memory

To post messages, you must log in.

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 63583 - Posted: 3 Oct 2009, 16:28:12 UTC

Nvidia has now announced a new architecture with on-chip L1 and L2 cache memory to support more memory intensive applications.
FERMI


Fermi makes GPU and CPU co-processing pervasive by addressing the full-spectrum of computing applications. Designed for C++ and available with a Visual Studio development environment, it makes parallel programming easier and accelerates performance on a wider array of applications than ever before – including dramatic performance acceleration in ray tracing, physics, finite element analysis, high-precision scientific computing, sparse linear algebra, sorting, and search algorithms.


Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 63583 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 63584 - Posted: 3 Oct 2009, 16:49:40 UTC

Here's a nice article talking about why protein folding is an important and challenging field of study and how some are using GPUs for such atomic modeling.

Probing Biomolecular Machines with Graphics Processors.
(be sure to click the settings item on the menu bar to format to a readable size)
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 63584 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zpm

Send message
Joined: 21 Mar 09
Posts: 6
Credit: 349,801
RAC: 0
Message 63587 - Posted: 4 Oct 2009, 2:50:46 UTC - in response to Message 63584.  
Last modified: 4 Oct 2009, 2:51:33 UTC

Here's a nice article talking about why protein folding is an important and challenging field of study and how some are using GPUs for such atomic modeling.

Probing Biomolecular Machines with Graphics Processors.
(be sure to click the settings item on the menu bar to format to a readable size)


Drugdiscovery@home is in the progress of trying to get gpu's ati and nvidia working and hopefully Multi-thread like aqua@home has, but it's a slow work in progress with 1 man doing 95% of the work, ageless is working on the ati app....

http://boinc.drugdiscoveryathome.com

if your interested in helping us, pm me for invite code.
ID: 63587 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 63588 - Posted: 4 Oct 2009, 4:00:36 UTC

Ahhhh....

and to think I'm having trouble understanding linear algebra in college :S
ID: 63588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 63589 - Posted: 4 Oct 2009, 4:13:08 UTC - in response to Message 63588.  

Ahhhh....

and to think I'm having trouble understanding linear algebra in college :S



congratulations to you!
ID: 63589 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zpm

Send message
Joined: 21 Mar 09
Posts: 6
Credit: 349,801
RAC: 0
Message 63590 - Posted: 4 Oct 2009, 6:23:48 UTC - in response to Message 63588.  
Last modified: 4 Oct 2009, 6:23:58 UTC

Ahhhh....

and to think I'm having trouble understanding linear algebra in college :S


college algebra is tough, maybe you and i should hook up and see what/where we are... my quarter just began.

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
http://boinc.drugdiscoveryathome.com
ID: 63590 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 14,016,882
RAC: 3,717
Message 63598 - Posted: 4 Oct 2009, 20:53:19 UTC - in response to Message 63583.  

Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU?


Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program.

It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least.
ID: 63598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 63604 - Posted: 5 Oct 2009, 0:35:06 UTC - in response to Message 63598.  

Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU?


Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program.

It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least.


Wouldn't writing to RAM be useful?
ID: 63604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 118,102,747
RAC: 33,700
Message 63612 - Posted: 5 Oct 2009, 15:46:08 UTC

I believe the bakerlab guys had to do some rewriting of rosetta to get it working efficiently on bulewaters (or was it a different machine?), which I would assume meant getting it to run on a single task in parallel rather than having each CPU running a separate simulation (otherwise it wouldn't have made use of the fact that the CPUs could communicate with each other, which I believe was the whole point of using a supercomputer rather than BOINC).

If i'm right (long odds!) then I'd guess that'd be the way to go for GPGPU as well as you only need one copy of the protein in RAM then (rather than one per core). I can't begin to imagine where you'd start with getting the cores to work on the same task together though.
ID: 63612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 14,016,882
RAC: 3,717
Message 63616 - Posted: 5 Oct 2009, 18:12:46 UTC - in response to Message 63604.  
Last modified: 5 Oct 2009, 19:01:16 UTC

Would the 768K L2 cache be sufficient to put a dent in the Rosetta memory requirements to run on a GPU?


Not much of a dent, since minirosetta currently requires about 500MB total memory to run on just one processor - a few hundred times as much. If you plan to use all the cores in order to get the maximum speedup, multiply 500MB by the number of cores to get the approximate amount of memory needed on the GPU card without a major, and therefore rather slow, rewrite of the program.

It should be easier to start with a version that runs minirosetta on only as many cores as enough memory is available for, though, with one more core used to combine the various data streams. Much less of a speedup, but still some. Also, it looks to me like the compilers to allow the use of languages such as C++ and Fortran to prepare the computer code are likely to only prepare it for cards with the new GT300 series of GPU chips, and not the GPU cards sold in the past. Something to ask Nvidia about, at least.


Wouldn't writing to RAM be useful?


Yes, but just how useful is being able to write to an extra less than one percent of the amount of memory needed to run minirosetta on one processor without a major rewrite of the program?

The total memory I referred to IS RAM, with a significant slowdown if swapping to the hard drive is used instead. Reaching the hard drive typically takes over a hundred times as long as reaching the RAM.

Sharing sections of the database that contain the same values regardless of which GPU core uses them is a good second step, although I'd assume that's a rather small fraction of the total amount of memory each processor needs to use.

I believe that the Milkyway@home project has found a way to get some GPU acceleration without a major rewrite of a program with high memory per processor requirements - just don't try to get the maximum speedup by using all the GPU cores. Instead, use only as many as there is enough graphics board memory for. Much less speedup, but allows getting at least some with much less work for the project team.

Some people might consider rewriting a computer program in a different computer language, even without rearranging its database, to be a major rewrite. However, even though this is required with current versions of the compiler software, it should be less of a major rewrite than both rewriting the program in a different computer language and making a drastic rearrangement of the database at the same time.

The new compilers Nvidia is planning to release in the next few weeks should reduce the amount of effort needed for such a rewrite; but Nvidia hasn't made it very clear whether those new compilers will work with the chips they've sold in the past as well as the new GT300, or only the new GT300 series chips.
ID: 63616 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,846,787
RAC: 3,070
Message 63631 - Posted: 8 Oct 2009, 9:13:37 UTC - in response to Message 63616.  

The new compilers Nvidia is planning to release in the next few weeks should reduce the amount of effort needed for such a rewrite; but Nvidia hasn't made it very clear whether those new compilers will work with the chips they've sold in the past as well as the new GT300, or only the new GT300 series chips.


May be time to upgrade?! Actually it may be time to have multiple versions available depending on which version of gpu a user has. That of course may even lead to different types of units being made available depending on the gpu. High level gpu's can crunch all units, lower level ones can only crunch some units. In short keep what they have and add to it, not replace it. Yes that could be a whole lot more work down the road support wise, but should be a bit easier in the short term. And then as time goes and more and better gpu's become available and more popular, the 400's?, then the pre 300 ones could be dropped off. Each project kind of does this already although they do it with the cpu and type of OS, ie Mac, Linux, Windows etc.

Microsoft has always said that keeping Windows backwards compatible has always been the sticking point to making Windows all that it can be. Keep making units like they do now, just make the new version and new units just for it. Maybe even make the new units better and more detailed as far as the research end goes, to take advantage of the new cards capabilities.
ID: 63631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : New FERMI GPU, 4x more cores, more memory



©2024 University of Washington
https://www.bakerlab.org