Accerate protein structure comparison with GPU

Message boards : Rosetta@home Science : Accerate protein structure comparison with GPU

To post messages, you must log in.

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1868
Credit: 8,259,674
RAC: 9,401
Message 73826 - Posted: 13 Sep 2012, 8:50:32 UTC
Last modified: 13 Sep 2012, 8:50:53 UTC

The RMSD after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops which can obscure local regions of similarity. A more sophisticated measure of structure similarity, TM-score, avoids these problems, and is one of the measures used by the community wide experiments of Critical Assessment of protein Structure Prediction (CASP) to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications.


GPU
ID: 73826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1868
Credit: 8,259,674
RAC: 9,401
Message 73830 - Posted: 13 Sep 2012, 19:29:05 UTC - in response to Message 73829.  

does this open the way for a hybrid CPU/GPU app?


Mmmm, i don't know.
The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card...
By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory!
ID: 73830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matt

Send message
Joined: 7 Sep 10
Posts: 8
Credit: 1,240,825
RAC: 0
Message 74041 - Posted: 18 Oct 2012, 4:06:54 UTC - in response to Message 73830.  

does this open the way for a hybrid CPU/GPU app?


Mmmm, i don't know.
The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card...
By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory!



How much memory is required? My GPU has 2 gigs of memory. Most modern cards have at least 1 GB.
ID: 74041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,773,304
RAC: 3,957
Message 74043 - Posted: 18 Oct 2012, 10:58:34 UTC - in response to Message 73830.  

does this open the way for a hybrid CPU/GPU app?


Mmmm, i don't know.
The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card...
By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory!


Actually I believe you are somewhat correct...they both use system ram to load a unit but from there it changes. A cpu unit gets loaded into system ram and runs from there, while a gpu units gets loaded in the gpu cards memory and runs on there. Then as each unit finishes it again uses system ram to unload the completed unit and load the next unit into memory. The advantage to using a gpu is that its memory can be MUCH faster than a cpu's memory, BUT to offset that speed it is VERY particular in the things it can do well. Add in the fact that MOST gpu's are used for gaming and not crunching, the gpu makers software is NOT designed for us crunchers. And YES you MUST use the gpu makers software to use a gpu for crunching, the Windows or Linux drivers are just enough to show the pretty pictures on the screen and are not NEARLY good enough to allow crunching.
ID: 74043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 116,393,926
RAC: 71,810
Message 74045 - Posted: 18 Oct 2012, 12:55:23 UTC

Hi Matt

You might have 1 or 2GB on your GPU but it's shared between hundreds of GPU cores, so unless the software can work in parallel then it's still a small amount of RAM per core compared to your CPU.

I'm not sure whether current APUs use a unified address space... but I believe that helps with porting programs like Rosetta to take advantage of the GPU part of APUs.

My understanding is that it's the compilers that need to automatically be able to identify suitable bits of code to then run on those GPUs. As the Rosetta program seems to be essentially a huge software suite capable of different protein modelling tasks, I wouldn't be surprised if some bits of Rosetta could be GPU accelerated, but I would expect that it would only be on APUs rather than discrete cards because if it's just certain sections of code that are being pushed out to the GPU then there's a big overhead for transferring to a discrete GPU and back because they don't share the same memory.

It's all speculation though - if there's a potential speed-up available and the benefit outweighs the cost then I'm sure they'll work on it.

Danny
ID: 74045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1868
Credit: 8,259,674
RAC: 9,401
Message 74058 - Posted: 20 Oct 2012, 9:15:04 UTC - in response to Message 74045.  

I wouldn't be surprised if some bits of Rosetta could be GPU accelerated, but I would expect that it would only be on APUs rather than discrete cards because if it's just certain sections of code that are being pushed out to the GPU then there's a big overhead for transferring to a discrete GPU and back because they don't share the same memory.

It's all speculation though - if there's a potential speed-up available and the benefit outweighs the cost then I'm sure they'll work on it.

Danny


WCG has released recently a gpu version of his project HCC (the wu in cpu running in 90 minutes, now in 5!!). Every single wu starts with cpu for 15-20 seconds, pass on gpu until 99,4%, then "return" on cpu to complete.
I think it's a great solution if you need cpu calculation...
ID: 74058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oscark

Send message
Joined: 31 Oct 07
Posts: 3
Credit: 22,448,339
RAC: 11,174
Message 76183 - Posted: 14 Nov 2013, 21:34:52 UTC - in response to Message 74058.  
Last modified: 14 Nov 2013, 21:40:40 UTC

memory problem ?

The CUDA programming model assumes a device with a weakly-ordered memory model, that is:
• The order in which a CUDA thread writes data to shared memory, global memory, page-locked host memory, or the memory of a peer device is not necessarily the order in which the data is observed being written by another CUDA or host thread;
• The order in which a CUDA thread reads data from shared memory, global memory, page-locked host memory, or the memory of a peer device is not necessarily the order in which the read instructions appear in the program for instructions that are independent of each other.

CUDA programming
ID: 76183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oscark

Send message
Joined: 31 Oct 07
Posts: 3
Credit: 22,448,339
RAC: 11,174
Message 76184 - Posted: 14 Nov 2013, 21:38:35 UTC

Floating-Point Operations per Second for the CPU and GPU:


CPU and GPU
ID: 76184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1868
Credit: 8,259,674
RAC: 9,401
Message 76186 - Posted: 15 Nov 2013, 10:54:22 UTC - in response to Message 76183.  

CUDA programming


OpenCl 2.0

Shared Virtual Memory
Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.


ID: 76186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oscark

Send message
Joined: 31 Oct 07
Posts: 3
Credit: 22,448,339
RAC: 11,174
Message 76189 - Posted: 16 Nov 2013, 13:36:58 UTC

When adding CUDA acceleration to existing applications, the relevant Visual Studio project files must be updated to include CUDA build customizations. For Visual Studio 2010 or 2012, this can be done using one of the following two methods:

1.Open the Visual Studio 2010 or 2012 project, right click on the project name, and select Build Customizations..., then select the CUDA Toolkit version you would like to target.

2.Alternatively, you can configure your project always to build with the most recently installed version of the CUDA Toolkit. First add a CUDA build customization to your project as above. Then, right click on the project name and select Properties. Under CUDA C/C++, select Common, and set the CUDA Toolkit Custom Dir field to $(CUDA_PATH) . Note that the $(CUDA_PATH) environment variable is set by the installer...

CUDA Toolkit
ID: 76189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1868
Credit: 8,259,674
RAC: 9,401
Message 76402 - Posted: 6 Feb 2014, 15:40:44 UTC - in response to Message 76189.  

When adding CUDA acceleration to existing applications, the relevant Visual Studio project files must be updated to include CUDA build customizations.


Or, you can use SPIR
http://www.khronos.org/spir

ID: 76402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Accerate protein structure comparison with GPU



©2024 University of Washington
https://www.bakerlab.org