Accerate protein structure comparison with GPU

Author	Message
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2100 Credit: 12,313,716 RAC: 4,821	Message 73826 - Posted: 13 Sep 2012, 8:50:32 UTC Last modified: 13 Sep 2012, 8:50:53 UTC The RMSD after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops which can obscure local regions of similarity. A more sophisticated measure of structure similarity, TM-score, avoids these problems, and is one of the measures used by the community wide experiments of Critical Assessment of protein Structure Prediction (CASP) to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. GPU ID: 73826 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2100 Credit: 12,313,716 RAC: 4,821	Message 73830 - Posted: 13 Sep 2012, 19:29:05 UTC - in response to Message 73829. does this open the way for a hybrid CPU/GPU app? Mmmm, i don't know. The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card... By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory! ID: 73830 · Rating: 0 · rate: / Reply Quote

Matt Send message Joined: 7 Sep 10 Posts: 8 Credit: 1,257,256 RAC: 2	Message 74041 - Posted: 18 Oct 2012, 4:06:54 UTC - in response to Message 73830. does this open the way for a hybrid CPU/GPU app? Mmmm, i don't know. The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card... By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory! How much memory is required? My GPU has 2 gigs of memory. Most modern cards have at least 1 GB. ID: 74041 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1897 Credit: 12,714,292 RAC: 1,314	Message 74043 - Posted: 18 Oct 2012, 10:58:34 UTC - in response to Message 73830. does this open the way for a hybrid CPU/GPU app? Mmmm, i don't know. The admins say that the principal problem of rosetta gpu wu is a memory problem of gpu card... By the way, if the problem is memory, why not use AMD APU?? Gpu and cpu use same memory! Actually I believe you are somewhat correct...they both use system ram to load a unit but from there it changes. A cpu unit gets loaded into system ram and runs from there, while a gpu units gets loaded in the gpu cards memory and runs on there. Then as each unit finishes it again uses system ram to unload the completed unit and load the next unit into memory. The advantage to using a gpu is that its memory can be MUCH faster than a cpu's memory, BUT to offset that speed it is VERY particular in the things it can do well. Add in the fact that MOST gpu's are used for gaming and not crunching, the gpu makers software is NOT designed for us crunchers. And YES you MUST use the gpu makers software to use a gpu for crunching, the Windows or Linux drivers are just enough to show the pretty pictures on the screen and are not NEARLY good enough to allow crunching. ID: 74043 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,177,117 RAC: 7,872	Message 74045 - Posted: 18 Oct 2012, 12:55:23 UTC Hi Matt You might have 1 or 2GB on your GPU but it's shared between hundreds of GPU cores, so unless the software can work in parallel then it's still a small amount of RAM per core compared to your CPU. I'm not sure whether current APUs use a unified address space... but I believe that helps with porting programs like Rosetta to take advantage of the GPU part of APUs. My understanding is that it's the compilers that need to automatically be able to identify suitable bits of code to then run on those GPUs. As the Rosetta program seems to be essentially a huge software suite capable of different protein modelling tasks, I wouldn't be surprised if some bits of Rosetta could be GPU accelerated, but I would expect that it would only be on APUs rather than discrete cards because if it's just certain sections of code that are being pushed out to the GPU then there's a big overhead for transferring to a discrete GPU and back because they don't share the same memory. It's all speculation though - if there's a potential speed-up available and the benefit outweighs the cost then I'm sure they'll work on it. Danny ID: 74045 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2100 Credit: 12,313,716 RAC: 4,821	Message 74058 - Posted: 20 Oct 2012, 9:15:04 UTC - in response to Message 74045. I wouldn't be surprised if some bits of Rosetta could be GPU accelerated, but I would expect that it would only be on APUs rather than discrete cards because if it's just certain sections of code that are being pushed out to the GPU then there's a big overhead for transferring to a discrete GPU and back because they don't share the same memory. It's all speculation though - if there's a potential speed-up available and the benefit outweighs the cost then I'm sure they'll work on it. Danny WCG has released recently a gpu version of his project HCC (the wu in cpu running in 90 minutes, now in 5!!). Every single wu starts with cpu for 15-20 seconds, pass on gpu until 99,4%, then "return" on cpu to complete. I think it's a great solution if you need cpu calculation... ID: 74058 · Rating: 0 · rate: / Reply Quote

oscark Send message Joined: 31 Oct 07 Posts: 3 Credit: 24,531,457 RAC: 135	Message 76183 - Posted: 14 Nov 2013, 21:34:52 UTC - in response to Message 74058. Last modified: 14 Nov 2013, 21:40:40 UTC memory problem ? The CUDA programming model assumes a device with a weakly-ordered memory model, that is: • The order in which a CUDA thread writes data to shared memory, global memory, page-locked host memory, or the memory of a peer device is not necessarily the order in which the data is observed being written by another CUDA or host thread; • The order in which a CUDA thread reads data from shared memory, global memory, page-locked host memory, or the memory of a peer device is not necessarily the order in which the read instructions appear in the program for instructions that are independent of each other. CUDA programming ID: 76183 · Rating: 0 · rate: / Reply Quote

oscark Send message Joined: 31 Oct 07 Posts: 3 Credit: 24,531,457 RAC: 135	Message 76184 - Posted: 14 Nov 2013, 21:38:35 UTC Floating-Point Operations per Second for the CPU and GPU: CPU and GPU ID: 76184 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2100 Credit: 12,313,716 RAC: 4,821	Message 76186 - Posted: 15 Nov 2013, 10:54:22 UTC - in response to Message 76183. CUDA programming OpenCl 2.0 Shared Virtual Memory Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices. ID: 76186 · Rating: 0 · rate: / Reply Quote

oscark Send message Joined: 31 Oct 07 Posts: 3 Credit: 24,531,457 RAC: 135	Message 76189 - Posted: 16 Nov 2013, 13:36:58 UTC When adding CUDA acceleration to existing applications, the relevant Visual Studio project files must be updated to include CUDA build customizations. For Visual Studio 2010 or 2012, this can be done using one of the following two methods: 1.Open the Visual Studio 2010 or 2012 project, right click on the project name, and select Build Customizations..., then select the CUDA Toolkit version you would like to target. 2.Alternatively, you can configure your project always to build with the most recently installed version of the CUDA Toolkit. First add a CUDA build customization to your project as above. Then, right click on the project name and select Properties. Under CUDA C/C++, select Common, and set the CUDA Toolkit Custom Dir field to $(CUDA_PATH) . Note that the $(CUDA_PATH) environment variable is set by the installer... CUDA Toolkit ID: 76189 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2100 Credit: 12,313,716 RAC: 4,821	Message 76402 - Posted: 6 Feb 2014, 15:40:44 UTC - in response to Message 76189. When adding CUDA acceleration to existing applications, the relevant Visual Studio project files must be updated to include CUDA build customizations. Or, you can use SPIR http://www.khronos.org/spir ID: 76402 · Rating: 0 · rate: / Reply Quote