Rosetta@home using AVX / AVX2 ?

Author	Message
h Send message Joined: 30 Nov 08 Posts: 1 Credit: 51,212 RAC: 0	Message 79422 - Posted: 19 Jan 2016, 7:14:30 UTC This is my first post here. The fallout opinion is that the code of rosetta can't go open, simply because there is comparison with other such software mostly proprietary so others will exploit in known ways openness of this code or other way-expose some stolen parts or just ideas which may be covered with patents not owned. Money lead the way and we are just poor volunteers. Because this launch is for free. Developers of rosseta not care about efficiency. Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already. I want to raise some thumb about the behavior of the watch dog timer in that application (3.65) No heartbeat from core client for 30 sec - exiting This message cause the Clean Energy Project 2 of world community grid to restart application and nullify time elapsed for example after 12 hours of wasting electricity. I quit from this project. It is just not fair. I must say that I am not for points and badges and other virtual goodies for Pavlov's pet but if project is inefficient just tell the people that this is it and nothing can be done. In which I doubt. Here, at least, for fair play, the elapsed time is being kept correctly. But this, in no way means that time is wasted efficiently. The volunteers processors may just produce huge mass of random numbers and not useful results. So what. Anytime you can switch to SETI and expect close encounter of third kind. ID: 79422 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79424 - Posted: 20 Jan 2016, 9:19:17 UTC - in response to Message 79422. Because this launch is for free. Developers of rosseta not care about efficiency. Thanks to threads and discussions about optimization, now i'm convinced that they haven't adequate resources (and, perhaps, the skills) to optimize it. So, yes, open source code may be a solution ID: 79424 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79611 - Posted: 24 Feb 2016, 9:10:39 UTC - in response to Message 79422. Developers of rosseta not care about efficiency. Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already. This message cause the Clean Energy Project 2 of world community grid to restart application and nullify time elapsed for example after 12 hours of wasting electricity. This is the point. Admins say that the computational power is "enough" and that they are not sure that optimizations of the code (64 bit, SSEx, etc) give advantage to project. But they are using OUR electricity and they have to use it as best as can. If rsj5 says that with simple 64 bit recompilation we have 10/15% plus, they have to consider seriously this change. I think it's a kind of respect for the volunteers. ID: 79611 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1829 Credit: 119,061,183 RAC: 16,549	Message 79613 - Posted: 24 Feb 2016, 9:52:31 UTC - in response to Message 79422. Last modified: 24 Feb 2016, 9:54:26 UTC This is my first post here. The fallout opinion is that the code of rosetta can't go open, simply because there is comparison with other such software mostly proprietary so others will exploit in known ways openness of this code or other way-expose some stolen parts or just ideas which may be covered with patents not owned. Money lead the way and we are just poor volunteers. That's not the reason - it's not open source because it is a valuable asset that is sold commercially which provides an income stream. It also probably helps with controlling the code-base as they control who can input into the software. Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already. That's because BOINC requires a 64-bit version for 64-bit platforms, so the 32-bit version is in a wrapper. ID: 79613 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 79618 - Posted: 24 Feb 2016, 14:31:14 UTC Last modified: 24 Feb 2016, 14:33:42 UTC as with the discussions in this thread: CERN Engineer Details AMD Zen Processor Confirming 32 Core Implementation, SMT https://boinc.bakerlab.org/forum_thread.php?id=6790 i'm thinking that cpu manufacturers are increasingly taking the 'short cuts' and simply deliver more 'cores' and 'pushing' all the hard work of performance / optimization to the software developers to use very specific and very limited processor features such as CUDA/Open CL that requires vectorised processing on very simplified cores. it used to be that the top line cpu manufacturers aim to deliver better performing CPUs (deeper and better instruction level parallelism, more intelligent out-of-order execution etc) but this stance has changed drastically to an extent that manufacturers simply build more simplified cores that provides very limited specialised functionality (e.g. vector processing) many of the higher ends ones are championing 'special' vector processing e.g. opencl/cuda/hsa etc. these notably includes AMD and Nvidia. little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires much more effort on the part of CPU designers and manufacturers that said all those vector processing / SIMD / OpenCL / CUDA / HSA / AVX etc etc is not necessary 'more efficient' they requires huge amount of power / energy to run in particular the high end GPUs. And they simply shift the responsibility of optimization to software / application developers, while they get away selling more 'cores' ID: 79618 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79621 - Posted: 24 Feb 2016, 17:21:11 UTC - in response to Message 79618. little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires much more effort on the part of CPU designers and manufacturers.... that said all those vector processing / SIMD / OpenCL / CUDA / HSA / AVX etc etc is not necessary 'more efficient' they requires huge amount of power / energy to run in particular the high end GPUs. You think at future, with ARM cores into x86 cpu or FPGA tech into Xeon processors. But SSEx extensions exist NOW and run in modern entry-level cpu. We are not speaking high-end GPUs (we understand that it's impossible to have gpu code for rosetta and Opencl/Cuda is a "dream"), but cpus may be used at the max!! ID: 79621 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 79622 - Posted: 25 Feb 2016, 0:44:58 UTC - in response to Message 79621. Last modified: 25 Feb 2016, 1:11:54 UTC You think at future, with ARM cores into x86 cpu or FPGA tech into Xeon processors. But SSEx extensions exist NOW and run in modern entry-level cpu. We are not speaking high-end GPUs (we understand that it's impossible to have gpu code for rosetta and Opencl/Cuda is a "dream"), but cpus may be used at the max!! a recent processor on the now rather hotly discussed Intel compute stick http://www.engadget.com/2016/01/22/intel-compute-stick-2016-review/ did away with even SSEx, yup no SSE, just more cores & 64 bits http://ark.intel.com/products/87383/Intel-Atom-x5-Z8300-Processor-2M-Cache-up-to-1_84-GHz and that's a latest model available today i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs along with AMD, Nvidia and the rest, they would claim that their approach can achieve teraflops, petaflops on the gpu but only very basic highly limited functionality compute that only address very specific use cases it is useless to have 100,000 vector processors/cores if a job at hand cannot be vectorized due to various dependencies within the algorithms/codes, it can only run on 1 of those 100,000 cores or worse case it can't be run due to the limited functionality on those vector processors a simple function f(x) = f(f(x-1)) would defeat the means to parallelize it as the results depends on the output of a previous iteration. ID: 79622 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 79625 - Posted: 25 Feb 2016, 2:50:40 UTC i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs actually u don't really need to wait for that, the future is here today http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html http://spectrum.ieee.org/semiconductors/processors/what-intels-xeon-phi-coprocessor-means-for-the-future-of-supercomputing https://en.wikipedia.org/wiki/Xeon_Phi ID: 79625 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79626 - Posted: 25 Feb 2016, 8:21:03 UTC - in response to Message 79625. actually u don't really need to wait for that, the future is here today http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html Yeap, Phy it's an incredible co-processor, but i think that, if admins want to use it, they have to re-write large part of the code. I'm speaking to add support, for example, to x64 and SSEx (with SIMPLE recompilation of source) and see what happens: largely test this new app on Ralph, debug it, etc. First tests, last year, demonstrated some improvements.... ID: 79626 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79627 - Posted: 25 Feb 2016, 8:46:33 UTC - in response to Message 79622. a simple function f(x) = f(f(x-1)) would defeat the means to parallelize it as the results depends on the output of a previous iteration. We know the problems of parallelization of the code and we know that it's (almost) impossible on Rosetta. We are discussing about "little" optimization. ID: 79627 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 79835 - Posted: 2 Apr 2016, 18:10:46 UTC - in response to Message 77856. http://code.compeng.uni-frankfurt.de/projects/vc They move here: https://github.com/VcDevel/Vc ID: 79835 · Rating: 0 · rate: / Reply Quote

Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0	Message 80432 - Posted: 25 Jul 2016, 21:18:45 UTC Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I'm pretty sure everyone is pretty sick of it being brought up again and again as am I sick and tired of waiting for a simple, definite answer from the people who are calling the shots... Answer A: "We're working at it and here are the preliminary results..." Answer B: "No can do." Not that I'm thinking about leaving rosetta@home but I'm thinking about "emotional disinvestment". There is a link in the navbar that says "Community". Let's face it, there is no such thing. That's an 'A' for scientific effort and an 'F' for community work... just close the forum and set up a bug tracker. ID: 80432 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 80433 - Posted: 26 Jul 2016, 8:18:09 UTC - in response to Message 80432. Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I think that if we "see something" we see it at the end of CASP Answer A: "We're working at it and here are the preliminary results..." Answer B: "No can do." Answer C: "We don't care" ID: 80433 · Rating: 0 · rate: / Reply Quote

rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,430,740 RAC: 2,270	Message 80449 - Posted: 30 Jul 2016, 15:30:44 UTC - in response to Message 80433. Last modified: 30 Jul 2016, 15:32:50 UTC Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I think that if we "see something" we see it at the end of CASP Answer A: "We're working at it and here are the preliminary results..." Answer B: "No can do." Answer C: "We don't care" I think it is: Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home. https://d.justpo.st/media/images/2013/07/66f81a0a59d1786af2e10027746e2873.jpg They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals. ------------- If they do not stabilize the serer infrastructure, Rosetta could collapse under the weight of its own success. Then compute throughput is going to drop to zero ... regardless of how good their CASP development has been. 8-) The last time I looked at their server hardware configuration (assuming that their description was relatively current), it looked like they would have disk IO bottle necks on their server and memory size problems on their client network machines. I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference. -------------- As to AVX/AVX2? David hooked me to 2 developers. June 13th: Developer "F": commenting on my recommendation for homogeneous coordinates ... "Storing 3d cartesian coordinates as homogenous coordinates is well established practice. For example, Eigen::Geometry using homogenous coordinates in geometric expressions to support SIMD parallelism." "Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I'd be opposed to changes that broadly affect the codebase outside of the numeric namespace, it would be much better to arrive at a solution that offers a simple typedef to replace xyxVector<Real> that offers a SIMD-compatible implementation." ---- I gave them the Vtune profiles which showed the hot instruction sequences and hand modified the instruction sequences to show how they shrank when using AVX. I am looking for a C++ programmer to help me with the "template" modifications. Nothing more from Developer "F". Developer "L": after I replied to David with: "If you do find an interested developer, .... grumble, grumble, grumble, ...." Developer "L" replied ... "I am in fact very interested in vectorization and would like to chat with you about it soon; I'm currently swamped with a few deadlines and projects, but anticipate that I'll have quite a bit more free time soon." ---- I have not heard back from Developer "L" and probably need to ping him. -------------- I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes. ID: 80449 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 80450 - Posted: 30 Jul 2016, 18:43:20 UTC Awesome, rjs5. Awesome work. ID: 80450 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2050 Credit: 40,394,045 RAC: 13,160	Message 80451 - Posted: 31 Jul 2016, 3:23:27 UTC - in response to Message 80449. They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals. ------------- If they do not stabilize the server infrastructure, Rosetta could collapse under the weight of its own success. Then compute throughput is going to drop to zero... regardless of how good their CASP development has been. 8-) -------------- I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes. I'm no coder (far from it) but I've worked with a few, good and less good. A good one is worth their weight in gold. If anyone can wangle an on-site visit for a couple of days, they should commit to it. Keep plugging away. ID: 80451 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 80452 - Posted: 31 Jul 2016, 18:30:08 UTC - in response to Message 80449. Last modified: 31 Jul 2016, 19:08:36 UTC Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home. https://d.justpo.st/media/images/2013/07/66f81a0a59d1786af2e10027746e2873.jpg :-O I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference. Waiting for info about donations/crowdfounding "Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I don't understand "F". He want to see results BEFORE introducing modifications?? I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes. Family is the most important thing, i think ID: 80452 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1933 Credit: 9,071,262 RAC: 5,440	Message 80453 - Posted: 31 Jul 2016, 19:08:09 UTC - in response to Message 80450. Awesome, rjs5. Awesome work. +1 P.S. This thread was opened Oct 2014, i hope we see "something new" before 2020 :-P ID: 80453 · Rating: 0 · rate: / Reply Quote

Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0	Message 80454 - Posted: 31 Jul 2016, 20:00:45 UTC - in response to Message 80453. Awesome, rjs5. Awesome work. +1 P.S. This thread was opened Oct 2014, i hope we see "something new" before 2020 :-P OMG...Tempus fugit I could have sworn it has been only a few months. Probable cause for the delay: The "NIH syndrome" or "We have always done it that way!" ID: 80454 · Rating: 0 · rate: / Reply Quote

rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,430,740 RAC: 2,270	Message 80459 - Posted: 1 Aug 2016, 20:53:05 UTC - in response to Message 80452. Last modified: 1 Aug 2016, 20:59:01 UTC "Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I don't understand "F". He want to see results BEFORE introducing modifications?? I think that Developer "F" was talking about needing real data for a major rewrite ... "major refactoring". I think that "F" agrees with me about "homogeneous coordinates" being a sensible change. There are MANY things that can be done to significantly improve performance without a major rewrite. The first change I talked about was introducing "homogeneous coordinates". This is very nice because, it does not "really" change the "project code". You can introduce the C++ TEMPLATE typedef changes, recompile and you should get the EXACT SAME ANSWER with the new compile options. The second place where substantial improvement can be accomplished with little effort is by upgrading the server to steer optimized applications to target crunchers. Build optimized apps and target machine capabilities. 8-) ID: 80459 · Rating: 0 · rate: / Reply Quote