Message boards : Number crunching : Rosetta@home using AVX / AVX2 ?
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next
Author | Message |
---|---|
h Send message Joined: 30 Nov 08 Posts: 1 Credit: 51,212 RAC: 0 |
This is my first post here. The fallout opinion is that the code of rosetta can't go open, simply because there is comparison with other such software mostly proprietary so others will exploit in known ways openness of this code or other way-expose some stolen parts or just ideas which may be covered with patents not owned. Money lead the way and we are just poor volunteers. Because this launch is for free. Developers of rosseta not care about efficiency. Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already. I want to raise some thumb about the behavior of the watch dog timer in that application (3.65) No heartbeat from core client for 30 sec - exiting This message cause the Clean Energy Project 2 of world community grid to restart application and nullify time elapsed for example after 12 hours of wasting electricity. I quit from this project. It is just not fair. I must say that I am not for points and badges and other virtual goodies for Pavlov's pet but if project is inefficient just tell the people that this is it and nothing can be done. In which I doubt. Here, at least, for fair play, the elapsed time is being kept correctly. But this, in no way means that time is wasted efficiently. The volunteers processors may just produce huge mass of random numbers and not useful results. So what. Anytime you can switch to SETI and expect close encounter of third kind. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
Because this launch is for free. Developers of rosseta not care about efficiency. Thanks to threads and discussions about optimization, now i'm convinced that they haven't adequate resources (and, perhaps, the skills) to optimize it. So, yes, open source code may be a solution |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
Developers of rosseta not care about efficiency. This is the point. Admins say that the computational power is "enough" and that they are not sure that optimizations of the code (64 bit, SSEx, etc) give advantage to project. But they are using OUR electricity and they have to use it as best as can. If rsj5 says that with simple 64 bit recompilation we have 10/15% plus, they have to consider seriously this change. I think it's a kind of respect for the volunteers. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,677,840 RAC: 9,700 |
This is my first post here. That's not the reason - it's not open source because it is a valuable asset that is sold commercially which provides an income stream. It also probably helps with controlling the code-base as they control who can input into the software.
That's because BOINC requires a 64-bit version for 64-bit platforms, so the 32-bit version is in a wrapper. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
as with the discussions in this thread: CERN Engineer Details AMD Zen Processor Confirming 32 Core Implementation, SMT https://boinc.bakerlab.org/forum_thread.php?id=6790 i'm thinking that cpu manufacturers are increasingly taking the 'short cuts' and simply deliver more 'cores' and 'pushing' all the hard work of performance / optimization to the software developers to use very specific and very limited processor features such as CUDA/Open CL that requires vectorised processing on very simplified cores. it used to be that the top line cpu manufacturers aim to deliver better performing CPUs (deeper and better instruction level parallelism, more intelligent out-of-order execution etc) but this stance has changed drastically to an extent that manufacturers simply build *more simplified cores* that provides very limited specialised functionality (e.g. vector processing) many of the higher ends ones are championing 'special' vector processing e.g. opencl/cuda/hsa etc. these notably includes AMD and Nvidia. little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires *much more* effort on the part of CPU designers and manufacturers that said all those vector processing / SIMD / OpenCL / CUDA / HSA / AVX etc etc is not necessary 'more efficient' they requires huge amount of power / energy to run in particular the high end GPUs. And they simply shift the responsibility of optimization to software / application developers, while they get away selling more 'cores' |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires *much more* effort on the part of CPU designers and manufacturers.... You think at future, with ARM cores into x86 cpu or FPGA tech into Xeon processors. But SSEx extensions exist NOW and run in modern entry-level cpu. We are not speaking high-end GPUs (we understand that it's impossible to have gpu code for rosetta and Opencl/Cuda is a "dream"), but cpus may be used at the max!! |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
a recent processor on the now rather hotly discussed Intel compute stick http://www.engadget.com/2016/01/22/intel-compute-stick-2016-review/ did away with even SSEx, yup no SSE, just more cores & 64 bits http://ark.intel.com/products/87383/Intel-Atom-x5-Z8300-Processor-2M-Cache-up-to-1_84-GHz and that's a latest model available today i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs along with AMD, Nvidia and the rest, they would claim that their approach can achieve teraflops, petaflops on the gpu but only very basic highly limited functionality compute that only address very specific use cases it is useless to have 100,000 vector processors/cores if a job at hand cannot be vectorized due to various dependencies within the algorithms/codes, it can only run on 1 of those 100,000 cores or worse case it can't be run due to the limited functionality on those vector processors a simple function f(x) = f(f(x-1)) would defeat the means to parallelize it as the results depends on the output of a previous iteration. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs actually u don't really need to wait for that, the future is here today http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html http://spectrum.ieee.org/semiconductors/processors/what-intels-xeon-phi-coprocessor-means-for-the-future-of-supercomputing https://en.wikipedia.org/wiki/Xeon_Phi |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
actually u don't really need to wait for that, the future is here today Yeap, Phy it's an incredible co-processor, but i think that, if admins want to use it, they have to re-write large part of the code. I'm speaking to add support, for example, to x64 and SSEx (with SIMPLE recompilation of source) and see what happens: largely test this new app on Ralph, debug it, etc. First tests, last year, demonstrated some improvements.... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
a simple function We know the problems of parallelization of the code and we know that it's (almost) impossible on Rosetta. We are discussing about "little" optimization. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
|
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I'm pretty sure everyone is pretty sick of it being brought up again and again as am I sick and tired of waiting for a simple, definite answer from the people who are calling the shots... Answer A: "We're working at it and here are the preliminary results..." Answer B: "No can do." Not that I'm thinking about leaving rosetta@home but I'm thinking about "emotional disinvestment". There is a link in the navbar that says "Community". Let's face it, there is no such thing. That's an 'A' for scientific effort and an 'F' for community work... just close the forum and set up a bug tracker. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I think that if we "see something" we see it at the end of CASP Answer A: "We're working at it and here are the preliminary results..." Answer C: "We don't care" |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,055,337 RAC: 4,311 |
Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges? I think it is: Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home. https://d.justpo.st/media/images/2013/07/66f81a0a59d1786af2e10027746e2873.jpg They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals. ------------- If they do not stabilize the serer infrastructure, Rosetta could collapse under the weight of its own success. Then compute throughput is going to drop to zero ... regardless of how good their CASP development has been. 8-) The last time I looked at their server hardware configuration (assuming that their description was relatively current), it looked like they would have disk IO bottle necks on their server and memory size problems on their client network machines. I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference. -------------- As to AVX/AVX2? David hooked me to 2 developers. June 13th: Developer "F": commenting on my recommendation for homogeneous coordinates ... "Storing 3d cartesian coordinates as homogenous coordinates is well established practice. For example, Eigen::Geometry using homogenous coordinates in geometric expressions to support SIMD parallelism." "Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I'd be opposed to changes that broadly affect the codebase outside of the numeric namespace, it would be much better to arrive at a solution that offers a simple typedef to replace xyxVector<Real> that offers a SIMD-compatible implementation." ---- I gave them the Vtune profiles which showed the hot instruction sequences and hand modified the instruction sequences to show how they shrank when using AVX. I am looking for a C++ programmer to help me with the "template" modifications. Nothing more from Developer "F". Developer "L": after I replied to David with: "If you do find an interested developer, .... grumble, grumble, grumble, ...." Developer "L" replied ... "I am in fact very interested in vectorization and would like to chat with you about it soon; I'm currently swamped with a few deadlines and projects, but anticipate that I'll have quite a bit more free time soon." ---- I have not heard back from Developer "L" and probably need to ping him. -------------- I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2126 Credit: 41,262,745 RAC: 8,354 |
They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals. I'm no coder (far from it) but I've worked with a few, good and less good. A good one is worth their weight in gold. If anyone can wangle an on-site visit for a couple of days, they should commit to it. Keep plugging away. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home. :-O I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference. Waiting for info about donations/crowdfounding "Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I don't understand "F". He want to see results BEFORE introducing modifications?? I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes. Family is the most important thing, i think |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1995 Credit: 9,648,924 RAC: 6,997 |
Awesome, rjs5. Awesome work. +1 P.S. This thread was opened Oct 2014, i hope we see "something new" before 2020 :-P |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Awesome, rjs5. Awesome work. OMG...Tempus fugit I could have sworn it has been only a few months. Probable cause for the delay: The "NIH syndrome" or "We have always done it that way!" |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,055,337 RAC: 4,311 |
"Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I think that Developer "F" was talking about needing real data for a major rewrite ... "major refactoring". I think that "F" agrees with me about "homogeneous coordinates" being a sensible change. There are MANY things that can be done to significantly improve performance without a major rewrite. The first change I talked about was introducing "homogeneous coordinates". This is very nice because, it does not "really" change the "project code". You can introduce the C++ TEMPLATE typedef changes, recompile and you should get the EXACT SAME ANSWER with the new compile options. The second place where substantial improvement can be accomplished with little effort is by upgrading the server to steer optimized applications to target crunchers. Build optimized apps and target machine capabilities. 8-) |
Message boards :
Number crunching :
Rosetta@home using AVX / AVX2 ?
©2024 University of Washington
https://www.bakerlab.org