Message boards : Number crunching : 300+ TeraFLOPS sustained!
Author | Message |
---|---|
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Looks like a big boost in CE participation has pushed Rosetta@Home well over the 300 TeraFLOP mark. Wondering if this has anyone at Baker lab thinking up any new experiments to run that may be more viable now than in the past or this little boost is still orders of magnitude away from being a game changer just yet? I know things aren't that simplistic, and real progress likely comes from evolution of the algorithms behind the models, but I'm sure there are thresholds where new things become possible.. Maybe its not at 320TeraFLOP/S though, maybe its at 300 ExaFLOP/S Still interesting to ponder over. Progress for the win! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,500,896 RAC: 12,649 |
Looks like a big boost in CE participation has pushed Rosetta@Home well over the 300 TeraFLOP mark. After all discussions about gpu/cpu optimization/etc, i think they are not so interested in additional computational power. |
ssoxcub@yahoo.com Send message Joined: 8 Jan 12 Posts: 17 Credit: 503,947 RAC: 0 |
I think they should constantly improve the code as folding@home does. From personal experience a nvidia 760 gets about 80,000, while after they improved the amd code, a r9 390 pulls down 300,000 points a day. Not sure if they could ever use a amd processor because of its math deficits. But another thought is, you can get a older cpu that would hold its own, say 8 years old, which is an extremely long time, but a 8 year old gpu would be outclassed x100 or even a x1000. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
would be fun an eye popper if rosetta@home reaches the petaflops benchmark, lets keep it up :) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
But another thought is, you can get a older cpu that would hold its own, say 8 years old, which is an extremely long time, but a 8 year old gpu would be outclassed x100 or even a x1000. Nvidia keeps improving CUDA, and supposedly making it easier to use. Maybe by the time Volta comes out, it would be worthwhile for Baker Labs to hire some smart grad student to look into it. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,500,896 RAC: 12,649 |
Nvidia keeps improving CUDA, and supposedly making it easier to use. Maybe by the time Volta comes out, it would be worthwhile for Baker Labs to hire some smart grad student to look into it. On the other side of the moon, Kronos Group, AMD, Altera, Intel and others keep improving OpenCl and supposedly making it easier to use. May be the time Vega comes out, it would be worthwhile for Baker Labs to hire some smart grad student to look into it. :-) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
On the other side of the moon, Kronos Group, AMD, Altera, Intel and others keep improving OpenCl and supposedly making it easier to use. May be the time Vega comes out, it would be worthwhile for Baker Labs to hire some smart grad student to look into it. I am happy to go either way, assuming AMD is still in business. |
Emigdio Lopez Laburu Send message Joined: 25 Feb 06 Posts: 61 Credit: 40,240,061 RAC: 0 |
Looks like a big boost in CE participation has pushed Rosetta@Home well over the 300 TeraFLOP mark. Why you say that??? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,500,896 RAC: 12,649 |
After all discussions about gpu/cpu optimization/etc, i think they are not so interested in additional computational power. Despite some very interesting preliminary tests, seems that they abandon the optimization scope. Please, read the discussions here and on Ralph's forum (here, for example) - Only one admin partecipate (Dekim) - This admin does not work very hard on optimizations (he has other things to do) - He says that optimizations are not so important, "precision" of simulation is more important than speed. - The optimization are commit to one volunteer (Rsj5), who works on code when he has free time. So, i'm not so optimist |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,961,703 RAC: 17,440 |
After all discussions about gpu/cpu optimization/etc, i think they are not so interested in additional computational power. Be more optimistic ... and as patient as you can. 8-) Not all is bad. I have thought about updating status several times, but I thought that it might be more appropriate for those on the project (dekim) to disclose plans/status. He can delete this message if I am off base ... since I did not ask. There is another lab student working on incorporating my findings into their production environment. They are busy but I have been feeding them measurements and configuration files. To summarize, I built 50+ binaries with selected option combinations and expected (as I had said before) about 20% improvement. I generally measured a 20% to 40% improvement and dekim said they had confirmed those numbers internally. I also said that it would require the compiler to auto-vectorize the code to go faster than 20%. The original source code, I think, was written in Fortran, and translated to C++. Ugh! Dekim indicated that they have built and deployed a test binary based on my suggestions on Ralph. I don't know which one he is talking about but v3.73 was released about the right time. He also indicated they have introduced an optimized binary into their local production clusters ... whatever that is. They are seeing more than 2x-4x improvement on one of their design protocols executing on that cluster. I will be interested in learning why the dramatic impact. They are being careful, because this involves changing compilers and options. They are also in the middle of a big change ... notice the size of the database increased from 180mb to 270mb ... 8-) I have set my Rosetta preferences to run 24 hour jobs so I can see when (if) a better binary is introduced. The easiest way to detect a changed binary is to run with the longer CPU target times and observe those that finish before the target 86,400 second CPU time stick out. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have set my Rosetta preferences to run 24 hour jobs so I can see when (if) a better binary is introduced. The easiest way to detect a changed binary is to run with the longer CPU target times and observe those that finish before the target 86,400 second CPU time stick out. I always run 24 hours on six cores of my i7-4790 (Win7 64-bit), and have seen several short work units since 2 April, when I started working on 3.73. 24 hour tasks You have done something very right it seems. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
@rjs5, Thanks for the update. Wanted to point out that 24hr work units, running more efficiently will simply produce more models in as close to 24hrs as they can. So, you won't notice them completing 20-40% sooner. Each time a new model is begun, a check is made to estimate whether it will complete before the runtime preference set in the user's settings. I believe the estimate is just based on time taken to complete prior models on the same task. So if model 23 completes after 23.5hrs of CPU, then the task is ended and returned. If model 23 completes after 22.5hrs, then a 24th model begins. Rosetta Moderator: Mod.Sense |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Thanks for the update. Wanted to point out that 24hr work units, running more efficiently will simply produce more models in as close to 24hrs as they can. I think I see what you are saying. You put as many apples of various sizes in the box without overflowing. However, I have seen several tasks that run under 10,000 seconds on the above (and three other) machines in only two days. I think that is very rare, and after checking it is only on the 3.73 tasks. Also, if they are that short, you would think there would be plenty of room to fit another model in. So it seems that something is making the run times shorter than before, and preventing another model from being run. Maybe there is a limit on the total number of models? |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Thanks for the update. Wanted to point out that 24hr work units, running more efficiently will simply produce more models in as close to 24hrs as they can. It depends on the type of tasks. I'll just copy and paste what I wrote earlier and perhaps Mod.sense or DEK can correct and/or add detail as necessary: If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.) Best, Snags |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,961,703 RAC: 17,440 |
Thanks for the update. Wanted to point out that 24hr work units, running more efficiently will simply produce more models in as close to 24hrs as they can. The 3.73 jobs hit my machines about 10am 3/31. My SkyLake 6700k with Win10 machine is only taking 86,400 seconds on tasks that run multiple structures. It looks like there are several ways of running jobs and possibly source of the confusion. ALL "24 hour" jobs that finished early. CPU time (sec) -- Task ID 9,231 -- 806473699 9,474 -- 806473717 10,616 -- 802461224 10,736 -- 806473700 11,629 -- 802461073 19,048 -- 802461280 19,727 -- 802461293 25,353 -- 802461165 28,458 -- 802461288 31,028 -- 806739396 31,109 -- 806739333 31,152 -- 806739395 32,629 -- 806739285 32,775 -- 806739281 32,788 -- 806739332 32,897 -- 806739284 74,202 -- 802461278 86,645 -- 802461299 <<< multiple structures tj_3_15_dimer_X_ZC16v1_DHR54_l3_h22_l3_v11_0_v1b_fragments_abinitio_SAVE_ALL_OUT_339362_541_0 86,825 -- 802461222 <<< multiple structures tj_3_15_dimer_X_ZC16v1_DHR54_l3_h22_l3_v11_0_v1 My Haswell Extreme Win10 machine did not get any of the "tj" jobs and no job took 24 hours. CPU time (sec) -- Task ID 26,147 -- 806166140 26,527 -- 806783950 27,408 -- 806166136 28,310 -- 806166173 28,716 -- 806166175 28,779 -- 806166153 28,946 -- 806166142 29,498 -- 806166158 29,656 -- 806166144 29,826 -- 806166155 30,031 -- 806166182 30,319 -- 806166135 31,056 -- 806166156 31,949 -- 806166137 32,495 -- 806166183 33,339 -- 806166141 33,441 -- 806166181 33,823 -- 806166154 34,171 -- 806166174 35,218 -- 806166143 37,461 -- 806166157 39,680 -- 806784007 41,060 -- 806784004 41,733 -- 806784024 42,119 -- 806784008 42,459 -- 806783991 42,714 -- 806784020 43,282 -- 806784012 45,064 -- 806783066 46,650 -- 806783178 46,709 -- 806783780 47,901 -- 806783261 48,229 -- 806783240 48,300 -- 806783221 48,708 -- 806783141 49,017 -- 806783220 49,049 -- 806783231 51,220 -- 806783233 51,305 -- 806783258 54,612 -- 806784017 |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
My SkyLake 6700k with Win10 machine is only taking 86,400 seconds on tasks that run multiple structures. It looks like there are several ways of running jobs and possibly source of the confusion. Can you reach a conclusion yet? Is it clear that there are gains, or is that yet to be sorted out? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,961,703 RAC: 17,440 |
My SkyLake 6700k with Win10 machine is only taking 86,400 seconds on tasks that run multiple structures. It looks like there are several ways of running jobs and possibly source of the confusion. Looks good. There are gains ... its the "how much" that is harder to determine. Performance is always a "work in progress". That is why you have to be careful in "optimizing" something. Everyone who follows assumes the the optimizations still work. Rosetta is a moving target and the run time statistics are very difficult to extract on this side of the server. The data ages out too quickly and the task name/information is buried another level deep. In very round numbers, I think there is generally 20%-50% in compiler and option "low hanging fruit". Timing of the deployed binary is out of my control and vision. Properly written code will see a 2x-4x improvement. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,500,896 RAC: 12,649 |
I'm here since 2005, so i'm patient :-) To summarize, I built 50+ binaries with selected option combinations and expected (as I had said before) about 20% improvement. I generally measured a 20% to 40% improvement and dekim said they had confirmed those numbers internally. Not bad! The original source code, I think, was written in Fortran, and translated to C++. Ugh! Yep, i think there are still some traces of Fortran Dekim indicated that they have built and deployed a test binary based on my suggestions on Ralph. I don't know which one he is talking about but v3.73 was released about the right time. Only Dekim can answer.... |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
While we are on the subject, I am presently on Win7 64-bit. But I could go to Linux Mint 18 when it comes out. Is there an advantage? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,961,703 RAC: 17,440 |
While we are on the subject, I am presently on Win7 64-bit. But I could go to Linux Mint 18 when it comes out. Is there an advantage? 8-} hit POST instead of PREVIEW ... If you are curious, you might install a VM and then install Mint on it. You can compare the performance of the 32-bit windows binary with the 64-bit Linux version. The last time I tried this, the VM was about 10% faster. |
Message boards :
Number crunching :
300+ TeraFLOPS sustained!
©2024 University of Washington
https://www.bakerlab.org