300+ TeraFLOPS sustained!

Author	Message
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 79935 - Posted: 25 Apr 2016, 21:11:11 UTC - in response to Message 79901. Last modified: 25 Apr 2016, 21:11:36 UTC No, host RAC suffers the same fate as the project TFlops (which is essentially the aggregation of all of the host RACs). This is part of why rjs5 has been saying it is difficult to estimate how a given improvement might actually effect overall efficiency. It's sorta like a factory that makes candy bars. And the same number of bars come off the line each day, only now several varieties are slightly larger than they used to be. But that's really hard to tell because on different days, the production consists of differing ratios of any of the ten kinds of candy bars the factory makes. Optimizations will be a great thing, and will certainly help the science behind the project, but you won't see a blip on a chart. In fact, part of what rjs5 has been asking for is a computational run, that can be repeated, to MAKE a chart, so there is something to compare to after making some changes. But even then all you can say is that one sample set, for a specific type of candy bar, is now showing an x% increase in average size. But it is possible that the change has adverse side effects on the other candy bar types, or on system environments that differ from that used for the measurements. The answer to any performance question is always "it depends". But I thought the RAC was determined largely by the NUMBER of models that the WU did during execution. Then so, if on average, this number increases because of optimizations and what not... and with all things equal... the RAC should increase (?) ID: 79935 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 79940 - Posted: 26 Apr 2016, 13:58:05 UTC RAC comes from history of granted credit. Granted credit is based on number of models completed times the credit claimed for the average model of that task series. Picture two hosts, one ("A") has a BOINC benchmark of 100 and the other ("B") at 200. "A" completes 10 models and claims 600 credits (took him 6 hours), "B" reports from the same series of work that it completed 30 models. "B" will be granted 1800 credits, and whatever "B" claims for the benchmark rating and time taken to run the 30 models will drop in to the running average with "A"'s report. See, it doesn't matter how long "B" took to run the work. That only effects the "claimed" credit, not the "granted" credit. The claims of all hosts are averaged together as they report and used to calculate the granted credit. So, assuming any optimization would improve all hosts by the same degree (say 15%), the number of models completed per unit time increases 15%. But the credit claims that drive the credit system on R@h are based on time consumed and the host's benchmark rating (i.e. the claimed credit). So, "A" now reports in 11 models and a claim of ~510 credits (in slightly more than five hours), and "B" is granted ~1530 credits, and each took slightly less crunch time to report 10% more models completed. So this is what I meant about how the only variance you will notice is based on how the benchmark varies from the actual crunching work. The two hosts will have the same benchmark after any optimizations. But, perhaps host "B" has a larger L2 cache and the optimizations work better in that environment and bring a 25% improvement to "B" whereas "A" only sees a 15% improvement. But again, the difference is hard to discern, and shows you a skew between the benchmark and the actual work more than anything else. If you assume all of the hosts run the same number of hours per day as prior to the optimizations, the total credit claimed, and granted, per day will be the same as it was before. But all hosts complete more models per day. The science needs the models. The more, the better. Faster turn-around time, also a great thing. So optimization will help the research, but not reflect as a blip on a chart. Rosetta Moderator: Mod.Sense ID: 79940 · Rating: 0 · rate: / Reply Quote

shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0	Message 79956 - Posted: 28 Apr 2016, 6:18:50 UTC - in response to Message 79940. RAC comes from history of granted credit. Granted credit is based on number of models completed times the credit claimed for the average model of that task series. Picture two hosts, one ("A") has a BOINC benchmark of 100 and the other ("B") at 200. "A" completes 10 models and claims 600 credits (took him 6 hours), "B" reports from the same series of work that it completed 30 models. "B" will be granted 1800 credits, and whatever "B" claims for the benchmark rating and time taken to run the 30 models will drop in to the running average with "A"'s report. See, it doesn't matter how long "B" took to run the work. That only effects the "claimed" credit, not the "granted" credit. The claims of all hosts are averaged together as they report and used to calculate the granted credit. So, assuming any optimization would improve all hosts by the same degree (say 15%), the number of models completed per unit time increases 15%. But the credit claims that drive the credit system on R@h are based on time consumed and the host's benchmark rating (i.e. the claimed credit). So, "A" now reports in 11 models and a claim of ~510 credits (in slightly more than five hours), and "B" is granted ~1530 credits, and each took slightly less crunch time to report 10% more models completed. So this is what I meant about how the only variance you will notice is based on how the benchmark varies from the actual crunching work. The two hosts will have the same benchmark after any optimizations. But, perhaps host "B" has a larger L2 cache and the optimizations work better in that environment and bring a 25% improvement to "B" whereas "A" only sees a 15% improvement. But again, the difference is hard to discern, and shows you a skew between the benchmark and the actual work more than anything else. If you assume all of the hosts run the same number of hours per day as prior to the optimizations, the total credit claimed, and granted, per day will be the same as it was before. But all hosts complete more models per day. The science needs the models. The more, the better. Faster turn-around time, also a great thing. So optimization will help the research, but not reflect as a blip on a chart. Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh? Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems. (Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 \| Speech) ID: 79956 · Rating: 0 · rate: / Reply Quote

Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,662,635 RAC: 0	Message 79959 - Posted: 28 Apr 2016, 12:33:29 UTC - in response to Message 79956. Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh? Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems. (Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.) Speak for yourself, I couldn't care less about the 'credit' etc. it's a nice-to-have measurement of contribution but in the end I do this for the science. Period. If I could 'donate' my credit to people who care so much about it I'd gladly go with zero credit to stop everyone else complaining. lol. ID: 79959 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 79960 - Posted: 28 Apr 2016, 14:52:10 UTC Last modified: 29 Apr 2016, 12:59:35 UTC Some machines will typically be granted less credit than claimed, other machines will swing the other way. Basically being granted less credit than claimed is an indication that your machine does not run actual R@h work as much better as it runs the BOINC benchmark. In other words, comparing your machine to some hypothetical one, your machine reports 2x better on the BOINC benchmark, but is only producing 1.8x (pick number less than 2 there) more R@h results per unit time. Tasks that fail are granted credit within 24hrs, and when this occurs, the credit granted is only visible from the task details. This change to BOINC was done specifically to reflect that the project appreciates the effort, and the fact that the science benefits from learning about and working through the failure. Rosetta Moderator: Mod.Sense ID: 79960 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 6	Message 79963 - Posted: 29 Apr 2016, 2:06:33 UTC - in response to Message 79960. Basically being granted less credit than claimed is an indication that your machine does not run actual R@h work as much better as it runs the BOINC benchmark. Ha! I like this. So true as well. I'm wondering whether to run something intensive while doing a new CPU benchmark so that I can get granted credits higher than claimed. Weirdly, my phone gets great credits while my overclocked desktop is always down. ID: 79963 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 80002 - Posted: 3 May 2016, 6:19:49 UTC - in response to Message 79959. Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh? Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems. (Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.) Speak for yourself, I couldn't care less about the 'credit' etc. it's a nice-to-have measurement of contribution but in the end I do this for the science. Period. If I could 'donate' my credit to people who care so much about it I'd gladly go with zero credit to stop everyone else complaining. lol. Same here. I just like how 'credits' measure your performance. ID: 80002 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 80042 - Posted: 8 May 2016, 4:24:55 UTC It's been a while since we last heard of these optimizations. Any news? rjs5? :D ID: 80042 · Rating: 0 · rate: / Reply Quote

rjs5 Send message Joined: 22 Nov 10 Posts: 274 Credit: 23,730,845 RAC: 0	Message 80054 - Posted: 9 May 2016, 14:19:57 UTC - in response to Message 80042. It's been a while since we last heard of these optimizations. Any news? rjs5? :D Well ... nothing really worthwhile. I haven't touched base with David in a couple weeks. Using the ICC compiler and turning off the aggressive inline and unroll options that create a large code footprint seems to give 40%+ improvement. In the 3.73 thread, I outlined the impact of running Primegrid in parallel with Rosetta. Primegrid uses the gimp library which prefetches a large array of data into the caches and kicks all the data in the data cache out. It caused a 3x slowdown in Rosetta and shows how big the code and data footprint is. You have to use -mtune=generic -march=core2 to maintain a common binary that runs on all machines. You can get some benefit from adding the "-ax<target>" option that generates a FAT binary with both generic and "<target>" code for SSE4.2 or AVX. I have been looking higher up in the program source code and running some experiments to see if I can coerce the program to auto-vectorize (where the compiler is able to use PACKED SSE/AVX instructions instead of scalar) without much luck. - removing call statements from the inner loops - breaking complex loops up into sequential simple loops - possibly using HUGE data pages I will probably build a couple more test cases to exercise some of the other protocols. Interesting but going slow. ID: 80054 · Rating: 0 · rate: / Reply Quote

Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0	Message 80055 - Posted: 9 May 2016, 15:21:49 UTC - in response to Message 80054. Interesting but going slow. Nonetheless, your efforts are highly appreciated! ID: 80055 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80056 - Posted: 9 May 2016, 19:01:06 UTC - in response to Message 80055. Interesting but going slow. Nonetheless, your efforts are highly appreciated! Very much so. Thanks rjs5. Even if you find no silver bullet, it is a much better position that prior to your research. Rosetta Moderator: Mod.Sense ID: 80056 · Rating: 0 · rate: / Reply Quote

Mark Send message Joined: 10 Nov 13 Posts: 40 Credit: 397,847 RAC: 0	Message 80061 - Posted: 10 May 2016, 11:07:11 UTC - in response to Message 80054. Interesting but going slow. I would also like to add my thanks to you for your efforts. It appears you could make a real impact far beyond running your own crunching ID: 80061 · Rating: 0 · rate: / Reply Quote

Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,662,635 RAC: 0	Message 80076 - Posted: 11 May 2016, 13:46:55 UTC - in response to Message 80061. Interesting but going slow. I would also like to add my thanks to you for your efforts. It appears you could make a real impact far beyond running your own crunching Same here, awesome to see someone in the community really step up and get involved. In other news, this: TeraFLOPS estimate: 600.836 .. if this keeps up, R@H could become a petaFLOPS cluster.. of x86 power at that.. that is incredibly remarkable! ID: 80076 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 80186 - Posted: 16 Jun 2016, 11:51:47 UTC Last modified: 16 Jun 2016, 11:59:05 UTC 400+ Tflops sustained 24x7, rosetta@home can possibly rank with top 500 supercomputer networks :o :p lol ID: 80186 · Rating: 0 · rate: / Reply Quote

Computing for Humanity (Account) Send message Joined: 8 Jan 16 Posts: 2 Credit: 492,998,458 RAC: 12	Message 80252 - Posted: 24 Jun 2016, 0:01:51 UTC 441st Place ID: 80252 · Rating: 0 · rate: / Reply Quote