300+ TeraFLOPS sustained!

Message boards : Number crunching : 300+ TeraFLOPS sustained!

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 79935 - Posted: 25 Apr 2016, 21:11:11 UTC - in response to Message 79901.  
Last modified: 25 Apr 2016, 21:11:36 UTC

No, host RAC suffers the same fate as the project TFlops (which is essentially the aggregation of all of the host RACs). This is part of why rjs5 has been saying it is difficult to estimate how a given improvement might actually effect overall efficiency.

It's sorta like a factory that makes candy bars. And the same number of bars come off the line each day, only now several varieties are slightly larger than they used to be. But that's really hard to tell because on different days, the production consists of differing ratios of any of the ten kinds of candy bars the factory makes.

Optimizations will be a great thing, and will certainly help the science behind the project, but you won't see a blip on a chart. In fact, part of what rjs5 has been asking for is a computational run, that can be repeated, to MAKE a chart, so there is something to compare to after making some changes. But even then all you can say is that one sample set, for a specific type of candy bar, is now showing an x% increase in average size. But it is possible that the change has adverse side effects on the other candy bar types, or on system environments that differ from that used for the measurements. The answer to any performance question is always "it depends".


But I thought the RAC was determined largely by the NUMBER of models that the WU did during execution. Then so, if on average, this number increases because of optimizations and what not... and with all things equal... the RAC should increase (?)
ID: 79935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79940 - Posted: 26 Apr 2016, 13:58:05 UTC

RAC comes from history of granted credit. Granted credit is based on number of models completed times the credit claimed for the average model of that task series.

Picture two hosts, one ("A") has a BOINC benchmark of 100 and the other ("B") at 200. "A" completes 10 models and claims 600 credits (took him 6 hours), "B" reports from the same series of work that it completed 30 models. "B" will be granted 1800 credits, and whatever "B" claims for the benchmark rating and time taken to run the 30 models will drop in to the running average with "A"'s report. See, it doesn't matter how long "B" took to run the work. That only effects the "claimed" credit, not the "granted" credit. The claims of all hosts are averaged together as they report and used to calculate the granted credit.

So, assuming any optimization would improve all hosts by the same degree (say 15%), the number of models completed per unit time increases 15%. But the credit claims that drive the credit system on R@h are based on time consumed and the host's benchmark rating (i.e. the claimed credit). So, "A" now reports in 11 models and a claim of ~510 credits (in slightly more than five hours), and "B" is granted ~1530 credits, and each took slightly less crunch time to report 10% more models completed.

So this is what I meant about how the only variance you will notice is based on how the benchmark varies from the actual crunching work. The two hosts will have the same benchmark after any optimizations. But, perhaps host "B" has a larger L2 cache and the optimizations work better in that environment and bring a 25% improvement to "B" whereas "A" only sees a 15% improvement. But again, the difference is hard to discern, and shows you a skew between the benchmark and the actual work more than anything else.

If you assume all of the hosts run the same number of hours per day as prior to the optimizations, the total credit claimed, and granted, per day will be the same as it was before. But all hosts complete more models per day. The science needs the models. The more, the better. Faster turn-around time, also a great thing. So optimization will help the research, but not reflect as a blip on a chart.
Rosetta Moderator: Mod.Sense
ID: 79940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 79956 - Posted: 28 Apr 2016, 6:18:50 UTC - in response to Message 79940.  

RAC comes from history of granted credit. Granted credit is based on number of models completed times the credit claimed for the average model of that task series.

Picture two hosts, one ("A") has a BOINC benchmark of 100 and the other ("B") at 200. "A" completes 10 models and claims 600 credits (took him 6 hours), "B" reports from the same series of work that it completed 30 models. "B" will be granted 1800 credits, and whatever "B" claims for the benchmark rating and time taken to run the 30 models will drop in to the running average with "A"'s report. See, it doesn't matter how long "B" took to run the work. That only effects the "claimed" credit, not the "granted" credit. The claims of all hosts are averaged together as they report and used to calculate the granted credit.

So, assuming any optimization would improve all hosts by the same degree (say 15%), the number of models completed per unit time increases 15%. But the credit claims that drive the credit system on R@h are based on time consumed and the host's benchmark rating (i.e. the claimed credit). So, "A" now reports in 11 models and a claim of ~510 credits (in slightly more than five hours), and "B" is granted ~1530 credits, and each took slightly less crunch time to report 10% more models completed.

So this is what I meant about how the only variance you will notice is based on how the benchmark varies from the actual crunching work. The two hosts will have the same benchmark after any optimizations. But, perhaps host "B" has a larger L2 cache and the optimizations work better in that environment and bring a 25% improvement to "B" whereas "A" only sees a 15% improvement. But again, the difference is hard to discern, and shows you a skew between the benchmark and the actual work more than anything else.

If you assume all of the hosts run the same number of hours per day as prior to the optimizations, the total credit claimed, and granted, per day will be the same as it was before. But all hosts complete more models per day. The science needs the models. The more, the better. Faster turn-around time, also a great thing. So optimization will help the research, but not reflect as a blip on a chart.


Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh?

Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems.

(Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.)
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 79956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,642,682
RAC: 105
Message 79959 - Posted: 28 Apr 2016, 12:33:29 UTC - in response to Message 79956.  


Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh?

Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems.

(Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.)


Speak for yourself, I couldn't care less about the 'credit' etc. it's a nice-to-have measurement of contribution but in the end I do this for the science. Period. If I could 'donate' my credit to people who care so much about it I'd gladly go with zero credit to stop everyone else complaining. lol.
ID: 79959 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79960 - Posted: 28 Apr 2016, 14:52:10 UTC
Last modified: 29 Apr 2016, 12:59:35 UTC

Some machines will typically be granted less credit than claimed, other machines will swing the other way. Basically being granted less credit than claimed is an indication that your machine does not run actual R@h work as much better as it runs the BOINC benchmark. In other words, comparing your machine to some hypothetical one, your machine reports 2x better on the BOINC benchmark, but is only producing 1.8x (pick number less than 2 there) more R@h results per unit time.

Tasks that fail are granted credit within 24hrs, and when this occurs, the credit granted is only visible from the task details. This change to BOINC was done specifically to reflect that the project appreciates the effort, and the fact that the science benefits from learning about and working through the failure.
Rosetta Moderator: Mod.Sense
ID: 79960 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,495,587
RAC: 13,712
Message 79963 - Posted: 29 Apr 2016, 2:06:33 UTC - in response to Message 79960.  

Basically being granted less credit than claimed is an indication that your machine does not run actual R@h work as much better as it runs the BOINC benchmark.

Ha! I like this. So true as well. I'm wondering whether to run something intensive while doing a new CPU benchmark so that I can get granted credits higher than claimed. Weirdly, my phone gets great credits while my overclocked desktop is always down.
ID: 79963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 80002 - Posted: 3 May 2016, 6:19:49 UTC - in response to Message 79959.  


Recent comment on this topic, but still confusing and it still feels like a penalty. If a machine "claims" a certain amount of credit, it usually receives less, and often much less--and nothing that I can do about it. Especially in cases where credit was 0, it makes me feel like you don't actually care about the contributions. Just an embarrassment of riches, eh?

Related annoying cases are computation errors after many hours of work. NOT my fault, but your bug, and 0 credit. Ditto deadline problems.

(Fortunately for Rosetta, at this point I don't even care enough to look for a better project. This is about the 3rd or 4th one I've supported, and none of them were significantly less annoying.)


Speak for yourself, I couldn't care less about the 'credit' etc. it's a nice-to-have measurement of contribution but in the end I do this for the science. Period. If I could 'donate' my credit to people who care so much about it I'd gladly go with zero credit to stop everyone else complaining. lol.


Same here. I just like how 'credits' measure your performance.
ID: 80002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 80042 - Posted: 8 May 2016, 4:24:55 UTC

It's been a while since we last heard of these optimizations.

Any news? rjs5? :D
ID: 80042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,423,241
RAC: 15,512
Message 80054 - Posted: 9 May 2016, 14:19:57 UTC - in response to Message 80042.  

It's been a while since we last heard of these optimizations.

Any news? rjs5? :D


Well ... nothing really worthwhile. I haven't touched base with David in a couple weeks.

Using the ICC compiler and turning off the aggressive inline and unroll options that create a large code footprint seems to give 40%+ improvement. In the 3.73 thread, I outlined the impact of running Primegrid in parallel with Rosetta. Primegrid uses the gimp library which prefetches a large array of data into the caches and kicks all the data in the data cache out. It caused a 3x slowdown in Rosetta and shows how big the code and data footprint is.

You have to use -mtune=generic -march=core2 to maintain a common binary that runs on all machines. You can get some benefit from adding the "-ax<target>" option that generates a FAT binary with both generic and "<target>" code for SSE4.2 or AVX.


I have been looking higher up in the program source code and running some experiments to see if I can coerce the program to auto-vectorize (where the compiler is able to use PACKED SSE/AVX instructions instead of scalar) without much luck.

- removing call statements from the inner loops
- breaking complex loops up into sequential simple loops
- possibly using HUGE data pages


I will probably build a couple more test cases to exercise some of the other protocols.

Interesting but going slow.



ID: 80054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 80055 - Posted: 9 May 2016, 15:21:49 UTC - in response to Message 80054.  


Interesting but going slow.

Nonetheless, your efforts are highly appreciated!
ID: 80055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80056 - Posted: 9 May 2016, 19:01:06 UTC - in response to Message 80055.  


Interesting but going slow.

Nonetheless, your efforts are highly appreciated!


Very much so. Thanks rjs5. Even if you find no silver bullet, it is a much better position that prior to your research.
Rosetta Moderator: Mod.Sense
ID: 80056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 80061 - Posted: 10 May 2016, 11:07:11 UTC - in response to Message 80054.  


Interesting but going slow.


I would also like to add my thanks to you for your efforts. It appears you could make a real impact far beyond running your own crunching
ID: 80061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,642,682
RAC: 105
Message 80076 - Posted: 11 May 2016, 13:46:55 UTC - in response to Message 80061.  


Interesting but going slow.


I would also like to add my thanks to you for your efforts. It appears you could make a real impact far beyond running your own crunching


Same here, awesome to see someone in the community really step up and get involved.

In other news, this:
TeraFLOPS estimate: 600.836


.. if this keeps up, R@H could become a petaFLOPS cluster.. of x86 power at that.. that is incredibly remarkable!
ID: 80076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 80186 - Posted: 16 Jun 2016, 11:51:47 UTC
Last modified: 16 Jun 2016, 11:59:05 UTC

400+ Tflops sustained 24x7, rosetta@home can possibly rank with top 500 supercomputer networks :o :p lol
ID: 80186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cancer Computer

Send message
Joined: 8 Jan 16
Posts: 2
Credit: 473,810,214
RAC: 6,112
Message 80252 - Posted: 24 Jun 2016, 0:01:51 UTC

441st Place
ID: 80252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : 300+ TeraFLOPS sustained!



©2024 University of Washington
https://www.bakerlab.org