Discussion of the new credit systen (2)

Author	Message
dcdc Send message Joined: 3 Nov 05 Posts: 1833 Credit: 123,915,244 RAC: 22,465	Message 26315 - Posted: 7 Sep 2006, 22:35:42 UTC - in response to Message 26310. Last modified: 7 Sep 2006, 22:36:45 UTC To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same. I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue. I posted here with what I think is the info we need to be able to see what the optimal configurations are with regard to CPU and OS. It'd be useful to have an accurate list showing the performance of different configs (the main factor bening the CPU I expect). It'd be a big help to those buing new crunchers as you can then make an informed decision, for example to go for core or x2, and how worthwhile things like cache and RAM are. However, as I posted above, finding out that an OS or hardware config isn't running the code as quickly as we'd like, and making it run faster are two very different things! ID: 26315 · Rating: 0 · rate: /

Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0	Message 26316 - Posted: 7 Sep 2006, 22:46:49 UTC Last modified: 7 Sep 2006, 22:49:36 UTC Define one not too long running reference workunit, post the required parameters to run that WU on Rosetta for maybe 2 hours and ask people to send the results to you (for validation) together with the BIOS, hard- and software information you need plus the real runtime. We had that in other DC projects and many people sent results. If it is possible to force Rosetta to use a specific start value instead of the random seed, this option should of course be used. ID: 26316 · Rating: 0 · rate: /

Whl. Send message Joined: 29 Dec 05 Posts: 203 Credit: 275,802 RAC: 0	Message 26317 - Posted: 7 Sep 2006, 22:49:28 UTC - in response to Message 26301. P.S Sorry, it seems anything I say or post stirs up some poster/s here. One thing that really annoys me about your posts, is the size of those GIF files. It is a real pain in the arse scrolling all over the place to read everybody elses posts. ID: 26317 · Rating: 0 · rate: /

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 26319 - Posted: 8 Sep 2006, 0:06:47 UTC - in response to Message 26270. Last modified: 8 Sep 2006, 0:16:30 UTC -- Deleted -- Mats already addressed the issue far better than me. ID: 26319 · Rating: 0 · rate: /

Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,522,678 RAC: 0	Message 26327 - Posted: 8 Sep 2006, 1:40:28 UTC - in response to Message 26310. I believe the optimization that is required and is known to solve the Mac issue should be implemented . To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same. I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue. Just in case: I do not do Linux ( too complicated for me ) and I do not belong to the Mac cult. :) I just don't like small sample statistics and conclusions based on small sample statistics. I don't like the use of " it seems to have been solved " in lieu of " it has been solved " . The compliance auditor that still lurks in me is trying to get answers. Alas it seems that my search for answers in an attempt to find solutions irritate some people. Worst, there are some people that do not understand why , if I left the project, I am still trying to look for answers. That is too complicated to answer here. Self-Exile , even though justified , is a weird state of being. Suffice to say I cared about this project and I do still care. That said. I think I should stop bothering this thread and let all that want and or care still look for answers and ways to make the system fair and to make the system attractive to all kind and type of crunchers. (Alas something that is not now.) keep doing their work. Pax Well, I'm by no means a good statisticians (that doesn't look right, but too tired trying to fix it), but, let's still play a little with numbers... Now, as I've already posted, if 10% is trying to cheat by artificially inflated claims, you can setup a table like this: Overclaim - increase in average granted credit per model: 5x - 40% 4x - 30% 3x - 20% 2x - 10% 1.5x - 5% 1.1x - 1% So, since Linux is just Underclaims, let's just expand this table a little. Going by BoincSynergy, there's 22632 Linux/Mac-computers in Rosetta, of total this is 12.7%. Note, no idea how many of the computers is actually active or not, but let's still use 12.7%. Underclaim - decrease in average granted credit per model: 10% - 1.27% 20% - 2.54% 30% - 3.81% 40% - 5.08% 50% - 6.35% 60% - 7.62% 70% - 8.89% 80% - 10.16% 90% - 11.43% 100% - 12.7% Meaning, even if all Linux/Mac-users claims zero credit for all their work, they'll only influence the average granted credit with 12.7%. Now, not sure how much more windows is claiming than Linux/Mac, but would guess on less than 2x, meaning the influence is less than 6.35% With some crunchers running "optimized" clients, they'll trying to increase average granted, and Linux/Mac unoptimized will try to decrease average granted. Does they cancel eachother out, possibly, but can't guarantee this. Anyway, since the new credit-system is the average of all results returned for a specific wu-type, the only real chance someone trying to get significant boost from a high claim is to be one of the 1st. to return. This in practice would mean running with 0.001 days cache-size, and 1h run-preference. A Linux/Mac-user can of course also try this, but if they're unlucky and is #1, they'll get much less credit than if they're #2 to return... In practice, appart for being the Lucky/Unlucky #1 to return, the granted credit will quickly average-away. So, in practice, there shouldn't be any significant (yes still unspecific) difference between platforms. That Mac is really slow crunching is a different problem, and isn't due to the BOINC-benchmark. But, being a little more specific at the end, remember, if all windows-users has returned all their wu, and by some unlucky strike of fate all Linux/Mac-users returns their result afterwards, the 1st. linux/mac-result will get the same granted as average for all the windows-users, while for the last linux/mac-result returned, you'll at the absolute worst get 12.7% less than the average for windows-users. But, remembering the table, this is if all linux/mac-users claimed zero credit, more realistically would expect windows is less than 2x higher benchmark, meaning the absolute worst-off is 6.35% lower for the last result. The other way around, all linux/mac returned before any windows-results, will be much worse, since the last windows-user will get roughly 2x (again not sure how much higher windows-benchmark is), but wouldn't expect it due to the users trying to get their credit-boost at the start... In any case, delaying crediting till 1000 results or something is in, should remove any large startup-spikes... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." ID: 26327 · Rating: 1 · rate: /

SekeRob Send message Joined: 7 Sep 06 Posts: 35 Credit: 19,984 RAC: 0	Message 26371 - Posted: 8 Sep 2006, 14:21:01 UTC - in response to Message 24684. Status report August, 23rd The new credit system went live. August, 24th, 11h23 UTC Currently all results returned are not granted credit but are set to "pending". This is due to the fact that the validator stopped working and has nothing to do with withholding credits for whatever reason. [edited because initial assumptions were wrong] I came specially over to crunch a few and see for myself how the new credit system works.....well first impressions are lasting impressions.....u must have nailed it right on the head.....getting credit for my Stock Machine on Stock WOS on my Stock BOINC 5.6.0 and the claim worked out 0.8% lower from what u computed the work was worth....totally aligned with the BOINC credit principles. Love it. ciao Coelum Non Animum Mutant, Qui Trans Mare Currunt ID: 26371 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 26380 - Posted: 8 Sep 2006, 17:01:01 UTC I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point. But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small. -- Mats ID: 26380 · Rating: 0 · rate: /

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 26381 - Posted: 8 Sep 2006, 18:18:11 UTC - in response to Message 26380. I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point. But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small. -- Mats 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. ID: 26381 · Rating: 0 · rate: /

casio7131 Send message Joined: 10 Oct 05 Posts: 35 Credit: 149,748 RAC: 0	Message 26419 - Posted: 9 Sep 2006, 4:12:53 UTC - in response to Message 26381. 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux. ID: 26419 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 26424 - Posted: 9 Sep 2006, 5:46:31 UTC - in response to Message 26419. 10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable. i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux. There's ten percent (or so) difference between the highest average and the lowest average of my machines. If I average those numbers themselves, the spread is +/- 5% (or so). I'm currently working from memory (as described in the previous post). I have four Linux machines and two Windows machines, one of which is a laptop. None of my machines have exactly the same configuration when it comes to processor type and sockets. My fastest machine (per clockspeed) is a Linux machine, so Windows certainly doesn't get a HIGHER result. In fact, I think Windows is actually the slowest machine (but it's also a socket 754 processor, which none of the others are - but I can't say if that's part of the reason why it's lower credit, or just simply because the Windows version is slower - or just that machine isn't working quite as fast for some other reason...) -- Mats ID: 26424 · Rating: 0 · rate: /

Bad_Wolf Send message Joined: 31 Jul 06 Posts: 4 Credit: 191,553 RAC: 0	Message 29435 - Posted: 16 Oct 2006, 6:58:30 UTC Last modified: 16 Oct 2006, 7:19:23 UTC Just my 2 cents opinion: If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give. [edit] Another way could be an average speed for every single class of CPU. For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average... [/edit] ID: 29435 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29462 - Posted: 16 Oct 2006, 13:58:43 UTC - in response to Message 29435. Just my 2 cents opinion: If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give. [edit] Another way could be an average speed for every single class of CPU. For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average... [/edit] Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats ID: 29462 · Rating: 0 · rate: /

Bad_Wolf Send message Joined: 31 Jul 06 Posts: 4 Credit: 191,553 RAC: 0	Message 29473 - Posted: 16 Oct 2006, 18:46:13 UTC - in response to Message 29462. Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats Maybe i didn't explain myself, sorry, english is my second language. I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data. BadWolf ID: 29473 · Rating: 0 · rate: /

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29511 - Posted: 17 Oct 2006, 12:33:53 UTC - in response to Message 29473. Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower. It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time]. Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality Yes, but each machine will have a different setup for memory and how well that memory provides data to the CPU, which is hard to measure. The CPU performance on it's own is already being measured, and that is the basis of the current score-system. There are also other factors: If the system is getting hot or low on power (in a laptop) it may reduce the speed of the processor, which means that it takes longer to do the calculation... Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think. -- Mats Maybe i didn't explain myself, sorry, english is my second language. I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data. BadWolf[/quote] And that's how it works today - there as benchmark to measure integer and floating point performance, and then the machine is left to do the real task of calculating Rosetta. This however has two potential problems: 1. There are different "clients" that calculate the benchmark results differently, including people who use an "optimized" client, which gives results that aren't quite comparable to the actual calculation capacity of the processor. 2. There's no measurement of the overall system performance, just a tiny benchmark (Dhrystone for integers, Whetstone for floating point) which fits nicely in the cache of just about any processor available today (anything more than about 16KB of L1 cache and it will fit in the L1 cache) - so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one. The current approximation, I think, although it may not be ideal, it's a close approximation of "pay for the amount of work done". -- Mats ID: 29511 · Rating: 0 · rate: /

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29516 - Posted: 17 Oct 2006, 14:56:39 UTC - in response to Message 29511. Last modified: 17 Oct 2006, 15:08:05 UTC ... so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one... And don't overlook the length of the floating point pipeline. Two cpus may score the same float speed on the benchmark, but the data is predictable therefore the pipeline runs efficiently. Suppose the processorts come to 1GHz float speed (makes the sums nice), and one is a three stage pipe and the other a five stage pipe. The first cpu actually takes 3 ns to do a float, and gets the throughput by having three on the go at once. The second takes 5ns to do a float, but has 5 on the go at once. The snag comes when which number to calulate next is depends on the result of the last crunch. The first cpu's pipeline stalls for 2ns, the second for 4ns. This can also happen if the data are needed in a weird order (eg FFT tends to do better the shorter the pipe, an important point if you want to crunch on Einstein and perhaps on SETI). If I remember rightly, a Pentium M has a shorter pipe than a Pentium 4. If so, then an M will do better than a 4 at the same benchmarked float speed, and this advantage will increase the more often the floating results are used to make decisions in the code. So on two critical aspects of floating point performance, benchmarks measure what the chip can do at its best (no cache stalls, no pipe stalls). That is further than you'd hope from being a measure of what the same chip does under real conditions -- and on a project like Rosetta those real conditions may be very different beween different kinds of WU, seeing the project experoments with different stategies. It is worse still. We have issues of different pipes and caches. But then, if it is a dual core chip, do they share the cache, have their own separate caches, or what? If separate caches, how do the cache controllers deal with the case where both caches are trying to access the off-chip memeory at once? All thse variables, and we are not even starting to ask about different motherboards yet... For all these reasons benchmarks are very crude. It does seem to me that running a selection of similar tasks on a random selection of boxes taken from the real user pool is less crude, especially with a large enough sample. River~~ ID: 29516 · Rating: 0 · rate: /

Seventh Serenity Send message Joined: 30 Nov 05 Posts: 18 Credit: 87,811 RAC: 0	Message 29586 - Posted: 18 Oct 2006, 15:24:31 UTC I've just switched back to Rosetta@Home from WCG because of the unfairness with credit on Linux systems. I'm more for the science of course, but since Rosetta@Home is still partly based around the HIV/AIDS virus, I'll be running R@H until WCG get their fixed credit system in place. "In the beginning the universe was created. This made a lot of people very angry and is widely considered as a bad move." - The Hitchhiker's Guide to the Galaxy ID: 29586 · Rating: 0 · rate: /

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29594 - Posted: 18 Oct 2006, 19:24:34 UTC - in response to Message 29586. I've just switched back to Rosetta@Home from WCG because of the unfairness with credit on Linux systems. I'm more for the science of course, but since Rosetta@Home is still partly based around the HIV/AIDS virus, I'll be running R@H until WCG get their fixed credit system in place. And even if both projects were equally fair, here each credit is new science. On projects running redundancy only half (or less) of the credits are science, the other half (or more) being used to check the answers. This holds for the old and new credit systems here, of course, so technically I am off topic... ID: 29594 · Rating: 0 · rate: /

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29625 - Posted: 19 Oct 2006, 8:31:41 UTC There is on major thing witht he new credit system, It stops people like Jose having to search through the credits looking for client_state file manipulators and stop things like this over at XtremLabs (which the project seems totaly unaware I guess) http://xw01.lri.fr:4320/top_hosts.php Loads of Top Hosts using file manipulation (general use 'optimised' clients would never claim that high). Since XtremLab have a max credit/hr that's not a problem, just set you Pentium4, D Even the Pentium 3 and 2's in the list you'll see to just under that. That now has little effect with the Rosetta@Home's 'new' credit system. Team mauisun.org ID: 29625 · Rating: 0 · rate: /

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29646 - Posted: 19 Oct 2006, 17:31:28 UTC - in response to Message 29625. There is on major thing witht he new credit system, It stops people like Jose having to search through the credits looking for client_state file manipulators Well yes, if the new system spoils Jose's fun that might count as a disadvantage... ;) But actually it is still possible to inflate host stats on the new system - Run several identical hosts for a while. Detach / Re-attach all but one. Wait for the detached hosts to have all their results deleted, leaving only their credits, merge with the one "master" box. Repeat every so often. Of course, it is different from client_state manipulators in two important ways - work of that amount of credit has actually been done, all that is tricky is the assigning of it to one host. And secondly, although it unfairly raises a box in the host stats, it does not have an unfair effect on user/team stats, about which most users seem more concerned. The serious point in the above is that not all of those accused of being client_state editors were doing what was suspected. No doubt some of them were. R~~ ID: 29646 · Rating: -3 · rate: /

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29653 - Posted: 19 Oct 2006, 19:58:48 UTC - in response to Message 29646. There is on major thing witht he new credit system, It stops people like Jose having to search through the credits looking for client_state file manipulators Well yes, if the new system spoils Jose's fun that might count as a disadvantage... ;) But actually it is still possible to inflate host stats on the new system - Run several identical hosts for a while. Detach / Re-attach all but one. Wait for the detached hosts to have all their results deleted, leaving only their credits, merge with the one "master" box. Repeat every so often. Of course, it is different from client_state manipulators in two important ways - work of that amount of credit has actually been done, all that is tricky is the assigning of it to one host. And secondly, although it unfairly raises a box in the host stats, it does not have an unfair effect on user/team stats, about which most users seem more concerned. The serious point in the above is that not all of those accused of being client_state editors were doing what was suspected. No doubt some of them were. R~~ lol :-D Did you have a look at how many of the computer over at XtremLabs I went throuhg 3 pages and all where claiming near the maximum the project allows. As far as I know it's mainly the two top people there. I know one of them from Boinc@Hull and he certainly is and he's doing it because the other person is from Boinc@Australia. Team mauisun.org ID: 29653 · Rating: 0 · rate: /