Posts by Biggles

21) Message boards : Number crunching : Discussion of the new credit systen (2) (Message 25520)
Posted 29 Aug 2006 by Profile Biggles
Post:
By straight, I would have meant unoptimized. 3 am posts can be bad for word choices.

Precisely my point. An unoptimized benchmark is incapable of correctly measuring the potential of a CPU. IF you are going to measure a cpu, you need to measure its entire potential to be fair, not its castrated potential. This is at the BOINC level, as it it not in their control whether a project optimizes their code. What use is taking a measurement you know isn't fully accurate?


Ah but an unoptimised benchmark does properly measure the potential of a CPU that using nothing but standard X86 instructions (or PowerPC for those of a non-PC persuasion). Rosetta doesn't use anything but X86 instructions, therefore the standard BOINC benchmark properly measures what a processor is capable of on Rosetta. The only way to increase the throughput of the processor is by using extended instruction sets like SSE, which is something Rosetta doesn't use. Is it really fair to reward processors for capabilities they have but which aren't used? If that was the way it worked then Prescott core Pentium 4s would deserve more credit than Newcastle core Athlon 64s, merely because the Prescott has SSE3 and the Newcastle doesn't. But I'm sure you realise the Newcastle is far far faster than the Prescott.

Enough work can be completed to exceed the std benchmark,


Only with optimisations from extended instruction sets like SSE and MMX - which are only really used in optimised SETI science applications.

thus making it a false measure Or at best, incomplete).


But as I've explained, the standard BOINC benchmark is spot on for Rosetta and all other non-optimised BOINC projects. SETI and Einstein are the only projects where there have been science applications with extended instruction set optimisations.

On the other hand, the opt bench can describe the work done but exceeds the proc's throughput limit, also making it false from what you say. So which is "right"? Both and neither it would seem. One is incapable of correctly measuring full potential, the other can measure it, but uses a "naive" method to do so.


The standard BOINC benchmark correctly measures the full potential of a processor that isn't using optimisations that come from extended instruction sets like SSE and MMX. And that fairly covers all BOINC projects and their standard science applications. The only extraordinary cases are the optimised science applications that were found for SETI and Einstien - both of which now have new science applications anyway.

Remember, the optimised BOINC clients reason for existing was to optimise the benchmarks to claim extra credit to account for the faster processing of work units on SETI when optimised science applications were used. Up to a point I agreed with this. That point was passed when one of those optimised BOINC clients, Crunch3r's 5.5.0, threw the laws of everything out the window.

One uses the lowest common denominator to score "fairly",


Yes, but that LCD in this case is standard X86 performance. As this is what is used by all BOINC projects, this is fair. If optimised science applications were freely available for all BOINC projects then it wouldn't be particularly fair.

the other attempts to score the true potential, but does so in a poor manner.


The other, basically just Crunch3r's 5.5.0 (it's the only one I know of that gives such high benches), attempts to get three or more times the credit. It can't really be said to be trying to measure true potential - the other clients already do that with relative increases in benchmark scores for the addition of things like SSE optimisations. 5.5.0 tried to get relatively more credit, because Crunch3r's optimised SETI apps were that much more efficient. That was fair and understandable on SETI, but not anywhere else.

At the end of the day, Trux had a more elegant solution for fair rewarding of SETI credit with optimised science applications. 5.5.0 worked only for SETI, but broke the BOINC ecosystem by throwing out the laws of maths and physics and returning benchmark results from out of this world.

If you had 2 theoretical machines, identical except for SSE2 enabled on one, and disabled on the other, the std benchmark would score them the exact same. If they ran an opt project, the SSE2 enabled machine would by far outproduce the disabled machine, yet would get the same score. With an opt benchmark, the difference would be shown, and the true potential would be scored.


Yup. This is the case, and to begin with, optimised BOINC clients stuck to this. But 5.5.0 went way beyond the curve and gave results that were physically impossible.

As a general purpose benchmark, it should encompass all possibilities. Yes, that would "inflate" the general scores, but in the computer world inflation is inevitable.

Anyway, I think I'm repeating myself or talking in circles, been typing this in spurts through many distractions and hours. Caring for an alzheimer's patient is not conducive to typing a long post in one sitting.

The major point is that one needs to back up from staring intently at the microdetails and look at the general picture. That being the std client cannot properly describe the potential power of modern procs. An opt client is the only way to measure the full potential. (not saying crunch3r's is the best, just a step, or attempt, in the right direction) If the work is possible, then A benchmark is also possible.


Well the thing is the standard BOINC client does fully measure the potential power of modern processors doing just standard X86 stuff. It doesn't take into account things like SSE2, because the projects themselves aren't using it in general. If they were using it then that would be different because it should be taken account of in that case. An optimised BOINC client does take it into account, but is only necessary when the projects themselvs are issuing science applications that use these extra features. When that happens, the BOINC benchmark needs to be overhauled to measure these extra capabilities. But they do not exist in normal usage.

Being as this is no longer relevant to rosy, I'll stop here. The last words are yours, tho, you've basically stated the same facts, with a different end opinion. That or your distaste was simply how crunch obtained the results to describe what was happening.

The only way this all seems to relate to the new credit system would be in that the work based units are scored in a manner that mirrors the original benchmark, which is known to be flawed. Many cannot stomach that.

Also, with the averaging, it would seem that even if Rosy were to get optimizations, more work would not actually be scored, but would instead be reaveraged back down to the "dumbed down" levels of the std bench.

The new system seems to have been a ton of effort to create a new measuring stick, that was wasted by shortening it back to the old one.


I agree with you that the new credit system is flawed in that it determines the credit value for structures based on the benchmark results. But this is averaged out over the whole project, so at least the playing field is level. It's still a bit inaccurate, but it does stop people from having the same unfair advantage that was present before the new credit system.

Therefore, I can only conclude that despite its flaws, it is a step in the right direction and that it is progress.
22) Message boards : Number crunching : Discussion of the new credit systen (2) (Message 25425)
Posted 29 Aug 2006 by Profile Biggles
Post:
@ Biggles
Apologies in advance for chopping up that post to quote a small bit of it, was too huge to do it all. Thank you for the indepth explanation though, it was a bit cumbersome, but not confusing.

It seems we actually are saying many of the same things, yet have come to opposite conclusions. Efficiency was indeed what I was getting at, though you explained it far better. The other issue would seem to simply need a bit of clarification on the definition of benchmark. The confusion may be coming from what exactly is being measured. You seem to feel the benchmark measures actual cpu output in cycles, my opinion differs - will explain further later.

Precisely what I was getting at, the newer arcitectures can do more work in fewer cycles. No extra ops/sec are achieved, but it IS doing the work in 3,000 mil ops/sec that a "straight" unit would take 9,000 mil ops/sec to do.


This is really a seperate issue. For instance we know a Core 2 Duo at 2 GHz is faster than an Athlon 64 at 2 GHz which is faster than a Pentium 4 at 2 GHz. That's architectural efficiency. And using a stock BOINC benchmark you would see a corresponding increase in the benchmark results.

Here's where we branch in definition of the benchmark, you are saying its measuring the actual throughput, in which case, what you say is true, it cannot exceed the proc limit. However, my understanding is that it is measuring the output versus a measuring stick, say what a straight fpu can produce. In that case, the output can appear to exceed the freq of the processor, however, all it is saying is that the work accomplished in your Athlon's 6,900 mil ops/sec or less would take a straight fpu 10,000+ mil ops/sec to do. (or 21,000ish mil ops/sec, since we're referencing tripled stats) Efficiency in action


I see where you are coming from, but the thing is there is no such thing as a straight FPU to be measured against. Really the BOINC benchmark is a measure of potential throughput. On the standard BOINC clients this benchmark is low because it's unoptimised and with no optimisations in the science application, we will not see higher throughput. If we then optimise a science application and use SSE2 in it, we will have increased the program efficiency. So if our program is now a whole lot more efficient, we may be getting throughput higher than the BOINC benchmark said we could, because it wasn't taking into account optimisations. So we then recompile BOINC and get the benchmark to take into account SSE2 like we used in the science applications, and then we get a corresponding increase in the BOINC benchmark, because using SSE2 got us higher efficiency and therefore more throughput.

Physically impossible as throughput, as stated above, yet not as a comparison. If the numbers were made up, I doubt the rosy staff would have allowed its use. Whether the optimizations were coded correctly, I cannot say. Having seperate opts for MMX only, SSE only, or SSE2 certainly implied their purpose was simply to add those specific instructions to the calculation. The vast majority of users simply felt they were the "true" measure of their cpu's power.. when properly programmed for.


Here we come back to the potential throughput point. The BOINC benchmark took into account the potential throughput of a processor running unoptimised code. The original, unoptimised SETI client didn't even achieve that level of throughput. It couldn't achieve maximum throughput because of things like branch prediction being wrong or cache misses or having to fetch data from main memory, all sorts of things that happen in the real world but things that don't happen in a benchmark. We frequently see real world performance being below that of a benchmark, because a benchmark is done under perfect conditions.

If we think again of a mystery processor that with a stock BOINC client gets an integer benchmark of 3,000 million ops/sec, we'd likely find that it never gets more than say 1,000 million ops/sec when running SETI. An optimised SETI application might double performance, might be more efficient and get us up to 2,000 million ops/sec. But these optimisations then get applied to the benchmark and inflate it to say 6,000 million ops/sec. I can still accept that, because it will be less than the physical maximum of the processor. Then we might get another optimised client that is 50% faster still. This gives us three times the performance of the original standard SETI application, because it is more efficient and because it uses optimisations. But rather than saying "hey, we've increased efficiency and are closer to the potential throughput that the BOINC benchmark gives" optimisers brought out new clients that inflated the benchmark another 50% and now we're on 9,000 million ops/sec and we've exceeded the physical maximum for the processor.

The problem boils down to this: the standard BOINC benchmark, even with no optimisations, gave a result, a potential throughput, that was higher than the science applications themselves achieved. And when optimised science applications appeared, optimised BOINC clients appeared with benchmarks that matched the increased efficiency of the science apps. Yes, it is fair to get twice the credit for twice the work, but inflating a benchmark is a naive way of doing this because it only really worked for SETI. Trux's calibrating client was far better because it got you the extra credit you deserved but without giving stupid benchmarks that screwed up credit claims everywhere else.

As a side point, if you try and run an SSE2 optimised BOINC client on a non-SSE2 processor it will crash. The different levels of optimisation are to give the best possible optimisations to each different processor type.

I can believe that Crunch3r's SETI client did produce 3x the work of the standard one, but that doesn't mean the processor is capable of producing that high a benchmark.


Sorry, but that statement doesnt' make much sense. It can do that much work, but can't produce a bench to match the amount of work done?
I'll stand by the statement that if a cpu can produce the work, it can produce a matching benchmark.


It does if you think about it carefully. The BOINC benchmark is a measure of potential throughput, what it would achieve with 100% program efficiency. But if the standard SETI application only achieved 30% program efficiency, it isn't using all of that potential throughput. An optimised SETI application may take this up to 90+% efficiency, meaning we are using more of this potential. It doesn't mean the processor is suddenly capable of producing a higher benchmark.

What you seem to be describing is a measurement. Actual throughput. The wiki defines benchmark as "a point of reference for a measurement" not just a straight measurement. The number of cycles you were quoting is a straight measurement. The measurement given is how many cycles it would take a reference cpu to do the same work.


No, the BOINC benchmark is potential throughput, not actual throughput.

This is basically splitting hairs on terminology. When I say a BOINC benchmark gives 3,000 million ops/sec. There is no reference point. If there was, then the result would be in a form of 1.79 (number pulled from thin air) meaning 1.79 times faster than a reference machine, where we would have a defined reference machine that might be a 2.6 GHz P4.

In summary, we seem to agree on the vast majority of points, but where you see a benchmark measuring actual cycles, I see a reference relative to work done. But we both agree that a cpu can do more work than the physical cycles allow under normal circumstances, due to the efficiency of newer arcitectures (MMX,SSE, etc) and proper coding.


I see the BOINC benchmark result as a measure of the potential of a processor. Whether this potential is used is up to the programmers of the science application. It should be a relative value, because faster processors will be measured by BOINC as having a higher potential.

No, I didn't mean that a CPU can do more work than the physical cycles allow under normal circumstances. The 6,900 million integer ops/sec figure from my first post was a mathematical, theoretical maximum for the 2.3 GHz Athlon XP that I was talking about. It cannot ever be exceeded, no matter what you do, be it using MMX, SSE, 3Dnow! or anything else.

A standard BOINC client would measure the potential throughput of that processor without optimisations to be about 2,000 million ops/sec. SETI/Rosetta/CPDN/whatever without optimisations will never exceed that figure in terms of how much work they do. If I were to then use an optimised SETI application that did then use SSE or something, I could then achieve more than 2,000 million ops/sec, but still less than the 6,900 million ops/sec. The SSE optimisations could then legitimately be applied to the BOINC benchmark (or measurement, if you feel that is a more accurate term) meaning that BOINC may measure a higher potential throughput than before, perhaps 5,000 million ops/sec. But it still cannot exceed the 6,900 million ops/sec figure because that is the mathematical maximum of the processor.

Optimisations allow us to achieve a higher potential throughput. But they do not allow us to do more work than the physical maximum of a CPU.

Neither of which are given any accounting in the standard client benchmark. Thus, it cannot measure the true "potential" of a cpu.


Actually, BOINC all along measured the potential of a CPU. Not exactly what it did, but what it could do. The standard BOINC client measures the potential throughput of a CPU without optimisations. An optimised BOINC client measures the potential throughput of a CPU with optimisations. What BOINC did not do is measure the actual throughput of a science application, because it actually can't.

This was also the fairest approach all along. What it means is that if you take any processor, and you use only a stock BOINC client and stock science application, then an hour on SETI would have claimed credit at the same rate as an hour on Rosetta, or Einstein, or Predictor...

Optimised clients screwed all this up. They said that your CPU had a higher potential throughput, which is only true when you have a matching optimised science application. Perhaps you are capable of 5,000 million ops/sec on SETI with an optimised science application. But on Rosetta with no optimised science application, you can only achieve the 2,000 million ops/sec level (because you aren't making use of SSE to achieve higher program efficiency). However, an optimised BOINC client like 5.5.0 says that your measured potential throughput is far higher than it actually is and results in the overclaim that everybody has seen.

My issue with 5.5.0 is that it gives a measured potential throughput of a CPU that is physically impossible. For brevity, I will resume calling measured potential throughput a "benchmark". Trux's BOINC clients, and Crunch3r's old clients gave benchmarks that were in the realms of physical possibility. Higher than would ever be achieved by science applications in practice, but not physically impossible. Just like the original BOINC benchmark - it also gave a benchmark result that was potential throughput but which was never achievable in practice by science applications, like I've tried (and probably failed) to explain.
23) Message boards : Number crunching : Upgrade complete, sort of (Message 25397)
Posted 29 Aug 2006 by Profile Biggles
Post:
Possibly stupid question, but you have made sure that your BOINC preferences are specified to allow more than one processor to be used on multiprocessor systems?

By the sounds of it the operating system sees and is using both cores, but BOINC and Rosetta are only using one core.
24) Message boards : Number crunching : Discussion of the new credit systen (2) (Message 25396)
Posted 29 Aug 2006 by Profile Biggles
Post:
Actually, Crunch3r's 5.5.0 client did return benchmarks that were higher than the physical abilities of a processor even if perfect efficiency was achieved.

This is my 2.3 GHz Athlon XP. It's a Barton core running at 11.5 * 200 MHz. Now look at my integer benchmarks, as given by Crunch3r's 5.5.0 client - "Measured integer speed 11476.56 million ops/sec". Thing is, an Athlon XP can only issue 3 integer instructions per clock cycle, meaning that my processor is only capable of 3 * 2,300,000,000 integer instructions per second. That means that with absolute perfect efficiency, which you don't get ever, my processor can issue 6,900 million ops/sec. Where did the extra 4,576,56 million ops/sec come from?!

This is my 2.14 GHz Duron. It runs Trux's 5.2.13 or something like that. Note it's integer benchmark - "Measured integer speed 6364.27 million ops/sec". But that is physically possible because 3 * 2,140,000,000 = 6,420 million ops/sec.

I am sick and tired of people saying that Crunch3r's 5.5.0 client rewards them based on what they can contribute. No, you can't contribute something that is physically beyond the capabilities of your processor.


Its been stated by some SETI people that the client actually DID produce 3x the work, and the opt bench was made to match it. If a proc can produce the work then it can produce the benchmark, and vice versa.


I can believe that Crunch3r's SETI client did produce 3x the work of the standard one, but that doesn't mean the processor is capable of producing that high a benchmark. I'll explain.

Say we have a processor that has a standard BOINC client benchmark of 3,000 million ops/sec. And that it does a work unit in 1.5 hours. Now that benchmark result is a measure of the processor's throughput, it's potential.

Now imagine that with Crunch3r's optimised SETI application the work unit can now be done in 0.5 hours - that is three times faster. But that doesn't mean the processor is suddenly capable of 9,000 million ops/sec. It hasn't gained any more potential throughput. So what has happened?

Efficiency has been improved. The same computation is being done in fewer processor cycles. An example would be dividing operations. A divide operation is slow, it takes several clock cycles. A right shift however takes only one cycle and is the same as division by two. That's just one quick example. Another way to improve efficiency is increase parallelism in the computation. For instance we might start reading values from the cache, or from memory, before we've finished adding something else. That maximises throughput, but doesn't mean we have any more capacity. It's the same with the use of things like SSE or MMX - they do in one instruction what takes several to do normally.

Going back to our example numbers above, we find that whilst the standard SETI client may take 1.5 hours to do a work unit on a processor that has the potential to do 3,000 million ops/sec, it may only be getting 33% efficiency. By doing a whole lot of optimisations, we might get the efficiency up to 99%, and then it would only take 0.5 hours to do a work unit on a processor that has the potential to do 3,000 million ops/sec.

What I am getting at is that the BOINC benchmark gives an indication of the throughput of a processor. Even if a processor is physically capable of issuing more instructions per second than the benchmark shows, this is because the benchmark cannot achieve higher throughput. An analogy would be a 200 mph Ferrari that we can only ever get to 120 mph because even though it is physically capable of more we aren't driving on a good enough road surface.

The final piece of the puzzle is why the benchmark scores are higher with optimised BOINC clients. If we consider something like a Trux client that turns 3,000 million ops/sec into 6,000 million ops/sec, this is because the use of SSE or SSE2 or MMX or whatever instructions allow us to get higher throughput on the benchmark. The only condition is that the throughput cannot be higher than the physical limit of the processor.

My last post showed that the Trux BOINC client came close in achieving maximum throughput for the processor on the BOINC benchmarks. This is supposed to reflect that my processor has things like SSE which can increase the maximum throughput of the processor on some computation. Thing is, whilst something like SSE raises our possible throughput, if the science application itself is still inefficient then we won't be any faster. But if we have somebody like Crunch3r optimising the SETI science application using things like SSE or other assembler tricks, we find the efficiency going up and getting closer to the maximum throughput of the processor.

The problem with Crunch3r's 5.5.0 client is that it returned results above the physical maximum for a processor. It didn't just take into account things like SSE and MMX to give a higher benchmark score (and thus to show that it's possible to get better throughput), rather it gave figures that show a processor was capable of 166% efficiency in terms of it's absolute physical maximum ability.

The old Crunch3r BOINC clients were fine, as are the Trux optimised clients. They do take into account the increase in throughput that can be attributed to extra instructions as found in SSE etc. But 5.5.0 just seems to make numbers up. They are physically impossible.

Now I am no guru by any stretch on modern processor architecture, but I do know they are incredibly complex, and measuring a cpu's power with remedial math just isn't possible.


I don't quite count myself as an expert on it, but I think my understanding is rather beyond that of the layman. What I can say is that we can easily give an upper limit on the number of instructions a processor can issue in a given time. In my first post I pointed out the Athlon XP can issue three integer instructions per clock cycle, which is where I get the 6,900 million ops/sec figure from as a fixed upper limit. What I can't calculate is whether 100% efficiency can be reached. If our code works in such a way that we never issue more than two integer instructions per clock cycle then we'll never be using more than 4,600 million ops/sec on my processor from above.

Also, we have to remember that some instructions take longer to execute than others. I might be able to issue 6,900 million ops/sec but a MUL instruction will take several clock cycles to finish whereas a LODSB might only take one. Inside that one second I might be able to execute 6,900 million LODSB instructions, but only 1,150 million MUL instructions.

There are too many things happening in parallel or in different areas of the processor. So, you ask where those extra cycles can possibly come from? As I understand it, the instuction sets can act as "mini" calculators. They are simple calcs, not a full FPU or whatever, they only look for simple pieces that can be reduced or simplified, to take the load off the main math processor. Why waste a full cpu cycle to grind out 2+2, or other simple addition or subtraction bits. They take the simple load off, and let the main center do the big complex math. Each time you can reduce a number or simplify an equation, it frees up the main math proc to do more "real" calculations, so more work gets done. Each of these steps still gets counted as though it were an operation. So if your bench is reading more cycles than your proc has GHZ, that'd be my guess as to why. Again, I'm not an expert in this by any means, so take it as you will. If I'm off in left field, I wouldn't mind hearing a better description myself.


An instruction set is the complete set of all the operations a processor can perform. This includes things like arithmetic operations, logical operations and memory addressing operations.

It's not that we use instructions to do simple things and leave the ALU or FPU to do complex calculations. It's that the complex calculations are broken down into their simple constituent parts, or instructions, and the processor executes each one of these instructions in turn. Taking an average of two numbers is several instructions. Firstly we need to load values from memory or registers. We then need to add the numbers together. Then we need to move the result to a memory location. Then we need to load that memory location. Then we need to divide that value. And then we need to move that result to another memory location.

Do you see now how we don't just have an average calculation done in a single processor cycle but rather how it's lots of instructions that all need at least one cycle to execute, some possibly more than one cycle? I hope I've been able to give you a bit of an understanding, although the likelihood is that I've confused you further.

All this brings me back to my original conclusion - that the 5.5.0 client returned benchmark results that are actually beyond what a processor is physically capable of producing even at perfect, 100% efficiency.
25) Message boards : Number crunching : How to fake out the new credit system (Message 25145)
Posted 27 Aug 2006 by Profile Biggles
Post:
I understand how your exploit works feet1st, but there's a way to limit its effectiveness. BOINC allows for a maximum daily work unit quota, which can be reduced for every failed work unit. If that is enforced that then limits the amount of work units a person can try cheating on.

Not perfect by a long shot, but it does limit the amount of times it can be done. And of course if a machine gets to the top host spot and has failed every work unit, it will stand out.
26) Message boards : Number crunching : Discussion of the new credit system (Message 25142)
Posted 27 Aug 2006 by Profile Biggles
Post:
In no way I wanted to imply that the opt clients delivered falsified results. I stated so on many occasions and will do so in future. You explained very nicely how those impressive benchmarks can be achieved by utilizing all the power modern processors offer. However as you said, Rosetta can't use all those features yet so in fact the benchmark offers more of a potential speed which could be gained from such a computer if the app could utilize all the features. Whether credit should be based on the real work done or the potential a host offers is another question and I'm undecided about that. However I was and am against a system in which every host just gets what he claims. That led to laughable claims of poor hosts and as I remember the most absurd were dealt with but every day new absurd appeared and the more subtle cheats were never caught. It was just a hole in the credit system, which required constant action from the project staff and prevented them from the important tasks.


Actually, Crunch3r's 5.5.0 client did return benchmarks that were higher than the physical abilities of a processor even if perfect efficiency was achieved.

This is my 2.3 GHz Athlon XP. It's a Barton core running at 11.5 * 200 MHz. Now look at my integer benchmarks, as given by Crunch3r's 5.5.0 client - "Measured integer speed 11476.56 million ops/sec". Thing is, an Athlon XP can only issue 3 integer instructions per clock cycle, meaning that my processor is only capable of 3 * 2,300,000,000 integer instructions per second. That means that with absolute perfect efficiency, which you don't get ever, my processor can issue 6,900 million ops/sec. Where did the extra 4,576,56 million ops/sec come from?!

This is my 2.14 GHz Duron. It runs Trux's 5.2.13 or something like that. Note it's integer benchmark - "Measured integer speed 6364.27 million ops/sec". But that is physically possible because 3 * 2,140,000,000 = 6,420 million ops/sec.

I am sick and tired of people saying that Crunch3r's 5.5.0 client rewards them based on what they can contribute. No, you can't contribute something that is physically beyond the capabilities of your processor.
27) Message boards : Number crunching : Cross Project Credit Equality (Message 23405)
Posted 19 Aug 2006 by Profile Biggles
Post:
Doesn't that effectively remove RALPH as the beta (alpha?) test site for Rosetta? It sounds like you're talking about one type of Ralph client application to determine the credit/decoy for a given WU type, and then using a different application on Rosetta to actually crunch the majority of the work.


What I mean is the Ralph version of Rosetta should have a timer and a benchmark. Benchmark within the science app itself to give a fair, non-optimised benchmark score. A timer also to prevent the modifying of the reported work unit run time.

For instance, if I was running Ralph, and my goal was to skew the averages of work units so that more credit was received for them, I need to use a bigger BOINC benchmark by using an optimised client, or if the benchmark is accurate, by making the work unit appear longer than it actually is. Hence the need for the actual science app - on Ralph if they use it to generate the average credits for a series of units, or Rosetta if they decide to make it more robust and a better average - to have a benchmark and a timer in it.

John MacLeod VII wrote:
Build a timer into Rosetta? That sounds a great deal like FLOPS counting. Not a bad idea, but substantially more difficult to program than using fixed credits per task, which works if tasks within a series have the same difficulty.


I don't see the difficulty in adding a few system calls like GetProcessTimes() or GetThreadTimes(). It doesn't need to be in the main body of the science code itself, rather after it. That way we have a measure of how much CPU time was actually used, as opposed to just what a BOINC client reports. This nullifies one of the ways of possibly cheating.
28) Message boards : Number crunching : Removing credits backdated to february. (Message 23349)
Posted 19 Aug 2006 by Profile Biggles
Post:
OK, so BOINC originally had the pipe dream that credits would be project independant. It is obvious that did not happen.
Look at projects where a quorum system is used. You cannot tell me that when 3 WUs are returned and the median WU credit score is the one that all 3 computers get awarded. Let's look at QMC, if say a Conroe clocked to 4GHz returns said WU and claims 150 credits, but the other 2 WUs are returned by a p4 clocked at 1.6GHz claiming 50 credits and another by a AXP clocked at 2.0GHz claiming 75 credits... guess what all three get 75 credits. That Conroe just got the shaft and the credits awarded cannot possibly be compared to a WU from TANPAKU (3 WUs sent out, the first one back is the score for all three) Nor can the credits/hr be compared between the two projects because one uses a quuorum of 1 and the other a quorum of 3. Your argument that BOINC was originally designed to be cross-project equal shows that BOINC is a failure just for this instance.


The 4 GHz Conroe should be claiming the same amount of credit. In its simplest form Credit is benchmark score * CPU time. The Conroe should have a higher benchmark, but lower time. This means it would claim the same credit for that work unit, but of course being the faster processor it would be able to do more work units per hour. Being a faster processor does not entitle it to get more credit for the same work - which is basically what you proposed. A standard benchmark, found in something like... the standard BOINC client, should mean that all computers claim nearly the same for the same work unit, irrespective of time taken, or platform or CPU type.

Let's make this clear, BOINC may have been devised to provide a common platform for multiple DC projects capable of producing scores that were comparable to one another; however, BOINC is merely the front-end that can be used by DC developers to deploy their DC project to a large installed base which can easily migrate to the newest DC project as easily as typing in a URL... that is all that BOINC is now, my friend, nothing more, nothing less.
It is now up to the developers of the DC projects to get BOINC to work as the developers wish, and I think Baker Labs did a heck of good job.


The reason BOINC is failing in regard to that original idea is because optimised clients are screwing up the amount of credit different systems claim.
29) Message boards : Number crunching : Cross Project Credit Equality (Message 23346)
Posted 19 Aug 2006 by Profile Biggles
Post:
The attempt should be made to keep parity across projects. I admit it will be impossible for it to be perfect, but it should be possible to get fairly close. The place to start would be a clear statement banning the optimized BOINC clients from participating in RALPH. If this is done and enforced, then the new method should come fairly close.


I do agree with you, but with BOINC being open source there is nothing to prevent someone from putting subtle changes into the source. A couple of percent here and there isn't really noticable to begin with, but it adds up.

Really, in using Ralph to calculate averages for the granted work credit, the benchmark and some form of timer need to be built into Rosetta itself so that it can't be easily modified. Even then it would still be possible to hack the binaries, but it would at least be something that was challenging instead of something that any kid with a bit of programming knowledge could change.
30) Message boards : Number crunching : Removing credits backdated to february. (Message 23342)
Posted 19 Aug 2006 by Profile Biggles
Post:
I don't think this will be an issue with the new system.

Work credit takes the boinc-derived credit claims from many computers and determines an average. This average is then applied to each computer that ran that WU.

Since it's a set work for credit constant, the cpu type is irrelevant. If one computes faster than the other, it gets more credits since it's able to process more work in the same time.

Let me know if I read your post wrong.


No, you read it right. However, as of yet, you can only see those granted work credit scores in the results page for a computer. Will it be extended so that we can compare entire teams using the granted work credit scores?

Angus also makes the point that it won't be completely representative. The only way I can think of to make it completely representative is to use the Ralph scores to set an approximate value for a work unit and have that pending until all work units from a series are complete then take the average from them all.

Upside to that would be the perfect accuracy in getting an average. Downside would be the length of time credit would be pending and the increased server load in actually working out the average.
31) Message boards : Number crunching : Cross Project Credit Equality (Message 23331)
Posted 19 Aug 2006 by Profile Biggles
Post:
Why is this so darned important? There is NO valid reason to compare results on DC projects under BOINC than it is to say distributed.net needs to have parity with SoB, or D2OL, or DPAD, or F@H, or any other of the myriad of DC projects. Each needs to stand on it's own, and recruit people and CPUs based on the merits of the project and how it's run.

BOINC is simply a piece of software that is only one part of what allows a DC project to do work. It has no relevance to credit parity across projects. This is a myth that was started by David Anderson - something out of a bad dream.


It was a feature of BOINC that was always part of its original design. David Anderson was the guy behind BOINC, that makes it a lot more than a myth.

Also, remember that the combined RAC is supposed to give an indication of the total throughput of the project in terms of TeraFLOPS. That might not matter to the individual users, but it does matter to the admins when the amount of computing power they have at their disposal affects their success in getting research grants or other resources.

Lastly on the topic of cross BOINC crediting, proper parity gives rise to healthy competition. And we all know how powerful a motivator competition can be.

I will concede that cross-project parity is of relatively little importance however. I'd certainly like to see it fixed, but the whole credit system needs fixing here first.

The thing is, even if we isolate Rosetta from BOINC and don't worry about parity with other projects, the credit system still doesn't work even for just this project. The reason for that is that Athlon 64s on Windows (general statement, mostly true) are getting 3 times the points of any other system for the same amount of actual work. I already explained elsewhere that clock for clock a Barton core Athlon XP will do approximately the same amount of work as a Newcastle/Winchester/Venice core Athlon 64, but that it gets far less points.

And that is killing healthy competition. It removes a lot of incentive for many people to crunch for this project if they can't get stats that measure up to the work they are actually doing. There's a helluva lot of P4s out there, and even with optimised clients they get far less credit than an Athlon 64 would, despite doing more work than the stats would indicate. That'll drive people away. The project needs to sort out the credit issue before people leave en mass.
32) Message boards : Number crunching : Cross Project Credit Equality (Message 23326)
Posted 19 Aug 2006 by Profile Biggles
Post:
Jose, the sole point of mentioning Leiden Classical was to point out that it is a project that A) has a quorum and B) has never had optimised science apps. With those conditions met, I believe XS wouldn't have as high an RAC on that project as they do on Rosetta even if all members were able to join and get work. It was one project mentioned amongst several.

I'll put my main point again in another way - on BOINC, one hour of CPU time on one project should give around the same credit as that same machine spending an hour on another project. 1 SETI hour ~= 1 Einstein hour ~= 1 Climate Prediction hour ~= 1 Predictor hour ~= 1 Leiden Classical hour ~= 1 Rectilinear Crossing Numbers hour and so on.

Imagine there is a machine that gets around 10 credits per hour on SETI. It should get approximately the same on Einstein. But it isn't perfect, so maybe it gets 11 credits per hour on Einstein. And maybe it only gets 9 credits per hour on Malaria Control. But that same machine on Rosetta running an optimised client would be getting perhaps 30 credits per hour. That's not fair to those who use standard clients, or to those who have processors that don't produce such staggeringly outrageous benchmark scores. Nor is it fair to those who try and compete over BOINC as a whole.

I want to see that past unfairness corrected. Not only that, but I expect to use a new stats system that awards credit based on work done, not on what is claimed by dodgy benchmarks. Yet if Rosetta keeps on exporting stats based on the current claimed and granted credit then it keeps on being inaccurate.

EDIT - Another thing Jose, the DC Vault is managed by Team Ninja. They dictate what goes in it. The DC Vault also covers a lot more than just BOINC. It has no bearing on BOINC and is not what is the matter of discussion.
33) Message boards : Number crunching : Removing credits backdated to february. (Message 23294)
Posted 19 Aug 2006 by Profile Biggles
Post:
How to say it politely:

Again it is the freakin BOINC uniformity mantra thing!!!!

Who cares about BOINC?

IT is the open source foolishness in BOINC that has opened the whole can of worms to start with. THEY, their open source, their ridiculous way of benchmarking, their allowing for easy ways of interfering with the benchamrks is the root source of all problems.

That and the freaking idea of the Boinc Purists that all projects must be run as if produced by a cookie cutter combined with their propensity of using the fighting word cheat.


Lots of people care about BOINC, especially its developers. The whole purpose of BOINC was an easy-to-use, cross platform distributed computing system. It was intended to ensue fairness across all projects. If the Baker Lab don't want to fit in with BOINC then they should find their own backend. Otherwise, they should try and run it properly. I accept it's difficult which is why I won't jump on the staff and blame them, rather try and offer suggestions.


Had the issue started here, I would have been in more forgiving mood .
But ,this is an old problem and has ben lived through in other projects and drat what a coincidence : the fights are allways started by the same people , arguing the BOINC Purist credo and spouting the same "the others are cheaters" argument. The last project was SETI. That can be veryfied

Do not compare member numbers , compare computers..one of our membres runs more than 150...all on Rosetta 24/7


I know member numbers aren't everything, but they give an indication that something is amiss. Of course, you say it's 150, but since few XS members show their computers, that's a bit difficult to verify.


As to LHC...bad choice of project.... That Freaking project is out of work.
For your information our team there has only 2 active members of a total of 3. Why dont get the administartor there to have new work available and maybe we can show there what a team of dedicated crunchers can do?


It was a great choice of project. Reason for that is it uses a quorum and has never had an optimised science application released for it. That means that benchmarks and credit claims are not skewed and nobody needs to run optimised BOINC clients. It was nothing about current RAC, it was about ability to ever have such an RAC on a project that didn't simply grant the claimed credit.

SIMAP, Leiden Classical, Malaria Control... those are another few that would be equally good choices. SETI and Einstein are not due to the optimised clients and those projects now returning more credit per hour than previously. Climate Prediction and it's offshoots also wouldn't be good choices because they use a fixed credit system.

If XS can replicate their RAC on a project with a quorum I'll eat my words. But they can't.

See in your obsession with optimized clients you forget the effect of overclocking and some Conroes, Kentfields and even power macs that were running 24/7 Rosetta. Rosetta @ Home was not a partime hobby for the XtremeSystems as it is for those Boinc Purists that Bash us. For our Rosetta Team it was a full time job. A job we were gladly to do.


I don't think you're fully understanding the situation then. An overclocked computer will run work units faster and therefore does more work and does deserve more credit. Overclocked computers have higher BOINC benchmarks. They claim more credit and are granted more credit, because they still claim credit in line with the work they have done. Only optimised clients claim more credit than they have done. And anyone running 5.5.0, and that includes myself, is claiming a lot more credit than they really deserve.

It was that dedication that made many whinners dislike us. So they went to the old tricks and used the cheat word.


I haven't called XS cheats. I've been very clear about that. Using 5.5.0 isn't cheating because the admins never forbade it's use. They also never put anything in place to prevent extreme credit claims. XS is doing a lot more work than any other team, they are getting around three times the credit of the Dutch Power Cows because they are doing around three times the credit of the Dutch Power Cows. What I'm not happy with is the fact that XS is getting more credit than SETI.Germany are across all BOINC projects when SETI.Germany is actually doing a lot more real work. Rosetta's lack of prevention of overclaiming is causing this. I don't call it cheating, I call it unfair but the fault of the admins.

Bactrack all you want. That doesnt change the fact you are one of those that used the concept cheater very looslely. A look at your Rosetta Rac says it all.


My RAC is low because I refuse to fully support a project that I feel has a lot of changes to go through. My lack of support means I also won't be writing any front page articles on Ars Technica any time soon. A look at my RAC shows that I get a granted credit that is equal to, or very close to my granted work credit. And not, like some, a granted credit that is three times my granted work credit.
34) Message boards : Number crunching : New Crediting system: questions (Message 23146)
Posted 18 Aug 2006 by Profile Biggles
Post:
both, but the granted credit is what gets reported to the stats sites.


I consider this a problem. The credit granting on Rosetta is way out of line with BOINC as a whole. A machine will earn far more credit per hour/day/week on Rosetta than on any other project if it uses an optimised client.

It's not technically cheating, but it's certainly not fair either.
35) Message boards : Number crunching : Removing credits backdated to february. (Message 23138)
Posted 18 Aug 2006 by Profile Biggles
Post:
Biggles:
Couple questions:
On what math do you base your statement that " There's no way they are truly contributing over 5 TeraFLOPS per day."
Please show me the numbers you used to make that claim.I know, you can't. There is no information to back that statement at all just your "feeling" that we couldn't be turning out that much. Now, are we? I don't honestly know. My best guess is that on the average day we have somewhere between 2000-3000 high end computers running 24/7 on rosetta.
We also would have liked to have had a situation where the credits claimed were based on work accomplished and not on that miserable benchmark within Boinc but thats what we were stuck with.
That is the way it was done and thats how it was done.
You say unrealistic, we say that the optimised files allowed a level plaing field between the different types of cpu's that are used.
Are the points granted higher than the stock client? Yes they are BUT the RELATIONSHIP between the points granted to the different types of systems WITH the optimised files being used became FAIR for the first time. That and only that is why we used them.
I didn't mean this as a flame to you. I can truly understand how you feel. I just felt compelled to make a few points in response.
Thanks for your time,
Movieman


I'll backtrack a bit and clarify.

When I said I don't believe XS are truly doing 5 TeraFLOPS per day what I really meant and what I should have said is that I don't believe XS are truly doing 500,000+ credits worth of work per day. Now 500,000 credits per day = 5 TeraFLOPS per day.

Reason I'm backtracking is that BOINC's idea of a TeraFLOP is rather far removed from reality. At perfect efficiency, 625 2GHz Athlons would be doing 5 TeraFLOPS. I believe XS are fielding more power than that, no matter what it's form. Thus I retract the TeraFLOP statement in proper terms.

With regards to not believing that XS are truly producing a sustained 500,000 credits per day, a glance at a selection of random computers in the XS team shows that the granted work credit (the new value which should be more accurate) and the granted credit values differ wildly. I've seen between 25 and 300% of the granted work credit being claimed. Does that make sense? I'm trying to remain easy to follow.

What that means is that I reckon somewhere between 1/3 and 1/2 of all XS's claimed credit is overclaim. Given the amount of power you truly have, I expect a true RAC, based on work done and not on the claimed credit given by optimised clients, to be somewhere in the region of 250,000. Simply because of the huge over-inflation given by optimised clients.

In different terms, if all of XS was to move to another project, say LHC for example, then the RAC of XS would drop dramatically.

Consider XS having a BOINC overall RAC of ~530,000 versus SETI.Germany having an RAC of ~500,000. They have 2037 users with credit this week, according to BOINCstat.com. XS in contrast have 178 users with credit this week, also according to BOINCstats.com. The numbers on Rosetta just do not stack up. In fairness, I'm not singling XS out here. I believe all the stats of anyone using optimised clients are wrong. I accept XS are doing three times as much work as the Dutch Power Cows etc. Proportionally on Rosetta, your stats are correct. Across BOINC, using credits, they're way out.

You say unrealistic, we say that the optimised files allowed a level plaing field between the different types of cpu's that are used.
Are the points granted higher than the stock client? Yes they are BUT the RELATIONSHIP between the points granted to the different types of systems WITH the optimised files being used became FAIR for the first time. That and only that is why we used them.


You're going to need to explain this to me a bit better. I understand that Linux systems need an optimised client to be credited fairly in comparison to Windows systems with the same hardware. I understand also that P4s generally need an optimised client to be credited fairly for the work they do in comparison to an AMD based system.

So why are dual Opterons running Windows running optimised clients? What do they need the playing field levelled for? All those systems are just getting way more credit than they deserve.

My ultimate reason for wanting to see credit changes backdated is that Rosetta is tearing apart the BOINC credit model. It wasn't perfect before, not by a long shot, but it was vaguely equivalent across projects. Rosetta awards far far more credit for the work done than any other project. I object to that.
36) Message boards : Number crunching : Removing credits backdated to february. (Message 22828)
Posted 17 Aug 2006 by Profile Biggles
Post:
I think backdating any changes to February would be an excellent idea.

You should get credit based on the work you do. You do twice as much work as me, you get twice as much credit. That's fair. That's what we should want.

Consider this. Rosetta uses FPU power. Not SSE2, not MMX, or 3DNow. Just straight up FPU power. Now interestingly, the Athlon XP and the Athlon 64, clock for clock, have the same FPU speed. The FPU unit is almost identical across both processors. Therefore, a 2.2 GHz, 512 KB L2 cache Barton core Athlon XP 3200 will produce for Rosetta almost exactly the same amount of work in any given time as a 2.2 GHz, 512 KB cache Newcastle/Winchester/Venice core Athlon 64 3500. But because of optimised clients and no quorum, they won't get the same credit.

I'm not saying it's cheating, because the admins never said that optimised clients shouldn't be used. But what it has done is grossly overinflate the credit that some people and some teams are getting. XS for instance have an RAC according to Rosetta's current system of over 500,000. There's no way they are truly contributing over 5 TeraFLOPS per day. I'm not condemning them either, I use 5.5.0 as well. But I would like a system where I don't have to overclaim to remain competitive, and I would like to see people fairly rewarded for what they do. And I do want to see it backdated to eradicate nearly a year's worth of unfair credit granting.
37) Message boards : Cafe Rosetta : Moderator contact thread archive (Message 13044)
Posted 4 Apr 2006 by Profile Biggles
Post:
Much appreciated. The team renaming has already been done and we've brought TVR inline with the rest of our BOINC teams. Only SETI and Predictor to fix now...

Cheers
Biggles
38) Message boards : Cafe Rosetta : Moderator contact thread archive (Message 12809)
Posted 30 Mar 2006 by Profile Biggles
Post:
That'd be great, thanks for your help.
39) Message boards : Cafe Rosetta : Moderator contact thread archive (Message 12781)
Posted 29 Mar 2006 by Profile Biggles
Post:
Thanks. So far I've not heard from EG, nor has he appeared on the Ars Technica forum. I haven't been on our IRC channel much, but there are some there who know of our BOINC team renaming and would have brought it to his attention. As far as I know EG hasn't contacted any other member of the team either.

So we are still effectively captainless.
40) Message boards : Cafe Rosetta : Moderator contact thread archive (Message 12690)
Posted 25 Mar 2006 by Profile Biggles
Post:
Didn't want to start a new thread unnecessarily.

I'm a member of Ars Technica Team Vino Rose. Our captain has been inactive for over four months and is unreachable. Could an admin appoint a new captain, someone who is active? After much discussion on the Ars Technica Open Forum, we've decided to do a spot of team renaming. This has been fine and straight forward where the captains are regular and active members of Ars Technica, be it on the forums or the IRC channel. However, EG, the captain of our Rosetta team isn't.

If you could help us out with this it'd be greatly appreciated.

Thanks
Biggles


Previous 20 · Next 20



©2022 University of Washington
https://www.bakerlab.org