Discussion of the new credit systen (2)

Message boards : Number crunching : Discussion of the new credit systen (2)

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
Profile [B^S] thierry@home
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 281,902
RAC: 0
Message 25331 - Posted: 28 Aug 2006, 20:59:09 UTC

Ahhhhhh good ;-). Thanks.
ID: 25331 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 2,906
Message 25343 - Posted: 28 Aug 2006, 22:27:40 UTC - in response to Message 25329.  

I have a Pentium D for a few weeks. Is it the same problem? The Task Manager shows both CPUs at 50%. But I said one chip, two cores, 50% each ....seems OK. But is it OK?
Thanks


Do you have your settings so that BOINC can access only one CPU? Maybe that is why it is not maxed out?
Reno, NV
Team: SETI.USA
ID: 25343 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 25351 - Posted: 28 Aug 2006, 23:20:25 UTC

If two Rosetta tasks are at 50% that makes a total of 100% for the PC - everything is OK with that, the total load on the graph window will sure show those 100%

The logic is : For a dual CPU PC one CPU is only half and thus cannot reach more than 50% (for single threaded apps)
ID: 25351 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 25352 - Posted: 28 Aug 2006, 23:26:12 UTC

In order to allow modem users to continue participating ....



This would be something to improve too, put this above all HTML and PHP statements, even above the doctype, on any page that might become a bit larger :

<?PHP
ob_start("ob_gzhandler");
?>
ID: 25352 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 25354 - Posted: 29 Aug 2006, 0:13:34 UTC

Do you have your settings so that BOINC can access only one CPU? Maybe that is why it is not maxed out?

I made the mistake of running two copies of a DC app on my dual core system at work for a few months without testing to make sure that windows was setup to deal with dual cores. Under windows 2k and XP, a single copy of a DC app (FaD/Rosetta, etc) on a single cpu system eats up 100% of the cpu (or the max available). And under Win2k or WinXP, a single copy of a DC app on a dual core system eats up 50% cpu according to Taskmanager. If you go over to the "performance" tab, and have chosen "one graph per cpu", then it'll show 100% of cpu1 being used, and 100% of cpu2 being used when they're both maxed out.


ID: 25354 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 25370 - Posted: 29 Aug 2006, 1:50:14 UTC - in response to Message 25354.  

...And under Win2k or WinXP, a single copy of a DC app on a dual core system eats up 50% cpu according to Taskmanager. If you go over to the "performance" tab, and have chosen "one graph per cpu", then it'll show 100% of cpu1 being used, and 100% of cpu2 being used when they're both maxed out.


Task manager, on Windows, on a dual core machine, will show two Rosetta tasks, each getting roughly 50% of the system... for a total of 100% of the system. Then if you graph each CPU separately, each is 100% busy.

If it only shows ONE Rosetta task using 50%, then there is a General Preference where you can tell BOINC how many CPUs to use, and it's probably only set to one. And in that case, your graphs would show one maxed out at 100% and the other pretty idle.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 25370 · Rating: 0 · rate: Rate + / Rate - Report as offensive
FloridaBear

Send message
Joined: 25 Aug 06
Posts: 2
Credit: 99,361
RAC: 0
Message 25371 - Posted: 29 Aug 2006, 2:00:38 UTC

I hope I'm in the right thread, please excuse me if not.

Just by observation, on my Core 2 Duo (brandy new), I'm getting less than 1/3 the credit per hour of CPU time that I receive from SETI. I am running an optimized SETI client, so that explains part of it, but there seems to be some disparity there. I'm not complaining, as I have decided to join Rosetta for the science, but it doesn't seem like it's quite on par with SETI in terms of credit.

SETI: 67 credits for 7,400 seconds' CPU time = 33 credits per hour per core
Rosetta: ~30 credits for 10,700 seconds' CPU time = 10 credits per hour per core

Basically, it seems that the new system of credit might be a bit too "coarse" from my understanding of how it works. I'm going to cut back my CPU time to a target of 2 hours and see if that makes any difference. Perhaps since I have an, umm, above average CPU, I should shoot for shorter time targets. I will post back when I have a few 2 hour WUs returned.
ID: 25371 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 25374 - Posted: 29 Aug 2006, 2:09:21 UTC

Paul, Welcome to Rosetta! Don't worry about changing your WU runtime preference. I mean set it to something that works for the amount of time your machine is on. The higher you set it, the less download bandwidth you'll have getting more work. But the new credit systems gives you credit for the amount of work you get done. And if you crunch Rosetta for 8 hrs a day, then regardless of whether that is 2hrs each on 4 compelted WUs, or 8hrs on one, your credit should be virtually identical. There is some varience between WUs, and so as people try to prove what the optimal WU runtime is, they see change, but it's all going to average out over time. They are sort of using a microscope to study a forest. They're only observing a tiny sample and then trying to extrapolate from that. But, really, just set the WU runtime to what works for your situation.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 25374 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 2,906
Message 25381 - Posted: 29 Aug 2006, 2:45:06 UTC - in response to Message 25374.  

And if you crunch Rosetta for 8 hrs a day, then regardless of whether that is 2hrs each on 4 compelted WUs, or 8hrs on one, your credit should be virtually identical.


If he sets he work units to the smallest setting, and keeps his queue as small as possible, he will have a better chance of hitting the "early return lotto".


Reno, NV
Team: SETI.USA
ID: 25381 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ethan
Volunteer moderator

Send message
Joined: 22 Aug 05
Posts: 286
Credit: 9,304,700
RAC: 0
Message 25385 - Posted: 29 Aug 2006, 2:53:42 UTC - in response to Message 25381.  

hitting the "early return lotto".


Does anyone have an example of this? While the credit system is a running average, it would take a lot of luck to be one of the first two or three (those making a 'quorum') to return results. It would be even rarer for someone to get two of these in their results.

I'm not sure of the actual number of wu's per run, but it's in the tens of thousands. You do the math, how hard it would be to get a new wu right when you requested work, then return that wu before two or three others had a chance?






ID: 25385 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 25386 - Posted: 29 Aug 2006, 2:56:57 UTC
Last modified: 29 Aug 2006, 3:00:20 UTC

Paul:
You've got 75,189.03seconds 219.23oc 287.22new credit 13.75189971 new credits/hour.
test it after a week, so you have a larger number of results to average.

Take a look at some of the graphs of granted credit and you'll see why we now need a fair sized group of WUs to average.
ID: 25386 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Biggles
Avatar

Send message
Joined: 22 Sep 05
Posts: 49
Credit: 102,114
RAC: 0
Message 25396 - Posted: 29 Aug 2006, 3:40:28 UTC

Actually, Crunch3r's 5.5.0 client did return benchmarks that were higher than the physical abilities of a processor even if perfect efficiency was achieved.

This is my 2.3 GHz Athlon XP. It's a Barton core running at 11.5 * 200 MHz. Now look at my integer benchmarks, as given by Crunch3r's 5.5.0 client - "Measured integer speed 11476.56 million ops/sec". Thing is, an Athlon XP can only issue 3 integer instructions per clock cycle, meaning that my processor is only capable of 3 * 2,300,000,000 integer instructions per second. That means that with absolute perfect efficiency, which you don't get ever, my processor can issue 6,900 million ops/sec. Where did the extra 4,576,56 million ops/sec come from?!

This is my 2.14 GHz Duron. It runs Trux's 5.2.13 or something like that. Note it's integer benchmark - "Measured integer speed 6364.27 million ops/sec". But that is physically possible because 3 * 2,140,000,000 = 6,420 million ops/sec.

I am sick and tired of people saying that Crunch3r's 5.5.0 client rewards them based on what they can contribute. No, you can't contribute something that is physically beyond the capabilities of your processor.


Its been stated by some SETI people that the client actually DID produce 3x the work, and the opt bench was made to match it. If a proc can produce the work then it can produce the benchmark, and vice versa.


I can believe that Crunch3r's SETI client did produce 3x the work of the standard one, but that doesn't mean the processor is capable of producing that high a benchmark. I'll explain.

Say we have a processor that has a standard BOINC client benchmark of 3,000 million ops/sec. And that it does a work unit in 1.5 hours. Now that benchmark result is a measure of the processor's throughput, it's potential.

Now imagine that with Crunch3r's optimised SETI application the work unit can now be done in 0.5 hours - that is three times faster. But that doesn't mean the processor is suddenly capable of 9,000 million ops/sec. It hasn't gained any more potential throughput. So what has happened?

Efficiency has been improved. The same computation is being done in fewer processor cycles. An example would be dividing operations. A divide operation is slow, it takes several clock cycles. A right shift however takes only one cycle and is the same as division by two. That's just one quick example. Another way to improve efficiency is increase parallelism in the computation. For instance we might start reading values from the cache, or from memory, before we've finished adding something else. That maximises throughput, but doesn't mean we have any more capacity. It's the same with the use of things like SSE or MMX - they do in one instruction what takes several to do normally.

Going back to our example numbers above, we find that whilst the standard SETI client may take 1.5 hours to do a work unit on a processor that has the potential to do 3,000 million ops/sec, it may only be getting 33% efficiency. By doing a whole lot of optimisations, we might get the efficiency up to 99%, and then it would only take 0.5 hours to do a work unit on a processor that has the potential to do 3,000 million ops/sec.

What I am getting at is that the BOINC benchmark gives an indication of the throughput of a processor. Even if a processor is physically capable of issuing more instructions per second than the benchmark shows, this is because the benchmark cannot achieve higher throughput. An analogy would be a 200 mph Ferrari that we can only ever get to 120 mph because even though it is physically capable of more we aren't driving on a good enough road surface.

The final piece of the puzzle is why the benchmark scores are higher with optimised BOINC clients. If we consider something like a Trux client that turns 3,000 million ops/sec into 6,000 million ops/sec, this is because the use of SSE or SSE2 or MMX or whatever instructions allow us to get higher throughput on the benchmark. The only condition is that the throughput cannot be higher than the physical limit of the processor.

My last post showed that the Trux BOINC client came close in achieving maximum throughput for the processor on the BOINC benchmarks. This is supposed to reflect that my processor has things like SSE which can increase the maximum throughput of the processor on some computation. Thing is, whilst something like SSE raises our possible throughput, if the science application itself is still inefficient then we won't be any faster. But if we have somebody like Crunch3r optimising the SETI science application using things like SSE or other assembler tricks, we find the efficiency going up and getting closer to the maximum throughput of the processor.

The problem with Crunch3r's 5.5.0 client is that it returned results above the physical maximum for a processor. It didn't just take into account things like SSE and MMX to give a higher benchmark score (and thus to show that it's possible to get better throughput), rather it gave figures that show a processor was capable of 166% efficiency in terms of it's absolute physical maximum ability.

The old Crunch3r BOINC clients were fine, as are the Trux optimised clients. They do take into account the increase in throughput that can be attributed to extra instructions as found in SSE etc. But 5.5.0 just seems to make numbers up. They are physically impossible.

Now I am no guru by any stretch on modern processor architecture, but I do know they are incredibly complex, and measuring a cpu's power with remedial math just isn't possible.


I don't quite count myself as an expert on it, but I think my understanding is rather beyond that of the layman. What I can say is that we can easily give an upper limit on the number of instructions a processor can issue in a given time. In my first post I pointed out the Athlon XP can issue three integer instructions per clock cycle, which is where I get the 6,900 million ops/sec figure from as a fixed upper limit. What I can't calculate is whether 100% efficiency can be reached. If our code works in such a way that we never issue more than two integer instructions per clock cycle then we'll never be using more than 4,600 million ops/sec on my processor from above.

Also, we have to remember that some instructions take longer to execute than others. I might be able to issue 6,900 million ops/sec but a MUL instruction will take several clock cycles to finish whereas a LODSB might only take one. Inside that one second I might be able to execute 6,900 million LODSB instructions, but only 1,150 million MUL instructions.

There are too many things happening in parallel or in different areas of the processor. So, you ask where those extra cycles can possibly come from? As I understand it, the instuction sets can act as "mini" calculators. They are simple calcs, not a full FPU or whatever, they only look for simple pieces that can be reduced or simplified, to take the load off the main math processor. Why waste a full cpu cycle to grind out 2+2, or other simple addition or subtraction bits. They take the simple load off, and let the main center do the big complex math. Each time you can reduce a number or simplify an equation, it frees up the main math proc to do more "real" calculations, so more work gets done. Each of these steps still gets counted as though it were an operation. So if your bench is reading more cycles than your proc has GHZ, that'd be my guess as to why. Again, I'm not an expert in this by any means, so take it as you will. If I'm off in left field, I wouldn't mind hearing a better description myself.


An instruction set is the complete set of all the operations a processor can perform. This includes things like arithmetic operations, logical operations and memory addressing operations.

It's not that we use instructions to do simple things and leave the ALU or FPU to do complex calculations. It's that the complex calculations are broken down into their simple constituent parts, or instructions, and the processor executes each one of these instructions in turn. Taking an average of two numbers is several instructions. Firstly we need to load values from memory or registers. We then need to add the numbers together. Then we need to move the result to a memory location. Then we need to load that memory location. Then we need to divide that value. And then we need to move that result to another memory location.

Do you see now how we don't just have an average calculation done in a single processor cycle but rather how it's lots of instructions that all need at least one cycle to execute, some possibly more than one cycle? I hope I've been able to give you a bit of an understanding, although the likelihood is that I've confused you further.

All this brings me back to my original conclusion - that the 5.5.0 client returned benchmark results that are actually beyond what a processor is physically capable of producing even at perfect, 100% efficiency.
ID: 25396 · Rating: -9.9920072216264E-15 · rate: Rate + / Rate - Report as offensive
Hymay

Send message
Joined: 15 Jun 06
Posts: 8
Credit: 95,312
RAC: 0
Message 25417 - Posted: 29 Aug 2006, 7:49:32 UTC - in response to Message 25396.  

@ Biggles
Apologies in advance for chopping up that post to quote a small bit of it, was too huge to do it all. Thank you for the indepth explanation though, it was a bit cumbersome, but not confusing.

It seems we actually are saying many of the same things, yet have come to opposite conclusions. Efficiency was indeed what I was getting at, though you explained it far better. The other issue would seem to simply need a bit of clarification on the definition of benchmark. The confusion may be coming from what exactly is being measured. You seem to feel the benchmark measures actual cpu output in cycles, my opinion differs - will explain further later.


Say we have a processor that has a standard BOINC client benchmark of 3,000 million ops/sec. And that it does a work unit in 1.5 hours. Now that benchmark result is a measure of the processor's throughput, it's potential.
Now imagine that with Crunch3r's optimised SETI application the work unit can now be done in 0.5 hours - that is three times faster. But that doesn't mean the processor is suddenly capable of 9,000 million ops/sec. It hasn't gained any more potential throughput. So what has happened?
Efficiency has been improved. The same computation is being done in fewer processor cycles.


Precisely what I was getting at, the newer arcitectures can do more work in fewer cycles. No extra ops/sec are achieved, but it IS doing the work in 3,000 mil ops/sec that a "straight" unit would take 9,000 mil ops/sec to do.



The final piece of the puzzle is why the benchmark scores are higher with optimised BOINC clients. If we consider something like a Trux client that turns 3,000 million ops/sec into 6,000 million ops/sec, this is because the use of SSE or SSE2 or MMX or whatever instructions allow us to get higher throughput on the benchmark. The only condition is that the throughput cannot be higher than the physical limit of the processor.


Here's where we branch in definition of the benchmark, you are saying its measuring the actual throughput, in which case, what you say is true, it cannot exceed the proc limit. However, my understanding is that it is measuring the output versus a measuring stick, say what a straight fpu can produce. In that case, the output can appear to exceed the freq of the processor, however, all it is saying is that the work accomplished in your Athlon's 6,900 mil ops/sec or less would take a straight fpu 10,000+ mil ops/sec to do. (or 21,000ish mil ops/sec, since we're referencing tripled stats) Efficiency in action



The old Crunch3r BOINC clients were fine, as are the Trux optimised clients. They do take into account the increase in throughput that can be attributed to extra instructions as found in SSE etc. But 5.5.0 just seems to make numbers up. They are physically impossible.


Physically impossible as throughput, as stated above, yet not as a comparison. If the numbers were made up, I doubt the rosy staff would have allowed its use.
Whether the optimizations were coded correctly, I cannot say. Having seperate opts for MMX only, SSE only, or SSE2 certainly implied their purpose was simply to add those specific instructions to the calculation. The vast majority of users simply felt they were the "true" measure of their cpu's power.. when properly programmed for.

I can believe that Crunch3r's SETI client did produce 3x the work of the standard one, but that doesn't mean the processor is capable of producing that high a benchmark.


Sorry, but that statement doesnt' make much sense. It can do that much work, but can't produce a bench to match the amount of work done?
I'll stand by the statement that if a cpu can produce the work, it can produce a matching benchmark.

What you seem to be describing is a measurement. Actual throughput. The wiki defines benchmark as "a point of reference for a measurement" not just a straight measurement. The number of cycles you were quoting is a straight measurement. The measurement given is how many cycles it would take a reference cpu to do the same work.

In summary, we seem to agree on the vast majority of points, but where you see a benchmark measuring actual cycles, I see a reference relative to work done.
But we both agree that a cpu can do more work than the physical cycles allow under normal circumstances, due to the efficiency of newer arcitectures (MMX,SSE, etc) and proper coding.

Neither of which are given any accounting in the standard client benchmark. Thus, it cannot measure the true "potential" of a cpu.

ID: 25417 · Rating: 0 · rate: Rate + / Rate - Report as offensive
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 25419 - Posted: 29 Aug 2006, 9:04:39 UTC - in response to Message 25417.  


The old Crunch3r BOINC clients were fine, as are the Trux optimised clients. They do take into account the increase in throughput that can be attributed to extra instructions as found in SSE etc. But 5.5.0 just seems to make numbers up. They are physically impossible.


Physically impossible as throughput, as stated above, yet not as a comparison. If the numbers were made up, I doubt the rosy staff would have allowed its use.


I wouldn't call "allowed" what more was "didn't care". In the meantime things have changed and the Rosetta team had to take actions against so called "optimizations", that made credits almost worth- and senseless. What the projects display on their websites as "TeraFLOPs" as a measure for their throughput is nonsense now - because of clients that benchmark something, but not the used instructions.

Norbert
ID: 25419 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Biggles
Avatar

Send message
Joined: 22 Sep 05
Posts: 49
Credit: 102,114
RAC: 0
Message 25425 - Posted: 29 Aug 2006, 11:52:51 UTC - in response to Message 25417.  

@ Biggles
Apologies in advance for chopping up that post to quote a small bit of it, was too huge to do it all. Thank you for the indepth explanation though, it was a bit cumbersome, but not confusing.

It seems we actually are saying many of the same things, yet have come to opposite conclusions. Efficiency was indeed what I was getting at, though you explained it far better. The other issue would seem to simply need a bit of clarification on the definition of benchmark. The confusion may be coming from what exactly is being measured. You seem to feel the benchmark measures actual cpu output in cycles, my opinion differs - will explain further later.

Precisely what I was getting at, the newer arcitectures can do more work in fewer cycles. No extra ops/sec are achieved, but it IS doing the work in 3,000 mil ops/sec that a "straight" unit would take 9,000 mil ops/sec to do.


This is really a seperate issue. For instance we know a Core 2 Duo at 2 GHz is faster than an Athlon 64 at 2 GHz which is faster than a Pentium 4 at 2 GHz. That's architectural efficiency. And using a stock BOINC benchmark you would see a corresponding increase in the benchmark results.

Here's where we branch in definition of the benchmark, you are saying its measuring the actual throughput, in which case, what you say is true, it cannot exceed the proc limit. However, my understanding is that it is measuring the output versus a measuring stick, say what a straight fpu can produce. In that case, the output can appear to exceed the freq of the processor, however, all it is saying is that the work accomplished in your Athlon's 6,900 mil ops/sec or less would take a straight fpu 10,000+ mil ops/sec to do. (or 21,000ish mil ops/sec, since we're referencing tripled stats) Efficiency in action


I see where you are coming from, but the thing is there is no such thing as a straight FPU to be measured against. Really the BOINC benchmark is a measure of potential throughput. On the standard BOINC clients this benchmark is low because it's unoptimised and with no optimisations in the science application, we will not see higher throughput. If we then optimise a science application and use SSE2 in it, we will have increased the program efficiency. So if our program is now a whole lot more efficient, we may be getting throughput higher than the BOINC benchmark said we could, because it wasn't taking into account optimisations. So we then recompile BOINC and get the benchmark to take into account SSE2 like we used in the science applications, and then we get a corresponding increase in the BOINC benchmark, because using SSE2 got us higher efficiency and therefore more throughput.

Physically impossible as throughput, as stated above, yet not as a comparison. If the numbers were made up, I doubt the rosy staff would have allowed its use. Whether the optimizations were coded correctly, I cannot say. Having seperate opts for MMX only, SSE only, or SSE2 certainly implied their purpose was simply to add those specific instructions to the calculation. The vast majority of users simply felt they were the "true" measure of their cpu's power.. when properly programmed for.


Here we come back to the potential throughput point. The BOINC benchmark took into account the potential throughput of a processor running unoptimised code. The original, unoptimised SETI client didn't even achieve that level of throughput. It couldn't achieve maximum throughput because of things like branch prediction being wrong or cache misses or having to fetch data from main memory, all sorts of things that happen in the real world but things that don't happen in a benchmark. We frequently see real world performance being below that of a benchmark, because a benchmark is done under perfect conditions.

If we think again of a mystery processor that with a stock BOINC client gets an integer benchmark of 3,000 million ops/sec, we'd likely find that it never gets more than say 1,000 million ops/sec when running SETI. An optimised SETI application might double performance, might be more efficient and get us up to 2,000 million ops/sec. But these optimisations then get applied to the benchmark and inflate it to say 6,000 million ops/sec. I can still accept that, because it will be less than the physical maximum of the processor. Then we might get another optimised client that is 50% faster still. This gives us three times the performance of the original standard SETI application, because it is more efficient and because it uses optimisations. But rather than saying "hey, we've increased efficiency and are closer to the potential throughput that the BOINC benchmark gives" optimisers brought out new clients that inflated the benchmark another 50% and now we're on 9,000 million ops/sec and we've exceeded the physical maximum for the processor.

The problem boils down to this: the standard BOINC benchmark, even with no optimisations, gave a result, a potential throughput, that was higher than the science applications themselves achieved. And when optimised science applications appeared, optimised BOINC clients appeared with benchmarks that matched the increased efficiency of the science apps. Yes, it is fair to get twice the credit for twice the work, but inflating a benchmark is a naive way of doing this because it only really worked for SETI. Trux's calibrating client was far better because it got you the extra credit you deserved but without giving stupid benchmarks that screwed up credit claims everywhere else.

As a side point, if you try and run an SSE2 optimised BOINC client on a non-SSE2 processor it will crash. The different levels of optimisation are to give the best possible optimisations to each different processor type.

I can believe that Crunch3r's SETI client did produce 3x the work of the standard one, but that doesn't mean the processor is capable of producing that high a benchmark.


Sorry, but that statement doesnt' make much sense. It can do that much work, but can't produce a bench to match the amount of work done?
I'll stand by the statement that if a cpu can produce the work, it can produce a matching benchmark.


It does if you think about it carefully. The BOINC benchmark is a measure of potential throughput, what it would achieve with 100% program efficiency. But if the standard SETI application only achieved 30% program efficiency, it isn't using all of that potential throughput. An optimised SETI application may take this up to 90+% efficiency, meaning we are using more of this potential. It doesn't mean the processor is suddenly capable of producing a higher benchmark.

What you seem to be describing is a measurement. Actual throughput. The wiki defines benchmark as "a point of reference for a measurement" not just a straight measurement. The number of cycles you were quoting is a straight measurement. The measurement given is how many cycles it would take a reference cpu to do the same work.


No, the BOINC benchmark is potential throughput, not actual throughput.

This is basically splitting hairs on terminology. When I say a BOINC benchmark gives 3,000 million ops/sec. There is no reference point. If there was, then the result would be in a form of 1.79 (number pulled from thin air) meaning 1.79 times faster than a reference machine, where we would have a defined reference machine that might be a 2.6 GHz P4.

In summary, we seem to agree on the vast majority of points, but where you see a benchmark measuring actual cycles, I see a reference relative to work done. But we both agree that a cpu can do more work than the physical cycles allow under normal circumstances, due to the efficiency of newer arcitectures (MMX,SSE, etc) and proper coding.


I see the BOINC benchmark result as a measure of the potential of a processor. Whether this potential is used is up to the programmers of the science application. It should be a relative value, because faster processors will be measured by BOINC as having a higher potential.

No, I didn't mean that a CPU can do more work than the physical cycles allow under normal circumstances. The 6,900 million integer ops/sec figure from my first post was a mathematical, theoretical maximum for the 2.3 GHz Athlon XP that I was talking about. It cannot ever be exceeded, no matter what you do, be it using MMX, SSE, 3Dnow! or anything else.

A standard BOINC client would measure the potential throughput of that processor without optimisations to be about 2,000 million ops/sec. SETI/Rosetta/CPDN/whatever without optimisations will never exceed that figure in terms of how much work they do. If I were to then use an optimised SETI application that did then use SSE or something, I could then achieve more than 2,000 million ops/sec, but still less than the 6,900 million ops/sec. The SSE optimisations could then legitimately be applied to the BOINC benchmark (or measurement, if you feel that is a more accurate term) meaning that BOINC may measure a higher potential throughput than before, perhaps 5,000 million ops/sec. But it still cannot exceed the 6,900 million ops/sec figure because that is the mathematical maximum of the processor.

Optimisations allow us to achieve a higher potential throughput. But they do not allow us to do more work than the physical maximum of a CPU.

Neither of which are given any accounting in the standard client benchmark. Thus, it cannot measure the true "potential" of a cpu.


Actually, BOINC all along measured the potential of a CPU. Not exactly what it did, but what it could do. The standard BOINC client measures the potential throughput of a CPU without optimisations. An optimised BOINC client measures the potential throughput of a CPU with optimisations. What BOINC did not do is measure the actual throughput of a science application, because it actually can't.

This was also the fairest approach all along. What it means is that if you take any processor, and you use only a stock BOINC client and stock science application, then an hour on SETI would have claimed credit at the same rate as an hour on Rosetta, or Einstein, or Predictor...

Optimised clients screwed all this up. They said that your CPU had a higher potential throughput, which is only true when you have a matching optimised science application. Perhaps you are capable of 5,000 million ops/sec on SETI with an optimised science application. But on Rosetta with no optimised science application, you can only achieve the 2,000 million ops/sec level (because you aren't making use of SSE to achieve higher program efficiency). However, an optimised BOINC client like 5.5.0 says that your measured potential throughput is far higher than it actually is and results in the overclaim that everybody has seen.

My issue with 5.5.0 is that it gives a measured potential throughput of a CPU that is physically impossible. For brevity, I will resume calling measured potential throughput a "benchmark". Trux's BOINC clients, and Crunch3r's old clients gave benchmarks that were in the realms of physical possibility. Higher than would ever be achieved by science applications in practice, but not physically impossible. Just like the original BOINC benchmark - it also gave a benchmark result that was potential throughput but which was never achievable in practice by science applications, like I've tried (and probably failed) to explain.
ID: 25425 · Rating: 0.99999999999999 · rate: Rate + / Rate - Report as offensive
FloridaBear

Send message
Joined: 25 Aug 06
Posts: 2
Credit: 99,361
RAC: 0
Message 25441 - Posted: 29 Aug 2006, 15:44:35 UTC - in response to Message 25386.  
Last modified: 29 Aug 2006, 15:46:00 UTC

Paul:
You've got 75,189.03seconds 219.23oc 287.22new credit 13.75189971 new credits/hour.
test it after a week, so you have a larger number of results to average.

Take a look at some of the graphs of granted credit and you'll see why we now need a fair sized group of WUs to average.



Yeah, I now see that I've received many more credits for the same times for certain WUs. Thanks!

On an unrelated note, I got a bunch of failed WUs too; I don't think I changed anything to do that...anyone else seeing that?
[EDIT]: Just received some new WUs that appear to be running fine.
ID: 25441 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 25446 - Posted: 29 Aug 2006, 16:39:54 UTC - in response to Message 25406.  
Last modified: 29 Aug 2006, 16:40:11 UTC

Biggles I think you've got it there. But that's a bit deep for most to digest. I tried to say something similar yesterday. See if you agree that it's basically a client that has a benchmark mechanism that's assuming you're going to be running an optimized SETI application.

Originally, SETI used to just count WUs completed. Then the optimized client came out, and correct me if I'm mistaken, but the logic was that we're all crunching the same (SETI) WUs, and if we run an optimized (SETI) client, and can crunch 3x more work, then we should get 3x the credit. Afterall, that's fair, right?

Where I think we got into problems (on the message boards which resulted in threads being deleted, and contributors and teams leaving) was when the SETI mindset, that this BOINC client somehow magically makes my machine 3x faster, was applied to Rosetta. The client does NOTHING for Rosetta. It's no better, and it's benchmark is no more accurately reflecting how well you can crunch Rosetta then the standard BOINC benchmark. So I think there were a LOT of people out there that simply didn't understand what they were running, and what was being reflected in their benchmarks and credits.

From a SETI perspective, you download some non-standard code and can crunch the same WUs at 3x the speed, and earn 3x the credit. Everyone should do it! It's pretty CLEAR that 3x is a good thing for the science!

From a Baker Lab perspective, we've got nothing against SETI folks running Rosetta. And whether they have an optimized client or not, Rosetta still runs fine.

{reworked to remove offensive terms}
From a hardcore Rosetta contributor perspective, if you claim
more credit then others are getting, then some started with name calling. And they feel the project team should take a stand against this. Your client does nothing to "optimize" Rosetta work, and therefore, whether you did it by manually editing the BOINC files, or by running a client with it's own benchmarking process, they feel you are getting more credit then is fair.

From a SETI perspective, we're not cheating, we're "optimizing" (they didn't realize the benchmarks they run assume the application will use SSE or MMX or other enhanced features of their system... even if Rosetta does not untilize them, and is totally bogged down in floating point operations).

The new credit system surprised some people. Because they fully expected that running with an optimized client would show better results... it doesn't. ...which was the point of the "hardcore Rosetta contributor" originally.

I'm neutral. I'm not on a big team. I really don't count the credits. I tried to stay out of the aguments and just interject a few facts when I could. But it seems to me the whole hoopla was a huge misunderstanding, and miscommunication. There were a lot of people talking beyond their knowledge. And the facts that were accurately brought up in the discussion were often lost in the battle.

I'm very pleased to see the boards be a constructive place to be again. And I am sorry some contributors got so upset about it all that they felt they had to leave. But, as I've said before (ooops, that thread got deleted!)
...if the sherrif gets the city council to create a few laws that were clearly needed, and starts enforcing the laws more strictly, and some folks leave town... let's just say I hold the sherrif in higher regard for taking action and doing what's right. And I'm proud to live in this town!


That's not meant as a slam against anyone that left. You did what you felt you had to do. But I think it was probably due to misunderstandings rather then facts.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 25446 · Rating: -1 · rate: Rate + / Rate - Report as offensive
kevint

Send message
Joined: 8 Oct 05
Posts: 84
Credit: 2,530,451
RAC: 0
Message 25448 - Posted: 29 Aug 2006, 17:03:05 UTC - in response to Message 25329.  

I have a Pentium D for a few weeks. Is it the same problem? The Task Manager shows both CPUs at 50%. But I said one chip, two cores, 50% each ....seems OK. But is it OK?
Thanks


Yep this is correct - and if you had 4 CPU's they would read 25% per CPU for 100% total.

SETI.USA


ID: 25448 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 25449 - Posted: 29 Aug 2006, 17:05:58 UTC

Biggles I know what you mean about 3x the boxes capability. But if I'm not mistaken, the optimized SETI client performs all the same math analysis on the data. But it does so with less actual processing steps.

It's kind of like taking the task of mowing a lawn of a given size. Two people use two different methods. One gets done in 20 minutes, the other takes an hour. They both produced the same result, a mowed lawn of a given size. They may have even used the same TOOL. But one used it better. And so I believe SETI went with granting 3x the credit for those using their tools more efficiently. The PC isn't 3x faster, but the work produced is worthy of credit was the mentality there. The reason the logic doesn't apply to Rosetta is because there is only one tool (the Rosetta application as distributed by Baker Lab), and there is only one way to use it. So there is no 3x optimization available. And if/when one is developed, it will be used by everyone as the standard distribution.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 25449 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Hymay

Send message
Joined: 15 Jun 06
Posts: 8
Credit: 95,312
RAC: 0
Message 25464 - Posted: 29 Aug 2006, 19:14:56 UTC - in response to Message 25425.  
Last modified: 29 Aug 2006, 19:21:00 UTC

I see where you are coming from, but the thing is there is no such thing as a straight FPU to be measured against.

By straight, I would have meant unoptimized. 3 am posts can be bad for word choices.

Really the BOINC benchmark is a measure of potential throughput. On the standard BOINC clients this benchmark is low because it's unoptimised and with no optimisations in the science application, we will not see higher throughput. If we then optimise a science application and use SSE2 in it, we will have increased the program efficiency. So if our program is now a whole lot more efficient, we may be getting throughput higher than the BOINC benchmark said we could, because it wasn't taking into account optimisations. So we then recompile BOINC and get the benchmark to take into account SSE2 like we used in the science applications, and then we get a corresponding increase in the BOINC benchmark, because using SSE2 got us higher efficiency and therefore more throughput.


Precisely my point. An unoptimized benchmark is incapable of correctly measuring the potential of a CPU. IF you are going to measure a cpu, you need to measure its entire potential to be fair, not its castrated potential. This is at the BOINC level, as it it not in their control whether a project optimizes their code. What use is taking a measurement you know isn't fully accurate?

Enough work can be completed to exceed the std benchmark, thus making it a false measure Or at best, incomplete). On the other hand, the opt bench can describe the work done but exceeds the proc's throughput limit, also making it false from what you say. So which is "right"? Both and neither it would seem. One is incapable of correctly measuring full potential, the other can measure it, but uses a "naive" method to do so.

One uses the lowest common denominator to score "fairly", the other attempts to score the true potential, but does so in a poor manner. LCD's are never fair, and are a practice I despise from academia. Standards should be raised, never lowered so everyone can "pass" (stopping here, as that thought is not solely bench related, but spreads into school systems deteriorating etc)

If you had 2 theoretical machines, identical except for SSE2 enabled on one, and disabled on the other, the std benchmark would score them the exact same. If they ran an opt project, the SSE2 enabled machine would by far outproduce the disabled machine, yet would get the same score. With an opt benchmark, the difference would be shown, and the true potential would be scored. As a general purpose benchmark, it should encompass all possibilities. Yes, that would "inflate" the general scores, but in the computer world inflation is inevitable.

Anyway, I think I'm repeating myself or talking in circles, been typing this in spurts through many distractions and hours. Caring for an alzheimer's patient is not conducive to typing a long post in one sitting.

The major point is that one needs to back up from staring intently at the microdetails and look at the general picture. That being the std client cannot properly describe the potential power of modern procs. An opt client is the only way to measure the full potential. (not saying crunch3r's is the best, just a step, or attempt, in the right direction) If the work is possible, then A benchmark is also possible.
Being as this is no longer relevant to rosy, I'll stop here. The last words are yours, tho, you've basically stated the same facts, with a different end opinion. That or your distaste was simply how crunch obtained the results to describe what was happening.

The only way this all seems to relate to the new credit system would be in that the work based units are scored in a manner that mirrors the original benchmark, which is known to be flawed. Many cannot stomach that.

Also, with the averaging, it would seem that even if Rosy were to get optimizations, more work would not actually be scored, but would instead be reaveraged back down to the "dumbed down" levels of the std bench.

The new system seems to have been a ton of effort to create a new measuring stick, that was wasted by shortening it back to the old one.



ID: 25464 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : Number crunching : Discussion of the new credit systen (2)



©2024 University of Washington
https://www.bakerlab.org