Posts by Paul D. Buck

1) Message boards : Number crunching : Why so much variation? (Message 66396)
Posted 1 Jun 2010 by Profile Paul D. Buck
Post:
Here is a pair of tasks, one generated 2 decoys, the other 8, same run times ... but look at the difference in granted credit... 0.69 vs. 53 ...

I don't know about you ... but, the methodology being used has issues ... same amount of calculation time, factor of almost 100 difference in award? I mean, I could buy the explanation if the variance was a factor of 4 with one doing 2 decoys and the other 8 ... but ... sorry, the explanation does not hold water ... one of the reasons I had deprecated my involvement in Rosetta ... which after I push it over 1M I suspect I will do again ... I grant you credits are worthless, but fairness is not ...
2) Message boards : Number crunching : Suspending due to High CPU use. (Message 66395)
Posted 1 Jun 2010 by Profile Paul D. Buck
Post:
Yep, new feature ... and almost everyone that notices it regards it as a bad thing.

Personally I think it is another solution in search of a problem. The only time BOINC has ever gotten in my way is from GPU usage and this feature is not likely to make a dent in that because of the way GPU tasks run ... if you have lag, you will still have it because the GPU Kernel does not shut down right away even if the parent task dies ... one of the reasons I did not run AP26 tasks on the mac was that the tasks were badly made and I got too much lag on the system... I run Collatz on the Mac all day long and never notice it at all (though the Mac card is slower than dirt ...)

Anyway, as more projects update the server side you will be able to make the setting change there...


I notice significant graphics lag while running collatz (ATI Card only). So I set it to run after 1 min of inactivity. The lag shows whether I OC the card or not. Windows 7.

If you run the anon platform you can "tune" most of the versions of Collatz so that this effect is minimal. For me, on all my systems running ATI cards I have not noticed this issue to any extent... but, as always, YMMV ... :)
3) Message boards : Number crunching : Suspending due to High CPU use. (Message 66384)
Posted 31 May 2010 by Profile Paul D. Buck
Post:
Yep, new feature ... and almost everyone that notices it regards it as a bad thing.

Personally I think it is another solution in search of a problem. The only time BOINC has ever gotten in my way is from GPU usage and this feature is not likely to make a dent in that because of the way GPU tasks run ... if you have lag, you will still have it because the GPU Kernel does not shut down right away even if the parent task dies ... one of the reasons I did not run AP26 tasks on the mac was that the tasks were badly made and I got too much lag on the system... I run Collatz on the Mac all day long and never notice it at all (though the Mac card is slower than dirt ...)

Anyway, as more projects update the server side you will be able to make the setting change there...
4) Message boards : Number crunching : Suspending due to High CPU use. (Message 66370)
Posted 30 May 2010 by Profile Paul D. Buck
Post:
If you are using one of the later versions of BOINC, they added a new "feature" to help you ... if the CPU use on the machine for any reason goes over 30% BOINC goes to sleep... most of the projects have yet to update the server side software to accommodate this new setting, at least Rosetta has not as yet, so you either have to set it on a project that allows this (SaH, Collatz are two I know allow it)... or set it locally ...

Those of us that advised that the setting is too low got blown off by UCB (as usual) and I had it triggered multiple times one night even though I was in bed and I have no automated activities running on the machine ...

So, if you attach to SaH, set NNT, make the setting change there, abort any tasks issued you can still make the setting using the remote ...
5) Message boards : Number crunching : [error] negative FLOPs left -1.#IND00 (Message 66369)
Posted 30 May 2010 by Profile Paul D. Buck
Post:
Hi

I believe that it was the DNETC project that was at fault. I had what appeared to be GPU work units running on the CPU or that is what it appeared to be in BoincManager.

I have detached from DENETC on all my PC's and the problem has gone away.

I am in the middle of a move so will retry if you want the logs in a few weeks ??

Mine also was with a DNETC task ... when the task was over the problem went away as well ... my point is that the mal-configuration of a task should not, in and of itself, cause RR Sim to fail over into NaN calculations. The problem is that because I could not reproduce it at will UCB blew me off ...

Anyway, if it is a few days or weeks ... I gotta live with that ... :)
6) Message boards : Number crunching : [error] negative FLOPs left -1.#IND00 (Message 66354)
Posted 29 May 2010 by Profile Paul D. Buck
Post:
That is an error in the RR Sim module ... as far as I have been able to tell it is of little consequence ... however, if you can replecate it at will can you capture a log for me so I can post it?

I need you to create a file named cc_config.xml in a text editor and add this:

<cc_config>
<log_flags>
<cpu_sched_debug>1</cpu_sched_debug>
<rr_simulation>1</rr_simulation>
</log_flags>
</cc_config>


Drop that file in the BOINC dir where the "client_state.xml" file resides ... on the "advanced Menu" click "read config file" check the messages log and you should see more messages wait a few seconds and then edit the file to set those flags to 0

<cc_config>
<log_flags>
<cpu_sched_debug>0</cpu_sched_debug>
<rr_simulation>0</rr_simulation>
</log_flags>
</cc_config>


and re-read the config file ...

copy the messages posted and I am going to PM you my e-mail address and I will post the log to the developers (I had the same problem but I cannot reproduce it at will) ...

As far as I can tell it is not a big deal other than RR Sim is not calculating the internal operations right meaning the internal scheduling is farbled ...
7) Message boards : Number crunching : OSX GPU (Message 66349)
Posted 28 May 2010 by Profile Paul D. Buck
Post:
Unless I'm reading this wrong, is there any ETA when Rosetta will be able to utilize GPU's, specifically on a MAC?

Right now MAC GPU use is pretty much restricted to Collatz and I think EaH though the EaH application uses almost a full core as well as the GPU (GPU use is about 30%) ... which is why at this time I am solidly doing Collatz on my Mac ...

Rosetta is not likely to have a GPU application for any platform ...


Forgive my ignorance or curiosity here Paul, but why won't they ever have one? Is it just something with the way the project is coded?

Nothing to forgive...

A GPU is like an old Cray computer, it is a Vector processor ... if your code uses vectors (arrays of numbers) and you do mostly vector math then the GPU will buy you speed just as the old vector processors used to ... if you don't, you can't ... :)

Wikipedia has a good article on vector processors ... and in their day they were the cat's pajamas for speeding up scientific research for those problems where the program is compatible ... but many problems are not that convenient ... so, vector processors are not the the solution to all problems ... and neither will GPUs solve all problems ... but for those that it can work for you, you can see 60-100 times increases in speeds ... the last time I ran a MW task on my CPUs it took about 4 hours ... my fastest GPUs do those same tasks in about 90 seconds today ...

The thing to focus on as the biggest positive is that the more projects that do move to use the GPU fully and release the CPU side means that those of us that do have GPUs can release the CPU side to work on those projects that are still locked to the CPU and at the same time provide terrific support to the GPU projects. In *MY* case I don't do Collatz, DNETC, GPU Grid, or MW on my CPUs at all ... for me they are pure GPU side projects... that means I have that much more time on my CPUs for the other 30 some project out there ... (~20-30K CS per day from my 500K-1M per day totals, see here).

There are other threads where I and others talk about this in more depth, you could page back in my posting history to find them, search did not see to find them (did not try advanced search, look for GPU, CUDA, or vector as another way to locate the discussions, I think one of the threads I commented in Dr. Baker also commented as well)...
8) Message boards : Number crunching : OSX GPU (Message 66342)
Posted 27 May 2010 by Profile Paul D. Buck
Post:
Unless I'm reading this wrong, is there any ETA when Rosetta will be able to utilize GPU's, specifically on a MAC?

Right now MAC GPU use is pretty much restricted to Collatz and I think EaH though the EaH application uses almost a full core as well as the GPU (GPU use is about 30%) ... which is why at this time I am solidly doing Collatz on my Mac ...

Rosetta is not likely to have a GPU application for any platform ...
9) Message boards : Number crunching : Rosetta@home in a single VM on different hosts (Message 66334)
Posted 26 May 2010 by Profile Paul D. Buck
Post:
Thanks. And I know that BOINC offers me a lot of configuration options like "don't use in case of mouse action", which I give away by using it in a VM. But I want to test if other applications work in this environment too, which usually don't offer any of these configurations (like a cluster/grid). I hope starting the VM with nice level 19 is sufficient for this or I'll have to search for a different scheduler.

You should try it with those options disabled... BOINC is pretty good about getting out of the way of other tasks because BOINC and the science applications run at very low priorites ... the one exception is GPU class where if the kernals are not properly tuned you can get a very laggy system ... also the CPU side is run at a slightly higher priority so that the GPUs don't get starved for data because the CPU is patiently waiting for something to give it a slice ...

Even so, I run up to dual GPUs (once had a quad GPU system with two GTX295 cards in it) and have not really seen issues even with the GPUs with rare exceptions where the GPU application was not properly tuned ...
10) Message boards : Number crunching : Rosetta@home in a single VM on different hosts (Message 66328)
Posted 26 May 2010 by Profile Paul D. Buck
Post:
...
Um, can you point to that paper on-line?
...

Paper: Optimizing Grid Site Manager Performance with Virtual Machines

At least I found out why the clients didn't upload finished jobs. After the wireshark results I suspected the network connection as possible source of error. The problem was the ca-bundle.crt file. Since shared folders do not work with links, I simply copied these files. And all the trouble because of this... Now I mount the NFS dirs directly via VM. And I finally could submit (some of) my old jobs.


I think what you are missing is that even though you have a VM you are still using the pc's cpu to run it, so if you are running Rosetta on the main pc and the VM pc too, you are still using the SAME cpu to do it.
...

I don't use the host for rosetta calculations. They are simply linux workstations for students and sometimes a second vm is started for windows courses ;). Until this point it seems stable.

Why I do this? I want to test the environment for all kind of background work, that's why I use the VM.

Cheers
Michael

Thanks for the paper ... interesting read ... and I am glad you found your issue(s) and are up and running. I would note two points however and that is:

First, BOINC tasks with rare exceptions are compute bound tasks and as such are not really addressed by the paper (at least not on my first read) and spawning multiple VMs on a single machine will not do anything to improve the performance of a task assigned to a VM machine as opposed to running it natively. Exceptions are the NCI tasks from the projects I mentioned before, where a person might create one or more VMs to run multiple copies of BOINC and these NCI projects increasing the number of tasks "in flight" and thus the "earning power" of that machine.

Second, BOINC itself is more or less nothing more than a different approach to virtualizing the "system" so that large compute bound tasks can be segmented and processed on weaker machines in a timely manner. GPU Grid which is also doing molecular modeling actually assigns tasks which are part of a larger problem so that they can get reasonable response times on their models ... by that I mean that the 6+ hour task you run is only part of the larger model ... your result creates a new task which after being processed may generate an additional generation ... MW is another project that is doing something similar... the point being, there are differing approaches to slicing up the problems ...

The paper is looking at how to better use the cluster resources to process a job mix and they sponsor a virtualization approach... BOINC looks at a large compute bound problem that cannot be economically processed on limited resources and solves the problem another way ...

No matter ... good luck with your experiments ...
11) Message boards : Number crunching : Upgrade choice? (Message 66316)
Posted 24 May 2010 by Profile Paul D. Buck
Post:
Not to sound too pius about it, but the DnetC, MilkyWay@Home and Collatz projects aren't my cup of tea; - I ran Dnet for 3+ years and only stopped because I feel my CPU time is better served helping humanity find cures to Cancer and diseases than cracking cryptography (hence going for Rosetta).

But thank you for informing me that there are projects out there to use the ATI cards! - Will have to keep my eye out for worthwhile projects when they become available for the ATI technology.


The problem is that for the gpu cards there is not currently a disease curing project yet. There has been talk about it here at Rosetta but Rosetta does not lend itself well to the differences between cpu and gpu precision. Cpu's are VERY precise while gpu's are only sort of precise, by comparison. And when you get out towards the 10th decimal point, one number can mean a whole lot 10 calculations later! And a hundred calculations later, you are not even in the same ballpark!!

The three are the only ones available as noted with GPU Grid and SaH (for AP tasks) on deck with ATI applications in alpha or Beta testing ... I have also heard rumors of one for Aqua as well (OpenCL) ... Einstein has said they are working on an OpenCL version of their application as well, though their CUDA application is nothing to write home about ...

There are a couple reasons to add you GPU to those projects even if you are not that sold on the science (I would put them as MW, Collatz, and DNETC in order of scientific utility) and that is to build the use numbers more ... not to mention your own CS score ... :)
12) Message boards : Number crunching : Not getting any work. (Message 66315)
Posted 24 May 2010 by Profile Paul D. Buck
Post:
Sadly this is another of those areas where to this point the complaints of the users posting to the Alpha list have been unheeded by UCB... we have noted this "lumpy" behavior in a number of different ways including the one where if you have just reset your debts you will see BOINC go and get a lot of work from just about all the projects to the point where you do have a good buffer fill ... then, a week or so later you will note that the buffer is starting to be a little "thin" in that it does not really have a full bag of work ...

I usually keep the flag to reset the debts in my CC CONFIG file and set it to 1 and stop and restart BOINC every couple weeks ...
13) Message boards : Number crunching : Rosetta@home in a single VM on different hosts (Message 66314)
Posted 24 May 2010 by Profile Paul D. Buck
Post:
For example I run a Boinc cpu project and a Boinc project that uses memory only and a Boinc gpu project on my pc's. On the real part of the box I crunch with all three but on the VM box I can only crunch the memory only project.

If you are speaking of FreeHAL it is not a memory only project though that is a popular misconception ... the CPU use is low as it is another NCI project like DepSPIDER (Closed) and Anansi (Inactive) or even WUProp and QCN... But, indeed the most common load people notice with FreeHAL is the memory footprint which depending on the version can be fairly high ...
14) Message boards : Number crunching : Rosetta@home in a single VM on different hosts (Message 66302)
Posted 23 May 2010 by Profile Paul D. Buck
Post:
I would say this depends on the machinery you use and on the jobs you want to calculate. If the jobs calculate only on one core or they need not that much RAM, I would say that you could do more work with more than one VM. I read a paper on this topic.

Um, no ...

Ok, a Virtual Machine means that on a base computer you create "Virtual" environments because the environment on the base host is not what you want ... not enough memory, different processors, OS, or whatever... or you want to simulate many machines on one machine. In this case you launch multiple VMs that each "seem" to be complete computers. Into these VM you can install whatever programs you want ... Like I can run a VM on my Mac that emulates windows machines ... but to do that I will always have the overhead of the VM, however slight that might be and that eats into the CPU cycles available to do other things ...

Now, because some BOINC Windows programs may be much more efficient due to an optimized compiler it is certainly possible that I would be able to run windows in a VM and get more BOINC production but I have not tried that experiment for any number of reasons ...

But it sounds to me like you are not doing that ... you are trying to set up a common image of a computing environment and then just using that image on multiple and separate machines in lieu of just installing BOINC on the individual machines ... and to that point I still don't understand the interest ... if you install BOINC natively on each machine it will use that machines full capabilities as possibly limited by the constraints allowed by BOINC...

As to the issues I think that the simplest explanation as to why you cannot do the uploads is that in your configuration BOINC does not correctly follow or find the data directory.

But that still brings me back to the simplest question ... if you are trying to run BOINC on 5 machines why don't you want to install BOINC on each machine? How is all this buying you an improvement? Only one copy of BOINC? The disk footprint of BOINC is tiny in today's world ... the biggest part is the data directories and that can't be common for BOINC to work correctly (as BOINC is currently contrived)...

One way there might be to start up each of the VM, install BOINC with each VM copy pointing to its own unique data directory and common BOINC directory ... save each VM with BOINC installed and go from there ... but, I think this is a misuse of VM technology to no essential gain ... you still have 5 separate installs of BOINC albeit each one is a VM that is running on one and only one machine ... meaning that you have no specific gain from creating a VM in the first place ... Because almost be definition virtually all VMs actually have fewer features than the base machines (at least with all the VMs I am aware of or have used ... a Xeon turns into a 486 for example) ...

Unless I completely misread your explanation ...

{edit}
I would say this depends on the machinery you use and on the jobs you want to calculate. If the jobs calculate only on one core or they need not that much RAM, I would say that you could do more work with more than one VM. I read a paper on this topic.

Um, can you point to that paper on-line?

Clarification, if on my windows boxes, I wanted to I could create multiple VMs, in each have a copy of BOINC running and run WUProp and FreeHAL and boost my credit scores and not do much at all ... for Rosetta tasks, running them natively on a single BOINC installation I will use all CPU cycles and memory to run the launched RaH tasks without using any memory or CPU to run the VM emulation. If I create two VM on that same machine and install and run BOINC on them to run RaH, I would get less done because of the overhead of the VM than I do now ... RaH tasks are compute bound, not I/O or memory bound (well, if you have way too little memory you will be all three as Windows thrashes the virtual memory in and out of the disk drive) ...
15) Message boards : Number crunching : Rosetta@home in a single VM on different hosts (Message 66288)
Posted 22 May 2010 by Profile Paul D. Buck
Post:
It probably does get confused.

Now, I have been scratching my head trying to figure out why you are doing this... using multiple VMs on a single machine will allow you to have more tasks in flight at the same time but the overhead of the VMs is going to mean that ultimately you are doing less work than you would if you just had one BOINC installation...
16) Message boards : Number crunching : Division of Labor (Message 66284)
Posted 22 May 2010 by Profile Paul D. Buck
Post:
Side note, there are three modules in BOINC that determine these issues I pontificated about. They are:

RR Sim, Resource Scheduler, and Work Fetch

RR Sim essentially models the computer and the work on it to determine if the right mix of work is on hand and about how long it will take to run. Resource Scheduler drives what gets run when and where and Work Fetch gets new work ... But, though they are conceptually distinct entities they all three interact in the determining what is done and when ...

Back on topic ... W02 for the last hour or two I have been watching is running a GPU Grid task on the Nvidia card and on the ATI card it has been running MW work ... and because I am not getting a full boatload of MW tasks I get 10-12 and run them off, idle the GPU, fetch more and repeat ... so every 15 minutes or so I lose 30-60 seconds of GPU work because BOINC will not pre-fetch enough MW work to prevent the GPU from running dry ... this is one of those effects I talked to ...

It happens more rarely on the CPU side ... but as the man said in the movie "I seen it done..."
17) Message boards : Number crunching : Division of Labor (Message 66277)
Posted 22 May 2010 by Profile Paul D. Buck
Post:
Paul -

Since I am in the process of building a new system centered on the new AMD X6 processor I am a little curious about your comment about "on quads and wider we see artifacts of the system not making optimal decisions on what to do when"

Mostly what happens is you will see BOINC moving along and then suddenly "panic" and start to run things in high priority mode... Or to run tasks that have later deadlines first out of a long list instead of running those that have the shortest deadline...

Most people will never see these things for a couple of reasons... first they don't use the BOINC Tasks page as their screen saver (as I do on my second monitor) ... or they do not run enough projects (well over 50% of users run only one project, I think the number for less than 5 covers 80-90% of all participants).

There are other odd things that happen, and they will happen with any Quad or better. Some of them are version dependent, meaning as you change versions, the oddities change some... because as frustrated as I get with UCB, on occasion they actually fix a bug or two that really makes BOINC work oddly ...

The bank teller analogy is the best I can suggest ... in the old days if you had 8 tellers, you had 8 lines and real frustration if you got behind the guy that was counting pennies ... that is why most banks use a single feeder line to feed the 8 tellers so that one long running customer does not hold up a select few that get angry at the bank ...

One of the other more common issues with BOINC is inappropriate queue fill ... I have a queue of 1 day(s) to tide me over Comcast outages which can occur at almost any time... they usually are short, but, I can have an outage that lasts 4-6 hours (one or two times a year) ... Ok, two immediate problems here .. with the GPU being able to run off a full load of MW tasks in about an hour that means that BOINC likely has not queued up enough work to tide me over ... mostly because UCB is wedded to the idea of GPU Strict FIFO rule ... so, BOINC does not cache and run work from multiple GPU only projects well ... (try it and watch closely, it can cache, but rarely has a balanced queue, the best way to illustrate this is attach to MW and Collatz with equal shares and watch as BOINC "lurches" between the two projects on a cycle that is as long as you cache size, one of the reasons my Collatz is higher than MW is because Collatz fills to 150 tasks and MW to only 48 and they run faster; in other words it is easier to get work from Collatz) ...

On W02 which I am watching right now I have a queue that is "filled" with tasks from a couple projects, mostly 11 tasks from RCN each listed at 25 hours ... yet half those tasks are likely to take seconds to minutes only ... because the run time is so variable (same issue on ABC and a couple other projects) ... So, BOINC thinks it has plenty of work on hand ... on my Mac it has 4 CPDN models ... same issue ... it thinks it has plenty of work on hand ... but it really doesn't ... not with 8 cores to keep busy ...

Anyway, it takes long hours of patient watching to see these patterns ... The best way to learn about issues like this is to watch the BOINC Alpha mailing list ...
18) Message boards : Number crunching : What is the credit used for? (Message 66260)
Posted 21 May 2010 by Profile Paul D. Buck
Post:
No, there isn't. Bragging rights is all.

I do not understand at all what you are saying.

It also allows one to "measure" contributions to various projects and to set personal goals. Many do use it for the "mine is bigger" and "mine is faster" as well ... if you join a team you may also get involved with challenges and other races where the teams compete to do more on a project (or collection of projects) over a set period of time.

A few days of processing and you should start to show up on the stat sites as I have on BOINC Stats or Free DC and as you have There are links on your account page that will take you to the various stat pages...

So, it is like running ... one stretches and warms up ... then starts to run ... but ... we run for a certain distance or time ... and we probably make measurements of how we did today ... then again tomorrow to see if we did better ... or worse ...

It is just one more thing to enjoy, or not, in BOINC ... some don't care at all (or at least say they don't care) and others can't seem to stop talking about their RAC or what they are going to do to raise it ...

So, if you are not into numbers, ignore credit... if you are into competition, join a team and start to race ... but, only do what you enjoy ...
19) Message boards : Number crunching : Division of Labor (Message 66259)
Posted 21 May 2010 by Profile Paul D. Buck
Post:
The run-time spread is actually wider than Mikey's post indicates... because he was illustrating deadlines rather than run times ... On my fastest GPU a MW task takes about 90 seconds ... were I to run that same task on the CPU side it would take about 4 hours and change... yet I also run CPDN where the tasks run for about 300 hours ...

RaH also allows you to select a run time ... so ... how do you balance all of that?

There is a complicated rule set that governs which task to run, when to stop, when to run alternative tasks etc. The problem with complicated rule sets is that the outcome is not always what one expects. An additional complication is that the developers don't quite see a blind spot they have developed about the internals of BOINC... that is that the internal operational "model" is still that of a single processing element ... they apply scaling factors to be sure to handle quad and 8 core systems ... but that is not the same thing ... so, on quads and wider we see artifacts of the system not making optimal decisions on what to do when ...

But the key is as has already been stated ... the resource share is honored over time, more or less, ... with the projects you have selected it should be more ...
20) Message boards : Number crunching : Minirosetta 2.10 (Message 65745)
Posted 15 Apr 2010 by Profile Paul D. Buck
Post:
I woke to a crash of this task that put a pop-up dialog box on my computer that had to be dismissed. In the mean time, the task was "locked" so my computer was sub-optimal in running ... in essence I had an idle core ... un-good ...


Next 20



©2024 University of Washington
https://www.bakerlab.org