Sanity check: Do buggy work units cast doubt on all results?

Message boards : Number crunching : Sanity check: Do buggy work units cast doubt on all results?

To post messages, you must log in.

AuthorMessage
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80013 - Posted: 4 May 2016, 0:20:29 UTC

This is basically s fork from a thread about apparent teething problems with Ubuntu 16.04:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6820

As mentioned in that fork, I'd also been watching an obviously sick work unit running on my Mac. This morning, after 36 days and 10 hours, it finally reached 100%. It then went to the ready to report status, but did not get reported, even though several other work units were completed and reported past it. Then it decided to start again. From 0%. Apparently 36 days of work lost. After it ran for a short time, it went to the computation error status, and later on it finally disappeared. Good riddance.

Still, it does raise the question of how much any of the Rosetta results should be trusted. (*1)

In related news, I decided to check the credit status for the 35 days and 10 hours of work. Though I had made a note of the work unit's identity, I couldn't do it. (*2) Perhaps reasonable that it had been purged from the machines list, though in searching for it, I noticed that the large majority of the Mac's work units had been granted less credit than requested, often much less, while only a few received more, and only slightly more in those cases. This seems to be quite typical for all of my machines. (I could not see the posted credit information from the old thread, perhaps due to a permission problem if it was internal to someone else's account.)

My suggestion on the credit thing to increase motivation and reduce frustration is that the request should never be more than the grant. The request should always be the minimum, and if the unit is ineligible for credit, even if that's because of bugs in the Rosetta code, then the request should become zero.

It might be nice if there were a way to see why more credit was granted, but I think granting less credit should be avoided on principle.

*1: I'm reminded of an old incident, possibly involving Dijkstra, in which a major theorem was proven by a computer program, but when the program was examined, it was determined to contain bugs. Possibly one of the failed attempts on the 4-color theorem? (Obviously my memories are getting a bit fuzzy these years, but if I needed the details, I bet I could find them on the Web...)

*2: My usual work unit search is by machines, but perhaps there is a better approach from the work unit ID. Therefore I'll note that it was P37194_DF03843_1-150_EN_MAP_hyb_cst_v02_K (according to a handwritten note).
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 80014 - Posted: 4 May 2016, 0:49:49 UTC
Last modified: 4 May 2016, 0:57:23 UTC

AFAIK, the rosetta credit system goes something like this:
Requested credit = time * synthetic benchmark ran by BOINC.
Granted credit = number of models generated per WU (i.e. real work).

I reduce the allocated time per WU to something like 4 hours in order to avoid "losing time" running "faulty" WUs.
You're pulling about 6k of RAC, which is pretty high among R@H contributors.

Just set your allocated time per WU to 4hrs. Then set "leave tasks in memory while suspended" under BOINC settings, and of course, give enough RAM (in % of total; at least 500MB per WU you're planning to run), and you should be good to go.
ID: 80014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80016 - Posted: 4 May 2016, 2:22:19 UTC

The actual compute resource required varies slightly for each model (and sometimes more than slightly) computed within a work unit. But there are basically two types of systems in the world, those that produce more work results than their performance running the BOINC benchmark would predict, and those that produce less. By your report, your machine fairly consistently is granted less credit than it claims. This indicates that while your machine is able to really scream on the BOINC benchmark, when it does actual R@h work, it is not outperforming.

The "claimed" credit comes from the BOINC Manager on each host. I believe you are trying to say a machine should never be "granted" less credit than it claims, but that is exactly why this customized credit system was developed in the first place. When you've got tens of thousands of people out there, some of them try to cheat the system. They modified their results to claim dramatically inflated amounts of credit. So the current system was developed to make the credit YOU are granted based on what your peers are reporting for how difficult it is to produce each model... rather than how difficult YOU report they are. Your report (fabricated or not) is still taken in to account and rolled in to the average used for granting future credit, but it was enough to remove the incentives that were in place to spoof the credit system.

Here at R@h we refer to those days as the "credit wars". People were calling each-other cheaters and credit whores. Others were insisting their machine should be granted credit based on what the manufacturer told them it could do (obviously "could do" when running something other than R@h work). It was very ugly. The system of issuing credit currently in place has very much created a system where credits granted are meaningful. You can't simply edit a file on your machine and receive credits.
Rosetta Moderator: Mod.Sense
ID: 80016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 80023 - Posted: 5 May 2016, 15:37:40 UTC - in response to Message 80013.  

As mentioned in that fork, I'd also been watching an obviously sick work unit running on my Mac. This morning, after 36 days and 10 hours, it finally reached 100%. It then went to the ready to report status, but did not get reported, even though several other work units were completed and reported past it. Then it decided to start again. From 0%. Apparently 36 days of work lost.

Apparenty 36 days of doing nothing lost. If there's actually some work being done, the max. CPU-Time used for a WU is your "Target CPU run time" + 4 hours so whatever you have set there, no more than 28 hours.

Next time this happens, simply restart the WU (by restarting the BOINC client for example). Waiting over a month after the deadline is nonsense, you could have checked the CPU usage and than you would have seen, that the WU is not doing anything.
.
ID: 80023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80027 - Posted: 6 May 2016, 6:02:20 UTC - in response to Message 80023.  

As mentioned in that fork, I'd also been watching an obviously sick work unit running on my Mac. This morning, after 36 days and 10 hours, it finally reached 100%. It then went to the ready to report status, but did not get reported, even though several other work units were completed and reported past it. Then it decided to start again. From 0%. Apparently 36 days of work lost.

Apparenty 36 days of doing nothing lost. If there's actually some work being done, the max. CPU-Time used for a WU is your "Target CPU run time" + 4 hours so whatever you have set there, no more than 28 hours.

Next time this happens, simply restart the WU (by restarting the BOINC client for example). Waiting over a month after the deadline is nonsense, you could have checked the CPU usage and than you would have seen, that the WU is not doing anything.


According to the CPU information I could see at the time, the work unit was doing something, using 100% of its own cycles, but the effective rate of progress was obviously quite low. My understanding is that the work allocation is only estimated, and your mileage may vary depending on the data.

My intuition was that this work unit should fail on some kind of sanity check, and yet it never did. It was also a kind of test of the stability of the Mac. (First one I've actually owned.)

On the other topic of excessive enthusiasm, I certainly appreciate that problem. It was really nuts in the first seti@home (pre-BOINC) days. However, what I was trying to suggest was that the claim algorithm should ask at the bottom end of the possible range so people don't see the results in a negative way, but it always looks "generous" in the grant. However, my new theory is that they 'cured' the excessive-enthusiasm problem by making the rules of the credit game too difficult for the gamesters to play.

Most important aspect did not seem to be addressed in the comments here (or elsewhere in the website). The Rosetta projects do not seem to act like they are well programmed. I've commented in more detail elsewhere about the computation errors, though I didn't note that my own view is that well written programs do not normally terminate in an error state. Any normal outcome that you understand should not be an "error", but should have some other description. The erratic check-pointing behavior has caught my attention before, though I think it should properly be the BOINC Manager's responsibility to recognize reasonably normal shutdowns and save the current machine states. The deadlines are just confusing and apparently pointless, and I've never understood why volunteers should care or be penalized for any of that, but as a clumsy design decision, it still reflects on the reliability of the code...

At what point are the results themselves cast into question? If we are supposed to be supporting research, perhaps we're just doing rough first-pass calculations, and everything that seems significant has to be recoded more carefully for the production runs...
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80031 - Posted: 6 May 2016, 14:06:42 UTC - in response to Message 80027.  
Last modified: 6 May 2016, 14:10:15 UTC

At what point are the results themselves cast into question?


The process is designed in a way that assures the answer to that question is "no, there is not point where doubt is cast over the results". Progress and results are not reliant upon any one host, user or work unit performing as expected. i.e. the system is extremely fault-tolerant.

At a high level, to study a protein there may be 10,000-100,000 models crunched, with various constraints and assumptions pursued. The results will show about 10 models of interest that have low energy. These 10 can easily be verified and are sometimes used as another starting point for designing additional runs to further explore the possibilities more similar to the initial successes.
Rosetta Moderator: Mod.Sense
ID: 80031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 80034 - Posted: 6 May 2016, 16:03:28 UTC - in response to Message 80027.  

According to the CPU information I could see at the time, the work unit was doing something, using 100% of its own cycles, but the effective rate of progress was obviously quite low. My understanding is that the work allocation is only estimated, and your mileage may vary depending on the data.

Hmm... this is strange, the watchdog should kill it if the CPU time is more than 4 hours over your preference.
.
ID: 80034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80060 - Posted: 10 May 2016, 1:21:44 UTC - in response to Message 80031.  
Last modified: 10 May 2016, 1:22:46 UTC

At what point are the results themselves cast into question?


The process is designed in a way that assures the answer to that question is "no, there is not point where doubt is cast over the results". Progress and results are not reliant upon any one host, user or work unit performing as expected. i.e. the system is extremely fault-tolerant.

At a high level, to study a protein there may be 10,000-100,000 models crunched, with various constraints and assumptions pursued. The results will show about 10 models of interest that have low energy. These 10 can easily be verified and are sometimes used as another starting point for designing additional runs to further explore the possibilities more similar to the initial successes.


Your [Mod.Sense's] first paragraph is certainly an interesting interpretation of what I wrote. Perhaps I confused the issue with the largely incidental reference to personal credit/points? If the software is flawed, it does not matter how many computers it runs on. From what I can see at the bottom of the trench, the quality assurance is failing to reassure--but that seems to be about par these days.

Returning to the credit question, why not put the minimal bandage on it? If the claim is lower than the grant, forget about the claim and just show the grant. (By the way, based on the latest "clarifications", I now think the reduced credit is not related to the deadlines. Which returns to the old question of why bother us with deadlines? Or perhaps I should be more curious about the occasional grants of extra credit?)
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80065 - Posted: 10 May 2016, 13:33:51 UTC

The deadlines are used by the BOINC Manager to prioritize processing of work from various projects, with various runtime requirements. I've also notice R@h has been sending small numbers of tasks with 2 day deadlines, rather than the normal 10 or 14 days. They are doing some runs for their entries in CASP, and these require a faster turnaround than the normal research that is underway.

The reason for describing the insulation from specific tasks that may not run properly is that you seem to be talking about a very small number of tasks that ran for weeks, or looked to have problems in other ways. I know that overall, the science being done on R@h is proving to deliver better structure predictions than the rest of the scientific community of the world. This is proven every two years by the results from CASP. So, I know the science is progressing, even if some of the 100,000 uncontrolled client host environments show some quirks from time-to-time.
Rosetta Moderator: Mod.Sense
ID: 80065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80077 - Posted: 12 May 2016, 5:17:23 UTC - in response to Message 80065.  

The deadlines are used by the BOINC Manager to prioritize processing of work from various projects, with various runtime requirements. I've also notice R@h has been sending small numbers of tasks with 2 day deadlines, rather than the normal 10 or 14 days. They are doing some runs for their entries in CASP, and these require a faster turnaround than the normal research that is underway.

The reason for describing the insulation from specific tasks that may not run properly is that you seem to be talking about a very small number of tasks that ran for weeks, or looked to have problems in other ways. I know that overall, the science being done on R@h is proving to deliver better structure predictions than the rest of the scientific community of the world. This is proven every two years by the results from CASP. So, I know the science is progressing, even if some of the 100,000 uncontrolled client host environments show some quirks from time-to-time.


I don't know why we seem to be having so much trouble communicating here. Writing as someone who was a professional technical editor of research papers for many years, I hope that any papers you [Mod.Sense] write for publication are easier to understand.

I am NOT saying that there is anything wrong with your prioritizing the work units in ANY way that advances the science, or for that matter for your own amusement or for any other reason. What I am trying and repeatedly failing to say is that it looks like my efforts to help you are partly wasted.

Even if you are tossing away many of the donated results, you don't have to do it in a discouraging way. Actually, I think you should use the late results to double-check the timely ones, but that's again becoming a peripheral question. More central is the again discouraging appearance of "requested" credit that is much smaller than "granted" credit with the suggestion that the donor has somehow earned a penalty that need not exist.

By your replies, even if they are unclear, I think you are sincere and sort of appreciate my contributions (along with the contributions of so many other people), but I think my own are about to go WAY down. I've decided to stop making the moderate efforts required to complete work units in a "timely" and relatively efficient manner.

I can provide details of the situation, but I probably don't understand it anyway. The short summary is that my employer wants to save electricity, and by returning to the default settings almost none of your work units will be completed on time during the day, when I use a computer for longer periods. At home, I use more computers, but not for long periods. I have been trying to keep them running for long enough to make progress on Rosetta projects, but I'm going to stop doing so. It will be interesting to see how much my daily contribution declines.

Having said that, I suspect my new status is more similar to the bulk of your estimated 100,000 machines. Perhaps many of them are also making reduced and inefficient contributions due to your default settings?
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,460,131
RAC: 16,617
Message 80078 - Posted: 12 May 2016, 14:57:11 UTC - in response to Message 80077.  

I think that any Rosetta job started before the deadline should receive the full credit it would have received whether or not the results are useful to the researcher. BOINC will not start a job after the deadline, and I don't think the burden should be on the cruncher to monitor these cases. There is no reason that I can think where this would be a problem.

I used to think (before a discussion with Greg_BE) that the CLAIMED versus GRANTED mechanism was confusing and unnecessary. Now that I think I understand what they are doing the CLAIMED/GRANTED numbers have some value. They help identify an overloaded machine since Rosetta is easily disrupted.

There are 3 possible relationships between these two numbers:

CLAIMED is
1. greater than the GRANTED value, which annoys people.
2. equal to the GRANTED value, which is expected.
3. less than the GRANTED value, which probably pleases people.


What appears to be happening when CLAIMED is greater than GRANTED is the machine is running other programs that are substantially disruptive to the Rosetta job. The Rosetta work makes slower progress than expected. This is the case where the machine was able to run the small BOINC benchmarks more effiently than the Rosetta job.


EXAMPLE of what I was thinking:
ASSUME:
Suppose you have 2 identical machines, that run the BOINC benchmarks exactly the same.
Suppose both machines are assigned the exact same Rosetta job.
Suppose the Rosetta job completes before the TARGET RUN TIME expires (this implies you can look at compute TIME to describe a fixed amount of work for this example).



CASE #1
Suppose both machines are IDLE other than the Rosetta job.
Both machines will finish the job in the same amount of time. Both will CLAIM the same amount of credit and be GRANTED the same amount of credit.
If the Rosetta CLAIM/GRANTED system worked perfectly, I would expect the CLAIM would equal the GRANTED. Actually, I would not be surprised if the GRANTED credits exceeded the CLAIMIED because few machines run a Rosetta job on an IDLE system ... where computing is most efficient.



CASE #2
Suppose one machine is IDLE and the other machine is BUSY with other jobs.
The IDLE machine will complete the Rosetta job faster than the BUSY machine because other jobs will intefer with Rosetta execution, .... and
I think ....
The BUSY machine will CLAIM more credit than the IDLE machine will CLAIM since the BUSY machine thinks it did more work.
The BUSY machine will be GRANTED less credit than CLAIMED.
The IDLE machine will (typically) be GRANTED more credit than CLAIMED rather than the "ideal" case where the CLAIMED would be GRANTED.

I say "typically" because the average machine is not running Rosetta AS efficiently as it does the benchmark code.




The deadlines are used by the BOINC Manager to prioritize processing of work from various projects, with various runtime requirements. I've also notice R@h has been sending small numbers of tasks with 2 day deadlines, rather than the normal 10 or 14 days. They are doing some runs for their entries in CASP, and these require a faster turnaround than the normal research that is underway.

The reason for describing the insulation from specific tasks that may not run properly is that you seem to be talking about a very small number of tasks that ran for weeks, or looked to have problems in other ways. I know that overall, the science being done on R@h is proving to deliver better structure predictions than the rest of the scientific community of the world. This is proven every two years by the results from CASP. So, I know the science is progressing, even if some of the 100,000 uncontrolled client host environments show some quirks from time-to-time.


I don't know why we seem to be having so much trouble communicating here. Writing as someone who was a professional technical editor of research papers for many years, I hope that any papers you [Mod.Sense] write for publication are easier to understand.

I am NOT saying that there is anything wrong with your prioritizing the work units in ANY way that advances the science, or for that matter for your own amusement or for any other reason. What I am trying and repeatedly failing to say is that it looks like my efforts to help you are partly wasted.

Even if you are tossing away many of the donated results, you don't have to do it in a discouraging way. Actually, I think you should use the late results to double-check the timely ones, but that's again becoming a peripheral question. More central is the again discouraging appearance of "requested" credit that is much smaller than "granted" credit with the suggestion that the donor has somehow earned a penalty that need not exist.

By your replies, even if they are unclear, I think you are sincere and sort of appreciate my contributions (along with the contributions of so many other people), but I think my own are about to go WAY down. I've decided to stop making the moderate efforts required to complete work units in a "timely" and relatively efficient manner.

I can provide details of the situation, but I probably don't understand it anyway. The short summary is that my employer wants to save electricity, and by returning to the default settings almost none of your work units will be completed on time during the day, when I use a computer for longer periods. At home, I use more computers, but not for long periods. I have been trying to keep them running for long enough to make progress on Rosetta projects, but I'm going to stop doing so. It will be interesting to see how much my daily contribution declines.

Having said that, I suspect my new status is more similar to the bulk of your estimated 100,000 machines. Perhaps many of them are also making reduced and inefficient contributions due to your default settings?

ID: 80078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 80079 - Posted: 12 May 2016, 18:15:05 UTC - in response to Message 80077.  

The short summary is that my employer wants to save electricity, and by returning to the default settings almost none of your work units will be completed on time during the day, when I use a computer for longer periods. At home, I use more computers, but not for long periods. I have been trying to keep them running for long enough to make progress on Rosetta projects, but I'm going to stop doing so. It will be interesting to see how much my daily contribution declines.

Simply decrease the target CPU run time to whatever seems suitable to the new rules and enable in BOINC Manager "Leave applications in memory while suspended" and allow computing when in use. And eventually reduce your WU cache.

I don't see there any problem, even a computer which is powered on for just one hour a day should be able to return results in time as long as you choose appropriate settings.
.
ID: 80079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80086 - Posted: 16 May 2016, 1:32:07 UTC - in response to Message 80079.  

The short summary is that my employer wants to save electricity, and by returning to the default settings almost none of your work units will be completed on time during the day, when I use a computer for longer periods. At home, I use more computers, but not for long periods. I have been trying to keep them running for long enough to make progress on Rosetta projects, but I'm going to stop doing so. It will be interesting to see how much my daily contribution declines.

Simply decrease the target CPU run time to whatever seems suitable to the new rules and enable in BOINC Manager "Leave applications in memory while suspended" and allow computing when in use. And eventually reduce your WU cache.

I don't see there any problem, even a computer which is powered on for just one hour a day should be able to return results in time as long as you choose appropriate settings.


Not allowed to suspend machines at work due to security considerations.

I also want to acknowledge rjs5's interesting and detailed explanations, but my secondary focus is on simplicity, especially from the perspective of the volunteers who are trying to donate work units.

My primary focus these days is on better use of my time, so I guess from that perspective my primary concern is with the amount of time BOINC-related and Rosetta-related distractions consume. The obvious conclusion is that I should run along, but I'll try to remember to drop by in a few days.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,536,805
RAC: 15,887
Message 80087 - Posted: 16 May 2016, 3:33:27 UTC - in response to Message 80086.  

The short summary is that my employer wants to save electricity, and by returning to the default settings almost none of your work units will be completed on time during the day, when I use a computer for longer periods. At home, I use more computers, but not for long periods. I have been trying to keep them running for long enough to make progress on Rosetta projects, but I'm going to stop doing so. It will be interesting to see how much my daily contribution declines.

Simply decrease the target CPU run time to whatever seems suitable to the new rules and enable in BOINC Manager "Leave applications in memory while suspended" and allow computing when in use. And eventually reduce your WU cache.

I don't see there any problem, even a computer which is powered on for just one hour a day should be able to return results in time as long as you choose appropriate settings.

Not allowed to suspend machines at work due to security considerations.

You've misunderstood what's being said. The comment doesn't refer to suspending the whole computer, but refers to Boinc suspending processing of tasks when it exceeds the default parameters set up within it.

Default Boinc parameters are plain daft. Unadjusted, I'd be surprised if they allowed anything to be completed, which is kind of what you're reporting in the first instance.

In addition, by not ticking "Leave non-GPU tasks in memory while suspended" in Boinc's "Computing Preferences" under the Options menu (Disk and Memory tab) there are known issues with tasks constantly reverting to a previous checkpoint.

So tick it, and cast your eye over the other settings on that tab to ensure they make sense and don't impose further /restrictions/ on runtime, and hopefully it'll help /prevent/ tasks from being suspended, which appears to tie in with what you're asking.

Hope that helps.
ID: 80087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Sanity check: Do buggy work units cast doubt on all results?



©2024 University of Washington
https://www.bakerlab.org