Posts by Scott Brown

1) Message boards : Rosetta@home Science : Reasons some people avoid BOINC projects (Message 12519) Posted 22 Mar 2006 by Scott Brown Post: Overzealous Moderation (Rosetta only): I stopped crunching for Rosetta some time ago because I disagreed with the overzealous nature of moderation of the message boards. While I was certainly in favor of some better organization, following moderated threads became overly tedious. More importantly, I suggested that, given the increased level of moderation, a more formal (and visible) statement of Forum policies was warranted (e.g., on the main web page, etc.). This was disregarded. I have watched these boards since and find that the moderation continues at this level, sometimes to the detriment of discussion. While checking offensive langauge is understandable and necessary, I remian uncomfortable with the moderation of posts even to the correction of the 'tone' of one's writing (e.g., see the "Optimized Client?" thread in the crunching forum). Having been involved with research extensively (e.g., reviewer for NSF, NIH grant work, etc.), part of my discomfort comes from a research ethics/rights of research participants standpoint. I have no idea as to the number of people that share this discomfort (though from post on other projects I know that i am not alone), so I will not even hazard a guess at its potential effect. (FYI, none of my posts--at least to my knowledge--was ever moderated here).
2) Message boards : Number crunching : Forum Section Ideas (Message 10188) Posted 29 Jan 2006 by Scott Brown Post: The thread has been quiet now for about two days. If everyone feels that what is on this thread represents a fairly complete set of the nature of the changes requested, I will send this information off to David Kim so he can look it over. ALL of the comments from EVERY post here (pro or con), will be included in the summation. I will request that either he or Dr. Baker post information outlining what they have decided as soon as they have the opportunity to look it over. After that, I would like to un-sticky the thread. Thank you all for your contributions and ideas. I would ask that you add my suggestion (from another thread) that an overarching forum policies section be placed on the Rosetta front page (similar to that at LHC@home) to this summation.
3) Message boards : Number crunching : Removal of posts containing offensive language (Message 9932) Posted 26 Jan 2006 by Scott Brown Post: I find it somewhat ironic that the very language that you use to explain offensive language would itself be considered offensive to many. The imagery of "mates...fishing" vs. "mixed company" directly implies quite strongly that women are to be protected from some sort of language that they are not equipped to handle. I realize that the number of women participating is much lower than the number of men (based on a BOINCStats.com poll), but if some of them are offended by being portrayed as such fragile beings, are you prepared to police yourself(or is sexist langauge not considered offensive...it is difficult to discern when there is no general forum policies document to consult and a moderator has, to at least a minor degree, used it in a post)? I dont see anything there that is sexiest to women other than the way your mind has pictured what the mod has said Exactly my point...what is offensive is defined by each individual. It has been stated earlier that the moderator should interpret what is offensive and delete accordingly. The moderator is not actually interpreting anything...without a set of guidelines (not a specific word listing, but a set of general policies for forum behavior), the moderator is simply applying his or her moral value system in the process of deleting offensive posts. While I am sure that Moderator9 (and all the other moderators) are wonderful people who are doing a very difficult job, I think it is unacceptable for this to be the moderating process. I should be clear here that I am not saying that moderators should not have any discretion. Instead, they should be interpreting a set of forum policies that are explictly provided to all forum users. This will provide more consistency in the moderating process and will provide forum users with a clear set of expectations (which is especially needed given the international composition of any BOINC-based project). That said, I think this is the last I will say about this. It appears from the response that few agree with just about any of my points, so I guess I will just drop it.
4) Message boards : Number crunching : Removal of posts containing offensive language (Message 9928) Posted 26 Jan 2006 by Scott Brown Post: ... I will remove posts based on language content that would be unacceptable in polite mixed company usage. ... That wording was intended to conjure an image, for each of us to use as a guide under which I will conduct my moderation duties. For most people that image is not of the sports pub, and it is not a bunch of mates out on a fishing trip. These are not normally considered polite mixed company gatherings. I find it somewhat ironic that the very language that you use to explain offensive language would itself be considered offensive to many. The imagery of "mates...fishing" vs. "mixed company" directly implies quite strongly that women are to be protected from some sort of language that they are not equipped to handle. I realize that the number of women participating is much lower than the number of men (based on a BOINCStats.com poll), but if some of them are offended by being portrayed as such fragile beings, are you prepared to police yourself(or is sexist langauge not considered offensive...it is difficult to discern when there is no general forum policies document to consult and a moderator has, to at least a minor degree, used it in a post)?
5) Message boards : Number crunching : Removal of posts containing offensive language (Message 9923) Posted 26 Jan 2006 by Scott Brown Post: @Moderator9 Thank you...it is an annoying job to moderate, and you have my respectful appreciation for your efforts. That said, I still maintain that, if such active moderation is to be undertaken, then a forum rules or forum policies link on the main page (similar to that at LHC@home) would be a good idea. (I guess you might also add this to your list of 6 points in the forum improvement thread).
6) Message boards : Number crunching : Removal of posts containing offensive language (Message 9808) Posted 25 Jan 2006 by Scott Brown Post: To ask for a precise list is asking the infinite as I have proved already here by using colloquial terminology detrimentally. Every swear word in every language would be hard to list so I think it`s best left to interpretation of the Mod and if necessary an appeal to him from the offended. I think you may be misinterpreting my intent (probably due to my quote of Moderator9's "overly specifc" terminology). I am not suggesting that precision is the issue (meaning a precise listing of terms), but rather that the project needs to provide clarity regarding the policy. For example, LHC@home provides a very explicit forum policy that is clearly noted on the main webpage. A sticky post in the forum is simply not displayed with the appropriate visibility. I also find it completely unacceptable that this be left entirely to each moderator's discretion (e.g., would each moderator agree without an overarching policy to follow?; what are the explicit sanctions for violation--or continued violation--of these rules?; etc.).
7) Message boards : Number crunching : Removal of posts containing offensive language (Message 9727) Posted 24 Jan 2006 by Scott Brown Post: Given the international flavor of all BOINC projects including Rosetta, you may want to be very explicit in your definition of offensive language. For example, you might use phrasing such as "...profanity and/or obscenity as defined using American English will be removed..." or something similar (I list American English given the location of the Rosetta project). I'd also assume that racial slurs and other such terms would be considered offensive. Since "offensive language" is open to interpretation, cultural differences, etc., I think it is necessary that you be "overly specific".
8) Message boards : Number crunching : Forum Section Ideas (Message 9707) Posted 24 Jan 2006 by Scott Brown Post: @Paul Buck Well said...and I agree. It is not a perfect forum, but no change is really needed (or at the most a modest modification similar to Einstein). @David Kim or David Baker In other posts, ideas for forum modification have been requested. I think it would be helpful to know if you intend to make such changes here and maintain them 'by hand' through BOINC upgrades as Paul Buck has noted, or is it your intention to use these suggestions to motivate BOINC (specifically Dr A) to change the baseline code using your influence as a project administrator (rather than direct requests/suggestions by users on the BOINC forums at boinc.berkeley.edu)?
9) Message boards : Number crunching : How to have the best BOINC project. (Message 6316) Posted 15 Dec 2005 by Scott Brown Post: Nice list Bill, although I agree that you haven't emphasized communication enough. Also in response to your (and Paul Buck's) comments about EAH, as a university professor myself I'd like to point out that we are at the end of the semester (in the U.S. at least). This means that someone like Bruce Allen is probably swamped with various administrative tasks before the holiday break (e.g., exams, etc.). A short drop-off in the communication frequency should be expected (as well as with some of the other U.S. university-based projects).
10) Message boards : Number crunching : code release and redundancy (Message 5559) Posted 8 Dec 2005 by Scott Brown Post: In the meantime, since it's all mysql based, couldn't you just run a few queries that pick up the average credit per WU and pinpoint the results that are abnormally high? I like this idea, but it also follows a very basic set of statistical assumptions that may be problematic. First, this can only be done on workunits for which all results are completed. On incomplete results, the downward bias produced by faster machines (noted below) would result in an artifically low threshold for what is defined as too high. Second, and more importantly, such a query assumes that we would know what the distribution of results should look like. If everything were truely random, we would likely end-up with a Normal or t distribution (basically bell shaped). However, without either strong theoretical reasons for identifying the nature of the distribution or substantial empirical evidence regarding what the underlying distribution is (and assuming that this does not vary by type of workunit), I am not sure that a true cutoff point could be clearly identified (for example, such a cutoff would be very difficult indeed if the distribution is bi-modal, etc.). Furthermore, a question might also arise regarding exceptionally low claimed credits. For example, (assuming a Normal distribution) if one were to throw out all results at 3+ standard deviations above the mean, would one then also be obligated to do something with results claiming more than 3 standard deviations less than the mean or median of the distribution. This also brings up a final point: all work units are bounded by zero credit claimed which means that the distribution of claims will necessarily vary as the mean claim moves away from zero (though the distribution will stabilize as claims reach a point sufficiently far from zero to nullify this effect).
11) Message boards : Number crunching : code release and redundancy (Message 5366) Posted 7 Dec 2005 by Scott Brown Post: If I recall correctly you can finish a model and the total credit was 5,700 and change. When they run the background jobs this is raised to 6,805.26 urp! ah that is the credit for the 72nd trickle ... And PrimeGrid is using only 1 result, but, variable credit. Paul, Would I be correct in assuming that PrimeGrid has less issues with varaibility in credit claims because it is more reliant on integer rather than floating -point calculations?
12) Message boards : Number crunching : code release and redundancy (Message 5364) Posted 7 Dec 2005 by Scott Brown Post: Pseudo-Redundancy That said, I still thing pseudo-redundancy has a lot going for it. For all of the folks worrying about accuracy, the quorum sizes would be in the thousands. The issues about being paired with a Mac user, and the like would disappear. You wouldn't be getting "average" credit anymore than with any other quorum system. A credit amount would be assigned based on the median, but the rate at which you gained credit would depend on the precise speed of your boxes relative to this average. I am very concerned that the statistical issues involved in this discussion are not being fully understood. First, I must agree with Bill and Paul that lack of clarity regarding how, and more importantly when, the distribution of results would be assessed for assigning credit is fundamentally problematic. There are no clear criteria being proposed for this, and the potentially excessive time involved to obtain credit is likely to turn substantial numbers of users away. More importantly, the assumptions underlying this approach (particularly that unusually high and low credit claims are temporally spread via a random process) are clearly untrue. Across projects, we have observed that faster machines return units more quickly and (if not optimized) claim less credit. The opposite holds for slower machines. Optimized BOINC clients complicate this picture, but since we have no reason to suspect that faster or slower machines are more likely to use optimization, then the effect should not be significant. Thus, the granted credit from these distributions will necessarily more heavily reflect the results returned by the fastest machines (and therefore, are likely to be downwardly biased). I would also point out that it has been incorrectly claimed below that this process would reduce "noise" in the results. Noise ( or error statistically) is the variation around a measure of central tendancy (e.g., mean, median, mode, a regression paramter, etc.). Collecting thousands of results does nothing to reduce the noise in a distribution; it simply increases the accuracy by which characteristics of that distribution are estimated. In other words, it does not matter whether one uses 4 , 10, or 100 units for redundancy, the actual spread of the true distribution always remains the same (i.e., this procedure does nothing to reduce the vast spread in claims based on benchmarking which is the fundamental problem--see Ingleside's posts below). Given the fundamental biases created by faster machines likely dominanting the credit determining results and issues with optimized BOINC clients, the effect of a "psuedo-redundancy"-only approach is at best to mute these differences (and Paul is also right here--this needs a better name). I believe that you are too quick to dismiss FLOPs counting and to view the two options as mutually exclusive. In many ways, the complimentary application of the two alternatives solves the inherent problems in both. FLOPs counting seems to reduce the fundamental problem of a large variance in claimed credit, while "pseudo-reundancy" provides more accurate estimates of distributional parameters that allow for the detection of and credit denial for claims that are likely the result of overt code manipulation.
13) Message boards : Number crunching : The cheating thread (Message 4625) Posted 28 Nov 2005 by Scott Brown Post: Since there is no rule against using an optimized BOINC client, it cannot be called cheating. Funny. First I never said anything against optimized clients... Funny, that part of my comment was clearly directed at the thread rather than you directly (unlike my second comment). @Stephen Be very careful what you chose to post publicly... First of all, that's nonsense. Second, Poorboy was right, I tried to denounce cheating and now I'm 'made to be the bad guy'. Third, I'm getting tired of those patronizing, omnimous comments about what I can or cannot say on those boards. Sorry, I wasn't clear here. I did not mean a standard forum posting, but rather was commenting on your proposed "list" creation. Such a public listing would be subject to more serious legal scrutiny. More importantly, my comment was merely a warning as I did not want to see you or the project in hot water. It sure as hell wasn't (and was not intended to be) patronizing! Finally, let's lock/delete/ignore this thread... Done.
14) Message boards : Number crunching : The cheating thread (Message 4566) Posted 28 Nov 2005 by Scott Brown Post: Well, just a quick point...cheating is, by definition, doing something that is expressly forbidden (e.g., copying someone else's test answers, using soemone else's writing as your own, etc.). Since there is no rule against using an optimized BOINC client, it cannot be called cheating. @Stephen Be very careful what you chose to post publicly regarding any specific claims of cheating. Those individuals, given the 'public' nature of such a posting, could take legal recourse against both you and the project (if it allowed the posting).
15) Message boards : Number crunching : So is it cheating or not? (Message 3488) Posted 17 Nov 2005 by Scott Brown Post: Yes, the benchmarking process is flawed, but to say there is no reason to run an optimized BOINC client is incorrect. It's a useful workaround until the credit process is overhauled. I think you didn't get my point. Since the benchmarking system is fundamentally flawed and unfair, it makes no difference whether optimized clients are allowed or not in the sense of system flaws and system fairness. Not only did I not say that there was "no reason to run an optimized BOINC client", I actually use an optimized client (TRUX's 5.3.1) because I give a substantial resource share to an optimized SETI.
16) Message boards : Number crunching : So is it cheating or not? (Message 3397) Posted 16 Nov 2005 by Scott Brown Post: It's not fair. It doesn't matter to me if you "mainly run seti" or not. Let's face it, there's only a few of us that even have to be concerned about being in the top 10 (i.e real competition), so it shouldn't be a big deal, but a level playing ground would be nice. A level playing field would be nice, but since when has this ever been the case with BOINC? I think those individuals with LINUX and sometimes MAC boxes would have liked to have fairness from day one. The simple truth (as Bill noted below) is that the benchmarking for credit idea is the problem, and a level field will never occur until the credting system is changed!
17) Message boards : Number crunching : code release and redundancy (Message 2811) Posted 10 Nov 2005 by Scott Brown Post: I agree with most here regarding concerns over completely releasing code into the 'wild'. However, we should not forget (as Paul briefly mentioned), the benefits that have come from optimized SETI code. Quite simply, optimized clients have produced up to double throughput levels for some machines. Given Rosetta's inherent need for increased CPU power, such increased throughput is fundamentally important. Thus, I would suggest that a modified open release occur. Specifically, release the code into the wild with a standard test workunit available. Optimized clients could then be created and submitted back to Rosetta for approval. Rosetta would then, upon approval, provide the optimized client through the official website only. Put an official Rosetta signature on these clients such that any unsigned clients would not be validated. This does not address the concerns with cheating, however. Cheating by gaming benchmarks (or FLOP counts, etc.) would still be possible since such measurss are produced by the BOINC core which could be unofficially optimized to return absurdly high values for these. I can see only two solutions to this problem: 1) Rosetta could lobby/demand that the UCB staff create a similar 'official stamping' process for optimized BOINC cores or 2) What Paul said...redundancy is required.
18) Message boards : Number crunching : Is it time to verify Results (Message 2626) Posted 8 Nov 2005 by Scott Brown Post: But cheating can still be a significant problem. If credits are my only (or main) motivation, then why wouldn't I push the limits in this project. I could (as Janus suggested) artificially generate high energy scores that would never be checked (based on David's post below that only low energy units are reexamined). Add in some optimized routines and, voila, I can generate credit as fast as I desire. The problem here is one that those of us from the old SETI@Home Classic days remember well. I would also suggest that David's "lottery ticket" analogy isn't quite correct. As is plainly clear in his post, maximizing resources is a premium here. In the analogy of lost lottery tickets, no opportunity costs are considered. Since lottery tickets have monetary expense, the loss is not irrelevant. For the project, David's logic applies only to the individual workunit--if it is lost, another randomly targeted unit will indeed have equal likelihood of obtaining a useful result. The problem with the analogy occurs at the project level. Given finite computing resources, lost workunits cost the project (both indirectly in the sense of donated computing time that is essentially unused and directly in how those lost workunits load the project's infrastructure). Thus, at some compositional threshold, a redundancy factor of 2 (as Janus suggested) actually does complete more useful work than the non-redundant runs. If the lost donated time were the only issue, then the threshold could simply be computed as 50% of the users or hosts. However, the difficulty lies in calculating that threshold given the inherent complexity of computing the infrastructure load. This additional loss necessarily means that the actual threshold lies below 50% of users/hosts (and could perhaps be a very samll percentage). Thus, I would argue, a minimum level of redundancy is the best route.
19) Message boards : Number crunching : Rosetta crashes on pausing (Message 1215) Posted 10 Oct 2005 by Scott Brown Post: As for myself I already had my Preferences set to Leave In Memory when I joined the Rosetta Project. So it should have Propagated across to it when I Attached to the Project. I also checked later on to make sure it was showing to Leave In Memory here at this Projects Preferences ... Are you attached to other projects? If so, you need to make sure that the prefs are set to leave in memory at all. Otherwise, your machine will alternate between settings as it contacts the separate projects (had this happen to me when I first joined SZTAKI and forgot to switch the default pref).