Posts by Hoelder1in

21) Message boards : Number crunching : Discussion of the new credit system (Message 24947) Posted 26 Aug 2006 by Hoelder1in Post: Found one. Very interesting. So there seems to be a portion, even among the larger and medium-sized teams, that were more or less flat last week (Phoenix Rising, AMD User, to pick some examples). I also find it interersting that some of the national teams (Czech national team, Hungary) saw considerable gains last week. There could be several reasons for that but one that comes to mind is that these countries might have fairly large shares of Linux users...
22) Message boards : Number crunching : Discussion of the new credit system (Message 24863) Posted 25 Aug 2006 by Hoelder1in Post: There is a nice upward trend in active users, active teams, new users per day, and new teams per day... nice to see! Look at boincstats here. :-) ...yes, Rosetta active teams are rising (by about 1% or 20 teams in the last couple days) while Einstein and SETI active teams are more or less flat. I'd like to hear Occam's take on that (haven't seen him on the boards for some time, perhaps he stopped crunching ;-).
23) Message boards : Rosetta@home Science : Solid answer needed, for who do we do this? (Message 24835) Posted 25 Aug 2006 by Hoelder1in Post: Sorry, I'm not really finding the answer. ... Error rates are lower than ever (2% Linux, 5% Windows, 6% Mac) (open, closed, closed-er) www.rosettacommons.org Does this answer your question ? -H. (linux user trying to avoid Windows whenever possible ;-)
24) Message boards : Number crunching : Discussion of the new credit system (Message 24765) Posted 24 Aug 2006 by Hoelder1in Post: The new RAC is calculated like the old one. It's complicated and I never understood it. The description is here I think there is an error on the wiki page you linked to: the formula given in the text does not agree with the computer code that is listed on the page. To agree with the computer code the formula should be RAC(new)=RAC(old)d(t) + (1-d(t))credit(new)/(t/86400) which I think makes more sense: the new RAC is a weighted mean between the old RAC and the new credit/(time in days since the last update) where the weighting function decays the old RAC with a half-time of one week. Well, I agree it is somewhat confusing... ;-)
25) Message boards : Number crunching : Discussion of the new credit system (Message 24752) Posted 24 Aug 2006 by Hoelder1in Post: It's a bit easier to explain downturn/upturn in terms of vacations and summer travel (in the N. hemisphere!)... "Occam's Razor! Occam's Razor!" Agreed, I hadn't checked the Einstein and SETI graphs. So the upturn doesn't only happen in Rosetta. And yes, I am a big fan of Occam's Razor. ;-)
26) Message boards : Rosetta@home Science : Christoph Jansen's formula of protein size and model completion times (Message 24743) Posted 24 Aug 2006 by Hoelder1in Post: I guess this is good news for the project as we only need to increase the total amount of CPU power exponentially, and not the power of individual computers! On the other hand, the power of individual computers does increase exponentially with time (the 18 months doubling time of computing power described by Moores's Law)...
27) Message boards : Number crunching : Discussion of the new credit system (Message 24734) Posted 24 Aug 2006 by Hoelder1in Post: The median would be 30, the mean with the lowest and highest observation thrown out would be 31. Straight mean is 37. I made that same suggestion some time ago and David Kim explained that the only numbers he has available are the cumulative claimed credit and cumulative number of returned models for each batch of WUs, so he can't calculate the median from that. Anything else would require some lengthy messing with the database. And yes, he also mentioned a correction factor to account for outliers that would need to be reviewed occasionally.
28) Message boards : Number crunching : Discussion of the new credit system (Message 24707) Posted 24 Aug 2006 by Hoelder1in Post: Many new users and hosts have arrived over the past day... a good sign! Users (last day ) : 78,055 (+377) Hosts (last day ) : 172,223 (+709) Though there's been a drop of active users (and about 10% drop of active hosts) Yes, there has been quite a dramatic drop in active users and hosts over the last two months - not just in Rosetta but in all of BOINC. But since yesterday active hosts and users seem to be on the rise again in Rosetta while the numbers for all of BOINC are still low. It seems people are voting with their feet (or rather their mouse and keyboard) for the new credit system.
29) Message boards : Rosetta@home Science : Christoph Jansen's formula of protein size and model completion times (Message 24630) Posted 24 Aug 2006 by Hoelder1in Post: After all I am a chemist... I guess I was awfully bad in the chemistry lab or else I might have become a chemist. ;-)
30) Message boards : Rosetta@home Science : Christoph Jansen's formula of protein size and model completion times (Message 24593) Posted 24 Aug 2006 by Hoelder1in Post: I copied this very interesting post by Christoph Jansen over from the Ralph forum because my comments would have been completely off-topic there: ...I have done a check of some 20 Rosetta WUs and have found out that the time to calculate one decoy is pretty exactly proportional to the number of amino acids in the protein to the power of 1,3. My formula is (number of amino acids)^1.3n(decoys) / time = const.(for a given machine) It yields pretty good values that vary by an average of 2.3% around the median. I am still collecting numbers to compare, but the latest two samples I put in after adjusting the proportionality factor had 99,9 and 100,2 of the average "work factor" for my machine. And the length of proteins varies from 28 to 157 amino acids, which is a factor of nearly six in length. I think it is reassuring that the model completion times only grow polynomially with the size of the protein and even with such a small exponent. One might have feared that, since the size of the parameter space grows exponentially with the number of amino acids in a protein, so do the model completion times, in which case the dependence would be: <something>^<number of amino acids> n/time = const, or more conveniently <something> * <number of amino acids> + log(n/time) = const So perhaps it is the number of required models (to reach a desired rmsd) that scales exponentially with the size of the protein, rather than the individual model completion times ? Or perhaps it is not protein size but contact order (how often the chain touches itself in the folded state - I hope I am right about that) which determines how many models are needed ? Well, you can't determine this from the data you have, Christoph but I am sure the Baker lab have figured this out. These scaling laws seem to be an excellent way to test the quality of the different algorithms (imagine that with your analysis you could determine that for one particlular WU type the exponent is, say 1.15, rather than 1.3...). This is all very interesting and thought-provoking (much more so than the credit stuff) !
31) Message boards : Number crunching : New Crediting system: questions (Message 23981) Posted 21 Aug 2006 by Hoelder1in Post: These are the two main issues I want resolved ... I just wanted to point out that on your machine the averaging out of the granted work credit seems to have worked quit well. So far you did 19 WUs for which work credit was granted. For the first 9 of these the sum of the granted work credit is 261.61, for the second set of 9 WUs it is 248.07 - with a difference of just 5%. Something else to note: It would require a quite detailed and lengthy analysis to establish that one type of WU grants more work credit than another. You would need at least something like 10 WUs of the same type to average out the algorithm-based randomness on the model level. These WUs should perferrable be from the same host or else you would also have to account for the different CPU speeds. Since the WU names don't get exported by XML this analysis would have to be done by extracting the information from the Web pages. I suggest that anyone capable of doing this anaysis and writing a suitable script should go to the stockmarket and try to identify stocks that are doing better than others. ;-) There is one other complications: If the following statement by David Kim (from a few days ago in this thread) is still valid We may actually have the credit/model values adjust for all work units as results come in and use ralph tests to serve as a starting point. than once you did your statistical analysis, the credit/model for a specific WU may likely have been adjusted by new data coming in while you did the analysis - once you have enough data to establish that one WU grants more or less credit than another, than so does the script that calculates the credit/model on the server and the inequality will be corrected. So bottom line, I very much doubt that we will ever see anyone trying, let alone succeeding, with this trick - in which case there would still be the counter measures that David K mentioned...
32) Message boards : Number crunching : New Crediting system: questions (Message 22662) Posted 17 Aug 2006 by Hoelder1in Post: We may actually have the credit/model values adjust for all work units as results come in and use ralph tests to serve as a starting point. I think this is an excellent idea - perhaps the sample sizes on Ralph were really a little bit too small to determine reliable averages...
33) Message boards : Number crunching : New Crediting system: questions (Message 22598) Posted 17 Aug 2006 by Hoelder1in Post: The granted vs. claimed credits is certainly inconsistent with granted being both higher and lower, even on the same PC. I have one that granted 50% more than claimed on one WU, but less than claimed on the other. I'm not even going to try to figure it out; I'll just keep crunching. I just did a quick test on the Windows users of my team (all using standard client). There were 9 WUs returned since Rosetta came up again and the average of the granted credit is very, very close to the claimed credit (to within 5 % or so). There is some scatter between the granted and claimed credit values: for seven of the 9 WUs the granted credit is within +/-20% of claimed credit. Observation for those who followed the recent activities on Ralph: I think David Kim adjusted the "credit/model correction factor" some more when moving the new system to Rosetta: while in the recent Ralph tests the granted credit on average was still about 10-15% higher than the claimed credit (for Windows/standard client users), the match now seems almost perfect - well as far as I can tell from the 9 WUs I looked at. If you want my opinion, I think David K did a great job so far. :-)
34) Message boards : Number crunching : Ok... I have a challenge for all of you... (Message 22489) Posted 15 Aug 2006 by Hoelder1in Post: I really think we need a nice, short 200 word summary of what Rosetta does (and a slightly longer one also.) There are tidbits of excellent comparisons and information spread out over the entire forum, usually as a response to a new person in the forum. After reading several such, I believe that I basically understand, but haven't seen it all in once place. Maybe ill try to pull it together a bit later. Well, there already are several explanations (each of them in the several hundred words range) of what Rosetta does, linked from the homepage: What is Rosetta@home? (this is the green link directly at the top of the homepage) Quick guide to Rosetta and its graphics (read the first few paragraphs) Welcome from David Baker (the explorer analogy) Rosetta@home Science FAQ Research Overview (more detailed) And if this still isn't enough you can go to www.bakerlab.org for additional information, including the full text of their research papers. I definitely don't want to discourage anyone from producing a well written 200 word summary of what Rosetta does, but please try to first have a look at the excellent descriptions which are already there (and linked directly from the top of the homepage). If there are things in those descriptions that are hard to understand I am sure specific questions in the forum will be answerd by the Rosetta team or some of the more knowledgeable participants.
35) Message boards : Number crunching : New credit system now being tested at RALPH@home (Message 22231) Posted 11 Aug 2006 by Hoelder1in Post: After reading through all the comments and ideas, I'd be interested in knowing exactly how many WU are actually getting "crunched" on the average day/week,etc.. Not points, but actual work units.. Is there a moderator or someone who can find this info for us? Might be interesting to see the breakdown. Thanks for your time, Movieman I think the number you are looking for is actually listed on the homepage under server status: Successes last 24h: 132,647. If I am not mistaken these would be the number of WUs crunched in the last 24 hours, but please note that a WU takes anywhere from 3 to 24 hours to complete, depending on the user's preferences. Perhaps structures per day would be a more meaningful number. If David Kim's estimate of 2 credits/decoy that he uses for his Ralph tests is approximately right than the structures per day would be roughly 'credits last 24 hours' divided by 2 or about 2,000,000 structures per day. I hope this helps, -H.
36) Message boards : Number crunching : New credit system now being tested at RALPH@home (Message 22183) Posted 10 Aug 2006 by Hoelder1in Post: From what little ralph data I have we can now start to get an idea about where the 2 credits/decoy is. All data displayed is from standard boinc software and no optimized apps are involved. [edit]I've seperated old ralph and "ralph-new". Oh, and CC is claimed credit/hour, GC is granted credit/hour. I've highlighted Rosetta, ralph, and ralph new for easier location. Not sure why you are posting this here, Tony. Just to avoid any confusion: the 2 credits/decoy system that currently runs on Ralph is just for preliminary software tests and will never run on Rosetta, neither will it be used on Ralph after the end of the software testing phase (presumably in a couple of days or so ??). See this explanation of the new credit system by Rosetta developer David Kim that was posted on the Ralph forum: The version that will eventually run on Rosetta@home will have work unit specific credit per model values that are determined from test runs on Ralph. It will be a requirement for lab members to not only test new work units on Ralph but to also determine the average credit per model value from their test runs for production runs. The credits should remain somewhat consistent with other projects since the average values will be based on the standard boinc crediting scheme. If things look okay on Ralph, Rosetta@home will use the credit per model crediting method while Ralph will switch back to the standard method.
37) Message boards : Number crunching : New credit system now being tested at RALPH@home (Message 22018) Posted 8 Aug 2006 by Hoelder1in Post: See this explanation by David Kim, quoted from the Ralph forum: The version that will eventually run on Rosetta@home will have work unit specific credit per model values that are determined from test runs on Ralph. It will be a requirement for lab members to not only test new work units on Ralph but to also determine the average credit per model value from their test runs for production runs. The credits should remain somewhat consistent with other projects since the average values will be based on the standard boinc crediting scheme. If things look okay on Ralph, Rosetta@home will use the credit per model crediting method while Ralph will switch back to the standard method. So the two credits per structure are just for preliminary tests - no need to be concerned...
38) Message boards : Number crunching : Newly built 5.4.10 optimized windows boinc client (Message 21606) Posted 2 Aug 2006 by Hoelder1in Post: I guess they could just attach a couple of local Baker Lab (or otherwise trusted) computers to Ralph and use those to determine the median of the claimed credit (to make sure Ralph doesn't get 'hijacked' to drive the credits/structure up). Would they need one of each type of amd and intel cpu? Would those machines need to test both windows, linux, and OSX performance? Well, the median is really quite powerful in terms of removing all kinds of differences and extremes. For instance, the less than 10% Linux computers can savely be ignored as they will have no effect on the median. I also suspect that any Intel/AMD differences can be pretty much ignored for the purpose of determining of the median.
39) Message boards : Number crunching : Newly built 5.4.10 optimized windows boinc client (Message 21598) Posted 2 Aug 2006 by Hoelder1in Post: Hoelder1in, that's a good suggestion. . defiantly easier to program. The first thing I think of though is doesn't that just move the 'optimized' client into Ralph? While it wouldn't have as direct an impact as it does on Rosetta, the median would still be shifted by the various clients. Perhaps time is a better measure than credit claimed? Credit calculation is time * cpu speed (for the most part). If all Ralph measured was time, it would take all the other variables out. . overclocking, clients, amd vs intel. . all of it would be averaged out to get a simulation to credit constant. If you determine average completion times on Ralph and set X credits per hour, how would you calibrate that in terms of FLOPS ? You would need to know the average speed of the Ralph participants' computers for that. So, perhaps it would be better to not use Ralph at all and measure the completion times on a local computer (or computers) which known benchmarks, instead ? I guess they could just attach a couple of local Baker Lab (or otherwise trusted) computers to Ralph and use those to determine the median of the claimed credit (to make sure Ralph doesn't get 'hijacked' to drive the credits/structure up).
40) Message boards : Number crunching : Newly built 5.4.10 optimized windows boinc client (Message 21589) Posted 2 Aug 2006 by Hoelder1in Post: Perhaps by taking the average of all the Ralph times and base it around x credits per hour? As an example, if one simulation takes an average of 1.5 hours, and the standard reward is 10 credits per hour, each simulation on Rosetta would grant 15 credits. Regardless of how fast your computer is (or the version of boinc you're running), whenever you return one simulation from that wu type, you get 15 credits. It is not really necessary to talk about completion times and credits per hour on Ralph - we could just leave the current (BOINC benchmark based) credit system on Ralph as it is, then for each WU type, assign the median of the claimed credits per structure on Ralph to the returned Rosetta structures. That way the new system would not have to be calibrated; the total amount of granted credit would by definition be the same as with the current system - so no issues with cross-project or FLOPS/credit calibration.

Previous 20 · Next 20