Posts by uioped1

1) Message boards : Number crunching : R@H preferences missing? (Message 35537)
Posted 26 Jan 2007 by uioped1
Post:
It seems my preferences were reset to the default; 25% share, 3 hour runtime, and no localized (work/home) settings. Has anyone else seen this?
2) Message boards : Rosetta@home Science : Lowest energy structure and distance to true structure (Message 28987)
Posted 6 Oct 2006 by uioped1
Post:

2. I noticed that the lowet energy structure and the lowets distance to true structure one are never the same . Does this mean that natural protein folding does not pursue lowest energy? What pursues R@H then? How can one possibly choose which prediction is the most accurate one among the many ones that come in, if the lowest energy criterion is not the right one?


If Rosetta would work perfectly there would be a match of the lowest energy structure and the lowest RMSD structure.


One thing to add, which is based on some discussions of the R@H algorithm from quite some time ago,

The RMSD of a predicted structure can only be calculated when the actual structure is already known. The Rosetta application does not use RMSD in any part of it's heuristics.

However, one result of the R@H effort has been is to develop a means to estimate the RMSD of a particular set of models for a protein that R@H has returned, based on their similarity, and how they relate to the other models that have been returned. So, because the project knows how many results were returned, and how similar, say, the best 1/10th of 1% of the results are, they can say "Here's the lowest energy structure we have predicted AND it's within X RMSD of the actual structure!" That was a pretty major advance by itself.
3) Message boards : Number crunching : Rosetta for Intel Mac Mini? (Message 14285)
Posted 21 Apr 2006 by uioped1
Post:
For the project staff
Please see these instructions for a discussion of the support for the mac+intel platform.
Please note that this does not require compiling a new binary version; you can use the existing mac+ppc binary which will run in emulation mode.


Project instructions

To create the Mac/Intel database entry, add the following to your project.xml file

<platform>
<name>i686-apple-darwin</name>
<user_friendly_name>Mac OS X (Intel)</user_friendly_name>
</platform>

and run xadd. Add new application versions using update_versions.


Of course compiling a native intel binary would be beneficial for performance, but doing this would at least get people up and running in the meantime...

I hope this is helpful!
4) Message boards : Number crunching : Validation (not for credits, but for scientific reasons) (Message 14075)
Posted 18 Apr 2006 by uioped1
Post:
Although it take a while to crunch a result, testing to see if the result is valid can be done quickly. Thus, the developers are able to test the validity of all results returned to them. That is why they don't need redundancy.


This would be true if the problem were NP-Complete. I strongly suspect that results are not verifiable in polynomial time (for the case of problems where we don't know the correct answer ahead of time; I don't think we've run any like that)

Perhaps someone from the project could verify that statement?

Also, to paraphrase what some others have posted, including the moderator, we are essentially all running the same WUs when we're working on the same job, from a redundancy sense. It is true, as I think BennyRop was trying to say, that this means someone could fraudulently claim to have 'found' the best decoy by working from the known result that we've been sending out for RMSD calculations. However, When it comes to real scientific applications of rosetta this won't be a problem because we obviously won't be sending that info out (because that's what we'll be trying to calculate) and, as the moderator points out, there's nothing we can really do about it anyway.

Finally with regard to trust of the data, as was pointed out over in the science journal some time ago, I think they have developed a method of combining the results that have the lowest energy and determining which of those have the lowest RMSD, (and I think even estimating what the RMSD is) thus ensuring that the results are scientifically useful.
5) Message boards : Number crunching : Possible to delete a host? (not merging) (Message 13573)
Posted 12 Apr 2006 by uioped1
Post:
DK and I are looking to upgrade the DB server within the month (likely sooner, but sh*t happens) which should allow for host merging again.

As to host deleting, I'm not sure but anecdotally I had a host that required weeks and weeks to before all the WU were flushed and I was able to delete it...


It looks like deleteing a host has actually been disabled as well. See my hosts for a couple of hosts that have no results. You'll just have to take my word for it when I say that deleting them is not an option.

Normally the last line in the host's details table is "click to:" which has the options of deleting or merging the hosts. The entire line is missing, rather than just the merge host link. It's possible that this is an easy fix...
6) Message boards : Number crunching : Any objections to reducing the maximum run time to 12-16 hours? (Message 12604)
Posted 24 Mar 2006 by uioped1
Post:
I'm running 1 machine on dial up, and I've set it to 48hr runtimes - it is just so much easier for me, uploading 1 job, downloading 1 job every 2 days, instead of baby sitting the machine daily for each network run, daily. That machine would probably just get shut down permenantly again if the run times go down.


If you set your cache size to connect every two days, your boinc client should download enough work for 2 days, regardless if your runtime is set to 48 hours or 12...
(of course this is after your machine has adjusted to the new runtime of your workunits after you make changes.)
7) Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal (Message 11738)
Posted 6 Mar 2006 by uioped1
Post:
The homolog insight is truly a leap forward. That is the sort of insight that comes along rarely and can be quite beneficial to search problems of this magnitude! Congratulations!
I wonder if it is required that the homologs be evolutionarily related, or if we could generate homologs that would serve the same purpose for the search. This might have the advantage of strengthening the value of the results (just like using the RMSD as a heuristic would invalidate them)

In answer to hugo the hermit:
[quote}In this work unit, I have the minimum RMSD by the look of it, but too high an energy. I know that you had both the minimum energy and the minimum RSMD in the stats, I was wondering why does the RMSD matter at all, should a protein be both the smallest it can be and have the least energy? Or should it just have the least energy? Is RMSD just a short cut to working out the energy?[/quote]

RMSD is a measure of the difference between the two foldings of a specific protien. I'd venture an educated guess that it stands for "[square]Root Mean Squared Deviation" For the runs we've done thus far, we've known the natural folding so we can calculate the RMSD and use it to evaluate how well the algorithm works. For the application to be usefull, however, it can't look at the RMSD to decide how to fold the protien, because we can't calculate it for protiens whose natural structures are unknown. I'm not a chemist, so I can't state with 100% certainty, but I think that the natural structures will have the lowest energies possible, thus in some sense we are trying to get to where we can use the energy of a structure as an approximation of the rmsd.
8) Message boards : Rosetta@home Science : Science News (Message 11568)
Posted 2 Mar 2006 by uioped1
Post:

What if you made the name a clickable link that linked to your robetta server?

WOW! That is an excellent idea! I had no idea where to look up that info before. I'd still like to see an updated english description.

I have to respectfully disagree with some earlier statements saying that lack of information about the project has been preventing participation in it. I think the responsiveness of late has been outstanding, and the way the app stands now it is better than many other projects. Basically, anything you do on this front is preaching to the choir - your attracting people who already know about the project. Of course I would love to have more info about what is going on, but at a certain point I have to say I'm a CS guy not a biologist so I have no idea what the info contained in the above database means...

I can't say that I know with certainty what will get the crunchers out in force for your project, but one thing that I would like to see is more platform optimized apps. I know that someone mentioned that a lot of work had been put into optimizing the algorithm, but more can be done in using a good optimizing compiler for a specific architecture. I recognize that you're hampered by the details the boinc architechture provides, and that you'd have to implement redundency to prevent cheating (or errors) if you released it open source, cancelling out anything you'd stand to gain by increased crunching. Thus I'd suggest this solution:

Compile a few versions that might see the most benefit; maybe for P4 (sse3 or 2), Pentium M, and one for AthlonXP/p3 on windows and linux. And maybe 2 for the newer Macs. Mainly, do the ones that would see the most benefit. You could release these as executables with an app-info.xml.

But finally, there's no substitute for partnerships/advertisement. I'm thiking about what the bbc did with cpdn recently; they managed to attract quite a few users that would never have heard about it otherwise.

Keep up the good work!
9) Message boards : Number crunching : Issues with 4.82 (Message 11392)
Posted 25 Feb 2006 by uioped1
Post:



See if this post helps you at all.



That post was definitely helpful. I think a new FAQ entry or stickied thread specifically about the effects you will see before the boinc clients adjust to the new units. What I posted here was an attempt, but I haven't worked through my queue yet, so I don't even know that it's correct. The part I'm specifically referring to is that:

This is a temporary problem,
It can be mitigated temporarily by doing X
That it will happen every time you download workunits after increasing proc time
Aborting all your units won't fix the problem, until you have completed at least a few of the new units.


Please correct me if I've got something wrong, I'd hate to be disseminating wrong info.
10) Message boards : Rosetta@home Science : Algorithm Discussion (Message 11345)
Posted 24 Feb 2006 by uioped1
Post:
Thanks for the quick response. (And thanks to the moderator for moving me to the correct location.)

This is certainly an interesting computational, as well as scientific, problem.

Some more questions that your responses brought up for me:

You mentioned two phases to the search:
The low resolution model found in the first part of the search has to be pretty close to the correct answer to "smack the nail on the head".

It appears to me that the units I'm currently running are not using a separate full atom relax stage. Are we using the full atom model for the whole search now, or are we splitting that up between two types of workunits? (or am I just not knowing what is really going on with my workunits)

How small of an RMSD is required for the results to be scientifically interesting, outside of the computational advence? I mean, for a protien with an unknown structure, would it be useful to the scientists to know that "this structure is 95% likely to have an RMSD below 3"

I look forward to hearing more as you discover features of the search space and make refinements to your algorithm.
11) Message boards : Number crunching : scheduling and target time working correctly? (Message 11344)
Posted 24 Feb 2006 by uioped1
Post:
...
As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling.



EDIT: I don't think you are really understanding my issue. Rosetta is downloading more work units than it can complete by the deadline without disabling the other project that works on this machine (it is dual core).


I was also having this problem. The error comes because the 'base' workunit time is created when the workunit is created, not when you download it and apply your preferences to it. So, the first time you download a workunit, the scheduler sees that last time you ran a workunit of this type, it ran in X seconds, so it assumes that you will continue to do so when calculating how many to give you.

It doesn't take very long for the system to figure out that something's changed and the workunits are now going to take a lot longer to complete. On my system, it was just two WUs. So, if you let your system run in EDF mode for a while, it will get straightened out.

If you're worried that some of the WUs downloaded are going to go over deadline, you can temporarily set your proc time down to two hours, and only set it back when you're about to start the last two or three of your downloaded WUs. Or just abort some of them.

Hope that helps.

[edit] all this is my best guess at reverse-engineering how the system works, so some of the specific details might be wrong. Please correct those bits, someone.[/edit]
12) Message boards : Number crunching : Issues with 4.82 (Message 11297)
Posted 24 Feb 2006 by uioped1
Post:

Not an error exactly, but a complaint: I just got my first batch of the new 4.82 workunits, which will take vastly more time than Boinc requested. I'm not clear on how the scheduler decides on how many workunits will fulfill a request for x seconds of work, but apparently this was not adjusted with the new search mode. (requested 48 hours of work, received possibly 120)



You can adjust the run length of the WUs in your preferences. See this post in the Rosetta FAQs for details.


Ah, I hadn't realized that you could do that for units already downloaded. Still, that wouldn't have fixed the problem I experienced. At best, I can set my requested time back to two, and try to set it back before I start my last result so that I can correct the time correction factor for next time.

Thanks for your help.

13) Message boards : Rosetta@home Science : Algorithm Discussion (Message 11275)
Posted 23 Feb 2006 by uioped1
Post:
I missed that one; thanks.

I was more wanting greater detail about how the current crop of workunit-styles aproaches the task of paring down and navigating the search-space.

In case someone in the know wanders by, here are a few questions just to get the ball rolling: (Note, I ended up asking a lot below. Feel free to take the low-hanging fruit, at least as a start.)

First off, as I understand it, the app performs a few runs of a stochastic local search initiated at random (except for the'barcode' units?) on the search space, using simulated annealing and some heuristics int he form of known likely positions for subsequences (except fro the 'random frag' units?) Is that approaching correct?

I was wondering how much we know about the search space? I'm not a molecular biologist or chemist, so I really have no clue how things would pan out or how much is known. Similarly, how good are our heuristics?

Extending from Paul's questions in the other thread, what is the "granularity" and coverage of the initial random trajectories of these workunits, and how does that compare with the average travel of a search trajectory?
O.K. So I made up granularity and travel. What I mean is what's the minimum spacing between trajectories, and how frequently will two trajectories overlap if they start near each other?

How prevalent are local minima, and how steep is the approach to the known results? What I mean is, using our algorithm, how right would your random guess have to be in order to smack the nail on the head and get the real folding via our annealing? How does that compare to the granularity I mentioned above?

Any answers would be read with interest, but I realize I've asked a lot more than I intended to. I don't expect the in-depth answer to everything right here, right now. I know that most everyone with answers is also likely to be fairly busy :)

Thanks in advance;
-alex.

[edit]Maybe this one would be better for the FAQ, but what do the names of the workunits mean?[/edit]
14) Message boards : Number crunching : Issues with 4.82 (Message 11273)
Posted 23 Feb 2006 by uioped1
Post:
The increased frequency of problems with version 4.82 we think is probably due to the increased average work unit run time. If a significant fraction of your work units are having problems, please reduce the target run time to two hours (the default is currently 8 hours)--this should reduce the chance of an error during the run by a factor of four. we will also reduce the default target time to four hours. on RALPH we didn't see these problems probably because the default time was set to one hour so we could get test results back quickly.


Not an error exactly, but a complaint: I just got my first batch of the new 4.82 workunits, which will take vastly more time than Boinc requested. I'm not clear on how the scheduler decides on how many workunits will fulfill a request for x seconds of work, but apparently this was not adjusted with the new search mode. (requested 48 hours of work, received possibly 120)
15) Message boards : Rosetta@home Science : Algorithm Discussion (Message 11267)
Posted 23 Feb 2006 by uioped1
Post:
Can anyone point me to discussion of the search algorithm R@H uses? A quick search lead me to the rnd generator discussion initiated by MR. Buck, which more got bogged down in minutia rather than really getting anywhere.
I'm looking for something aimed at programmers, but something's better than nothing.






©2024 University of Washington
https://www.bakerlab.org