Improvements to Rosetta@home based on user feedback

Author	Message
UBT - Halifax--lad Send message Joined: 17 Sep 05 Posts: 157 Credit: 2,687 RAC: 0	Message 15893 - Posted: 11 May 2006, 6:31:57 UTC - in response to Message 15859. I would like to know when AMS support will be made available on RALPH & Rosetta, this would mean upgrading the server to enable support for the AMS. The majority of projects have already upgraded and there is only a small minority who have not done so as yet, this project & RALPH been one of them. So please do this upgrade ASAP! Thanks Dave While certainly circumstances may dictate otherwise over time, I would not expect to see any server upgrades during CASP without a very compelling reason. Well my only reason is the AMS simplifies all BOINC processes for users, from one location they can see all the various projects and what they do, they can then click a couple of buttons and are signed up to the project automatically, alter settings and many more options. As the AMS is the next big thing to arrive for BOINC it would be sensible to upgrade to allow users to use it. Don't want to get left behind from the other projects Join us in Chat (see the forum) Click the Sig Join UBT ID: 15893 · Rating: 0 · rate: / Reply Quote

Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0	Message 16024 - Posted: 12 May 2006, 11:55:25 UTC Last modified: 12 May 2006, 11:55:59 UTC I agree with Dave, as Account Managers are the new feature in Boinc 5.4.9 and it is also mentioned on main Boinc site. There is already one available on boincstats. Another one (GridRepublic) is currently in alpha testing stage. I think a lot of users will start to use them in coming months. Aglarond ID: 16024 · Rating: 0 · rate: / Reply Quote

Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0	Message 16057 - Posted: 12 May 2006, 15:28:47 UTC I posted this idea on other thread, but now I think it belongs here: What about kindly asking Akos Fekete, who made optimizations on Einstein, if he could look at Rosetta and try to make optimizations here? Although, it may be necessary to pay him for his effort, as he is rather busy. ID: 16057 · Rating: 0 · rate: / Reply Quote

senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0	Message 16084 - Posted: 12 May 2006, 19:16:13 UTC I think it is great to see this projects enthusiam towards CASP. May I suggest changing the deadlines to maybe a week or so during this project? I also think this would be appropriate since checkpointing has met with such a success thus far. ID: 16084 · Rating: 0 · rate: / Reply Quote

Kerwin Send message Joined: 19 Sep 05 Posts: 10 Credit: 1,773,393 RAC: 0	Message 16173 - Posted: 13 May 2006, 16:56:38 UTC - in response to Message 16057. Last modified: 13 May 2006, 16:59:08 UTC Aglarond, I agree with you. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code. I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta. It would be great to have him. Because of his S41.06 client, my crunch time for a 'long' Einstein unit now hovers in the 45 - 48 minutes range, compared to about 2 hours 45 minutes and more with the standard client. If he could bring his magic here now, I think it would be very good considering CASP has started. I posted this idea on other thread, but now I think it belongs here: What about kindly asking Akos Fekete, who made optimizations on Einstein, if he could look at Rosetta and try to make optimizations here? Although, it may be necessary to pay him for his effort, as he is rather busy. ID: 16173 · Rating: 0 · rate: / Reply Quote

Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0	Message 16181 - Posted: 13 May 2006, 17:41:01 UTC - in response to Message 16173. Aglarond, I agree with you. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code. I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta. It would be great to have him. Because of his S41.06 client, my crunch time for a 'long' Einstein unit now hovers in the 45 - 48 minutes range, compared to about 2 hours 45 minutes and more with the standard client. If he could bring his magic here now, I think it would be very good considering CASP has started. I posted this idea on other thread, but now I think it belongs here: What about kindly asking Akos Fekete, who made optimizations on Einstein, if he could look at Rosetta and try to make optimizations here? Although, it may be necessary to pay him for his effort, as he is rather busy. I have suggested to Dr. Baker (more than once) that he consider having one or more of the prominent "optimizers" assist the project by looking over the code. There is movement on this front, but there are issues that impact this process, and they cannot be ignored. The very nature of Rosetta work requires frequent changes in the basic approach of the code. As a result there is not a single stable code set to optimize. SETI, Einstein and a lot of other projects do not have to deal with this as an issue. Unlike Rosetta, the approach to the work and the code are fairly stable elements. The code is not public release at this time. You have to remember that this is a COMPUTING RESEARCH project. The code is the actual focus of much of the research. It will be published (released), but that will occur in the normal course of publishing the research because the code IS the research. While this model does not fit the BOINC norm for code release for many projects, it fits the science research model perfectly. If you look at SETI and Einstein they are actually doing the same thing. The focus of their research is Pulsars and finding ET. They have not published those results yet. But the code they are using is published because it is not the focus of their efforts. So asking Rosetta to release the code to the public is a bit like asking Einstein (the scientist not the project) to publicly release all his notes on relativity two years before he published his theory. It just does not work that way in the scientific community. So talent has to be brought in, rather than sending the code out. Another element of this is timing. Rosetta would love to have had very streamlined code for CASP7. But there was simply not time. In the weeks leading up to CASP there were not less than 5 code releases, mostly to make it run better for all of us. At least three of these contained entirely new approaches to the work. There was also a focus on solving some rather intractable bugs. Now that they are down to basically one remaining intermittent problem, (and it may actually be an issue with windows itself), perhaps there will be time to bring in some fresh programing skills to optimize areas of the code that are reasonably stable. To that end Rom will be back very soon, and a few other people will reappear as well. But this will only work as areas of the code become stable that lend themselves to improvements. None of this means it is not going to happen, or that it is not happening right now. It just means that the issue is not as straight forward as it is on other projects. Moderator9 ROSETTA@home FAQ Moderator Contact ID: 16181 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 16193 - Posted: 13 May 2006, 19:54:01 UTC - in response to Message 16173. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code. I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta. It would be great to have him. Where did you read that Akos expressed an interest in optimising Rosetta? That would be great, but when I mentioned it in passing to him, a couple of months ago in the Einstein forums, it was met with "benign indifference" (or so it seemed to me). Akos is evidently a code wizard (I've met a couple) and his help could make a huge difference for any project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 16193 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16225 - Posted: 14 May 2006, 5:38:40 UTC Last modified: 14 May 2006, 19:40:29 UTC I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code. On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community. ID: 16225 · Rating: 0 · rate: / Reply Quote

senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0	Message 16236 - Posted: 14 May 2006, 6:32:17 UTC - in response to Message 16225. On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community. ----------------------------------------------------------------------------- Hmmm. This is an interesting issue. I currently attached to Ufluids but do not crunch for them because I lost too much work. Anyways, the code was released over a month ago with their project and I have yet to see any progress with their code. Depending on ones reasoning, I see that both sides of the issue have good arguements. I would agree with Dr. Baker that the code should be released in order to provide a service to the scientific community. I do not think releasing the code to optimize it will do much good. I have observed that a majority of the crunchers, such as myself, have little knowledge of "coding" and could not help even if they wanted to. The select few with the knowledge probably do not have the time, interest, or are manipulating it for their own benefit. Although I have been with this project almost as long as David, I soon stopped crunching due to all of the errors my computer was getting at the time. Only recently have I crunched for this project therefore I do not know the history of the discussions here. My personal suggestion would be to continue recruiting expertise from admins on other BOINC projects like you seem to be doing. Chrulle, the former admin over at LHC is looking for a job. Have you talked to him? ID: 16236 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 16295 - Posted: 15 May 2006, 9:52:14 UTC - in response to Message 16236. Seti The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps. Einstien aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster. None of iwas compiled from source, though I'm assuming he is now ;-) Rosetta Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above. What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up. Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little. Team mauisun.org ID: 16295 · Rating: 0 · rate: / Reply Quote

Tallguy-13088 Send message Joined: 14 Dec 05 Posts: 9 Credit: 843,378 RAC: 0	Message 16297 - Posted: 15 May 2006, 11:09:40 UTC - in response to Message 16295. Last modified: 15 May 2006, 11:12:44 UTC In the world of the "mainframe" where I work for a living, it is not uncommon (in low level system code), to see subroutines added that address specific optimizations at a "processor family" level or OS release level. This is typically done to address to address "incompatible" architectural changes (i.e. control block residency/format changes, etc). In the case of the multiple processor types such as is present in a project of this nature, I suspect that this would be a complex change to the code for the developers. My concern would be that there is a strong potential for making the code too complex for the average developer to maintain. Remember, this is still "basic science" and the focus is still on developing the techniques of the process Dr. Baker and crew are working on. As such subject to regular change. I'm thinking that they want to focus on developing the techniques and not be "bogged down" with the specifics of processor level optimizations quite yet. Once that is "where they want it", then they can then set off on the task of "stroking the code". Just my 2 cents. Seti The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps. Einstien aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster. None of iwas compiled from source, though I'm assuming he is now ;-) Rosetta Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above. What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up. Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little. ID: 16297 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 16312 - Posted: 15 May 2006, 14:59:33 UTC I agree that the application is changing too rapidly to worry about optimization. I know history tells us you can sometimes get 2x better throughput, but that is especially true for applications that were poorly coded to begin with. And it's especially true for applications that don't change, so the ONLY changes you are making pertain to optimizations. I would just point out that releasing source might also open the door to new platforms. Not specifically with hope of an optimized client for a platform, but rather a straight port of the code. Once a port process is in place, it's generally pretty straightforward to keep up with new releases. Any way around, none of it is stuff you want to play with during CASP, while you're already strecthing the limits and finding proteins that just happen to fit the profile of those you feel a new algorythm that's been on the shelf may work well with. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 16312 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 16540 - Posted: 18 May 2006, 15:04:55 UTC - in response to Message 16225. Last modified: 18 May 2006, 15:09:03 UTC I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code. See this New Scientist article on Akos' Einstein work. So maybe he now completed his Einstein code speed-up activities and is looking for new challenges... ;-) Team betterhumans.com - discuss and celebrate the future - hoelder1in.org ID: 16540 · Rating: 0 · rate: / Reply Quote

XeNO Send message Joined: 21 Jan 06 Posts: 9 Credit: 109,466 RAC: 0	Message 16624 - Posted: 19 May 2006, 9:34:56 UTC - in response to Message 16410. Rosetta 5.16: (1) We're continuing our efforts to reduce memory usage by typical workunits by rosetta@home. You can expect an even further reduction in memory footprint in our next update. (2) We're testing a new science mode which uses the sequence and structural information from homologous proteins in an early phase of the simulation, but then returns to the target protein sequence in the final refinement phase. This mode appears to have a larger memory footprint than typical workunits, so we will only send out these jobs to computers that have >1Gb RAM. (3) Also, we're trying a new feature where at the end of a simulation, Rosetta compares its fold to the predictions made by a dozen other algorithms. (Those predictions are sent to the clients in a compressed format.) Seeing consensus between different algorithms is usually a good sign that a prediction is right. Is that Greater than or Equal to 1GB of RAM, or just Greater? When my Computer is not in use I have no problems with Rosetta taking a hog's share of resources. ID: 16624 · Rating: 0 · rate: / Reply Quote

Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0	Message 16635 - Posted: 19 May 2006, 13:02:53 UTC - in response to Message 16624. Rosetta 5.16: (1) We're continuing our efforts to reduce memory usage by typical workunits by rosetta@home. You can expect an even further reduction in memory footprint in our next update. (2) We're testing a new science mode which uses the sequence and structural information from homologous proteins in an early phase of the simulation, but then returns to the target protein sequence in the final refinement phase. This mode appears to have a larger memory footprint than typical workunits, so we will only send out these jobs to computers that have >1Gb RAM. (3) Also, we're trying a new feature where at the end of a simulation, Rosetta compares its fold to the predictions made by a dozen other algorithms. (Those predictions are sent to the clients in a compressed format.) Seeing consensus between different algorithms is usually a good sign that a prediction is right. Is that Greater than or Equal to 1GB of RAM, or just Greater? When my Computer is not in use I have no problems with Rosetta taking a hog's share of resources. The term I have seen in discussions is "At least". So I would assume it will be equal to or greater than. I have suggested that this be a user selectable option. Moderator9 ROSETTA@home FAQ Moderator Contact ID: 16635 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 16820 - Posted: 22 May 2006, 8:23:28 UTC A plea: can merging be fixed? I have orphaned computers left right and centre. It's making a nonsense of the stats as well; Boinc Synergy thinks I have 6 machines when I have 3, repeat that to some degree for every producer and the stats become completely meaningless. Is there a db whiz among you? ID: 16820 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 16830 - Posted: 22 May 2006, 12:41:00 UTC Today I got two T0287-targets: https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036 https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415 The expiry-date for this protein is June, 1st if I'm reading this page correctly: http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi However the deadline for both WU were June, 5th which is past the expiry date. I suggest to shorten the deadlines according to the expiry date in order to not receive results too late. ID: 16830 · Rating: 0 · rate: / Reply Quote

Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0	Message 16840 - Posted: 22 May 2006, 15:45:24 UTC - in response to Message 16830. Today I got two T0287-targets: https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036 https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415 The expiry-date for this protein is June, 1st if I'm reading this page correctly: http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi However the deadline for both WU were June, 5th which is past the expiry date. I suggest to shorten the deadlines according to the expiry date in order to not receive results too late. Actually there are a number of dates that are pertinent depending on the catagory of CASP in which a project submits its results. The dates you are seeing are for server predictions. The predictions that Rosetta is working on are in a different category. The reporting dates they are using have already taken into account the dates the project needs in order to meet the CASP deadlines for the category in which they will submit their results. Moderator9 ROSETTA@home FAQ Moderator Contact ID: 16840 · Rating: -1 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16841 - Posted: 22 May 2006, 16:01:41 UTC - in response to Message 16830. Today I got two T0287-targets: https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036 https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415 The expiry-date for this protein is June, 1st if I'm reading this page correctly: http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi However the deadline for both WU were June, 5th which is past the expiry date. I suggest to shorten the deadlines according to the expiry date in order to not receive results too late. Good point--thanks for catching this! ID: 16841 · Rating: 1 · rate: / Reply Quote

Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0	Message 16842 - Posted: 22 May 2006, 16:05:52 UTC - in response to Message 16820. A plea: can merging be fixed? I have orphaned computers left right and centre. It's making a nonsense of the stats as well; Boinc Synergy thinks I have 6 machines when I have 3, repeat that to some degree for every producer and the stats become completely meaningless. Is there a db whiz among you? I have brought this issue up with the project team repeatedly since the bug was found in the BOINC software. They will fix it, but the word as of last thursday was not yet. Testing is currently being conducted on RALPH for the next server release, and there are a few things that need adjusting before it is ready for prime time. Those issues are being worked out with the BOINC developers. While it may seem that this is a simple issue in fact it is not. The current structure of the database is such that the merging of computers not only drags the servers to a halt, but I am told it was corrupting the data base. The problem is in the BOINC software. Currently the only option is to turn off merging to prevent the problems. There is no way for the project to distinguish which of your machines might be real or Ghost machines. So it is not practical for them to simply run a script to merge them. The next version of the BOINC server software is purported to have a fix for the problem. However, considering the projects key role in the ongoing CASP experiments, taking the server off line for this type of upgrade is not only risky, but impractical. The stats you cite are only adversely effected to the extent that individual machine statics are the focus of the information desired. The vast majority of the stats are in fact collective, and not dependent on a view of a specific machine. All of the stats that depend on your total credit are still as accurate as ever, even the rac for a particular machine will reflect the proper contribution if allowed to do so. The only stats significantly affected by this issue are those relating to the total credit for a particular machine. Moderator9 ROSETTA@home FAQ Moderator Contact ID: 16842 · Rating: 0 · rate: / Reply Quote