Cheating ??

Author	Message
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 2313 - Posted: 5 Nov 2005, 4:41:28 UTC 1hz7A_abrelaxmode_random_gauss_cheat_jitter03_00459 Are we now all going to cheat ?? ;-) I am sure this is some clever trick by the Baker lab to better find the lowest energy structure. Just wondering. More generally, all those different WU names seem to suggest slightly different search strategies, to better sample the search space, etc. I am kind of surprised that one rosetta application (v4.78) can handle all those different flavors of the search algorithm. -H.B. ID: 2313 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 2317 - Posted: 5 Nov 2005, 5:15:20 UTC - in response to Message 2313. 1hz7A_abrelaxmode_random_gauss_cheat_jitter03_00459 Are we now all going to cheat ?? ;-) I am sure this is some clever trick by the Baker lab to better find the lowest energy structure. Just wondering. More generally, all those different WU names seem to suggest slightly different search strategies, to better sample the search space, etc. I am kind of surprised that one rosetta application (v4.78) can handle all those different flavors of the search algorithm. -H.B. Hah! I told David K that somebody was going to ask about this. Yes, that is exactly what we are doing--trying out many different search strategies. the different WU differ in how we ensure diversity in the different runs. I and others in my group have been working on a wide variety of ways to do this over the past year, but the problem in pre BOINC days was that on average the results got a bit worse whenever we tried to spread things out, and since we couldn't carry out very large numbers of runs we never really saw any benefit. this has totally changed, and now it is clear from the past few weeks of BOINC runs that explicitly enforcing diversity leads to more low energy and low rmsd structures, even though the majority of the population becomes slightly higher energy. As you probably know, in standard molecular simulation work, each independent trajectory starts with a different random number seed. In the calculations we've run recently, we've been experimenting with randomizing in addition subsets of the parameters which determine the outcome of the trajectories, so instead of a single random starting position, we have a random state vector for each run, and hence each run will not only begin from a different place but experience a different low resolution force field, etc. (these are the flags with names like rand_SS_wt, etc.). The "barcode" flags basically restrict a small subset of randomly selected angles into randomly selected angular ranges--again, since in different runs different subsets of angles are selected the runs are more spread out. as to why we can do so much with a single executable--you should see the list of command line flags!--we've tried to make as much controllable from the command line because of the difficulty of sending out a new app with boinc. We'd definitely like to be able to change the code more frequently, but this isn't practical. so what about "cheat"? in the discussion of dimensionality the other day, I said that the search was effectively in a much smaller subspace than might be expected given the number of degrees of freedom. in fact, there are only a small number of degrees of freedom for which the naitve structure and the structures we have been producing differ considerably. essentially what the "barcode" flags are doing is combinatorially enumerating them. but I'm impatient, so I want to see what happens if I fix roughly four of these highly variable degrees of freedom (angles) to roughly the correct values. from the time it takes to compute the correct structure with this "cheat", I can estimate how long it would take without the cheat. roughly speaking, i'm restricting 4 angles each to 1/3 their normal range, so I could get the same result by complete enumeration in 3**4 fold more computer time. ID: 2317 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 2323 - Posted: 5 Nov 2005, 6:27:33 UTC - in response to Message 2317. 1hz7A_abrelaxmode_random_gauss_cheat_jitter03_00459 Are we now all going to cheat ?? ;-) I am sure this is some clever trick by the Baker lab to better find the lowest energy structure. Just wondering. More generally, all those different WU names seem to suggest slightly different search strategies, to better sample the search space, etc. I am kind of surprised that one rosetta application (v4.78) can handle all those different flavors of the search algorithm. -H.B. Hah! I told David K that somebody was going to ask about this. Yes, that is exactly what we are doing--trying out many different search strategies. the different WU differ in how we ensure diversity in the different runs. I and others in my group have been working on a wide variety of ways to do this over the past year, but the problem in pre BOINC days was that on average the results got a bit worse whenever we tried to spread things out, and since we couldn't carry out very large numbers of runs we never really saw any benefit. this has totally changed, and now it is clear from the past few weeks of BOINC runs that explicitly enforcing diversity leads to more low energy and low rmsd structures, even though the majority of the population becomes slightly higher energy. As you probably know, in standard molecular simulation work, each independent trajectory starts with a different random number seed. In the calculations we've run recently, we've been experimenting with randomizing in addition subsets of the parameters which determine the outcome of the trajectories, so instead of a single random starting position, we have a random state vector for each run, and hence each run will not only begin from a different place but experience a different low resolution force field, etc. (these are the flags with names like rand_SS_wt, etc.). The "barcode" flags basically restrict a small subset of randomly selected angles into randomly selected angular ranges--again, since in different runs different subsets of angles are selected the runs are more spread out. as to why we can do so much with a single executable--you should see the list of command line flags!--we've tried to make as much controllable from the command line because of the difficulty of sending out a new app with boinc. We'd definitely like to be able to change the code more frequently, but this isn't practical. so what about "cheat"? in the discussion of dimensionality the other day, I said that the search was effectively in a much smaller subspace than might be expected given the number of degrees of freedom. in fact, there are only a small number of degrees of freedom for which the naitve structure and the structures we have been producing differ considerably. essentially what the "barcode" flags are doing is combinatorially enumerating them. but I'm impatient, so I want to see what happens if I fix roughly four of these highly variable degrees of freedom (angles) to roughly the correct values. from the time it takes to compute the correct structure with this "cheat", I can estimate how long it would take without the cheat. roughly speaking, i'm restricting 4 angles each to 1/3 their normal range, so I could get the same result by complete enumeration in 3**4 fold more computer time. Wow, thanks for the detailed response. I also read the review paper which you linked from the home page. -Hermann ID: 2323 · Rating: 0 · rate: / Reply Quote