Information on Ver 4.97 errors

Message boards : Number crunching : Information on Ver 4.97 errors

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 13440 - Posted: 11 Apr 2006, 7:25:29 UTC

Did the "rosetta_4.97_windows_intelx86.pdb" file give you any useful information about what happend?

Anders n
ID: 13440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XS_Duc
Avatar

Send message
Joined: 30 Dec 05
Posts: 17
Credit: 310,471
RAC: 0
Message 13441 - Posted: 11 Apr 2006, 9:34:27 UTC - in response to Message 13438.  

I just got back into town an hour ago, and have not yet been able to pinpoint the source of the recent problems. But I want to apologize in any event, the scale of the problems certainly was my fault.
Here is what happened:

I wanted to test the effects of an improvement in sampling alternative sidechain conformations during the high resolution stage of the search. Tests on our in house computers showed that this improvement resulted in consistently lower energy structures being found, and there were absolutely no signs of any run time problems. David K. sent out the new version of the code to RALPH thursday, and we submitted some test jobs. Friday afternoon we talked, and as there seemed to be no problems on ralph, and the code change was relatively minor, David sent the new version out to rosetta@home.
I was very eager to see how the improvement in sampling would affect the searches I had been carrying out in the HBLR_1.0 series of runs you all had been doing over the past month, and as I was going out of town for a few days I submitted a large number of jobs friday evening so that there would be a clear picture when I returned. You can imagine my horror on checking up on rosetta and ralph in the few minutes before leaving saturday morning! It was clear by saturday that the test jobs I had sent out on ralph had a high error rate on windows, and that I had totoally jumped the gun by sending out the very large set of runs on rosetta on friday. I'm very sorry that I did this, and about the waste of resources and confusion this caused, and definitely learned my lesson--always make sure the ralph tests are complete and 100% positive before submitting large scale on rosetta.


Those who are free of sin, may now pick up a stone and throw it...
We lost some time and resources, so what? It happened before and will certainly happen again I guess.
Nothing is flawless, mistakes/errors will always be made... but they shall be forgiven and forgotten in the long run towards succes.

The weak shall perish...
ID: 13441 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 14020 - Posted: 18 Apr 2006, 8:50:18 UTC - in response to Message 13298.  

Sounds like "reset project" from the projects tab. This basically aborts any WUs and reloads the application code.


I know it is too late for this thread, but I'd like to correct this please, Feet1st.

Reset is not the same as abort and reload. Reset does a forget and reload.

Often the abort is useful to a project as the error file may contain some useful info. It also allows the WU to be released to another user. For the latter reason, often with a dodgy WU reset is more useful as it does not force a re-issue until the team have had a chance to stop the WU being issued.

So both have their uses, but they are not the same.

Where a project wants the error reports, the short procedure is to go to the work tab and abort each existing work unit, and let it report in due course.

The full procedure if you want also to force a reload is quite complicated as you have to force through the flushing of the aborted work.

1) set No New Work for that project
2) abort all WU separately from the Work tab
3) suspend all other projects from the projects tab to force the aborted WU to run (sounds contradictory, but this is where each WU generates the error report)
4) in the unlikely event that these get stuck, resume then suspend one of the other projects - sometimes you'll find you need to do this as many times as you have aborted WU
5) update this project
6) wait for aborted WU to disappear from work tab
7) *now* reset project if required
8) set allow new work
9) resume all other projects from the projects tab.

It is a lot to ask users to do - which is why a project may well just ask for a reset instead - a larger percentage of users will actually do it! But it still is not the same.

River~~
ID: 14020 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Information on Ver 4.97 errors



©2025 University of Washington
https://www.bakerlab.org