Message boards : Number crunching : Hanging Rosetta??? (sorry for the crosspost)
Author | Message |
---|---|
ephman Send message Joined: 21 Dec 05 Posts: 4 Credit: 1,410,336 RAC: 0 |
hi, i'm running the latest linux version of boinc on a pretty quick machine. what i'm noticing is that when it's about 20% done with a rosetta unit, my cpu stops running at 100% and goes back down to normal levels. i've tried a couple different units and the samething happens. is this normal? any ideas how i can fix this? thanks for the bandwidth, ephman |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
...when it's about 20% done with a rosetta unit, my cpu stops runing at 100% and goes back down to normal levels... for a few seconds? a few minutes? half an hour? or stays like that till the end of the job? If it is just for a few minutes, this is what I think is happening, and I've seen it on my slow Linux box too. If it never recovers, then after maybe an hour I'd be feeling like aborting that WU and trying my luck with another. If I am right about what is happening, the short answer is that there is nothing you can do about it. Now for the long answer: What happens at each of the 'round number' steps in Rosetta's progress is that it is changing from analysing one part of the job to go on to do the next. It is almost like starting Rosetta again. It will be calling on parts of the program that have not been used since startup, or since the start of the previous stage. These may have been swapped out of RAM into virtual memory. (Cunningly, they don't go into the swap file, program code is 'swapped' using it's original DLL file, as the code cannot have changed). In addition, Rosetta may need new data from the data files. These may be in the cache, but if these parts of the file have not been accessed before are more likely still onthe hard disk. These have to be read. All of this means that Rosetta is waiting for the virtual memory manager and the disk cache manager to figure out what to do, and then waiting for the hard disk to actually do it. While the process is disk-limited the CPU usage drops to normal levels as you have observed. River~~ edit: PS, welcome to Rosetta and congrats on getting your first few credits! |
ephman Send message Joined: 21 Dec 05 Posts: 4 Credit: 1,410,336 RAC: 0 |
hi, i have a 2.4ghz linux (2.6.12-10-686) box, it's not too slow. the program seems to hang for more then just a few minutes but i've never timed it. i'm going to try and be patient and let it go longer maybe a few hours then. basically you're telling me that there is nothing i can do but wait it out right? thanks ephman ...when it's about 20% done with a rosetta unit, my cpu stops runing at 100% and goes back down to normal levels... |
N7QLT Send message Joined: 19 Dec 05 Posts: 2 Credit: 3,753,965 RAC: 0 |
I'm am brand new here, so advice from long time users is probably more pertinent. On my Linux box I am running 2 project. Rosetta was not switching back and forth gracefully. It seemed to hang as you are describing. I found a hint here somewhere that recommended setting the "Leave in memory" setting to YES. FWIW, Gene |
ephman Send message Joined: 21 Dec 05 Posts: 4 Credit: 1,410,336 RAC: 0 |
i'm very new to this, and i searched for the answer for this question, but how do i actually make that setting change? thanks ephman I'm am brand new here, so advice from long time users is probably more pertinent. On my Linux box I am running 2 project. Rosetta was not switching back and forth gracefully. It seemed to hang as you are describing. I found a hint here somewhere that recommended setting the "Leave in memory" setting to YES. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
i'm very new to this, and i searched for the answer for this question, but how do i actually make that setting change? Thanks Gene you are absolutely right. Sometimes being new is a great help as you've just grappled with the same issue yourself. ephman - sorry about my previous post, it was a complete red herring :-( To set this pref, from this page click on the following links [ home ] My Account Edit/View general preferences Edit preferences then select YES for the appropriate option, and click save (or is it update?) THEN, go to your BOINCmanager, choose the projects tab, highlight Rosetta, and click the Update button. You may still see a minute or three when Rosetta goes quiet, but it should come back to 100% fairly quickly each time - a few minutes no longer. River~~ |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
I am not sure if it was clear from the earlier posts. If you have the power save option turned on for the disk drives they WILL spin down. Then, when it is time to checkpoint, the program is going to have to wait for the drives to "spin-up" which takes about 30 secconds. One of the current issues with Rosetta@Home is that it only checkpoints at the 10/20/etc. % points. So, either live with the pauses, or leave the drive spun up. Programs like SETI@Home checkpoint at very frequent intervals so they will not let the drives sping down. Programs like Rosetta@Home and CPDN checkpoint less frequently, CPDN because the checkpoint file is so large, Rosetta@Home, um, not sure why... But, this IS one of the things on the developers list of things to do ... |
Message boards :
Number crunching :
Hanging Rosetta??? (sorry for the crosspost)
©2024 University of Washington
https://www.bakerlab.org