Message boards : Number crunching : Report problems with Rosetta version 5.34
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Soren Hedberg Send message Joined: 30 Oct 06 Posts: 25 Credit: 3,653 RAC: 0 |
Ahhh, run always did the trick. Thank you very much for the welcome, and the help! :) I'm having a problem with R@H as well. My problem is that I set the program up to use 100% of the CPU (which, according to SPeedfan, it is doing), and I leave the program to do its work on a job that it says should take 4 hours CPU time. However, after about 30 minutes of working at full load, it says that it has only completed 1 minute 30 seconds worth of CPU Time. What's up with that? I have ZoneAlarm and AVG Antivirus working on my computer at the same time, is it a conflict with one of these programs? |
Soren Hedberg Send message Joined: 30 Oct 06 Posts: 25 Credit: 3,653 RAC: 0 |
OK, now it has slowed to a crawl again. No idea why, nothing has changed except I had an error (due to graphics, I think) and I had to get a new project. Any idea what is doing this? |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 86 |
> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time. Only been happening since Rosetta 5.32 and now 5.34. Also happens in Ralph@home. |
Soren Hedberg Send message Joined: 30 Oct 06 Posts: 25 Credit: 3,653 RAC: 0 |
> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time. I'll try a quick reboot. |
Soren Hedberg Send message Joined: 30 Oct 06 Posts: 25 Credit: 3,653 RAC: 0 |
Rebooting worked. You're the man, Conan! |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 86 |
> No worries. Glad it helped. I hope they sort the graphics problem out soon. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Seems I was wrong. The no heartbeat message has nothing to do with the manager. I asked a question on the Dev mail list and go this back: I am only guessing here, but 1. probably bec there is a shared mem area already for other client/app communication and it is convenient to program it that way 2. because when shmem is already being used it is less of an overhead to add a few bytes more to the shmem than to use a separate semaphore or TCP edit: and over a large number of heartbeats (7k/hr) the total overhead may be a lot less than for using 7k semaphore changes. 3. certainly means easier compatibility with win9x whose process handling was very primitive -- more of an issue when BOINC was created than now, but there are still a significant number of win9x clients out there (winMe counts as 9x for this) My issue with the heartbeat system is as follows. It assumes that when the core client does not run for 31sec due to the machine being worked hard on its main mission, "obviously" the app will not have done either. The assumption is that the client will run next due to the priority difference, and by the time the app runs there will be a new heartbeat even if the interval is a lot more than 30sec. My fear is that when paging has occurred in the interval, which is not uncommon if the main mission was so intensive as to keep the box's attention for 31sec, it is very likely that the client is paged first as the app will have run more recently. The app gets given the cpu at the end of the interval despite being a lower priority than the client, as the client is in i/o wait getting its memory back from disk. Hey presto, the app sees there is an interval of over 30sec and no heartbeat. R~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
This task kept running for 11min after being pre-empted and then exited with a heartbeat message. It had run for 2hrs 40min since its last checkpoint, so almost 3hrs was lost. Linux, no gui, slow cpu by today's standards, bench at 250 / 400 with 368 Mb ram. Box is working as network router but with only two 100MHz net cards these cannot be taking all of its attention - especially as typical net traffic it handles is at around broadband speeds. There can be brief surges of work on its main mission when a lot of new TCP connections are set up through the box. The history is that I was trying to get all the BOINC work off the box to check something, the run time pref had started at 24hrs, was reduced to 6hrs and reduced again to 2hrs. At the reduction to 2hrs this task had just over 1 hour on the clock, and boinc_command showed it had checkpointed at about 3460 sec, just under the hour. When the task got to 3hrs 40something minutes it had not advanced the%complete and at that time was preempted by another project. While LHC was running the Rosetta task still showed 3hrs 40something as time run, but as soon as it was restarted its runtime dropped back to under 1 hour. Looking at the messages tab showed the heartbeat message occurred a good 11 min after the pre-empt when the app should not have been still running, and when it is not surprising that the client was no longer sending them. I wonder now if this is new, or wheyther I have just not noticed this before. I wonder if the app is expected to do anything when it is pre-empted, and if LHC wu had been re-started before the Rosetta wu had finished its stand-down? (I don't know the code so apols if this suggestion seems daft to those who do). Even so, 11min seems a long tome for a task to be squeezed out by an equal priority competitor. I have "leave apps in memory" set but of course the heartbeat issue inevitably loses time back to the last checkpoint. At this point, not being willing to wait a second 2hours, I dropped the cpu pref to 1 hour, stopped and started BOINC, and the task exited cleanly. It had done 3 decoys in just under an hour - is it plausible that it should have taken over 2hrs40min to not complete the next? Or was something already wrong by that time in addition to the wierd behaviour after pre-empt? edit: Is it possible that I upset the apple cart somehow by changing the cpu pref downwards from 6hrs to 2hrs? (I have done this before without problems) |
Edwin Send message Joined: 29 Mar 06 Posts: 4 Credit: 69,961 RAC: 0 |
> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time. I did the same thing. Turned off the graphics (screensaver) and since then everything is ok. No more freezings and all workunits are being computed with a result. It seems to me that there is a conflict with XP once the screensaver kicks in and shows the graphics. Perhaps the developers of rosetta can look in to this. greetz, Edwin |
EW-3 Send message Joined: 1 Sep 06 Posts: 27 Credit: 2,561,427 RAC: 0 |
I may be confused, when I check results it indicates client 5.4.11 Where do I check for the Rosetta version? Or is that the same thing? FWIW - like the last poster I stopped any graphics from running (screensaver) and everything is running great. In fact it even seems to be running faster and smoother (no more hanging up at 1.5% for an hour). |
stewjack Send message Joined: 23 Apr 06 Posts: 39 Credit: 95,871 RAC: 0 |
I may be confused, when I check results it indicates client 5.4.11 Select the task tab in the BOINC manager. Under 'application" you should see rosetta 5.34
You don't understand that the progress percentage is only updated when rosetta stores a setpoint. If you restart your computer the WU will restart at the last setpoint There is usually an initialization setpoint. That was your 1.5%! Rosetta does not have fixed setpoints. I once had to wait five and a half hours for my second setpoint! I have my WU set for a completion time of 4 hours, but that was an unusual WU. A Rosetta WU normally setpoints every 10 min to 1.5 hours. If you get your graphics working, you will discover that Rosetta creates a setpoint before each new model starts. Jack |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I may be confused, when I check results it indicates client 5.4.11 The 5.4.11 is the BOINC version your PC is running. You can see the Rosetta version, either in the tasks tab as Stewjack points out, or once a WU is reported back and you've updated to the project it is shown at the bottom of the WU display on the website. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Report problems with Rosetta version 5.34
©2025 University of Washington
https://www.bakerlab.org