Report problems with Rosetta version 5.34

Message boards : Number crunching : Report problems with Rosetta version 5.34

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Soren Hedberg

Send message
Joined: 30 Oct 06
Posts: 25
Credit: 3,653
RAC: 0
Message 30336 - Posted: 31 Oct 2006, 3:47:39 UTC - in response to Message 30333.  

Ahhh, run always did the trick. Thank you very much for the welcome, and the help! :)

I'm having a problem with R@H as well. My problem is that I set the program up to use 100% of the CPU (which, according to SPeedfan, it is doing), and I leave the program to do its work on a job that it says should take 4 hours CPU time. However, after about 30 minutes of working at full load, it says that it has only completed 1 minute 30 seconds worth of CPU Time. What's up with that? I have ZoneAlarm and AVG Antivirus working on my computer at the same time, is it a conflict with one of these programs?

Soren,welome to Rosetta!

I. (Assuming that you meant to say that after 30 minutes CPU Time, the task was showing only a bit more than 1% complete):
Some time ago, longer tasks would appear to be "stuck" at one percent, so the project developers changed the Rosetta application to increase the percent complete from 1.000% by small amounts so that people would not become concerned that the task was "stuck". If this is the case, on the graphic display/ screensaver you will see that the tassk is still working on Model 1. The percentage complete will be updated more realistically after tbe first model is completed.

II. (If you actually *did* intend to say that the task has used only 1.50 seconds of CPU time in a half hour (as measured by a watch or clock), then you can use the Windows Task Manager under the "Processes" tab to check the Rosetta task to see if it's using CPU. Also, check the "General Preferences" and "Rosetta preferences" from your Rosetta Account web page. In particular, check the General preference for "Do work while Computer is in use". If this is set to "No", then Rosetta will suspend itself (stop working) anytime you are working with the computer (typing, using mouse), and for some time afterward. There are also other conditions that must be satisfied in order for Rosetta to run. These are to provide limitations on Rosetta, if necessary, so that it cannot interfere with other work by, say, slowing response times. However, I have Rosetta set to run all the time and never have seen any significant slow-down on other work I do. (Well, perhaps rendering video might be affected, but few more typical tasks like web browsing, word processing, or email).

Note: you can also use the BOINC Manager "Activity" tab to tell Rosetta to "Run Always". This overrides the "Preferences".

There are many, many people here that really want to help you perform your best and have fun, too. Always feel free to post questions and comments!

Again, welcome! Happy Rosetta crunching!



ID: 30336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Soren Hedberg

Send message
Joined: 30 Oct 06
Posts: 25
Credit: 3,653
RAC: 0
Message 30342 - Posted: 31 Oct 2006, 4:41:37 UTC

OK, now it has slowed to a crawl again. No idea why, nothing has changed except I had an error (due to graphics, I think) and I had to get a new project. Any idea what is doing this?

ID: 30342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 456
Message 30344 - Posted: 31 Oct 2006, 4:53:01 UTC

> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time.
Only been happening since Rosetta 5.32 and now 5.34. Also happens in Ralph@home.
ID: 30344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Soren Hedberg

Send message
Joined: 30 Oct 06
Posts: 25
Credit: 3,653
RAC: 0
Message 30346 - Posted: 31 Oct 2006, 4:54:03 UTC - in response to Message 30344.  

> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time.
Only been happening since Rosetta 5.32 and now 5.34. Also happens in Ralph@home.


I'll try a quick reboot.

ID: 30346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Soren Hedberg

Send message
Joined: 30 Oct 06
Posts: 25
Credit: 3,653
RAC: 0
Message 30347 - Posted: 31 Oct 2006, 4:56:53 UTC

Rebooting worked. You're the man, Conan!

ID: 30347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 456
Message 30351 - Posted: 31 Oct 2006, 5:07:50 UTC

> No worries. Glad it helped. I hope they sort the graphics problem out soon.
ID: 30351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30352 - Posted: 31 Oct 2006, 5:14:59 UTC - in response to Message 30201.  
Last modified: 31 Oct 2006, 5:46:10 UTC

Seems I was wrong. The no heartbeat message has nothing to do with the manager. I asked a question on the Dev mail list and go this back:

davea@ssl.berkeley.edu to me, boinc_dev
More options 7:13 pm (1 minute ago)

The manager is not involved.

Applications listen for "heartbeat" messages
(sent via shared memory) from the core client.
Normally it's sent once a second.
If the application doesn't get one in 30 secs,
it prints "no heartbeat" and quits

-- David



wow... why use shared memory for that... I mean that's what semaphores were for... I mean... shared memory has always been much slower than other methods... It's just damned convenient for shared data... (to which a heartbeat does *not* qualify)


I am only guessing here, but

1. probably bec there is a shared mem area already for other client/app communication and it is convenient to program it that way

2. because when shmem is already being used it is less of an overhead to add a few bytes more to the shmem than to use a separate semaphore or TCP edit: and over a large number of heartbeats (7k/hr) the total overhead may be a lot less than for using 7k semaphore changes.

3. certainly means easier compatibility with win9x whose process handling was very primitive -- more of an issue when BOINC was created than now, but there are still a significant number of win9x clients out there (winMe counts as 9x for this)


My issue with the heartbeat system is as follows. It assumes that when the core client does not run for 31sec due to the machine being worked hard on its main mission, "obviously" the app will not have done either. The assumption is that the client will run next due to the priority difference, and by the time the app runs there will be a new heartbeat even if the interval is a lot more than 30sec.

My fear is that when paging has occurred in the interval, which is not uncommon if the main mission was so intensive as to keep the box's attention for 31sec, it is very likely that the client is paged first as the app will have run more recently. The app gets given the cpu at the end of the interval despite being a lower priority than the client, as the client is in i/o wait getting its memory back from disk. Hey presto, the app sees there is an interval of over 30sec and no heartbeat.

R~~
ID: 30352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30353 - Posted: 31 Oct 2006, 5:43:20 UTC
Last modified: 31 Oct 2006, 5:50:59 UTC

This task kept running for 11min after being pre-empted and then exited with a heartbeat message. It had run for 2hrs 40min since its last checkpoint, so almost 3hrs was lost.

Linux, no gui, slow cpu by today's standards, bench at 250 / 400 with 368 Mb ram. Box is working as network router but with only two 100MHz net cards these cannot be taking all of its attention - especially as typical net traffic it handles is at around broadband speeds. There can be brief surges of work on its main mission when a lot of new TCP connections are set up through the box.

The history is that I was trying to get all the BOINC work off the box to check something, the run time pref had started at 24hrs, was reduced to 6hrs and reduced again to 2hrs. At the reduction to 2hrs this task had just over 1 hour on the clock, and boinc_command showed it had checkpointed at about 3460 sec, just under the hour.

When the task got to 3hrs 40something minutes it had not advanced the%complete and at that time was preempted by another project. While LHC was running the Rosetta task still showed 3hrs 40something as time run, but as soon as it was restarted its runtime dropped back to under 1 hour.

Looking at the messages tab showed the heartbeat message occurred a good 11 min after the pre-empt when the app should not have been still running, and when it is not surprising that the client was no longer sending them.

I wonder now if this is new, or wheyther I have just not noticed this before.

I wonder if the app is expected to do anything when it is pre-empted, and if LHC wu had been re-started before the Rosetta wu had finished its stand-down? (I don't know the code so apols if this suggestion seems daft to those who do). Even so, 11min seems a long tome for a task to be squeezed out by an equal priority competitor.

I have "leave apps in memory" set but of course the heartbeat issue inevitably loses time back to the last checkpoint.

At this point, not being willing to wait a second 2hours, I dropped the cpu pref to 1 hour, stopped and started BOINC, and the task exited cleanly.

It had done 3 decoys in just under an hour - is it plausible that it should have taken over 2hrs40min to not complete the next? Or was something already wrong by that time in addition to the wierd behaviour after pre-empt?

edit: Is it possible that I upset the apple cart somehow by changing the cpu pref downwards from 6hrs to 2hrs? (I have done this before without problems)
ID: 30353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Edwin

Send message
Joined: 29 Mar 06
Posts: 4
Credit: 69,961
RAC: 0
Message 30359 - Posted: 31 Oct 2006, 8:05:36 UTC - in response to Message 30344.  

> All I know Soren is that once I turned off the graphics my machine has been working fine. With them on the computer goes slow, locks up and does very little work, needing a reboot half the time.
Only been happening since Rosetta 5.32 and now 5.34.


I did the same thing. Turned off the graphics (screensaver) and since then everything is ok. No more freezings and all workunits are being computed with a result.

It seems to me that there is a conflict with XP once the screensaver kicks in and shows the graphics. Perhaps the developers of rosetta can look in to this.

greetz,

Edwin
ID: 30359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EW-3

Send message
Joined: 1 Sep 06
Posts: 27
Credit: 2,561,427
RAC: 0
Message 30370 - Posted: 31 Oct 2006, 14:56:33 UTC

I may be confused, when I check results it indicates client 5.4.11
Where do I check for the Rosetta version? Or is that the same thing?

FWIW - like the last poster I stopped any graphics from running (screensaver) and everything is running great. In fact it even seems to be running faster and smoother (no more hanging up at 1.5% for an hour).

ID: 30370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stewjack

Send message
Joined: 23 Apr 06
Posts: 39
Credit: 95,871
RAC: 0
Message 30374 - Posted: 31 Oct 2006, 16:02:36 UTC - in response to Message 30370.  

I may be confused, when I check results it indicates client 5.4.11
Where do I check for the Rosetta version? Or is that the same thing?


Select the task tab in the BOINC manager. Under 'application" you should see rosetta 5.34


(no more hanging up at 1.5% for an hour).

You don't understand that the progress percentage is only updated when rosetta stores a setpoint. If you restart your computer the WU will restart at the last setpoint There is usually an initialization setpoint. That was your 1.5%!

Rosetta does not have fixed setpoints. I once had to wait five and a half hours for my second setpoint! I have my WU set for a completion time of 4 hours, but that was an unusual WU. A Rosetta WU normally setpoints every 10 min to 1.5 hours.

If you get your graphics working, you will discover that Rosetta creates a setpoint before each new model starts.

Jack
ID: 30374 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 30388 - Posted: 31 Oct 2006, 19:57:41 UTC - in response to Message 30370.  

I may be confused, when I check results it indicates client 5.4.11
Where do I check for the Rosetta version? Or is that the same thing?

The 5.4.11 is the BOINC version your PC is running. You can see the Rosetta version, either in the tasks tab as Stewjack points out, or once a WU is reported back and you've updated to the project it is shown at the bottom of the WU display on the website.

Rosetta Moderator: Mod.Sense
ID: 30388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Report problems with Rosetta version 5.34



©2024 University of Washington
https://www.bakerlab.org