Rosetta keeps preempting

Message boards : Number crunching : Rosetta keeps preempting

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 14999 - Posted: 29 Apr 2006, 14:20:15 UTC
Last modified: 29 Apr 2006, 14:23:23 UTC

Rosetta keeps preempting after running for only a few minutes, e.g. 1 min, 7 min. Switch between applications every is set to 60 minutes, leave in memory is set on, other Rosetta specifics are as defaults.

Running Boinc 5.2.13 on Windows XP Pro SP2 with Rosetta app 5.07. Only other project is S@H.

Not sure why Rosetta is preempting so quickly. Any ideas? Will the new checkpoint mean the short time frame is valid work progress or is it just starting from the beginning each time?

Many thanks


ID: 14999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15000 - Posted: 29 Apr 2006, 14:23:05 UTC

I don't think it's preempting due to "switch time", It's probably switching to to EDF messages in your "messages" tab. Look there to see if your seeing it switch in and out of EDF.

tony
ID: 15000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15001 - Posted: 29 Apr 2006, 14:28:17 UTC
Last modified: 29 Apr 2006, 14:30:48 UTC

Hi

No sign of EDF messages. This is a sample:

29/04/2006 14:53:12||request_reschedule_cpus: process exited
29/04/2006 14:53:12|SETI@home|Computation for result 05mr99ab.17601.3778.529820.1.137_0 finished
29/04/2006 14:53:12|rosetta@home|Resuming result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 using rosetta version 507
29/04/2006 14:53:14|SETI@home|Started upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Finished upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Throughput 2763 bytes/sec
29/04/2006 15:02:01||request_reschedule_cpus: project op
29/04/2006 15:02:02|rosetta@home|Pausing result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 (left in memory)
29/04/2006 15:02:02|SETI@home|Starting result 05mr99ab.17601.4162.997154.1.52_1 using setiathome version 411

What does that reschedule cpus project op message mean? Not sure why upload complete of a S@H WU caused a preempt of Rosetta.
ID: 15001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15002 - Posted: 29 Apr 2006, 14:46:18 UTC - in response to Message 15001.  

29/04/2006 14:53:12||request_reschedule_cpus: process exited
29/04/2006 14:53:12|SETI@home|Computation for result 05mr99ab.17601.3778.529820.1.137_0 finished
29/04/2006 14:53:12|rosetta@home|Resuming result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 using rosetta version 507
29/04/2006 14:53:14|SETI@home|Started upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Finished upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Throughput 2763 bytes/sec
29/04/2006 15:02:01||request_reschedule_cpus: project op
29/04/2006 15:02:02|rosetta@home|Pausing result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 (left in memory)
29/04/2006 15:02:02|SETI@home|Starting result 05mr99ab.17601.4162.997154.1.52_1 using setiathome version 411

At the end of each result, the work scheduler recomputes the work remaining and decides what to run next. The sample of the messages tab you've posted isn't very large and I don't see it "cycling" every other minute. I see it finished a seti, resumed rosetta, uploaded seti, recalculated and decided to run seti instead. In round robin it uses cpu time/project to decide what to run, and evidently it decided you'd run enough rosetta and needed seti time to balance your requested resource share. I've even seen the work scheduler request 1 second of work. If saw even a one second imbalance, it would switch projects at this time.

You can read the work scheduler definition in the wiki, it's a good read and might help.

tony
ID: 15002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15005 - Posted: 29 Apr 2006, 15:31:46 UTC

OK, previous to this it downloaded new Rosetta work and ran the WU for 1 min before prempting, than it ran Rosetta for approx 7 mins as below. I am not sure of the checkpoint interval for the new Rosetta so wondered if any useful work was being done in these brief spells of CPU time.

I thought the switch time was used so a project would get one hour and not a fraction of an hour. Maybe the scheduler needs to settle down after getting new Rosetta work.

ID: 15005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15007 - Posted: 29 Apr 2006, 15:50:28 UTC

It may be doing what you say, I just don't see it in the small snippet of the log you posted. Post a larger section. a project op-reschedule is done when calling home, or whatever time your "switch between" setting is set at(there might be other reasons a project op is done that fail my memory), but the point is, that unless some event triggers it, it should run for your entire switch between interval (unless it finishes in that time). Switching shouldn't affect your result unless you have "leave in memory" set to NO.

That downloading, running one min, then run another 7 is interesting and I'd like to see more of the log. is this an Hyper Threading or Dual core puter?
ID: 15007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15011 - Posted: 29 Apr 2006, 16:22:53 UTC

CPU is a P4 2.5Ghz before the days of HT or dual cores :-(

I gave up on Rosetta for awhile as the lack of checkpointing seemed to be wasting CPU when you rebooted etc. Saw the news on the new client so I allowed new work and this is the log: Added colors to help ID events. After third go it seems to have settled down. Although why it downloaded so much work is a concern, rosetta only has a resource allocation of 10% and downloaded enough work as if it had 100% allocation for the connect to network every setting. I would expect Rosetta to have only downloaded its allocation so this can only add complexity to the alogrithm the schduler has to work out.

29/04/2006 13:59:42||request_reschedule_cpus: project op
29/04/2006 13:59:44|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
29/04/2006 13:59:44|rosetta@home|Reason: To fetch work
29/04/2006 13:59:44|rosetta@home|Requesting 172800 seconds of new work
29/04/2006 13:59:49|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
29/04/2006 13:59:52|rosetta@home|Started download of rosetta_5.07_windows_intelx86.exe
29/04/2006 13:59:52|rosetta@home|Started download of jump_templates_v2.dat.gz
29/04/2006 13:59:57|rosetta@home|Finished download of jump_templates_v2.dat.gz
29/04/2006 13:59:57|rosetta@home|Throughput 135793 bytes/sec
29/04/2006 13:59:57|rosetta@home|Started download of 1di2_.psipred_ss2.gz
29/04/2006 13:59:59|rosetta@home|Finished download of 1di2_.psipred_ss2.gz
29/04/2006 13:59:59|rosetta@home|Throughput 651 bytes/sec
29/04/2006 13:59:59|rosetta@home|Started download of frags400.txt
29/04/2006 14:00:00|rosetta@home|Finished download of frags400.txt
29/04/2006 14:00:00|rosetta@home|Throughput 2073 bytes/sec
29/04/2006 14:00:00|rosetta@home|Started download of 1di2.pdb.gz
29/04/2006 14:00:02|rosetta@home|Finished download of 1di2.pdb.gz
29/04/2006 14:00:02|rosetta@home|Throughput 8956 bytes/sec
29/04/2006 14:00:02|rosetta@home|Started download of 1di2_.fasta
29/04/2006 14:00:04|rosetta@home|Finished download of 1di2_.fasta
29/04/2006 14:00:04|rosetta@home|Throughput 72 bytes/sec
29/04/2006 14:00:04|rosetta@home|Started download of aa1di2_09_05.400_v1_3.gz
29/04/2006 14:00:36|rosetta@home|Finished download of rosetta_5.07_windows_intelx86.exe
29/04/2006 14:00:36|rosetta@home|Throughput 171096 bytes/sec
29/04/2006 14:00:36|rosetta@home|Started download of aa1di2_03_05.400_v1_3.gz
29/04/2006 14:00:43|rosetta@home|Finished download of aa1di2_09_05.400_v1_3.gz
29/04/2006 14:00:43|rosetta@home|Throughput 83310 bytes/sec
29/04/2006 14:00:43|rosetta@home|Started download of 1mkyA.psipred_ss2.gz
29/04/2006 14:00:44|rosetta@home|Finished download of 1mkyA.psipred_ss2.gz
29/04/2006 14:00:44|rosetta@home|Throughput 923 bytes/sec
29/04/2006 14:00:44|rosetta@home|Started download of 1mky.pdb.gz
29/04/2006 14:00:45|rosetta@home|Finished download of 1mky.pdb.gz
29/04/2006 14:00:45|rosetta@home|Throughput 13889 bytes/sec
29/04/2006 14:00:45|rosetta@home|Started download of aa1mkyA03_05.400_v1_3.gz
29/04/2006 14:00:47|rosetta@home|Finished download of aa1di2_03_05.400_v1_3.gz
29/04/2006 14:00:47|rosetta@home|Throughput 111137 bytes/sec
29/04/2006 14:00:47|rosetta@home|Started download of aa1mkyA09_05.400_v1_3.gz
29/04/2006 14:00:48||request_reschedule_cpus: files downloaded
29/04/2006 14:00:48|SETI@home|Pausing result 05mr99ab.17601.3778.529820.1.137_0 (left in memory)
29/04/2006 14:00:48|rosetta@home|Starting result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 using rosetta version 507
29/04/2006 14:01:01|rosetta@home|Finished download of aa1mkyA03_05.400_v1_3.gz
29/04/2006 14:01:01|rosetta@home|Throughput 101046 bytes/sec
29/04/2006 14:01:01|rosetta@home|Started download of 1mkyA.fasta
29/04/2006 14:01:03|rosetta@home|Finished download of 1mkyA.fasta
29/04/2006 14:01:03|rosetta@home|Throughput 90 bytes/sec
29/04/2006 14:01:03|rosetta@home|Started download of 1b72_.psipred_ss2.gz
29/04/2006 14:01:04|rosetta@home|Finished download of 1b72_.psipred_ss2.gz
29/04/2006 14:01:04|rosetta@home|Throughput 880 bytes/sec
29/04/2006 14:01:04|rosetta@home|Started download of aa1b72_09_05.400_v1_3.gz
29/04/2006 14:01:16||request_reschedule_cpus: project op
29/04/2006 14:01:16|SETI@home|Resuming result 05mr99ab.17601.3778.529820.1.137_0 using setiathome version 411
29/04/2006 14:01:16|rosetta@home|Pausing result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 (left in memory)
29/04/2006 14:01:18|rosetta@home|Finished download of aa1b72_09_05.400_v1_3.gz
29/04/2006 14:01:18|rosetta@home|Throughput 149884 bytes/sec
29/04/2006 14:01:18|rosetta@home|Started download of 1b72.pdb.gz
29/04/2006 14:01:19|rosetta@home|Finished download of aa1mkyA09_05.400_v1_3.gz
29/04/2006 14:01:19|rosetta@home|Throughput 124865 bytes/sec
29/04/2006 14:01:19|rosetta@home|Finished download of 1b72.pdb.gz
29/04/2006 14:01:19|rosetta@home|Throughput 25462 bytes/sec
29/04/2006 14:01:19|rosetta@home|Started download of 1b72_.fasta
29/04/2006 14:01:19|rosetta@home|Started download of aa1b72_03_05.400_v1_3.gz
29/04/2006 14:01:20||request_reschedule_cpus: files downloaded
29/04/2006 14:01:22|rosetta@home|Finished download of 1b72_.fasta
29/04/2006 14:01:22|rosetta@home|Throughput 188 bytes/sec
29/04/2006 14:01:22|rosetta@home|Started download of aa1dtj_09_05.400_v1_3.gz
29/04/2006 14:01:26|rosetta@home|Finished download of aa1b72_03_05.400_v1_3.gz
29/04/2006 14:01:26|rosetta@home|Throughput 195509 bytes/sec
29/04/2006 14:01:26|rosetta@home|Started download of 1dtj.pdb.gz
29/04/2006 14:01:27||request_reschedule_cpus: files downloaded
29/04/2006 14:01:27|rosetta@home|Finished download of 1dtj.pdb.gz
29/04/2006 14:01:27|rosetta@home|Throughput 12664 bytes/sec
29/04/2006 14:01:27|rosetta@home|Started download of aa1dtj_03_05.400_v1_3.gz
29/04/2006 14:01:38|rosetta@home|Finished download of aa1dtj_03_05.400_v1_3.gz
29/04/2006 14:01:38|rosetta@home|Throughput 127723 bytes/sec
29/04/2006 14:01:38|rosetta@home|Started download of 1dtj_.fasta
29/04/2006 14:01:40|rosetta@home|Finished download of 1dtj_.fasta
29/04/2006 14:01:40|rosetta@home|Throughput 84 bytes/sec
29/04/2006 14:01:40|rosetta@home|Started download of 1dtj_.psipred_ss2.gz
29/04/2006 14:01:41|rosetta@home|Finished download of 1dtj_.psipred_ss2.gz
29/04/2006 14:01:41|rosetta@home|Throughput 996 bytes/sec
29/04/2006 14:01:41|rosetta@home|Started download of aa2tif_03_05.400_v1_3.gz
29/04/2006 14:01:49|rosetta@home|Finished download of aa1dtj_09_05.400_v1_3.gz
29/04/2006 14:01:49|rosetta@home|Throughput 130439 bytes/sec
29/04/2006 14:01:49|rosetta@home|Finished download of aa2tif_03_05.400_v1_3.gz
29/04/2006 14:01:49|rosetta@home|Throughput 146385 bytes/sec
29/04/2006 14:01:49|rosetta@home|Started download of 2tif.pdb.gz
29/04/2006 14:01:49|rosetta@home|Started download of aa2tif_09_05.400_v1_3.gz
29/04/2006 14:01:50||request_reschedule_cpus: files downloaded
29/04/2006 14:01:51|rosetta@home|Finished download of 2tif.pdb.gz
29/04/2006 14:01:51|rosetta@home|Throughput 9284 bytes/sec
29/04/2006 14:01:51|rosetta@home|Started download of 2tif_.fasta
29/04/2006 14:01:52|rosetta@home|Finished download of 2tif_.fasta
29/04/2006 14:01:52|rosetta@home|Throughput 78 bytes/sec
29/04/2006 14:01:52|rosetta@home|Started download of 2tif_.psipred_ss2.gz
29/04/2006 14:01:53|rosetta@home|Finished download of 2tif_.psipred_ss2.gz
29/04/2006 14:01:53|rosetta@home|Throughput 664 bytes/sec
29/04/2006 14:02:01|rosetta@home|Finished download of aa2tif_09_05.400_v1_3.gz
29/04/2006 14:02:01|rosetta@home|Throughput 242493 bytes/sec
29/04/2006 14:02:02||request_reschedule_cpus: files downloaded
29/04/2006 14:53:12||request_reschedule_cpus: process exited
29/04/2006 14:53:12|SETI@home|Computation for result 05mr99ab.17601.3778.529820.1.137_0 finished
29/04/2006 14:53:12|rosetta@home|Resuming result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 using rosetta version 507
29/04/2006 14:53:14|SETI@home|Started upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Finished upload of 05mr99ab.17601.3778.529820.1.137_0_0
29/04/2006 14:53:24|SETI@home|Throughput 2763 bytes/sec
29/04/2006 15:02:01||request_reschedule_cpus: project op
29/04/2006 15:02:02|rosetta@home|Pausing result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 (left in memory)
29/04/2006 15:02:02|SETI@home|Starting result 05mr99ab.17601.4162.997154.1.52_1 using setiathome version 411
29/04/2006 15:02:03|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
29/04/2006 15:02:03|SETI@home|Reason: Requested by user
29/04/2006 15:02:03|SETI@home|Reporting 1 results
29/04/2006 15:02:08|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
29/04/2006 16:02:02|rosetta@home|Resuming result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 using rosetta version 507
29/04/2006 16:02:02|SETI@home|Pausing result 05mr99ab.17601.4162.997154.1.52_1 (left in memory)
29/04/2006 17:02:02|rosetta@home|Pausing result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_3976_0 (left in memory)
29/04/2006 17:02:02|SETI@home|Resuming result 05mr99ab.17601.4162.997154.1.52_1 using setiathome version 411

ID: 15011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15013 - Posted: 29 Apr 2006, 16:46:47 UTC

As a "boinc alpha tester" we're supposed to uninstall, delete folders, and do a completely clean install when new versions need testing. This means I have to re-attach to all 12 projects. I noticed that when I reattached, I'd get LOTS of work from EACH project I attached to. I reported this behavior, but it's not been fixed yet (that I've seen). Might be time to mention it again. I have a 3 day cache, and it was like each project was trying to give me 3 days worth, That's bad when deadlines are two weeks. LOL This ofcourse, only happens upon attachment, so it's not a killer most the time.

The scheduler does a "reschedule" each time it calls home, so in your case, it did it each time it downloaded a new wu after attachment. At first it wanted to run Rosetta, then switched to Seti and stayed there until the seti unit was finished, then did a project op and switched to rosetta, then back after the next project op. Some people run low caches so it has to do a reschedule just to ensure it has enough work each time it calls home, but because you have a larger cache, you get to see this every time. Everything you're seeing is normal, and after a while it should settle down for you. I use a 180 minute switch setting so it'll finish most wus without switching (I should probably go to 4 hours + now that Rosetta has a 4 hour run time pref, anyway).

Later in your log, you see it settle down (after initial downloads), and pretty much switch every hour as it should. Remember, It does a reschedule each hour, that doesn't mean it'll actually switch, but it'll check the cpu run time numbers and make it's best decision to keep your requested resource share honored.


Have I rambled on too much? I tried to piece it together so it would be understandable.

tony
ID: 15013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15019 - Posted: 29 Apr 2006, 17:19:42 UTC - in response to Message 15013.  



Have I rambled on too much? I tried to piece it together so it would be understandable.

tony


LOL no, not a ramble, it made sense to me. If you could mention to the devs about the download item with each project downloading 100% of the cache size and not per resource allocation that would be great.





ID: 15019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15025 - Posted: 29 Apr 2006, 17:30:48 UTC

The BOINC scheduling could be improved to say it mildly. However it does quite okay in the long run after some confusion if you change things (attach to projects, change ressource share etc.). If you are concerned about Boinc not being able to finish all WUs consider increasing the ressource share. ;-) It seems from your log BOINC is not yet in EDF-Mode so it should be able to return all results in time.
ID: 15025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15029 - Posted: 29 Apr 2006, 17:40:37 UTC
Last modified: 29 Apr 2006, 17:42:21 UTC

I will mention it again, quite frankly, I reported it and forgot to follow up (read-bug the heck out of him) on it. Rom is supposed to release the next "recommended" boinc client very very soon. He release 5.4.7 yesterday and said:

Howdy folks,
>
> This release should resolve the localization problems and the
> setup program should shutdown BOINC when running as a service before
> checking to see which files are in use.
>
> I'm concidering this a release candidate, if no red flags are
> thrown, this is what we'll release with.
>
> ----- Rom


so, I know a fix won't be in this upcoming release(unless it was fixed in the last couple dev releases, I didn't do a clean install on the last few).
ID: 15029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15030 - Posted: 29 Apr 2006, 17:53:07 UTC
Last modified: 29 Apr 2006, 17:56:02 UTC

mmciastro: When you start from scratch and load up 12 projects, BOINC starts out thinking there is only one project to get WUs from to fill it's cache... then it sees the second project, and sees resource shares cut in half (by default) etc. You're probably overcommited before you reach project #3. So unless it continues to grab a full cache for each project, I think BOINC is behaving as expected, and just using the information it has available at the time.

David@home: BOINC has a debt system. Hey Rosetta, I'm gettin' a little crammed here on SETI deadlines... could I "borrow" some CPU time? I'd glad repay you tomorrow. When BOINC requests xxxx seconds of new work, this is based on the estimate of how much time BOINC expects to be running that project in the coming days, and based on resource share, % of time CPU is available and a number of other factors... but, bottom line, it may have simply ordered some "extra" work, because it's "planning" to crunch a bit more Rosetta than the average over the next couple of days.

Also keep in mind, the cache size is not an absolute thing. BOINC doesn't want to assume you WILL connect to the net exactly at the times it will want to access the net. And so I believe it actually orders twice as much work as you might otherwise figure it "should" based on # of CPUs, resource share etc. This will keep a cache of work, even if the scheduled time to report back the project is missed due to no internet access, or project down, or PC off, whatever. This is why I've seen it recommended that you not set your cache larger than one half of your WU deadlines.

[edit] oh, and as long as you "leave in memory" and aren't powering off the PC very often, your Rosetta work is progressing, even if only a little at a time. And the checkpoint will just help assure that if you've got a fair amount of work done, that it gets saved. Thus, on average, when you DO turn off your computer, you lose much less work in progress.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15031 - Posted: 29 Apr 2006, 17:59:39 UTC - in response to Message 15029.  
Last modified: 29 Apr 2006, 18:05:31 UTC

I will mention it again, quite frankly, I reported it and forgot to follow up (read-bug the heck out of him) on it. Rom is supposed to release the next "recommended" boinc client very very soon. He release 5.4.7 yesterday and said:

Howdy folks,
>
> This release should resolve the localization problems and the
> setup program should shutdown BOINC when running as a service before
> checking to see which files are in use.
>
> I'm concidering this a release candidate, if no red flags are
> thrown, this is what we'll release with.
>
> ----- Rom


so, I know a fix won't be in this upcoming release(unless it was fixed in the last couple dev releases, I didn't do a clean install on the last few).


No worry, I have been involved from early on at S@H and survived the early days (only recently changed my username to this one :-) ). BOINC is always advancing and improving. The development focus seems to be on balancing multiple active projects but I would prefer just to have a reserve project that was only crunched when the main project's cache ran out. I have tried setting the resource to a silly small value but this cache download thingy means that when S@H runs out and the reserve project comes into play means that the cache fills up with Rosetta which then knocks S@H when it comes back on line. I would just like a reserve project to download one WU at a time LOL.

ID: 15031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15033 - Posted: 29 Apr 2006, 18:11:33 UTC - in response to Message 15031.  

I would just like a reserve project to download one WU at a time LOL.

You might want to put 2cents worth in to River~~'s ASAP thread then.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15064 - Posted: 30 Apr 2006, 3:45:45 UTC

Not completely on topic I suppose, but I have noticed that on my machines, a manual update of SETI will almost always trigger a swap to Rosetta, or a Swap from Rosetta. SETI is the only project I am running that will do this with a very hight degree of consistency.

As to the download of a lot of work after pausing and restarting a project. If you are attached and just suspended, the project can still run up a debt (at lease it did in earlier BOINC versions). So when the project is resumed it would naturally load enough work to fill out the debt. Since BOINC has no clue about the "Time" setting at Rosetta, the calculation for how much to load would be based on the last Work Units you ran and the run length BOINC thinks they will take.

All of this will settle out if BOINC is allowed to run for a while to adjust.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15106 - Posted: 30 Apr 2006, 23:32:22 UTC

I believe it's simply the dispatcher, it gets kicked off whenever there's a change that might result in him changing his mind about what should be running right now. Downloading a WU is one of those many events. So is suspending a WU, or resuming a project. These can all cause it to reevaluate what it "SHOULD" be running right now, and swap out... this is why the new checkpointing implementation in Rosetta is going to show a nice bump in the daily credits chart. The dispatcher runs more than you might think, and when he's tipping on the edge, he can change his mind frequently if other events cause him to step in and reevaluate... otherwise, we chimes in on your "switch between applications every..." setting.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15138 - Posted: 1 May 2006, 7:11:27 UTC - in response to Message 15106.  

I believe it's simply the dispatcher, it gets kicked off whenever there's a change that might result in him changing his mind about what should be running right now. Downloading a WU is one of those many events. So is suspending a WU, or resuming a project. These can all cause it to reevaluate what it "SHOULD" be running right now, and swap out... this is why the new checkpointing implementation in Rosetta is going to show a nice bump in the daily credits chart. The dispatcher runs more than you might think, and when he's tipping on the edge, he can change his mind frequently if other events cause him to step in and reevaluate... otherwise, we chimes in on your "switch between applications every..." setting.


Definitely good news to see better checkpointing available in Rosetta.

ID: 15138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David@home
Avatar

Send message
Joined: 7 Oct 05
Posts: 29
Credit: 185,330
RAC: 0
Message 15140 - Posted: 1 May 2006, 7:38:34 UTC

Just checked the WU in my logs below as it has completed. It only did just over 3 hours of CPU time. I guess those two short bursts of CPU activity meant that Rosetta decided it could not complete the fourth hour in time. I think the scheduler should keep to the one hour switch time and not preempt projects before this time.

ID: 15140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15186 - Posted: 1 May 2006, 16:53:54 UTC - in response to Message 15140.  

I think the scheduler should keep to the one hour switch time and not preempt projects before this time.

I agree with your point, which I believe is that having the scheduler kick in more than your preference can be disruptive. But let's look at it from the scheduler's point of view.


[a narrative of the scheduler talking to itself under David's proposed model]

"...ok so now he's resumed the only WU from this project which was previously suspended... I wonder if I need to run some work for that project... ohhh, I've got a lot of debt to that project! I should run everything I can... but David doesn't want me to interrupt anyone and I just started another project's WU 10 minutes ago, so I'll just sit with 1 of the dual processors active, because 10 minutes ago I only had one WU to work on... and I'll wait another 50 minutes before I fire up the WU David just said to "Resume".

Again, I agree with your point. But if I resume a WU, and the scheduler DOESN'T wake up and run what I want it to, I'm not going to be pleased either.

A similar scenerio occurs when a new WU is downloaded, rather than resumed. It wasn't previously a choice for scheduling, and now it is. If the project has been down for some time, and I finally got a WU, I really should go crunch that.

It is for these reasons that the scheduler, while not the perfect answer, is doing a reasonable job. And also that project checkpointing is so key. I'm very glad they found a way to pull it off with minimal overhead. Our productivity is going to rip up the charts!
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rusty Lafavour

Send message
Joined: 12 Apr 06
Posts: 4
Credit: 26,391
RAC: 0
Message 15426 - Posted: 3 May 2006, 20:06:54 UTC


ID: 15426 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Rosetta keeps preempting



©2024 University of Washington
https://www.bakerlab.org