Message boards : Number crunching : Not building up any cache of work after BOINC upgrade.
Author | Message |
---|---|
muddocktor Send message Joined: 11 May 07 Posts: 17 Credit: 14,543,886 RAC: 0 |
Hey you all, I recently upgraded the BOINC client from 6.6.36 to 6.10.18 due to wanting to run some cuda work on Seti while keeping my cpu available for Rosetta. Before the upgrade, I had a decent cache of Rosetta work units. I did the upgrade to BOINC, when I restarted BOINC it trashed all the work units. That was a first for me, as I have upgraded BOINC numerous times in the past while running Seti with no problems. Now, after the client upgrade, my machine won't build up any extra work units as a cache supply of them. And the work units I've been getting are really big, but right now I have a grand total of the 4 work units I'm presently processing and no extra wu's cached for Rosetta. And I have just a small supply of Seti wu's for gpu processing, which I only let it do when the machine isn't in use. I've gone through my computing preferences and it is still showing my previous preferences set for this project. One other thing I just thought of that happened right after the upgrade was that the project registered the computer as another system being added and I had to merge the accounts to get it back to 1 account. I don't like running without at least some kind of cache of work units on my machine. Do you all have any insight why my machine won't get more work cached? |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
Hey you all, I just looked at your computers and they all seem to have a nice cache of more than half a dozen units each. It seems to have fixed itself. |
muddocktor Send message Joined: 11 May 07 Posts: 17 Credit: 14,543,886 RAC: 0 |
Hey you all, Actually, if you look at the QX9650 computer, you will see only 4 wu's being not already finished on it. The 4 shown as In Progress are the ones it is actually processing. And at the very back of the tasks shown for that computer there are 13 tasks shown as No Reply, which were work units that the BOINC client upgrade trashed on this system. That part really cheesed me off there as I've never had a simple BOINC client upgrade just totally bork my wu cache before. |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
Hey you all, Okay try this, BE VERY CAREFUL THOUGH!!! Go to your Boinc/Data folder and do a right click on the file client_state.xml and open it with Notepad. In the Project section look for these lines: <long_term_debt>10.285423</long_term_debt> <duration_correction_factor>1.797723</duration_correction_factor> DO NOT EDIT THE FILE, JUST LOOK AT IT!!!!!!!!!!!!!!!!!!!!!!!!! Now close the file, do NOT save, just close it and post your settings here. Both help Boinc decide how to get work and where to spend the resources. You will have a different set of numbers for each project you are attached to. The above is from Aqua and here are my FreeHal numbers: <long_term_debt>31.437500</long_term_debt> <duration_correction_factor>100.000000</duration_correction_factor>. If you feel uncomfortable doing this DO NOT DO IT!! We are getting technical here and your whole Boinc can get messed up sooo bad it will need uninstalling and re-installing, causing you to lose everything!!! transient has another good idea as well. It basically will reset any funky settings and of course fix any missing or corrupted files too. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
Sorry for hijacking this thread but for some reason I have a similar problem that's only started in the last few days. I'm running BOINC 6.10.36 (the one that was withdrawn) but haven't changed any settings for a good while. My details for Rosetta are: <long_term_debt>0.000000</long_term_debt> <duration_correction_factor>1.447721</duration_correction_factor> And for WCG: <long_term_debt>-178366.388223</long_term_debt> <duration_correction_factor>1.331190</duration_correction_factor> - With a 2 day buffer I usually have 25-30 Rosetta WUs and 1 WCG, but right now have 5 Rosetta (3 running) 4 WCG (1 running). - Run times for Rosetta are 8 hours (WCG varies 5-12 hours) - Resource share between the two are set at Rosetta 1500(93.75%), WCG 100 (6.25%) One thing I have been doing recently is suspending one Rosetta job so a WCG one keeps ticking over as I had one or two short lead time WUs come from WCG that kept swapping application every hour and stopping (I've just increased this to 120mins). I'd have thought this would've consumed any WCG debt but it seems to be doing the opposite (Ave Work done has been steady at 90-100 over the last month). As the WCG WUs keep coming down I did recently increase the Rosetta resource share from 1100 to 1500, but it doesn't seem to have helped either. (I guess when I said above I hadn't changed any settings for a while I wasn't being strictly truthful). Any help appreciated... ;) ![]() ![]() |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Sid, leave it all alone. BOINC will take care of getting the work returned before deadlines. When tasks are suspended the debts and scheduler behaviors change, and with highly variable runtimes on WCG, the scheduler already has it's hands full trying to guess what it will get next. Let it work through the WCG work, then it should correct itself. I haven't heard why the BOINC release was pulled. Were there perhaps work fetch problems there? Rosetta Moderator: Mod.Sense |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
Sorry for hijacking this thread but for some reason I have a similar problem that's only started in the last few days. I'm running BOINC 6.10.36 (the one that was withdrawn) but haven't changed any settings for a good while. As Mod.Sense says just leave it alone but the reason you are seeing the difference is the LTD or Long Term Debt you owe for WCG. It is trying to repay that and will crunch more WCG stuff until it gets back to a more neutral number. You could stop Boinc edit the file and Boinc will use the new numbers, however that doesn't stop what caused the problem in the first place and could lead to bigger problems in the long run. I agree with Mod.Sense, just leave it alone and Boinc will settle the debt and get back to crunching as usual in the next couple of weeks. And it was work fetch issues that caused the pulling of the release, but I do not know if it was this specific problem or not. If you send an PM to Paul Buck he would know. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
I don't know why 6.10.36 was withdrawn but in the version history 6.10.37 just shows one fix relating to project descriptions in XP, which I assumed was the problem so I didn't roll back as I'm runing Vista. Aside from that I've had this version for nearly 3 weeks but only had a problem in the last couple of days. Since posting I've had my first Rosetta WUs download for 2 days, but only one at a time and they start running almost straight away in preference to the WCG ones I have in spite of the debt. The long term debt figure for WCG is now a bigger negative number. In addition I notice the following: Rosetta: <short_term_debt>5753.092075</short_term_debt> WCG: <short_term_debt>-5753.092075</short_term_debt> By now you'll have fully realised I don't know what I'm looking at nor what I'm doing, so I'll gladly take your advice and leave well alone. I'm slightly comforted by the fact that further Rosetta WUs are coming down, albeit piece-meal, but I'm obviously still concerned in case there's a problem with supply from Rosetta in future as I have no buffer. I like WCG but I prefer to run work from here - WCG was only intended to be a back-up when things go wrong. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
And for no apparent reason 21 WUs came down spread over 4 hours. I won't pretend to understand. The debt figures make no more sense either but everything's back to normal. I'll pretend it never happened... ![]() ![]() |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
And for no apparent reason 21 WUs came down spread over 4 hours. Sometimes Boinc works in mysterious ways and just ignoring it does in fact work. Glad you are back to normal again, here to hoping it stays that way! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
In the last 12 hours I've completed 4 and received 9 more Rosetta WUs so I think I'm back at the full buffer again. Panic over. I'm mopping up the last WCG WUs as well (2 out of 3 done) which have earlier deadlines then I'll see when or if any more get called. I expect none for a few days but there's no guessing by me. If I only get Rosetta WUs for a while I'll be happy enough. I'll keep an eye out for a new Boinc manager too. ![]() ![]() |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
I'll keep an eye out for a new Boinc manager too. I am glad all is going well, as for versions of Boinc I have several different versions I am running here, the oldest is 6.4.5 while the newest is 6.10.18. I have one Linux machine but I don't count it in this group, mainly because I have no idea how to update it, it is working just fine so I don't need to figure it out anyway. I have given up trying to keep with all the updates for Boinc! They were going thru one a week for awhile there!! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
I just click on the download link on the Rosetta home page occasionally. It seems to determine my OS and show the latest official Boinc release automatically (not the interim ones) on my various machines from XP through Vista to W7. ![]() ![]() |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
I just click on the download link on the Rosetta home page occasionally. It seems to determine my OS and show the latest official Boinc release automatically (not the interim ones) on my various machines from XP through Vista to W7. I do that too, I just don't act on the info. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
I don't know why 6.10.36 was withdrawn but in the version history 6.10.37 just shows one fix relating to project descriptions in XP, which I assumed was the problem so I didn't roll back as I'm running Vista. Aside from that I've had this version for nearly 3 weeks but only had a problem in the last couple of days. Everything's been running fine for a while. I upgraded to Boinc 6.10.43 and all I've done is increase my resource share to Rosetta, but it's all started happening again in the exact same way. Rosetta: <short_term_debt>67455.183998</short_term_debt> <long_term_debt>0.000000</long_term_debt> <duration_correction_factor>1.512451</duration_correction_factor> WCG: <short_term_debt>-67455.183998</short_term_debt> <long_term_debt>-201848.746995</long_term_debt> <duration_correction_factor>1.459519</duration_correction_factor> All 4 cores are running Rosetta with no backup, in spite of 2.4 days of additional buffer being requested. As soon as one Rosetta WU finishes, WCG gets to run in its place for 2 hours and another single Rosetta WU is grabbed, which then gets it's turn when the 2 hours is up and runs to its end. I know it'll probably resolved itself as it did before if I leave it alone, but it's still very odd. I haven't been suspendingresuming tasks this time (honest!) ![]() ![]() |
Message boards :
Number crunching :
Not building up any cache of work after BOINC upgrade.
©2025 University of Washington
https://www.bakerlab.org