Posts by PresterJohn

1) Message boards : Number crunching : unable to increase work cache size (Message 4166)
Posted 24 Nov 2005 by PresterJohn
Post:
Bill,

your xml edit did the trick. after i restarted boinc, it began downloading more WU's. i hit the 100 limit, so i don't think that machine has enough to tide me over the holidays but it's better than it was before.

much appreciated. :)

ps. sorry for the late reply. all hell broke loose yesterday on what should have been a quiet day and was busy plugging holes...
2) Message boards : Number crunching : unable to increase work cache size (Message 4025)
Posted 23 Nov 2005 by PresterJohn
Post:
is there a way to dbl-check the version # within the client?


Help/About BOINC Manager. 5.2.7 is, I think, the latest recommended release.

Again, I know some people are paranoid about showing their computer info, but click on my name and look at what it shows for mine. It's not everything it will show YOU for your computers, but it would let us see what you're running, look at results, etc...


it shows that i'm running 5.2.6

i'll give the xml edit a try in the morning. my preference settings are all correct, ie. there is no restriction on 'Do work only between the hours of'.

thanks for the help so far. :)
3) Message boards : Number crunching : unable to increase work cache size (Message 3979)
Posted 22 Nov 2005 by PresterJohn
Post:
This project does it exactly the same way every other BOINC project does it...


I would like to emphasize this. Bill Michael is absolutely true, this is not Rosetta-specific and every single BOINC project behaves the same.


if this is standard for all BOINC projects then chalk up my remarks to ignorance since rosetta is the first boinc project i've ever crunched.

nothwithanding...here does seem to be other parameters which affect WU's and their download, such as the connect interval which David Kim has since reduced but which could be further tweaked, i think. i know of some fad users who are on dial-up and this project is not that dial-up friendly...
4) Message boards : Number crunching : unable to increase work cache size (Message 3978)
Posted 22 Nov 2005 by PresterJohn
Post:
>>5.3.x is beta code. Unsupported.

i could be misquoting the version since i am doing it off the top of my head. is there a way to dbl-check the version # within the client?
5) Message boards : Number crunching : unable to increase work cache size (Message 3977)
Posted 22 Nov 2005 by PresterJohn
Post:
is the 100 WU per day limit a per machine basis or a per user basis?


It's per machine. With your computers hidden, I can't see any information that I could use to help out... definitely MUST know BOINC version, and would help to know Result Duration Correction Factor for the machine in question. Here is mine - if yours is drastically different, that could be part of the problem.

% of time BOINC client is running 99.133 %
While BOINC running, % of time host has an Internet connection 100 %
While BOINC running, % of time work is allowed 99.9788 %
Average CPU efficiency 0.995069
Result duration correction factor 1.051188


here are the numbers for the machine in question:

% of time BOINC client is running 98.3857 %
While BOINC running, % of time host has an Internet connection 100 %
While BOINC running, % of time work is allowed 35.231 %
Average CPU efficiency 0.984731
Result duration correction factor 0.395016

i don't know why the bold-faced stuff is showing at 35%???
6) Message boards : Number crunching : unable to increase work cache size (Message 3971)
Posted 22 Nov 2005 by PresterJohn
Post:
however, it seems that even setting 'Connect to network every XX days' to the max of 10, i only seem to be able to d/l enough work to last for approx 3 days. what gives?!?


Are you running BOINC V5.x? Prior versions did not use the "Duration Correction Factor", and for many people, a "10-day cache" was actually more like 3 days. The newer BOINC is much more accurate. Or, as the other posters have said, you may be hitting the maximum-per-day. In that case, you may have to set it to 10 a few days before you actually need 10 days worth, and connect daily, to get that many stored up on that fast a computer.


all machines are running 5.3.x and using the same preferences for the cache size, etc.


>>In that case, you may have to set it to 10 a few days before you actually need 10 days worth

unfortunately, even prepping the machine several days in advance isn't doing the trick.

just my opinion, but the entire way the WU's are managed for this project really leaves a lot to be desired. and if Rosetta wants to take its place in the forefront of preferred DC projects, this really needs to be reviewed and amended.
7) Message boards : Number crunching : unable to increase work cache size (Message 3968)
Posted 22 Nov 2005 by PresterJohn
Post:
Yes and I should have added, also make sure you wait while the scheduler gets the WUs in - if you haven't hit the daily quota yet. Your mileage may vary, but mine does around 12 downloads, pauses for 10 minutes, does another 10 downloads, pause, etc. until either the 10 day cap or the daily quota is reached. Don't expect one big long download of 400 work units or anything like that.


that interval has since been shorted (i'm seeing 4 minutes) but i am giving the scheduler plenty of time. the machine is question hasn't downloaded any more WU's for over 2 hrs.
8) Message boards : Number crunching : unable to increase work cache size (Message 3967)
Posted 22 Nov 2005 by PresterJohn
Post:
There's also a limit because of clients that just error out all the time. One machine that errors out a WU every X seconds, can go through a lot of WUs in a day.

PresterJohn - can you determine if you're hitting your daily quota? Also note that your quota can decrease depending on how many client errors you're generating.


Andrew & Stephan,

is the 100 WU per day limit a per machine basis or a per user basis?

if it's a per machine basis, then i'm not anywhere close to hitting the 100 WU/day limit. and if it's a per user basis, IMO perhaps this is something that needs to be reconsidered for the good of the project since it seems to set unrealistically low.

I was able to d/l 52 WU's on monday morning after uploading two days worth of completed jobs and today, i was only able to d/l 18 jobs. the queries that i'm getting are taking approx 80 minutes each to finish...thus 55 cached WU's is only about 3 days worth.

the frustrating part of this is that i have slower machines which appear to able to download more cached WU's than my faster machines. not anywhere even close to 5 days worth (much less 10 days), but they appear to be able to receive more than some of the other machines.

none of my machines are giving are giving me regular client errors. during the three weeks of crunching r@h, i've had two client errors (on two different machines) and aside from a brief hiccup with stalled WU's which the 4.79 upgrade seemed to have fixed, everything appears to be working correctly.
9) Message boards : Number crunching : unable to increase work cache size (Message 3945)
Posted 22 Nov 2005 by PresterJohn
Post:
in anticipation on the long thanskgiving weekend, i want to increase the work cache on my crunchers so that they will have enough work to crunch for the 5 days that i will be away on holiday (there will not be any internet access during that time for these machines).

however, it seems that even setting 'Connect to network every XX days' to the max of 10, i only seem to be able to d/l enough work to last for approx 3 days. what gives?!?

in addition, is there a correlation between the size of the work cache and the cpu benchmark for each machine?

logically, if a powerful machine is being used, shouldn't it download more WU's than a less powerful machine (assuming they are using the same preference setting) because the faster machine will complete more WU's per 24hrs than the slower PC? my casual observation over the last two weeks doesn't seem to bear this theory out.

it's a shame that w/this project, users can't even properly download more work when needed and are bound by some arbitrary limit. i guess my machines will sit idle after all their WU's are completed because they won't be able to get enough work. a pity...
10) Message boards : Number crunching : client upgrade, stalled WU's - what is the cause and the fix??? (Message 3291)
Posted 15 Nov 2005 by PresterJohn
Post:
the stalled job that i killed on one of my machines this morning had d/l'ed 4.79 yesterday. how can i verify that it is indeed running the new version?


In the Work tab, the Application column. Mine shows "rosetta 4.79" at the moment.


yep, i noticed version # listed in the application column about a minute ago and it did say 4.78. but how exactly does the software know to use 4.79?

just now i attempted to manually force 4.79 to load by renaming the 4.78 exe. it took two restarts on boincmgr to get 4.79 to load but in the process it cleared out my queue and it attempted to download 4.78 again.

--- quoted from message log ----------------------

2005-11-15 11:01:03 [---] request_reschedule_cpus: start failed
2005-11-15 11:01:03 [rosetta@home] Computation for result 1hz7A_abrelaxmode_random_gauss_fix_bb_jitter03_110659_0 finished
2005-11-15 11:01:03 [rosetta@home] Starting result 1n0u__abrelaxmode_random_length20_jitter02_omega_16322_0 using rosetta version 479
2005-11-15 11:01:26 [rosetta@home] Finished download of rosetta_4.78_windows_intelx86.exe
2005-11-15 11:01:26 [rosetta@home] Throughput 209181 bytes/sec
2005-11-15 11:02:01 [rosetta@home] Fetching master file
2005-11-15 11:02:06 [rosetta@home] Master file download succeeded
2005-11-15 11:02:12 [rosetta@home] Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
2005-11-15 11:02:12 [rosetta@home] Reason: To fetch work
2005-11-15 11:02:12 [rosetta@home] Requesting 728251 seconds of new work, and reporting 41 results
2005-11-15 11:02:17 [rosetta@home] Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
2005-11-15 11:02:18 [---] request_reschedule_cpus: files downloaded
2005-11-15 11:02:18 [---] request_reschedule_cpus: files downloaded
2005-11-15 11:02:18 [---] request_reschedule_cpus: files downloaded
2005-11-15 11:02:18 [---] request_reschedule_cpus: files downloaded
2005-11-15 11:02:18 [---] request_reschedule_cpus: files downloaded


something does not look right here!
11) Message boards : Number crunching : client upgrade, stalled WU's - what is the cause and the fix??? (Message 3286)
Posted 15 Nov 2005 by PresterJohn
Post:
see my question #1...

the stalled job that i killed on one of my machines this morning had d/l'ed 4.79 yesterday. how can i verify that it is indeed running the new version?
12) Message boards : Number crunching : client upgrade, stalled WU's - what is the cause and the fix??? (Message 3284)
Posted 15 Nov 2005 by PresterJohn
Post:
1) for the windows version of the client, is there a way to tell what version number of the software i am running?

i see that my machine downloaded rosetta_4.79_windows_intelx86.exe but how can i tell if it is actually running 4.79? i see no mention of the 4.79 executable being started in my stdoutdae.txt


2) i've skimmed thru some of one or two of the related threads about WU's stuck at 1%, etc and correct me if i'm wrong, but it seems that there is a number of different possibilities and no one seems to know what exactly is the cause of the problem.

since this weekend, i've had approx 5 occurrences of stalled WU's. in two of those cases, the client kept happily trying to finish and eventually wasted 43.8 and 14.5 hrs respectively only to return a client error as the final outcome (see links below).

http://boinc.bakerlab.org/rosetta/result.php?resultid=1626055

http://boinc.bakerlab.org/rosetta/result.php?resultid=1368310

the other three occurrences were cases of 'active' stalled jobs (the latest of which i discovered 90 minutes ago), which were aborted by user intervention. all told, probably over 120 hrs of wasted time and money (electricity in nyc isn't cheap you know) doing absolutely nothing useful.

so understandably, i am not in a particularly happy mood about this and would like to know what is being done to diagnose and fix this problem.

i would rather not hear suggestions about running boincview or checking my boxes more frequently. in the two sites that i run r@h, boincview will not work for one of them because the highly secured router/switch environmment locks out the bionc service port. find-a-drug users are/were accustomed to a client that ran smoothly with a minimum of user intervention and administration. an occasional bad batch of WU's being pushed out to users i can understand and live with, but unexplained, unreproducible errors which might be occurring on a frequent basis and which could result in nonproductive conditions that may last for days is almost untenable.

we have some large crunchers on our team and the extra overhead to manage and check host machines to make sure they are properly working is entirely unsatisfactory and will probably negatively impact the number of participants interested in running rosetta.

[edit] fixed typo in thread subject.
13) Message boards : Rosetta@home Science : problems downloading WU's (Message 2738)
Posted 9 Nov 2005 by PresterJohn
Post:
I changed it down to 2 minutes instead of 10.


many thanks, David! much appreciated. :)
14) Message boards : Rosetta@home Science : problems downloading WU's (Message 2719)
Posted 9 Nov 2005 by PresterJohn
Post:
No, it is our boinc server configuration file, and yes, I changed the value yesterday to reduce the amount of work units sent to a host that may be getting too many. Are you not getting enough work units to keep crunching?


actually on that new install, when i checked it this morning it was out of work (completed 6 WU's). some of my crunchers do not have 24 hr internet connectivity. (this is because of self-imposed security concerns with the nature of the work that these machines perform)

just my opinion but unless you have a very real issue with downloading abuse (or are bound by server/connectivity issues), i question why min_sendwork_interval has to be increased at all, particularly now when a host of potential members may be evaluating this project for the first time. i can see where a new user might find it disconcerting to see numerous "Not sending work" entries in their message log; it's not an ideal way to put your best foot forward for this project, imo.

if you allow your particpants to define their own preferences for connect frequency, then at some point you need to trust in their being responsible enough to manage this for themselves. of course, a newbie cruncher might innocently set this value out of ignorance to an excessive value. in which case, user education (through the form of public reminders on the home page) is probably a good start.

in short, empower your users rather than arbitrarily defining limits. it sounds radical but it actually works, sometimes. :)
15) Message boards : Rosetta@home Science : problems downloading WU's (Message 2683)
Posted 9 Nov 2005 by PresterJohn
Post:
David,

can i override this setting by creating my own config.xml file? there is a myriad of settings like db_name, db_host, etc...do i need to define these?

a sample file would really help...

also, was min_sendwork_interval changed in the last 24 hrs on the back-end (server-side)?

ps. [Tank] 2 thumbs up on your animated sig. cute. :-)


ackkk....it's late. need to catch some shut-eye!
16) Message boards : Rosetta@home Science : problems downloading WU's (Message 2666)
Posted 8 Nov 2005 by PresterJohn
Post:
i've been having problems all day today downloading new work units for my crunchers.

the message i'm seeing is:

Not sending work - last RPC too recent xxx sec

(xxx can range from 90 to 900)

i just did a rosetta install onto a new machine and i'm getting this same message. right now it's crunching on the single WU it downloaded as part of the initial install, but i'm not able to download anything more.

what in blazes is going on?! please tell me this is a server problem and that it's not a client issue.

i've already checked my preferences,etc and everything is OK. i did not have any problems d/l'ing WU's or doing new installs yesterday.

17) Message boards : Cafe Rosetta : thinkers from FaD team (Message 2542)
Posted 7 Nov 2005 by PresterJohn
Post:
[quote][quote]That rules out 4 of my PC's then (the PC's can't all be online at once)... 4 port router, more than 4 PC's :(


do you have a spare hub or switch lying around? connect it to one of the router ports and you can have more pc's online. (you may need a crossover cable to do this, depends on the type of port you have on the hub/switch.)
18) Message boards : Rosetta@home Science : how do general preferences work/how often does the benchmark run (Message 2344)
Posted 5 Nov 2005 by PresterJohn
Post:
I hope I didn't come across as RTFM as it really wasn't my intention :-)


not at all. :-)

i know you didn't just list those links off the top of your head so i definitely appreciate the effort! and if you did just pull those off the top of your head, then you're a *scary* guy. [chuckle]




19) Message boards : Rosetta@home Science : how do general preferences work/how often does the benchmark run (Message 2341)
Posted 5 Nov 2005 by PresterJohn
Post:
thanks for those links...i'll take a look thru them later. i read a little of the BOINCwiki yesterday and was aware than an averaging system was used, just wanted to understand a bit more of the benchmarking and how it fits in the overall scheme of things.

i'm a tech guy and we always like to understand how things work. :) and having those answers makes it easier to answer the usual newbie questions that our own xpc members will ask about the project.

cheers.
20) Message boards : Rosetta@home Science : how do general preferences work/how often does the benchmark run (Message 2319)
Posted 5 Nov 2005 by PresterJohn
Post:
so having just 50% cpu available for rosetta should not affect credits rewarded.


beg your pardon...but how would having only 50% of cpu allocation to the rosetta client NOT affect the calculation of 'points'.

consider the following example:

PC #1: 3ghz cpu with 100% allocation of cpu time to rosetta

PC #2: 3ghz cpu with 50% allocation of cpu time to rosetta (remaining 50% of cpu time allocated to process X)

if both PC's crunch the same WU and PC #2 takes roughly twice as long as PC #1 to complete the WU, wouldn't PC #1 get awarded more pts than PC #2? PC #1 may not get double the points, but it should be significantly more, right?

------

under the find-a-drug system, pts for completed WU's were awarded by a formula:

cpu_score * completion_time_for_WU == pts

the cpu_score is calculated when crunching commences on the WU and approx every 15 minutes thereafter, another sampling benchmark occurs. if the cpu was busy running other tasks during this sampling period, then the cpu_score will decrease; this accurately reflects the real world state of cpu load/resources that was made available to the dc client. and thus the pts awarded decreases as well, given the formula used (completion time would naturally rise also because less cpu time was allocated to the client).

even factoring in the averaging system that rosetta does, wouldn't PC #1 get awarded more points in the example i provided?

which probably leads to a follow-up question: is the benchmark that the client runs uses sensitive to the cpu load on the processor at the time of the sampling?

thanks.


Next 20



©2024 University of Washington
https://www.bakerlab.org