Posts by Ingleside

1) Message boards : Number crunching : Turn off Virtualbox task from host details (Message 105603)
Posted 20 Mar 2022 by Ingleside
Post:
If you can't do the Tasks then there's no point having an option to skip/accept tasks.

The point of having the option is to not get spammed with useless "VirtualBox is not installed" notices.
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 105027)
Posted 19 Feb 2022 by Ingleside
Post:
You need a intel chip

No, you need a CPU compatible with intel-x86, but it doesn't matter if this is a "true" intel CPU or AMD or whatever.
Also, it doesn't matter if it's a "real" 32-bit CPU, or if it's a 64-bit CPU that is also intel-x86 compatible.

As you can see from the application-page, if you're using 32-bit Linux/Windows none of the application mentions AMD at all, but if you're using either OS you can still download and crunch work units with your AMD.
3) Message boards : Number crunching : Rosetta lost the lottery (Message 72924)
Posted 29 Apr 2012 by Ingleside
Post:
If you zero your debts and flip the cache settings from their version 6 settings all will be okay again.

"debts" is deprecated and isn't used in v7.

Instead REC = "Recent Estimated Credit" is used, and this should atleast in theory be better than the old debts-method was...

One effect of the changed scheduling is that fewer BOINC-projects will have work on client at any given time.
4) Message boards : Number crunching : About ready to REMOVE Rosetta@home (Message 71600)
Posted 12 Nov 2011 by Ingleside
Post:
R@h has never claimed to run in 96MB.
Settings in WU parameters files will not effect the actual amount of memory utilized as a task runs, unless the task exceeds an upper limit.

What do you feel the problem with the parameter file is? How are you thinking it is impacting your experience running R@h?

Hmm, let's select a random Rosetta-task...
<workunit>
<name>Aug20_needle_11start_flipped_select2_test_SAVE_ALL_OUT__35323_4963</name>
<app_name>minirosetta</app_name>
<version_num>317</version_num>
<rsc_fpops_est>40000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>500000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>100000000.000000</rsc_memory_bound>
<rsc_disk_bound>300000000.000000</rsc_disk_bound>
<command_line>
@needle.run.flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip Aug20_11start_flipped_select2_needle.zip -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1885853
</command_line>
<file_ref>
<file_name>Aug20_11start_flipped_select2_needle.zip</file_name>
<open_name>Aug20_11start_flipped_select2_needle.zip</open_name>
</file_ref>
<file_ref>
<file_name>needle.run.flags</file_name>
<open_name>needle.run.flags</open_name>
</file_ref>
</workunit>


Let's see, "<rsc_memory_bound>100000000.000000</rsc_memory_bound>" = 97656.25 KB = 95.36743 MB. So yes, Rosetta@home definitely claims users computers only needs 96 MB memory to run this task...

The BOINC scheduling-server uses this info, together with users memory-settings (the highest of the two "use at most %memory"-settings, even if user has switched them around) to decide if a given client can run such a task at all. If example user has selected max 90 MB, this task won't be sent to client. If on the other hand user has selected max 100 MB memory, this task will be downloaded, for so nearly immediately erroring-out because it exceeds users memory-settings. It doesn't matter if the computer has 1 GB or more memory, as long as it exceeds the user-set memory-settings the task will error-out.

The behaviour of <rsc_memory_bound> was changed with the introduction of the memory-preferenses set by users, so even all Rosetta-tasks uses 400 MB memory or something, as long as it's below the user-preference-limit it won't error-out. Only very old clients like v5.2.xx and earlier should error-out solely due to <rsc_memory_bound> but since <rsc_memory_bound> is being used by the scheduling-server it's still a bad idea to set the limit much too low like Rosetta@home is doing. It's a big enough problem that scheduling-server doesn't take into account #cores then giving-out work if you're not making it even worse by setting the limit too low.

Hmm, this task has now hit 803 MB memory-usage, this is 8.4 times more than the limit indicates... Thankfully this computer has more than 1 GB memory, otherwise so much memory gobbled-up by a single task definitely would give problems with other usage of the computer.

5) Message boards : Number crunching : About ready to REMOVE Rosetta@home (Message 71597)
Posted 12 Nov 2011 by Ingleside
Post:
Difference between current and prior behavior do not necessarily indicate anything is broken, nor in need of a "fix".

I'm sure they will improve it if they can.

Well, Rosetta@home apparently still claims a task only needs 96 MB memory, but under 1 minute from start it's already using 400 MB and has now increased to 460 MB. Rosetta@home using the wrong wu-parameters isn't something new, Rosetta@home still uses the same broken wu-parameter-file as they used roughly 4 years ago, see the old thread about System requirements????.

This problem is easily fixed by Rosetta@home, by updating their wu-parameter-file.
6) Message boards : Number crunching : Please disable upload certificates? (Message 71392)
Posted 8 Oct 2011 by Ingleside
Post:
A quick test reveals Rosetta@home still haven't fixed their broken upload-server:

08.10.2011 14:33:09 | rosetta@home | [fxd] starting upload, upload_offset -1
08.10.2011 14:33:09 | rosetta@home | Started upload of place_CwEfDw_20111005_EBOV_GP3_1jl1_ProteinInterfaceDesign_05Oct2011_33944_360_0_0
08.10.2011 14:33:09 | rosetta@home | [file_xfer] URL: http://srv6.bakerlab.org/rosetta_cgi/file_upload_handler
08.10.2011 14:33:18 | rosetta@home | [file_xfer] http op done; retval 0 (Success)
08.10.2011 14:33:18 | rosetta@home | [file_xfer] parsing upload response: <data_server_reply>    <status>0</status>    <file_size>0</file_size></data_server_reply>
08.10.2011 14:33:18 | rosetta@home | [file_xfer] parsing status: 0
08.10.2011 14:33:18 | rosetta@home | [fxd] starting upload, upload_offset 0
08.10.2011 14:33:27 | rosetta@home | [file_xfer] http op done; retval 0 (Success)
08.10.2011 14:33:27 | rosetta@home | [error] Error reported by file upload server: invalid signature
08.10.2011 14:33:27 | rosetta@home | [file_xfer] parsing upload response: <data_server_reply>    <status>-1</status>    <message>invalid signature</message></data_server_reply>
08.10.2011 14:33:27 | rosetta@home | [file_xfer] parsing status: -128
08.10.2011 14:33:27 | rosetta@home | [file_xfer] file transfer status -128 (permanent upload error)
08.10.2011 14:33:27 | rosetta@home | Giving up on upload of place_CwEfDw_20111005_EBOV_GP3_1jl1_ProteinInterfaceDesign_05Oct2011_33944_360_0_0: permanent upload error


7) Message boards : Number crunching : Servers? (Message 68298)
Posted 31 Oct 2010 by Ingleside
Post:
My machine received no new work for at least 24 hours, even though you say that the site was down for only ten hours. Do you know the algorithm for reconnecting after an outage?

It doesn't seem anyone has answered this yet...

If user doesn't manually hits "update", the v6.10.xx-clients uses the following rules:

1: If can't make connection to Scheduling-server, the client does a random backoff between 1 minute and 4 hours.
If makes 10 failed connections in a row, tries to download the projects home-page. If client fails to download the home-page, client takes a 24-hour backoff.

2: If makes a connection, but if gets messages like "Project is shut down", or "Can't open database" or similar, server-side orders client on a 1-hour backoff.

3: If project is up, but client for one of many possible reasons doesn't get any work, client does a random backoff. This random backoff you won't see, except if you select the project on project-tab, and hits "properties". The random backoff starts with a 1-minute upper limit for 1st. failed work-request, and for each successive failed work-request the upper limit is doubled, meaning 1-minute, 2-minutes, 4-minutes, 8-minutes, ... , upto a max upper limit of 24 hours.

4: Rosetta@home AFAIK uses so old server-code, that if you hits your daily quota (due to many errors), you'll be deferred until midnight server-side + upto 1 hour random backoff.

5: Depending on Rosetta@home's server-code, if it's old enough since last time upgraded, it's also possible you'll get a 24-hour deferral if you hits limits like "not enough memory" or "not enough free disk space" and so on.

Client also includes a couple additional reasons for not asking for work, even work is needed:
6: If one (or more) downloads is currently backing-off, all work-request to project is blocked. (The project-wide deferral on downloads doesn't count, only if an individual download has a backoff).
7: If #tasks that has one or more files to upload/is uploading is > 2* #cpu's, all work-request to project is blocked.

For #7, since your computers is dual-cores, it's counted as 2 cpu's in BOINC, meaning if you've got 5 or more tasks that wants to upload file(s), work-request is blocked.

As for which of these rules you've been affected with I don't know...




BTW, since a bank-analogy apparently is popular in this thread, you can look on BOINC as a "bank" that only handles 3 types of bank-jobs, this is for customers to put money into their account, take money out of their account, and for customers to ask for how much money they've got on their account.

For each customer, something like this will happen:
a: Customer tells their account-number.
b: Bank-employee looks-up account-number.
c: Customer shows their identification.
d: Bank-employee verifies the identification is ok (like name is correct for account-number).
e: Customer specifies he wants to take-out some money.
f: Bank-employee checks how much money is available on the account.
g: If enough money on account, bank-employee grabs the staple of 10-dollar-bills, and starts counting-out 10-dollar-bills until:
g1: There's no more 10-dollar-bills.
g2: The account doesn't contain enough money to get another 10-dollar-bill.
g3: The customers specified amount is reached.
h: The customer needs to write his signature, before he gets the money the bank-employee has counted-out for him.
i: The bank-employee records the number of 10-dollar-bills given-out to the customer, and the account-info is updated with the new amount of money on it.

If instead customer wants to deposit money, step a-d is the same, while step e is changed:
e: Customer specifies he wants to deposit some money.
f: Customer handles the bank-employee some money.
g: Bank-employee counts-up how much money it is.
i: The bank-employee records how much money he's got, and updates the account-info with the new amount of money on it.

Or, instead the customer only wants to know how much money is on his account. Step a-d is still the same, while step e is changed:
e: Customer specifies he wants to know how much money is currently on his account.
f: Bank-employee checks how much money is available on the account.
g: Bank-employee tells the customer how much money is on the account.


As you can see from this, even just looking-up the account-info is 7 steps, while giving-out some work is only 2 additional steps, meaning 9 steps total.

Also, then the bank-employee in step g is sitting with a staple of 10-dollar-bills, counting-out example 12 bills instead of only counting-out example 2 bills doesn't take much extra time, so both for the customer and the bank it's better he get 12 bills at once, than he gets only 2 bills, and must stand in line 5 times more to get an additional 10 bills...

For the bank, 12 bills at once is 9 steps, while 6 x 2 bills is 54 steps...


Well, for BOINC it's not exactly the same, since each task has it's own update. But, for each Scheduler-request you'll still need to look-up user_id, host_id, preferences, update computer-info, possibly update preferences. So, if on average assumes you don't need to update preferences, each scheduler-request is 4 steps + 1 per task.
Meaning, 12 tasks as a single scheduler-request means 16 database-hits.
12 tasks as 6 scheduler-requests means 6 * 4 + 12 = 36 database-hits.

And, this is only by counting the work-requests, on top of this you'll get the hits when you report finished tasks...
8) Message boards : Number crunching : Server error: can't attach shared memory (Message 57494)
Posted 2 Dec 2008 by Ingleside
Post:
Ok, I played around with some of my computers that hadn't fixed themselves yet, and I discovered that after 11 "project is down" messages, the next try will download the master file. It doesn't matter if the messages are from normal retries or from pressing update.

If you press update, you only need to wait until the "project is down" message appears in the messages tab. The usual 4 minute wait between requests doesn't apply here.

Can someone confirm if this works with other versions of BOINC? (I'm using 5.2.13)

With so old client it could be an idea to upgrade...

As for getting the new scheduler-url, BOINC-client since v3 (if not even earlier), will re-read the master-file if 10 failed connections. As "failed connection" doesn't only count instances there the server is down/unreachable, but also instances there client asks for work but scheduling-server doesn't give you any work 10 times in a row. So, most clients should get the new url after a few hours.

So, most users should have gotten the new url by now. If they've not got it, possible reasons are:
1; Then tried to download master-url the web-server was unreachable (the front-page, not the scheduling-server). If can't download the master-url, the BOINC-client will wait 24 hours before re-trying... For old clients, (don't remember exact version), the delay for failed master-url is 1 week...
2; Not sure if it's a problem for anyone, but as far as the web goes, it is possible client is getting served-up an old copy of front-page...

9) Message boards : Number crunching : Can a 'malformed' Workunit crash BOINC? (Message 54144)
Posted 2 Jul 2008 by Ingleside
Post:
That's a good point. I don't like doing mass emails but it may be worth doing one in this case.

Many emails will bounce, get lost in spam-folder and so on, and not to forget some users has configured to not get email from project. Also, even if user does read email, it's not always he's got ready access to all computers so can abort the corrupt wu's.

But, BOINC does have the option to abort wu's on client, in case wu's is cancelled server-side. For this to work, you'll need:
1; Enable <send_result_abort> on server.
Note, this will increase database-load.
2; Users must run BOINC-client v5.8.17 or later for auto-aborting to work.
The unconditional task-abort will possibly work with v5.5.1 and later, but to be on the safe side, use v5.8.17 or later.

There's one additional problem, for clients to get the abort-message, they'll need to connect the Scheduling-server, something they don't neccessarily need to do if stuck on a particular corrupt wu. To ensure they're connect Scheduling-server:
3; Use example <next_rpc_delay>86400</next_rpc_delay> server-side.
(users needs to run v5.5.1 or later)

This means, except for computers that is manually connected, they'll connect atleast once per day. In case there's another batch of corrupt wu's, this means majority of computers will cancel the wu's within 24 hours after they're cancelled server-side.

Now, #3 won't help in the current situation there the corrupt wu's was released 14 days ago, but will help next time Rosetta@home releases a batch of "bad" wu's, or another bad application...
10) Message boards : Number crunching : Minirosetta v1.28 bug thread (Message 54041)
Posted 28 Jun 2008 by Ingleside
Post:
The errors may indicate a corrupted database file. Can you try resetting the project?

If files becoming corrupt after download is a problem, it would be an idea for Rosetta@home to enable <verify_files_on_app_start>
11) Message boards : Number crunching : Detaching and reataching (Message 53591)
Posted 8 Jun 2008 by Ingleside
Post:
Thanks for your reply. I am definitely aware of the CPID problem: had it once, been there, done that.

Would it be better to put the resource share at zero for projects that are offline for a few months? If I set NNT as you suggest, the unused resources will be distributed to the other projects evenly, without me saying what shares go where. Isn't that the way it goes with unused resources?

AFAIK any projects set to "No new work" and no work left on computer will be treated as if not attached as far as calculations of which project to ask for more work and so on, even in BOINC Manager it is used then displaying resource-share...

As for setting resource-share to zero, a quick test with BOINC alpha reveals it's impossible, so while likely possible on projects that isn't running up-to-date server-code, wouldn't recommend to try it since it will possibly give client-problems, especially if client tries to divide by a resource-share of zero...
12) Message boards : Number crunching : Detaching and reataching (Message 53586)
Posted 7 Jun 2008 by Ingleside
Post:
I've been thinking about detaching from a couple of projects in my list. The question is what happens if I reattach to those projects in the future, say in a year or so? When I reattach, will they still recognize the old hosts, or will they create new accounts for a new host?

It's unlikely a client-installation will keep the same <host_cpid> for a whole year, especially if you've attached/detached from various projects during this time... So, it's very likely you'll get assigned a new <hostid> on re-attach after so long time. But, you can most likely merge the hostid's, so credit-wise it'll look like the same. So, appart from losing a lower hostid and older "created"-time, it doesn't really matter.

Even if you've deleted a host, the User-Account will still keep all it's credits, so whatever detachs/re-attachs you're doing doesn't really matter.

The only place it can matter is in your cross-project stats on various BOINC-stats-sites. If none of your active computers is attached to some of the projects, it's possible the <cross_project_id> gets changed, leading to credits "fragmented" across that looks like multiple accounts to the stats-sites. If this is a concern for you, the easy method is to keep 1 active computer attached to all projects, and just set "No new work" on whatever project(s) don't want to run actively. On any other computers you can detach whatever projects you wants.
13) Message boards : Number crunching : Errors galore!! Multiple machines (Message 53566)
Posted 6 Jun 2008 by Ingleside
Post:
take a look at this error msg i found in one of his tasks:
http://boinc.bakerlab.org/rosetta/result.php?resultid=168818917

8741.23
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 10800
# random seed: 3343687
ABORT: bad to aa_rotno_to_packedrotno
aa,rot1/2/3/4: ILE 8 0 2 0 0
chi no 1 nchi 2 aav 1 is_chi_proton_rotamer(aa,aav,i) 0
ERROR:: Exit from: rotamer_functions.cc line: 1465

He has one or two others like this as well.
Later he gets a validation error after succesfully completing the task.

Of course being that some of these are CASP8 that could be a cause.
They are running on roesetta 5.96

The quad machine had 5 errors in 24 hours of which 4 were program errors and 1 was a validate error. One of the dual cores has validate errors which is a RAH issue not his computer.

Another random sample of work shows a mini that crashed on 2 systems immediatly.

I would call it a string of bad luck, not a hardware issue.

The "Validate errors" is a Rosetta-problem, and the wu's crashing after a couple seconds on 2 different computers is obviously buggy.

The problem in my opinion is, (appart for crappy keyboard - not my computer),
is all the wu's his comtuter is erroring-ot while someone else manages to finish correctly...
Example, 154098541 that gives a "ERROR:: Exit from: fullatom_energy.cc line: 2030"
153933379 same error
1538933379 with "ERROR:: Exit from: refold.cc line: 338"
153835521 with "ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763"
153204401 with a long string of "sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range"

And the list goes on, with both Mini and Beta-application.

Now, only a couple is paired with another Linux, so it is possible 2 buggy Linux-aplications. For the few he is paired with Linux, either shorter run-time or slower speed can be possible his crash is longer out in wu, so not a good indication either way.

Still, his 4-core having much higher computer error-rate than 3 of the 8-core Linux-comuters in top-60 looks suspicious to me, so taking a little closer look on his computer shouldn't be a big problem.

Afterall, atleast a couple of the checks like "Overclocked or not" or "Oops, the cpu is running at 100 Celsius" is easil checked (and answered)...

Now, running Gromacs, Prime95 and memory-tests on the other hand is much more time-consuming...


BTW, one method to test if it's a bad Rosetta-application or not is, download a ton of work, disable network, exit boinc, backup boinc, and re-start boinc.
If one or more of wu's gives an error, re-run the same wu from the backup.
If the backup-copy crashes on the same spot (example 1st. crashed after 2h and backup after 2h1m), it's most likely a bad wu or application.

If on the other hand the backup-copy finish withot crashing, or one copy crashed after 1 hour while the other after 2.5 hours, it looks more like a hardware-problem than a wu/application-problem...

If there aren't any errors, this method will only lose the 1-minute or something taken to make a backup-copy. And, even if there are errors, re-running a couple wu's (optimal is to check 4 errors at once), will only take a couple hours, and not 24h+ that using another program will do.

BTW, in case stop/re-start from checkpoint has any influence, let wu's run from start to finish...
14) Message boards : Number crunching : Errors galore!! Multiple machines (Message 53542)
Posted 5 Jun 2008 by Ingleside
Post:
It seems things are suddenly unstable. People are suggesting my machines are showing bad memory but I don't really buy this.

ARe others seeing issues with Rosetta?

Well, you didn't say if you've done any of the suggestions made in your last thread...

It doesn't need to be bad memory, it can be bad cpu, or something else...

The Amd is possibly a problem with OS or drivers to OS, or possibly access-rights.

The quad... a very quick look shows a couple wu's crashing within 1 minute, this is likely bad wu's. But, there's also around 25 other crashes...

A very quick look through top-computer-list, and looking on 3 Linux-systems from top-60, showed some 1-minute-crashing, but of the longer-running there was only 4 crashes across 3 computers...
I've no idea on the "Validation"-errors, and I've not counted them, possibly this is a Rosetta-server-based problem...

So, maybe you're just unlucky, but with 20x the error-rate of other Linux-computers, would still guess it's a computer-problem...

BTW, it doesn't need to be anything hardware-related, it can be the Linux-distibution you're using, or the libraries installed, is reason for the errors, while the other linux-computers usesother distribution/librarier and doesn't get the errors...
15) Message boards : Number crunching : Errors: NANs occured in hbonding! (Message 53481)
Posted 31 May 2008 by Ingleside
Post:
I'm getting a strange error on one of my machines.
You can take a look at the details of my Quad Core machine for the nitty gritty.

What does this mean? I can't find any reference to this error using a search on this site.

Any help would be appreicated.

Thanks.

~Doug

NaN stands for "Not a Number", and happens if you tries dividing by zero or taking the square-root of a negative number and so on.

A quick look shows that on most wu's you're getting this error, other users doesn't seem to have a problem finishing them, so it doesn't look like a problem with wu-parameters. There is some "bad" wu's, but these terminates after a couple seconds, and not 1h+ like most of your errors.
Also, it happens with both "Mini" and "beta"-application, so another mini-bug seems less likely. Running Linux, it's possible a linux-bug, but neither of your 2 other linux-computers seems to have a problem, and atleast one other linux-computer had finished a wu your quad errored-out.

So, this can indicate a hardware-problem with the quad. Check for dust-bunnies, and check your system-temps to see if any over-heating. If you're overclocking, decrease the overclock, since your computer is generating garbage.

To check for cpu-errors, run Gromacs StressCPU or Prime95 torture-test, and run a memory-test to check for memory-errors. Any errors reported by either of these means an unstable computer generating wrong results. You'll most likely not get an error immediately, so it's recommended to run each test for atleast 24 hours.
16) Message boards : Number crunching : Computer array crunching? (Message 53398)
Posted 27 May 2008 by Ingleside
Post:
I have always wondered: why use Prime95 instead of BOINC? Does it give you lots of info about stability that a failed WU or a system crash won't?

I've been itching to OC as well, but each time I try I get blue-screen or system freezes which I can't figure out how to resolve so I go back to stock.

Using BOINC as a test is a possibility, but you must either re-run the exact same wu's and compare the results afterwards, or run a project that uses a quorum.

Just running Rosetta@home won't be a good indication, since even if it hasn't crashed, you can still have generated an invalid result. Also, even Rosetta@home validates all results, even if you've got an "impossible" result that aren't used for scientific purposes, you'll AFAIK still get your credit...

Prime95 is an "easy" test, since if there is any calculation-errors, it will immediately be detected.

But, Prime95 and various DC-projects does not use the cpu and memory the exact same way, so even if you've done a "stable" Prime95-run, it does not mean running a DC-project will not use the "wrong" part of the system and starts generating garbage...

Overclocking for so running 24-hour-tests of Prime95, Memtest, Gromacs and so on error-free means "no problems detected yet". It does not mean "computer won't make a wrong calculation due to overclocking once a month"...
17) Message boards : Number crunching : Claimed credit vs grant credit (Message 53397)
Posted 27 May 2008 by Ingleside
Post:
I appreciate all the attempts to explain this, but I still don't get it. So, again, what you are saying is if I have a fast machine and someone else with a slow machine finishes a work unit first and gets, say 80 credits, if I go and finish that same workunit later in half the time, I will probably not get the 80 credits, but maybe 40 or 50. If I am correct, this doesn't seem fair. I'm not mad. I'm just trying to understand the beast.

zmsybe my problem is I don't really understand what a decoy is.

To make the calculation "simple":

1; "slow" computer uses 4 hours and manages to crunch 10 decoys in this time, and gets 80 credits. This is 8 credits/decoy, and 20 credits/hour.
2; "fast" computer uses 4 hours and manages to crunch 18 decoys in this time. For this, the "fast" computer gets:

8 credits/decoy * 18 decoys = 144 credits.

This also means, 36 credits/hour.


So, a little simplified, everyone gets 8 credit/decoy for this wu-type, but a fast computer can manage to generate more decoys in the same amount of time than a slow computer, and therefore gets more credit/hour.


In practice credit/decoy will variate somewhat as more and more results is returned, but this variation should be small.
18) Message boards : Number crunching : Looking for some help please- decreasing credit per unit (Message 53143)
Posted 18 May 2008 by Ingleside
Post:
So is this a Rosetta issue, or a BOINC issue? On this computer I can see that the host average has dropped steadily for the past month (by around 20%).

Also, a dual Xeon system I brought online is only getting about 60% of the claimed credit. With some granted credit being in the 20% range at times.

So, could it be anything to do with:

1) An OS issue? The system with Server 2003 looks to be getting the lowest granted credit. The system with Vista gets the highest granted credit.

Just my guess, but as long as you're using win32, any OS-difference between any of NT4/win2k/xp/2003/vista/2008 would be a couple % at most, except if you've got too little memory to run vista that is...

2) Communication problem (with BOINC / Rosetta code) On every system, whenever I look, there are task at 100% but BOINC doesn't communicate (I have tried all kinds of settings). It is almost like the next task has to finish and “push” the old one out of the way, (not all the time, say 85% of the time. I have DSL and always connected).

To not put unneccesarily extra load on the Scheduling-server and database, BOINC-client will wait to report any "ready to report"-results until next time needs to connect. Next connection to Scheduling-server happens:

1; Next time needs to ask for more work. On average, would guess this happens roughly 1/2 the target-run-time after result finished.
2; If 24 hours since result uploaded.
3; If less than 24 hours to deadline.
4; If less than "connect about every N days" to deadline.
5; Manually triggered by user.
6; To send trickle-up (CPDN).

19) Message boards : Number crunching : minirosetta v1.19 bug thread (Message 53028)
Posted 13 May 2008 by Ingleside
Post:
Yes, that usually is the cause of it. I don't know if there's an official bug report on it. I do know it's a question that shows up in the BOINC forums every few months where the explanation is given and they claim it would be too much effort to fix.

Adding a little more, there's atleast 2 open Trac-tickets about this, #113 and #336.

20) Message boards : Number crunching : No Work Units?? (Message 53022)
Posted 12 May 2008 by Ingleside
Post:
I'm getting this now, had the same last week but it cleared up this hasn't.

5/12/2008 11:19:38 AM|rosetta@home|Requesting 7618 seconds of new work

5/12/2008 11:19:44 AM|rosetta@home|Scheduler RPC succeeded [server version 601]

5/12/2008 11:19:44 AM|rosetta@home|Message from server: Not sending work - last request too recent: 68 sec

5/12/2008 11:19:44 AM|rosetta@home|Deferring communication for 1 min 0 sec

5/12/2008 11:19:44 AM|rosetta@home|Reason: no work from project

Server shows plenty of work!

pete.


Ah, the "client doesn't wait long enough before re-ask"-bug, a quick look reveals this was fixed in v5.10.23, while one of your clients is still running v5.10.20...

So, upgrading your BOINC-client will fix this problem.



Next 20



©2024 University of Washington
https://www.bakerlab.org