Posts by Darrell

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 88741)
Posted 23 Apr 2018 by Darrell
Post:
I just lost a few more Rosetta 4.07 after they clogged my 8GB RAM computer, then each wanted more RAM. When are the estimates going to get better so as to avoid all the wasted crunching?

I have set ALL my 8GB computers to only run a single 4.07 task, so I would rather waste unused crunch time than used crunch time (and electricity).
2) Message boards : Number crunching : Rosetta 4.0+ (Message 88563)
Posted 27 Mar 2018 by Darrell
Post:
@ Jim1348

That is a bit high. The maximum I see for the last two weeks is 1179 GB, and usually less than 700 GB. However, I have 32 GB, so they might as well use it. My other projects (on LHC and GPUGrid Quantum Chemistry) often use more.


And on my 32GB computers, I don't mind. I wasn't expecting the 4.07 version to take so much, though. I would like to restrict them to the "big boys" but there doesn't seem to be a way to deselect or select them. Perhaps just limit the tasks to a single CPU on the computers that have only 8GB.

LHC often takes more, but they run in a VM on my 32GB machines and so I can manage the load.
3) Message boards : Number crunching : Rosetta 4.0+ (Message 88533)
Posted 26 Mar 2018 by Darrell
Post:
I just found Rosetta 4.07 used 2,111,242,240 bytes (1.97 GIGAbytes) before my system crashed (i7-4770K, 8GB). This seems to be just a bit more than expected, so please take a look and fix the problem.

I run SETI, EINSTEIN, and LHC in addition to Rosetta, so Rosetta can't have the whole machine!
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81337)
Posted 16 Mar 2017 by Darrell
Post:
See original post here

It appears that Rosetta Mini 3.73 is set to tell the BOINC Manager to use the priority for "... coprocessor (GPU) applications, wrapper applications, and non-compute-intensive applications ...". The priority assigned "follows" (is assigned the changed level) when I change the <process_priority_special>N</process_priority_special> parameter setting, but only in the one computer where I am running LHC applications using Virtual Box. It does not change in my other computers.

Does anyone have a clue as to why, and more importantly, how to separate the priorities between those two applications? Is this a bug in the server for Rosetta Mini 3.73, the application, or the BOINC Manager 7.6.33?

I want the VBox wrapper to run at a higher priority than compute-intensive applications, not at the same level.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81329)
Posted 14 Mar 2017 by Darrell
Post:
It appears that Rosetta Mini 3.73 is set to tell the BOINC Manager that it is a non-compute-intensive application, but we know it IS compute-intensive.


I need to retract this claim somewhat, as I have only observed it on one of my computers. Do not investigate further until/unless I can give more details as to why only one was affected.

Thanks.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81325)
Posted 13 Mar 2017 by Darrell
Post:
It appears that Rosetta Mini 3.73 is set to tell the BOINC Manager that it is a non-compute-intensive application, but we know it IS compute-intensive.

I use the BOINC Client parameters <process_priority>1</process_priority> and <process_priority_special>3</process_priority_special> at various times. They work as follows: (from the BOINC Wiki)

<process_priority>N</process_priority>, <process_priority_special>N</process_priority_special>
The OS process priority at which tasks are run. Values are 0 (lowest priority, the default), 1 (below normal), 2 (normal), 3 (above normal), 4 (high) and 5 (real-time - not recommended). 'special' process priority is used for coprocessor (GPU) applications, wrapper applications, and non-compute-intensive applications, 'process priority' for all others. The two options can be used independently.

Using my values, Rosetta is running as "Above Normal", i.e., BOINC Manager treats it as a non-compute-intensive application. This should be corrected so that the BOINC Manager can assign the correct (user defined) priority to the tasks.
7) Message boards : Number crunching : Unintended consequences of the new credit system? (Message 81316)
Posted 12 Mar 2017 by Darrell
Post:
.
[snip]
.
Thanks for the reference and I may pursue it... Not sure because sometimes I also wonder if some of this is my own fault.
...
In addition, I just read a comment from Mod.Sense [who seems to be one of those people] to the effect that Rosetta@home isn't much worried about the bandwidth these days.


I also rather doubt that much will be done to address your bandwidth loss issue. Therefore ...

You have two goals: 1) to eliminate or minimize bandwidth loss and 2) maximize computer utilization. These two goals are not 100% compatible.

ASSUMPTION: you are only willing to process WUs from Rosetta

To minimize bandwidth loss, set preferences to have Rosetta send WUs of only 1 hour duration. Set WU storage parameters to 0+0, and use only a single thread on a computer that runs much of the time. The maximum bandwidth loss possible, then, approaches a single WU size.

This configuration however results in underutilization of the computer if it has more than 1 CPU without hyperthreading (or equivalent from AMD). To more fully use all the CPU/threads in multicore/hyperthreading CPUs increases the potential loss of bandwidth nearly linearly with each additional WU being processed in parallel. It also has the risk of not keeping the thread busy if/when WUs can not be downloaded immediately after completing a WU (Rosetta has no work to send, the internet is unavailable, maintenance or breakage, ...).

Addressing the second goal of keeping the computer 100% busy, then, requires accepting an increased risk of bandwidth loss. Increasing the WU store so one or more WUs are ready in the computer store allows some loss of availability of WU downloads without having the computer go idle. The larger the store, the longer the non-availability can be to keep the computer busy, and the larger the risk of bandwidth loss. That risk is very non-linear as the size (in days) of the store approaches the size of the time to the deadlines. Likewise, the longer the processing of a WU, the less often downloads must be done with very little change in the size of the WU itself (IIRC).

If the assumption made above is false, then a second approach can be taken, to wit, add a backup project(s) so that when the primary project does not have work or can not be contacted, the backup project(s) can send one to keep the computer busy. This approach helps assure the computer is 100% busy regardless of the choices made about WU size and days of storage made for the primary project (backup projects do not download into the store), and eliminates the need to monitor/adjust BOINC Manager frequently to keep things running.

A backup project is one for which the resource share is 0 (zero).

Users have to make their own decisions and choices that they believe are the best tradeoffs. I wish you luck in finding the best tradeoff for you.
8) Message boards : Number crunching : DNS Problems and Late Work Units (Message 81302)
Posted 11 Mar 2017 by Darrell
Post:
[.
snip
.]

The obvious problem is that the BOINC client may not be capable of providing the projects with the information they need to do that sort of intelligent scheduling. Obviously the client software is positioned to track the usage patterns of each client computer it is running on, but I've seen no evidence that it does so.


IIRC, the BOINC Manager makes the decisions on how much work/how many WUs to download and it is based on
1) how much total work is already in process,
2) how much total work is downloaded but not yet in process,
3) how large the WUs are for the project that wants work,
4) the resource share for the project in relation to the other project(s) in the recent past (history), and
5) how large the user requested work queue is.

On Rosetta, the user -may- control item 3 by:

BOINC Manager -> Rosetta@home -> Your preferences -> [login if needed here] edit preferences for the venue(s) your computer(s) use -> Target CPU run time = {x hour} -> Update preferences

and item 5 by:

BOINC Manager -> Options -> Computing preferences -> Computing -> Store at least {m} days of work -> Store up to an additional {n} days of work -> OK

If any of these parameters are changed more often than a few days apart, the history data won't fit, and too much or too little may be downloaded. If the computer usage varies widely over a few days, the same thing may happen (e.g., run 24/24 hours for 5 days, then off for 3 days).

Using a smaller queue and smaller WU size with a consistent daily use pattern on the computer(s) reduces the risk of lost bandwidth. Assigning backup project(s) reduces the risk of idle computers. These are under user control and choice.

I feel like you [an administrator or possibly even the director of the project] should be well positioned to see exactly how many of your downloads are not returned with results before their deadlines have elapsed. All I can do is try to purge (abort) old units that I am reasonably sure will not meet their deadlines--but obviously I do have privileged information about how I use my computers and I don't need to track their usage histories to make those predictions.

I agree the project could or possibly does track such data, but I am guessing the payback is too small to be worth the effort. After all, how many non-advanced users (those who never touch tuning parameters) are there in relation to those of us who do? The project (and David E K) do address some of the things over which we have no control, and the other things that we can control, we should adjust as best we can.
9) Message boards : Number crunching : Unintended consequences of the new credit system? (Message 81301)
Posted 11 Mar 2017 by Darrell
Post:
@ shanen:


.
[snip]
.
My older concern with the wasted bandwidth is probably something that should be BOINC-level problem.
.
[snip]
.



There is a group that addresses issues at the BOINC Manager level. They can be contacted at:

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha

or, via email, send a message with subject or body 'help' to
boinc_alpha-request@ssl.berkeley.edu

Best wishes.
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81081)
Posted 24 Jan 2017 by Darrell
Post:

Your answer is much appreciated. Thank you.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81076)
Posted 22 Jan 2017 by Darrell
Post:
Sorry for that Darrell. Looks like it's been a moving target. Appears it is now something that can be done via the cc_config file.
<cc_config>
  <options>
    <zero_debts>1</zero_debts>
  </options>
</cc_config>

Ahh, thanks. I also see it is recommended to NOT use it starting with version 7. Oh well!

Do you know where the debts are stored so I could manually reset them? I am accustomed to editing potentially dangerous items (e.g., registry). Send me a PM if you don't want to post that information publicly.
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81072)
Posted 21 Jan 2017 by Darrell
Post:
@Darrell,

Thanks for the reply. Yes, the other projects and a fractional share of the time available would have that effect. As Sid points out, there is very likely a debt to R@h right now, and so in addition to short deadlines, you got a load of more work than average. It sounds like you are now to a point that even longer deadlines would reach the "running at high priority" state, because of the fractional resource share.

So another approach would be to clear the debts so they are all equal. That would help avoid BOINC requesting more work than the current cache window requires for any of your projects. At one time, there was a debt viewer tool which you could use to zero out the record of project debts. Appears you can now do it with boinccmd.


@mod.sense: I finally looked into doing that and found another description on the BOINC Wiki that seems to be 3 years newer.

<joke> Unfortunately, someone forgot to tell the developer to include it in the program. </joke>

When tried, I get the "help", not the results wanted. The "--set_debts" is not listed as valid even though the WIKI shows it.


Confirming the URL is valid:

S:Program FilesBOINC>boinccmd.exe --get_project_config http://boinc.bakerlab.org/rosetta/
poll status: operation in progress
poll status: operation in progress
uses_username: 0
name: rosetta@home
min_passwd_length: 6


Reset the debt for Rosetta:

S:Program FilesBOINC>boinccmd.exe --set_debts http://boinc.bakerlab.org/rosetta/ 0 0

usage: boinccmd [--host hostname] [--passwd passwd] [--unix_domain] command

default hostname: localhost
default password: contents of gui_rpc_auth.cfg
Commands:
--client_version show client version
--create_account URL email passwd name
--file_transfer URL filename op file transfer operation
op = retry | abort
--get_cc_status
--get_daily_xfer_history show network traffic history
--get_disk_usage show disk usage
--get_file_transfers show file transfers
--get_host_info
--get_message_count show largest message seqno
--get_messages [ seqno ] show messages > seqno
--get_notices [ seqno ] show notices > seqno
--get_project_config URL
--get_project_status show status of all attached projects
--get_proxy_settings
--get_simple_gui_info show status of projects and active tasks
--get_state show entire state
--get_tasks show tasks
--get_old_tasks show reported tasks from last 24 hours
--join_acct_mgr URL name passwd attach account manager
--lookup_account URL email passwd
--network_available retry deferred network communication
--project URL op project operation
op = reset | detach | update | suspend | resume | nomorework | allowmorework | detach_when_done | dont_detach_when_done
--project_attach URL auth attach to project
--quit tell client to exit
--quit_acct_mgr quit current account manager
--read_cc_config
--read_global_prefs_override
--run_benchmarks
--set_gpu_mode mode duration set GPU run mode for given duration
mode = always | auto | never
--set_host_info product_name
--set_network_mode mode duration set network mode for given duration
mode = always | auto | never
--set_proxy_settings
--set_run_mode mode duration set run mode for given duration
mode = always | auto | never
--task url task_name op task operation
op = suspend | resume | abort

S:Program FilesBOINC>
13) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81063)
Posted 19 Jan 2017 by Darrell
Post:
Jim -- the project itself has run for over ten years with pretty solid reliability.

However, in terms of communications and responsiveness to end user reports and issues, they have been historically weak. It is an unfortunate aspect of the project.

[snip a lot]

I reluctantly agree. I predict that until the server gets upgraded and is not stressed, and WUs start piling up unprocessed in the server, that no one at the project will care to improve the processing of the WUs.

Sad to admit, but I am an electrical engineer (retired) who graduated from the UW.
14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81055)
Posted 18 Jan 2017 by Darrell
Post:
Can anyone here tell me the REASON for such short deadlines? Today, tomorrow or next month shouldn't make much of a difference to the science, and a longer deadline would allow slower CPUs (or running less time) to participate.

I did actually answer that. They're CASP tasks which required to be back within a couple of days. <However> CASP finished months ago so it seems these are re-runs of those tasks and the urgency no longer exists. But it took several years to make the deadlines of urgent tasks clear, so I'm guessing we won't get a timely response to changing them back during this time of re-runs.

tl;dr No good reason


Hmmm. If CASP (whatever that acronym stands for) had to be back within a couple of days, why are they being sent out now (months later)? Seems like a contradiction to me.

SETI, e.g., has about an EIGHT WEEK deadline on their WUs.

To be fair, Seti tasks could have a million year deadline and that would be too soon. Every minute they run is a waste of time.


Everyone has their own opinion and choices to make.

Is there a commercial reason Rosetta is pushing so hard? Or is it that their server doesn't have the capacity to store that many active WUs? I know a system upgrade is "somewhere" in the future.

Not the former, definitely the latter - that's been said here. I've just asked about the hardware upgrade followed by server software upgrade. Without either of those it's not looking good for any of us.


Agreed.

One question for you: If you're maintaining your 1+1 setting, once you get down to 1.99 days you'll get more tasks and with 2 day deadline tasks you'll <never> meet those deadlines, thereby forcing your PC into high-priority mode and preventing your GPU tasks from running. Why have you decided to have 2 days of buffer rather than 1.5? Or have you changed it now? It seems to me your problems will go away if you change, but they're bound to remain if you don't.

Edit: Also, changing 1+1 to 0+2 would be a better setting as new (potentially 2-day) tasks will come down a full day before needed. This may well be the cause of 5-day tasks being pushed back and risking going into high priority. That's the only explanation I can think of for 5-day deadline tasks performing the way you describe. The first figure should only ever be non-zero if you only connect occasionally, not if you have a permanent connection (default setting is apparently a quite understandable 0.1). 0.1+1.5 is coherent with 8-hour default runtimes


The first number asks to maintain "at least" this much work, and the second number asks for "up to" this many days additional. Since some projects (like this one) run out of work on a irregular basis, I want a small supply to bridge the lack of supply. [Or you can think of it as an "occasional connection" to the supply]. Further, those numbers are for ALL projects (as I wrote in an earlier post) not just Rosetta. Today, SETI has maintenance and is offline for much or all of the day. My downloaded queue of WUs is being used to process during this period.

There is one wrinkle I have that few of the crunchers here have and that is that I am not in the U.S. I am in Vietnam, and three of the trans-Pacific internet cables are broken (see here if interested). Thus I AM connected only intermittently even though I would rather be continuously connected.
15) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81046)
Posted 16 Jan 2017 by Darrell
Post:
We don't get to chose whether we receive 2 or 5 or 7 day deadline tasks, so in Darrell's case I don't blame him for his reaction when everything came down as 2-day. His setup is a lot more involved than I have to deal with so I guess it requires a little extra maintenance.


<sigh> I agree with you, Sid, that since I have many CPUs and GPUs running a mix of projects, and Rosetta setting short deadlines, I must pay more attention than I would prefer.


That said, I'm getting more 5-day deadlines 2-to-1 with 2-days so I suspect the problem has gone away for the moment, even if Darrell maintains a 2-day (1+1)buffer. I'd still recommend 0+1.5 though, to cover all eventualities


I am still getting many 2-day deadline WUs from Rosetta so the situation continues. I have received so many 2-day that some of my 5-day WUs are being delayed close to a HIGH PRIORITY state.

Can anyone here tell me the REASON for such short deadlines? Today, tomorrow or next month shouldn't make much of a difference to the science, and a longer deadline would allow slower CPUs (or running less time) to participate. SETI, e.g., has about an EIGHT WEEK deadline on their WUs. Is there a commercial reason Rosetta is pushing so hard? Or is it that their server doesn't have the capacity to store that many active WUs? I know a system upgrade is "somewhere" in the future.
16) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81031)
Posted 15 Jan 2017 by Darrell
Post:
[quote]Where is the reply from the Admin as to WHY these have short deadlines?


I'll start by pointing out that I am not a project admin. Yes, I know, the message board tags indicate otherwise. This is just how BOINC server code reflects my message board administrative controls.

I'm not clear how a reply explaining why the project has some 48hr deadline work is actually going to be of any relief to your situation. Since you are attached to a project that does send 48hr deadlines, a 1+1 day queue and the strong desire not to have projects go to high priority due to BOINC Manager's fear that work might otherwise not be completed in time, it would seem you would be better served to establish preferences that cause the BOINC manager not to request so much work at one time.


View the question I asked as rhetorical to bring the attention of the impact to the Admin.

If the queue were ONLY for Rosetta, this would work. It is for ALL the projects so it won't work for me since some projects seem to have WUs in batches hours or even days apart.

I should also point out that if your machine is running 24hrs a day, then once the first few short-deadline tasks complete, the BOINC Manager should be relieved of the concern that about a days worth of work won't be completed in 48hrs. So it's not going to be in the way of your GPU work for 48hrs, or even the roughly 20hrs of estimated compute time.


That computer runs 24/7 (God and the electric company willing) and has finished ALL of the WUs about which I asked the question. It has received another batch at about 3:30PM of 59 more short deadline WUs, so the situation is not resolved even though I admit there may have been a time when BOINC was NOT in panic mode (HIGH PRIORITY) when I wasn't monitoring.

It is also fairly unusual for all of your current cache of work to have the short deadlines. More commonly you'll see just a fraction of the tasks with the 48hr deadlines and others with 7-10 day deadlines. And such a mix also avoids the high priority situation you happen to see today.


If there are just a few short deadline WUs, then I would not have noticed or been concerned. When there are so many that they shut down the GPU work, then I have a concern.

I will just continue to restrict Rosetta's access to my 5 computer's CPU time for the time being since no one seems to have the real answer or appears to be taking any action to increase the time to allow for processing or throttle the release of short deadline WUs for processing.

Thank you for your reply.
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81026)
Posted 15 Jan 2017 by Darrell
Post:
Why all the WUs with super short deadlines?

I finally got some new work today (1/13). I use a 1+1 day queue. Now the Rosetta work is going into panic mode, shutting off the use of some of my graphic cards from other projects (SETI and Einstein).

More than 32 new WUs came down around 6:46PM on 1/13 and have deadlines at 6:46PM on 1/15. They are 80,000 GFLOPS and should run about 4 hours each. Running 7 threads, they will require (32/7)*4 hours to complete, which is around 20 hours to all finish.

Meanwhile, 3 of my 4 graphic cards are idle due to panic mode. Not nice.

Please assign AT LEAST 7 days to process work units.

Computer is http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=2098312


I've had CPU tasks get to high priority and my GPUs still ran. Multiple systems. Your GPUs tasks must require a CPU thread all to itself and thats causing the issue. Not rosetta.


I think you have the correct answer. The GPU jobs typically will require "some" CPU resources (which consumes at least 1 CPU). IF ... any project task get within the queue setting expiration time window, the task will go into HIGH PRIORITY mode and will not be interrupted. This is a BOINC design "feature".

With a complex system like this, using an " app_config.xml " file in the Rosetta project directory WHICH then allows CPU resources to be allocated during HIGH PRIORITY cases like these is the only way I know around this.


Ooops, wrong button pressed ... continuing with reply.

Yes, it is by design and I have no issue with that design. My issue is with Rosetta sending out large numbers of WUs with short deadlines (2 days in this instance). Having a 1+1 day queue (2 days) means those WUs immediately become HIGH PRIORITY and force other work off the CPUs. This is NOT fair and does not allow the normal mechanism to allocate resources to work.

I have adjusted my App_config.xml to limit Rosetta's use of my CPUs. This will have the unavoidable affect of leaving unused CPU time when the queue of my other projects goes dry during normal times. Rosetta would normally just get that CPU time, but now it cannot.

Where is the reply from the Admin as to WHY these have short deadlines?
18) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81024)
Posted 15 Jan 2017 by Darrell
Post:

I've had CPU tasks get to high priority and my GPUs still ran. Multiple systems. Your GPUs tasks must require a CPU thread all to itself and thats causing the issue. Not rosetta.


Note: When I write "CPU" I am referring to a "logical CPU" or thread.

I know how these things work and have carefully "tuned" the mix of projects such that my 8 total CPUs run 95-100% and all four GPUs run 95+% busy. SETI runs on the GPUs supported by two CPUs (each reserves 0.34 CPUs), Rosetta gets the rest.

It IS the Rosetta short deadline tasks that go into panic mode that force my SETI GPU tasks off their two CPUs. I have four GPUs in some of my computers, such as the one I listed, and only one GPU was being used (because I run 2 SETI/GPU means 0.34*2 is less than one CPU). Rosetta was using all eight threads instead of the six it normally uses.

Where is the Admin response?

P.S. I just check todays downloaded WUs and many/all of them are ALSO short deadlines.
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81020)
Posted 13 Jan 2017 by Darrell
Post:
Why all the WUs with super short deadlines?

I finally got some new work today (1/13). I use a 1+1 day queue. Now the Rosetta work is going into panic mode, shutting off the use of some of my graphic cards from other projects (SETI and Einstein).

More than 32 new WUs came down around 6:46PM on 1/13 and have deadlines at 6:46PM on 1/15. They are 80,000 GFLOPS and should run about 4 hours each. Running 7 threads, they will require (32/7)*4 hours to complete, which is around 20 hours to all finish.

Meanwhile, 3 of my 4 graphic cards are idle due to panic mode. Not nice.

Please assign AT LEAST 7 days to process work units.

Computer is http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=2098312
20) Message boards : Number crunching : GPU computing (Message 80793)
Posted 27 Oct 2016 by Darrell
Post:
As someone with 14 discrete GPU cards, I support those projects that have applications that run primarily in the GPUs (Einstein, SETI).

My five computers have fairly modern CPUs, so I also give their cycles to projects that DON'T have applications for GPUs (Rosetta, LHC).

This works for me. Keeps both GPUs and CPUs busy.



Next 20



©2021 University of Washington
https://www.bakerlab.org