Posts by Bryn Mawr

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109440)
Posted 18 days ago by Bryn Mawr
Post:
New tasks came down about an hour ago


Sadly, still with the lower connect error
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109375)
Posted 13 Jun 2024 by Bryn Mawr
Post:
Now out of work new

This has been the best run we've had for a couple of years - bound to end at some point once everyone's offline cache runs down.
It's at this point my 12hr runtime setting ekes out my remaining work as far as possible.

What I'd re-emphasise is that the default runtime for tasks has fallen to 3hrs for some reason, which I believe to be a mistake and contradicts the forced Boinc setting of 8hrs,
As such, people should go into Boinc's Your Account option, select Rosetta@home preferences and change Target CPU run time to an explicit 8hrs rather than "not selected".
This will almost treble how long tasks run and extend the life of work batches so that we run out less, if at all, while almost trebling the credit we get for tasks too.

This should be considered a high priority for everyone imo.


I’ve always figured to leave it on default as the project scientists who set them up know their requirements better than I do.
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109314)
Posted 30 May 2024 by Bryn Mawr
Post:
But for anyone else that's been reading these posts...

If you limit the number of cores/threads available to BOINC, you will maximise your BOINC processing. You will get the maximum possible amount of work done each day that your system is capable of, you won't have issues with deadlines (unless of course you have inappropriate cache settings), or Panic Mode or any of those types of issues.

Well, that's obviously not true.
If you limit the number of cores available to Boinc, your Boinc processing will be limited to the maximum # of cores you've allocated, while your unallocated cores won't be used for Boinc and may or may not be fully utilised, depending what else is going on.
Use all your cores all the time. Your computer will decide millions of times per second what it should do with its capability better than any human ever will.
Do not listen to the man behind the curtain.


One thing I’ve noticed is that my Ryzens appear to be power limited, the TDP is 65w and the PTT comes out at 88w so with all 24 cores running each core is getting about 3.67w but with only 20 cores running each core gets about 4.4w and the power draw is still 88w with the cores running a higher frequency.

Also, running 23 cores allows the os to have its two pennies worth without having to swap out the data for a running WU and then swap it back in again, the WUs can run closer to 100% than the 97/98% they get when running 24 cores.

That being said, I always run at 24 cores and let the computer sort itself out.
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109257)
Posted 16 May 2024 by Bryn Mawr
Post:
You ran out of memory. Six jobs of 2.6 GB and you have 16 GB.


Ach, I thought I had 32gb.

I remember now, the 2 sticks wouldn't play with each other :-(

The other machine has 64gb, I'll update this one to match

Thanks
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109250)
Posted 15 May 2024 by Bryn Mawr
Post:
A strange error, sadly I can only give a sketchy report but I hope it’s enough :-

Host = https://boinc.bakerlab.org/rosetta/results.php?hostid=6231982

Boinc 7.24.1, Ubuntu 22.04.4

I allowed Ubuntu to update and then rebooted, subsequent to this Boinc Manager disconnected after running for about a minute - the event log showed a Rosetta task restarting and immediately Boinc closing having received signal 15. This would repeat each time I restated the host and the Boinc service restarted.

I have now aborted all of the Rosetta tasks and this behaviour has now stopped.

(How) can a Rosetta task kill Boinc?

Just a notification as I’ve never heard this described before.
6) Message boards : Number crunching : More threads on VM tasks possible? (Message 109167)
Posted 24 Apr 2024 by Bryn Mawr
Post:
* delete *
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109146)
Posted 22 Apr 2024 by Bryn Mawr
Post:
I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.


This is not a problem with Boinc. It is not a problem with Folding. It is a problem with your configuration which is preventing the two projects, which have no way of knowing the other is there, from working together.

You have been given the configuration changes required, all that’s needed now is for you to try them.
8) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109085)
Posted 5 Apr 2024 by Bryn Mawr
Post:
28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.


They might have beta in the name but these have been the production WUs for some time now.
9) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 109042)
Posted 26 Mar 2024 by Bryn Mawr
Post:
The main thing I take from this is “Try the suggestions” - you can always reverse the config changes if they don’t work.
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108983)
Posted 14 Mar 2024 by Bryn Mawr
Post:
Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :-

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07/database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
21:15:06 (176255): called boinc_finish(1)

</stderr_txt>
]]>

Boinc 7.24.1 and Ubuntu 22.04.4
11) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108962)
Posted 9 Mar 2024 by Bryn Mawr
Post:
A couple of stand out lines :-

3/9/2024 7:57:31 AM | | - Suspend if no input in last 1.00 minutes
3/9/2024 7:57:31 AM | | - Leave apps in memory if not running
3/9/2024 7:57:31 AM | | - Store at least 0.00 days of work
3/9/2024 7:57:31 AM | | - Store up to an additional 3.00 days of work
3/9/2024 7:57:31 AM | | - max disk usage: 10.00 GB

The first will leave your tasks suspended most of the time.

The second is probably your problem, you are running a lot of projects and Rosetta, on its own, can easily use 5gb. Try increasing your disk limit to 100gb (it’s reporting over 300gb free and it won’t use more than it needs).

Then set all other projects to no new tasks and do an update on Rosetta, when that has finished set your other projects back to accept new tasks.

If this does not pull any tasks then post the event log entries relating to the attempt.
12) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108961)
Posted 9 Mar 2024 by Bryn Mawr
Post:
I hope I got what you wanted this time. I rebooted, started BOINC with a watch in hand, counted five minutes and then got the log file.

Thanks.


Much better :-)
13) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108956)
Posted 9 Mar 2024 by Bryn Mawr
Post:
As an alternative method, reboot your machine, restart Boinc and then copy the event log after 5 minutes.

That will guarantee that we get the head of the log file which is what we need.
14) Message boards : Number crunching : Beyond newbie Q&A (Message 108948)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Your confusion appeared to be in thinking that a WU had a time allocated to it


No, I didn't mean that either.


Then what of “The total number of queued jobs is the total number of WUs in the queue waiting to be sent” can we explain better?
15) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108947)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I continue not to see rosetta@home in the list of the projects....


I continue not to see the beginning of a log where Boinc has just been restarted.
16) Message boards : Number crunching : Beyond newbie Q&A (Message 108938)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I'm sorry, but neither answer directly responds to my question.

Maybe my question wasn't clear enough, but I'm asking the meaning of TQJ given the variable sizes, not what a work unit is or how long the server will take to process work units.


Your confusion appeared to be in thinking that a WU had a time allocated to it thus you were getting 36 hour WUs and there were 2 hour WUs. Each job in the queue is a WU and that WU will take as long as you allow it through your configuration (or less if it completes first) and will complete as many decoys as it has time to. Thus your question as stated really has no meaning - the total number of queued jobs is the total number of queued jobs regardless of the variable size which is under your control.
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108928)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug?


You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).
18) Message boards : Number crunching : Beyond newbie Q&A (Message 108927)
Posted 8 Mar 2024 by Bryn Mawr
Post:
How do I interpret the server status page?
Today it says "Total queued jobs: 431,868"

For most projects one job is one work unit, but here we have variable work unit sizes creating multiple models apiece.
My 36 hour work units have hundreds of models, and the beta ones have thousands.

The lowest common denominator is a two-hour work unit.
So are the total queued jobs 863,736 hours of work?
Are they 431,868 models of various complexity, completed in minutes to hours apiece?
Are they part of a pool with the in-progress models, totaling 717,885 iterative calculations bouncing between users until they reach a undisclosed limit and retire out of the pool?


Each WU will work either until it has completed or until the end of the iteration that takes it closest to your requested run time.

If you have set 36 hours as your run time it will run for 36 hours but it is the same WU that would run for 8 hours on another box with the default setup.
19) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108926)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Did you actually exit Boinc and restart before copying that?

What is needed is the start of the file after a restart - about the first 40 lines.
20) Message boards : Number crunching : Run time of new tasks. (Message 108883)
Posted 27 Feb 2024 by Bryn Mawr
Post:
As I said, Folding has been running on this machine for years, the "problem" I started the thread about is VERY recent. All the tasks from that last download have completed and returned now, they show quite a variety of run times 73K to 90k. There are also other tasks downloaded the same day, but at an earlier time, that do not show the extended run time.


Then look at you logs and see what else was running on your machine during the period in question.

This is not a Boinc problem, this is within your machine.


Next 20



©2024 University of Washington
https://www.bakerlab.org