Posts by Bryn Mawr

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109257)
Posted 6 days ago by Bryn Mawr
Post:
You ran out of memory. Six jobs of 2.6 GB and you have 16 GB.


Ach, I thought I had 32gb.

I remember now, the 2 sticks wouldn't play with each other :-(

The other machine has 64gb, I'll update this one to match

Thanks
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109250)
Posted 7 days ago by Bryn Mawr
Post:
A strange error, sadly I can only give a sketchy report but I hope it’s enough :-

Host = https://boinc.bakerlab.org/rosetta/results.php?hostid=6231982

Boinc 7.24.1, Ubuntu 22.04.4

I allowed Ubuntu to update and then rebooted, subsequent to this Boinc Manager disconnected after running for about a minute - the event log showed a Rosetta task restarting and immediately Boinc closing having received signal 15. This would repeat each time I restated the host and the Boinc service restarted.

I have now aborted all of the Rosetta tasks and this behaviour has now stopped.

(How) can a Rosetta task kill Boinc?

Just a notification as I’ve never heard this described before.
3) Message boards : Number crunching : More threads on VM tasks possible? (Message 109167)
Posted 28 days ago by Bryn Mawr
Post:
* delete *
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109146)
Posted 22 Apr 2024 by Bryn Mawr
Post:
I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.


This is not a problem with Boinc. It is not a problem with Folding. It is a problem with your configuration which is preventing the two projects, which have no way of knowing the other is there, from working together.

You have been given the configuration changes required, all that’s needed now is for you to try them.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109085)
Posted 5 Apr 2024 by Bryn Mawr
Post:
28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.


They might have beta in the name but these have been the production WUs for some time now.
6) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 109042)
Posted 26 Mar 2024 by Bryn Mawr
Post:
The main thing I take from this is “Try the suggestions” - you can always reverse the config changes if they don’t work.
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108983)
Posted 14 Mar 2024 by Bryn Mawr
Post:
Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :-

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07/database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
21:15:06 (176255): called boinc_finish(1)

</stderr_txt>
]]>

Boinc 7.24.1 and Ubuntu 22.04.4
8) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108962)
Posted 9 Mar 2024 by Bryn Mawr
Post:
A couple of stand out lines :-

3/9/2024 7:57:31 AM | | - Suspend if no input in last 1.00 minutes
3/9/2024 7:57:31 AM | | - Leave apps in memory if not running
3/9/2024 7:57:31 AM | | - Store at least 0.00 days of work
3/9/2024 7:57:31 AM | | - Store up to an additional 3.00 days of work
3/9/2024 7:57:31 AM | | - max disk usage: 10.00 GB

The first will leave your tasks suspended most of the time.

The second is probably your problem, you are running a lot of projects and Rosetta, on its own, can easily use 5gb. Try increasing your disk limit to 100gb (it’s reporting over 300gb free and it won’t use more than it needs).

Then set all other projects to no new tasks and do an update on Rosetta, when that has finished set your other projects back to accept new tasks.

If this does not pull any tasks then post the event log entries relating to the attempt.
9) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108961)
Posted 9 Mar 2024 by Bryn Mawr
Post:
I hope I got what you wanted this time. I rebooted, started BOINC with a watch in hand, counted five minutes and then got the log file.

Thanks.


Much better :-)
10) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108956)
Posted 9 Mar 2024 by Bryn Mawr
Post:
As an alternative method, reboot your machine, restart Boinc and then copy the event log after 5 minutes.

That will guarantee that we get the head of the log file which is what we need.
11) Message boards : Number crunching : Beyond newbie Q&A (Message 108948)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Your confusion appeared to be in thinking that a WU had a time allocated to it


No, I didn't mean that either.


Then what of “The total number of queued jobs is the total number of WUs in the queue waiting to be sent” can we explain better?
12) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108947)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I continue not to see rosetta@home in the list of the projects....


I continue not to see the beginning of a log where Boinc has just been restarted.
13) Message boards : Number crunching : Beyond newbie Q&A (Message 108938)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I'm sorry, but neither answer directly responds to my question.

Maybe my question wasn't clear enough, but I'm asking the meaning of TQJ given the variable sizes, not what a work unit is or how long the server will take to process work units.


Your confusion appeared to be in thinking that a WU had a time allocated to it thus you were getting 36 hour WUs and there were 2 hour WUs. Each job in the queue is a WU and that WU will take as long as you allow it through your configuration (or less if it completes first) and will complete as many decoys as it has time to. Thus your question as stated really has no meaning - the total number of queued jobs is the total number of queued jobs regardless of the variable size which is under your control.
14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108928)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug?


You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).
15) Message boards : Number crunching : Beyond newbie Q&A (Message 108927)
Posted 8 Mar 2024 by Bryn Mawr
Post:
How do I interpret the server status page?
Today it says "Total queued jobs: 431,868"

For most projects one job is one work unit, but here we have variable work unit sizes creating multiple models apiece.
My 36 hour work units have hundreds of models, and the beta ones have thousands.

The lowest common denominator is a two-hour work unit.
So are the total queued jobs 863,736 hours of work?
Are they 431,868 models of various complexity, completed in minutes to hours apiece?
Are they part of a pool with the in-progress models, totaling 717,885 iterative calculations bouncing between users until they reach a undisclosed limit and retire out of the pool?


Each WU will work either until it has completed or until the end of the iteration that takes it closest to your requested run time.

If you have set 36 hours as your run time it will run for 36 hours but it is the same WU that would run for 8 hours on another box with the default setup.
16) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108926)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Did you actually exit Boinc and restart before copying that?

What is needed is the start of the file after a restart - about the first 40 lines.
17) Message boards : Number crunching : Run time of new tasks. (Message 108883)
Posted 27 Feb 2024 by Bryn Mawr
Post:
As I said, Folding has been running on this machine for years, the "problem" I started the thread about is VERY recent. All the tasks from that last download have completed and returned now, they show quite a variety of run times 73K to 90k. There are also other tasks downloaded the same day, but at an earlier time, that do not show the extended run time.


Then look at you logs and see what else was running on your machine during the period in question.

This is not a Boinc problem, this is within your machine.
18) Message boards : Number crunching : Run time of new tasks. (Message 108878)
Posted 26 Feb 2024 by Bryn Mawr
Post:
The runtime for the last few work units has jumped up considerably. CPU time and credit haven't changed noticeably, but run time has increased from 60 thousand to 90 thousand on the same machine, (very approximate - illustrates the point though).


On both my crunchers the run time was 10,800 and is still 10,800.
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108871)
Posted 25 Feb 2024 by Bryn Mawr
Post:
Seems like 5 million tasks have become available to run !!
As of 24 Feb 2024, 8:02:19 UTC [ Scheduler running ]
Total queued jobs: 4,921,248
In progress: 115,690
Successes last 24h: 43,490


My crunchers thank you :-)
20) Message boards : Number crunching : Rosetta Beta 6.00 (Message 108764)
Posted 13 Dec 2023 by Bryn Mawr
Post:
What is the error that shows in the stderror file - click on the workunit in the tasks link of your account.
Here's the output of one Task.

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu: error while loading shared libraries: libGL.so.1: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

Installation/permissions issue?


Certainly looks like it.


Next 20



©2024 University of Washington
https://www.bakerlab.org