Posts by Bryn Mawr

1) Message boards : Number crunching : More threads on VM tasks possible? (Message 109167)
Posted 13 days ago by Bryn Mawr
Post:
* delete *
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109146)
Posted 15 days ago by Bryn Mawr
Post:
I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.


This is not a problem with Boinc. It is not a problem with Folding. It is a problem with your configuration which is preventing the two projects, which have no way of knowing the other is there, from working together.

You have been given the configuration changes required, all that’s needed now is for you to try them.
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109085)
Posted 5 Apr 2024 by Bryn Mawr
Post:
28 tasks with validate error...great....but i suppose thats just the way it goes with a beta.


They might have beta in the name but these have been the production WUs for some time now.
4) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 109042)
Posted 26 Mar 2024 by Bryn Mawr
Post:
The main thing I take from this is “Try the suggestions” - you can always reverse the config changes if they don’t work.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108983)
Posted 14 Mar 2024 by Bryn Mawr
Post:
Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :-

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07/database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
21:15:06 (176255): called boinc_finish(1)

</stderr_txt>
]]>

Boinc 7.24.1 and Ubuntu 22.04.4
6) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108962)
Posted 9 Mar 2024 by Bryn Mawr
Post:
A couple of stand out lines :-

3/9/2024 7:57:31 AM | | - Suspend if no input in last 1.00 minutes
3/9/2024 7:57:31 AM | | - Leave apps in memory if not running
3/9/2024 7:57:31 AM | | - Store at least 0.00 days of work
3/9/2024 7:57:31 AM | | - Store up to an additional 3.00 days of work
3/9/2024 7:57:31 AM | | - max disk usage: 10.00 GB

The first will leave your tasks suspended most of the time.

The second is probably your problem, you are running a lot of projects and Rosetta, on its own, can easily use 5gb. Try increasing your disk limit to 100gb (it’s reporting over 300gb free and it won’t use more than it needs).

Then set all other projects to no new tasks and do an update on Rosetta, when that has finished set your other projects back to accept new tasks.

If this does not pull any tasks then post the event log entries relating to the attempt.
7) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108961)
Posted 9 Mar 2024 by Bryn Mawr
Post:
I hope I got what you wanted this time. I rebooted, started BOINC with a watch in hand, counted five minutes and then got the log file.

Thanks.


Much better :-)
8) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108956)
Posted 9 Mar 2024 by Bryn Mawr
Post:
As an alternative method, reboot your machine, restart Boinc and then copy the event log after 5 minutes.

That will guarantee that we get the head of the log file which is what we need.
9) Message boards : Number crunching : Beyond newbie Q&A (Message 108948)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Your confusion appeared to be in thinking that a WU had a time allocated to it


No, I didn't mean that either.


Then what of “The total number of queued jobs is the total number of WUs in the queue waiting to be sent” can we explain better?
10) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108947)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I continue not to see rosetta@home in the list of the projects....


I continue not to see the beginning of a log where Boinc has just been restarted.
11) Message boards : Number crunching : Beyond newbie Q&A (Message 108938)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I'm sorry, but neither answer directly responds to my question.

Maybe my question wasn't clear enough, but I'm asking the meaning of TQJ given the variable sizes, not what a work unit is or how long the server will take to process work units.


Your confusion appeared to be in thinking that a WU had a time allocated to it thus you were getting 36 hour WUs and there were 2 hour WUs. Each job in the queue is a WU and that WU will take as long as you allow it through your configuration (or less if it completes first) and will complete as many decoys as it has time to. Thus your question as stated really has no meaning - the total number of queued jobs is the total number of queued jobs regardless of the variable size which is under your control.
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108928)
Posted 8 Mar 2024 by Bryn Mawr
Post:
I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug?


You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).
13) Message boards : Number crunching : Beyond newbie Q&A (Message 108927)
Posted 8 Mar 2024 by Bryn Mawr
Post:
How do I interpret the server status page?
Today it says "Total queued jobs: 431,868"

For most projects one job is one work unit, but here we have variable work unit sizes creating multiple models apiece.
My 36 hour work units have hundreds of models, and the beta ones have thousands.

The lowest common denominator is a two-hour work unit.
So are the total queued jobs 863,736 hours of work?
Are they 431,868 models of various complexity, completed in minutes to hours apiece?
Are they part of a pool with the in-progress models, totaling 717,885 iterative calculations bouncing between users until they reach a undisclosed limit and retire out of the pool?


Each WU will work either until it has completed or until the end of the iteration that takes it closest to your requested run time.

If you have set 36 hours as your run time it will run for 36 hours but it is the same WU that would run for 8 hours on another box with the default setup.
14) Message boards : Number crunching : Why no work on my computer since Feb 17, 2024 (Message 108926)
Posted 8 Mar 2024 by Bryn Mawr
Post:
Did you actually exit Boinc and restart before copying that?

What is needed is the start of the file after a restart - about the first 40 lines.
15) Message boards : Number crunching : Run time of new tasks. (Message 108883)
Posted 27 Feb 2024 by Bryn Mawr
Post:
As I said, Folding has been running on this machine for years, the "problem" I started the thread about is VERY recent. All the tasks from that last download have completed and returned now, they show quite a variety of run times 73K to 90k. There are also other tasks downloaded the same day, but at an earlier time, that do not show the extended run time.


Then look at you logs and see what else was running on your machine during the period in question.

This is not a Boinc problem, this is within your machine.
16) Message boards : Number crunching : Run time of new tasks. (Message 108878)
Posted 26 Feb 2024 by Bryn Mawr
Post:
The runtime for the last few work units has jumped up considerably. CPU time and credit haven't changed noticeably, but run time has increased from 60 thousand to 90 thousand on the same machine, (very approximate - illustrates the point though).


On both my crunchers the run time was 10,800 and is still 10,800.
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108871)
Posted 25 Feb 2024 by Bryn Mawr
Post:
Seems like 5 million tasks have become available to run !!
As of 24 Feb 2024, 8:02:19 UTC [ Scheduler running ]
Total queued jobs: 4,921,248
In progress: 115,690
Successes last 24h: 43,490


My crunchers thank you :-)
18) Message boards : Number crunching : Rosetta Beta 6.00 (Message 108764)
Posted 13 Dec 2023 by Bryn Mawr
Post:
What is the error that shows in the stderror file - click on the workunit in the tasks link of your account.
Here's the output of one Task.

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu: error while loading shared libraries: libGL.so.1: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

Installation/permissions issue?


Certainly looks like it.
19) Message boards : Number crunching : Rosetta Beta 6.00 (Message 108761)
Posted 12 Dec 2023 by Bryn Mawr
Post:
Hi

my computer specs:

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz [Family 6 Model 60 Stepping 3]
Number of processors 8
Operating System Debian GNU/Linux 12 (bookworm) [6.1.0-15-amd64|libc 2.36]
BOINC version 7.20.5
Memory 7833.52 MB

Rosetta v4.20 x86_64-pc-linux-gnu are completing normally as expected
Rosetta Beta v6.05 x86_64-pc-linux-gnu errors out, every single one gives "Error while computing"

Is there any option to block the "beta 6*" series until this is sorted out ?

thanks


UPDATE: currently the errors are at 116 and counting.
Just for the record, this is a headless install of Debian stable and no overclocking applied to the hardware,
which has been set up just for Rosetta & WCG.


What is the error that shows in the stderror file - click on the workunit in the tasks link of your account.
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 108748)
Posted 6 Dec 2023 by Bryn Mawr
Post:
Holy sh*t indeed. Finally got some work, all tasks errored out within 30 seconds.

Is it me? Is it the work? Anybody else have this problem?

Cheers - rumple

So far 6 errors and 99 good.

The errors are hard with my wingman also failing and appear to be a data error in the config file.

I’d say it’s all old7scaff units but I’ve had one, an old7scaff_7aa_hall_9 that’s completed and validated.

I had a lot of failures with old8scaff and old7scaff Rosetta Beta tasks too, but they were all received and returned by 30 Nov - also with a fair few successes tbf.

The new batch all seem to be named new7snme_7aa_ndif and are running fine here (7hrs+) but I haven't returned any yet to confirm they validate successfully.
They certainly don't crash after 5-20secs, which is the issue rumple seems to have.


Yes, another 100 new7 tasks completed and validated since then with no fails :-)


Next 20



©2024 University of Washington
https://www.bakerlab.org