Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 279 · 280 · 281 · 282 · 283 · 284 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2030
Credit: 39,909,306
RAC: 19,574
Message 109331 - Posted: 2 Jun 2024, 18:18:20 UTC - in response to Message 109315.  

Taking 14-22hrs out of runtime goes a long way - in all likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run.
Which is exactly what my advice does- it reduces the Runtime for each and every Task, for all projects that the person does. It doesn't just do it for one Project, but for all of them.

As I said before & I will say again - your suggestion addresses the symptom, mine fixes what is actually causing the problem.
Most people prefer to fix the problem. If you're ok with just fixing the symptom, then so be it.

In this case, you're assuming the problem when there's no evidence of it being the one you describe based on the symptom.
A while back you rightly pointed out the symptom was one of scheduling. Solve the scheduling issue where Rosetta knowingly misleads Boinc, in the way I described - the end.
If you were to take a look at adrianxw's Rosetta tasks (where tbf he only seems to be running Rosetta tasks atm which mislead Boinc in the other direction) and no deadlines are being missed any more so Panic mode won't be arising let alone missing deadlines. nor will they if his tasks are all Beta ones.

You also ignore the fact that, with all cores usable to Boinc tasks, the Folding tasks are <additional> to those tasks.
I don't know how many Folding tasks run at a time - I assume it's one - so the inefficiency you see in Boinc tasks is entirely taken up by a 9th task running at normal priority on an 8-core machine.
How does that pan out? Neither of us know for sure, but I'm going to suggest that almost all of that "inefficiency" disappears by the processing of a 9th task on an 8 core machine.

But for anyone else that's been reading these posts...

If you limit the number of cores/threads available to BOINC, you will maximise your BOINC processing. You will get the maximum possible amount of work done each day that your system is capable of, you won't have issues with deadlines (unless of course you have inappropriate cache settings), or Panic Mode or any of those types of issues.
Well, that's obviously not true.
It's obviously true if you actually understand what is going on.
If you don't understand, then it's not going to be obvious.

Having cut off quoting the critical part of what I wrote about how the unutilised cores to Boinc are made use of, this isn't a statement on what I wrote but what I explicitly didn't write, so worthless,

One final attempt to point out the obvious-

On a different situation no-one's talking about... irrelevant.
When someone raises that as their issue, bring it up again. Then it might have a point.
ID: 109331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1914
Credit: 8,839,333
RAC: 9,528
Message 109332 - Posted: 3 Jun 2024, 6:34:24 UTC

Still the same error, again and again

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
20:49:27 (8612): called boinc_finish(1)

ID: 109332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109333 - Posted: 3 Jun 2024, 6:57:41 UTC - in response to Message 109331.  

...
time to end it.
I have tried my best i to help you understand, but every point you make shows that you still don't understand what is happening, so it really is time for me to give up once and for all.
Grant
Darwin NT
ID: 109333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109336 - Posted: 5 Jun 2024, 8:26:29 UTC
Last modified: 5 Jun 2024, 9:22:36 UTC

Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.
Grant
Darwin NT
ID: 109336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109337 - Posted: 5 Jun 2024, 9:22:14 UTC - in response to Message 109336.  
Last modified: 5 Jun 2024, 9:22:52 UTC

Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.
Now up to 24,128, and the Server Staus showing several processes on boinc-process not running.
Seems to be nothing but recurring issues with that server lately.
Grant
Darwin NT
ID: 109337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109338 - Posted: 5 Jun 2024, 10:45:39 UTC - in response to Message 109337.  
Last modified: 5 Jun 2024, 10:47:51 UTC

Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.
Now up to 24,128, and the Server Staus showing several processes on boinc-process not running.
Seems to be nothing but recurring issues with that server lately.
Now all processes on boinc-process are down and Waiting for Validation is now up to 35,496.

Maybe it's gone down in sympathy with the ralph server over on Ralph. It's been down for 4-5 days now.
Grant
Darwin NT
ID: 109338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 243
Credit: 435,550
RAC: 1,192
Message 109339 - Posted: 5 Jun 2024, 10:48:34 UTC

Everything is running as of as of 5 Jun 2024, 10:16:46 UTC
ID: 109339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109340 - Posted: 5 Jun 2024, 10:51:12 UTC - in response to Message 109339.  
Last modified: 5 Jun 2024, 10:54:37 UTC

Everything is running as of as of 5 Jun 2024, 10:16:46 UTC
10 minutes earlier everything on boinc-processes was dead.
And the same with the ralph server at Ralph, it's showing life again as well.


BTW- check the date time stamp- that's for the Task application data.

The server status data is this one- Remote daemon status as of 5 Jun 2024, 10:45:06 UTC
It would be good if these things were updated more often.
Grant
Darwin NT
ID: 109340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 243
Credit: 435,550
RAC: 1,192
Message 109341 - Posted: 5 Jun 2024, 10:52:16 UTC

They probably rebooted it.
ID: 109341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109342 - Posted: 5 Jun 2024, 10:56:08 UTC - in response to Message 109341.  

They probably rebooted it.
It'd be nice if they fixed whatever it was that keeps causing it to die so they don't need to keep rebooting it.
Grant
Darwin NT
ID: 109342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2030
Credit: 39,909,306
RAC: 19,574
Message 109350 - Posted: 7 Jun 2024, 9:22:24 UTC - in response to Message 109342.  

They probably rebooted it.
It'd be nice if they fixed whatever it was that keeps causing it to die so they don't need to keep rebooting it.

It is very odd - it never used to happen.
Anyway, glad it got sorted before too long and they didn't need a nudge this time seeing as I'm 2 days late in finding out
ID: 109350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109363 - Posted: 11 Jun 2024, 7:49:15 UTC

New work at Ralph, with new errors.
So some work has been done, but looks like there's still quite a way to go.

RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_d_pred_188_16900_2_1

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
Codice di accesso non valido.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
Traceback (most recent call last):
  File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv1rf2aapredict.py", line 733, in <module>
    with zipfile.ZipFile(args.z) as z:
  File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1268, in __init__
    self._RealGetContents()
  File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1335, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

</stderr_txt>
]]>



RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_d_pred_60_16900_5_1

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:ProgramDataBOINC/projects/ralph.bakerlab.orgev0Scriptsactivate.bat' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>

Grant
Darwin NT
ID: 109363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2030
Credit: 39,909,306
RAC: 19,574
Message 109364 - Posted: 11 Jun 2024, 14:51:39 UTC

Total queued jobs on the front page down to 222k
Advance warning we may be out of new tasks in the next 24hrs unless we get lucky again.
Fingers crossed.
ID: 109364 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109365 - Posted: 12 Jun 2024, 7:36:48 UTC
Last modified: 12 Jun 2024, 7:39:06 UTC

Now out of work new.
Also, although the Server status shows all green, there is a backlog of Tasks waiting on Validation.
3,078 at the moment.
Grant
Darwin NT
ID: 109365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1556
Credit: 15,972,297
RAC: 16,382
Message 109366 - Posted: 12 Jun 2024, 9:55:29 UTC - in response to Message 109365.  

Also, although the Server status shows all green, there is a backlog of Tasks waiting on Validation.
3,078 at the moment.
Whatever was going on before, the backlog has now cleared.
Grant
Darwin NT
ID: 109366 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2030
Credit: 39,909,306
RAC: 19,574
Message 109367 - Posted: 12 Jun 2024, 11:43:19 UTC - in response to Message 109365.  

Now out of work new

This has been the best run we've had for a couple of years - bound to end at some point once everyone's offline cache runs down.
It's at this point my 12hr runtime setting ekes out my remaining work as far as possible.

What I'd re-emphasise is that the default runtime for tasks has fallen to 3hrs for some reason, which I believe to be a mistake and contradicts the forced Boinc setting of 8hrs,
As such, people should go into Boinc's Your Account option, select Rosetta@home preferences and change Target CPU run time to an explicit 8hrs rather than "not selected".
This will almost treble how long tasks run and extend the life of work batches so that we run out less, if at all, while almost trebling the credit we get for tasks too.

This should be considered a high priority for everyone imo.
ID: 109367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile RDTSC

Send message
Joined: 29 Jan 24
Posts: 3
Credit: 189,058
RAC: 2,729
Message 109368 - Posted: 12 Jun 2024, 12:16:14 UTC

https://boinc.bakerlab.org/rosetta/ Their home page could do with some updates; last post almost two years ago. I get it, web hosting and administration is expensive, along with preparing, running, and maintaining massive job servers. It just seems to me that a little grease, at the right points of this machine, would greatly help it function.
ID: 109368 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 243
Credit: 435,550
RAC: 1,192
Message 109369 - Posted: 12 Jun 2024, 12:19:21 UTC

Hal jobs run for three hours because subtasks are short and produce many results per task.

Other jobs run for 8 hours.
ID: 109369 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2030
Credit: 39,909,306
RAC: 19,574
Message 109370 - Posted: 12 Jun 2024, 12:45:17 UTC - in response to Message 109369.  

Hal jobs run for three hours because subtasks are short and produce many results per task.

Other jobs run for 8 hours.

No. All mine run for 12hrs because I set them to run for 12hrs.

They don't hit a top limit of decoys and end because some internal limit has been reached.

Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs.

So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'.

Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins.

The alternative is what we have now - no new tasks. Everyone loses.
ID: 109370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 243
Credit: 435,550
RAC: 1,192
Message 109371 - Posted: 12 Jun 2024, 12:48:37 UTC

tasks starting with RosettaVS run for 8 hours for me.
ID: 109371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 279 · 280 · 281 · 282 · 283 · 284 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org