Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 280 · 281 · 282 · 283 · 284 · 285 · 286 . . . 299 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Tasks starting with RosettaVS run for 8 hours for me. Great, but I don't say this for the ones that run as expected, but for all those that don't, of which there seem to be many. Also, I don't recall seeing any RosettaVS tasks. I don't know how they behave. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 387 Credit: 11,993,574 RAC: 13,023 |
Now out of work new I’ve always figured to leave it on default as the project scientists who set them up know their requirements better than I do. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,381,157 RAC: 24,297 |
New batch of work over at Ralph, with new errors. RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_148_16902_5_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 8, in <module> import torch File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorch__init__.py", line 124, in <module> raise err OSError: [WinError 1455] Il file di paging è troppo piccolo per essere completato. Error loading "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchlibcaffe2_detectron_ops_gpu.dll" or one of its dependencies. </stderr_txt> ]]> RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_e_pred_195_16901_6_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 698, in <module> b.write(base64.b64decode(f.read())) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libbase64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Invalid base64-encoded string: number of data characters (65) cannot be 1 more than a multiple of 4 </stderr_txt> ]]> RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_119_16902_6_1 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> Codice di accesso non valido. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 708, in <module> pred.predict(out_name+f'_{n}', File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 551, in predict logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaRoseTTAFoldModel.py", line 358, in forward msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 1106, in forward msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair, File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 929, in forward xyz, state, alpha = self.str2str( File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchcudaampautocast_mode.py", line 141, in decorate_autocast return func(*args, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaTrack_module.py", line 503, in forward shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aaSE3_network.py", line 96, in forward return self.se3(G, node_features, edge_features) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodeltransformer.py", line 185, in forward node_feats = self.graph_modules(node_feats, edge_feats, graph=graph, basis=basis) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodeltransformer.py", line 47, in forward input = module(input, *args, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersattention.py", line 162, in forward fused_key_value = self.to_key_value(node_features, edge_features, graph, basis) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 347, in forward out += self.conv_in[str(degree_in)](feature, invariant_edge_feats, File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 186, in forward radial_weights = self.radial_func(invariant_edge_feats[e_i:e_j]) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aa/SE3Transformerse3_transformermodellayersconvolution.py", line 118, in forward return self.net(features) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulescontainer.py", line 139, in forward input = module(input) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmodulesmodule.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnmoduleslinear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchnnfunctional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: [enforce fail at ..c10coreCPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 536870912 bytes. </stderr_txt>]]> Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 257 Credit: 483,503 RAC: 397 |
Did they port rosetta python projects to native windows? Try to increase pagefile size. It helped with gpugrid python project. It even uses gpu. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
What I'd re-emphasise is that the default runtime for tasks has fallen to 3hrs for some reason, which I believe to be a mistake and contradicts the forced Boinc setting of 8hrs, While generally true, it's clear imo this 3hr target runtime is an error as it's inconsistent with what Rosetta tells Boinc. It only ever slips through when a new version of the app comes out. Istr it happened once before and was corrected in the days when the admins paid more attention to us. If the 8hr default ever changes I think something would be said - and seeing as no-one's saying anything these days I doubt it ever will change without a very specific reason. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Ooh, 360k tasks. We live to fight another day (or two) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1989 Credit: 9,459,558 RAC: 12,671 |
Today a lot of "classical" error ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442 BOINC:: Error reading and gzipping output datafile: default.out 08:16:19 (5164): called boinc_finish(1) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Today a lot of "classical" error Yes, but very quickly, so I'm not too worried by them More concerning are two Validate errors after running to completion hal_8a_i_hal_8aa_2jp5597_d99_0001_SAVE_ALL_OUT_2978378_13_0 hal_8a_i_hal_8aa_2jp1316_d224_0001_SAVE_ALL_OUT_2978378_13_0 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Ooh, 360k tasks. We live to fight another day (or two) Turned into 3+ days, but we're out again. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Ooh, 360k tasks. We live to fight another day (or two) While I know most people will have finished up their outstanding tasks already, I managed to sneak 4 extra returned tasks today and now discover that the validators running under boinc-process are down again. Better now than at other times, I guess |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,381,157 RAC: 24,297 |
That boinc-process server has developed a habit of regularly falling over, it was well past due for another crash. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Ooh, 360k tasks. We live to fight another day (or two) Or maybe not better now as 660k tasks newly available |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1989 Credit: 9,459,558 RAC: 12,671 |
Or maybe not better now as 660k tasks newly available 0 wus and a lot of daemons are down.... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Or maybe not better now as 660k tasks newly available Yup. I would've expected 660k to last at least 2 days, but I'm not sure it lasted much more than 15hrs, Unless tasks got pulled. Front page figures borked on top of boinc-process server borked Edit: Actually, I'm now thinking tasks did get pulled. Unvalidated tasks were about 20k before the new batch arrived - now 160k In progress tasks were about 30k, now 112k That implies 222k tasks were grabbed But the front page is locked at 7am with 660k queued, 440k have gone missing, presumed pulled |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Or maybe not better now as 660k tasks newly available Still the same - now nudged Edit while posting: site went down, back 5mins later, no apparent change yet but might be shortly (fingers-crossed) |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,381,157 RAC: 24,297 |
boinc-process server still dead, front page Server Status numbers still not updated (Last update, 07:04 UTC, yesterday). Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
boinc-process server still dead, front page Server Status numbers still not updated (Last update, 07:04 UTC, yesterday). Add it to the very long list of things I'm completely wrong about... <sigh> I've asked. We wait. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,381,157 RAC: 24,297 |
Just heard the fans in my system wind up. Checked BOINC & lo and behold- Rosetta has work again. Now if they could just get that boinc-process server that's been dead for a while now up and running again then all would be good. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,977,896 RAC: 19,267 |
Just heard the fans in my system wind up. Both you, and this PC were ahead of me. The rest, still just as you say. In a way, knowing if there are tasks or not, and whether they give credit or not, or how long they'll last, isn't massively different |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,381,157 RAC: 24,297 |
Server Status on the front page is yet to update, but all the servers on the Server Status page are now green and work is still flowing. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org