minirosetta 2.14

Message boards : Number crunching : minirosetta 2.14

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,495,587
RAC: 13,712
Message 67581 - Posted: 4 Sep 2010, 2:45:26 UTC - in response to Message 67575.  


lrm_jorj_combined_torsion_it06_run01_A_rlbd_2hl7__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_318

lrm_jorj_combined_torsion_it06_run01_A_rlbd_1r26__SAVE_ALL_OUT_IGNORE_THE_RESTlr8_DECOY_21224_522

ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Exact same error for me in the following tasks:

lrm_jorj_combined_torsion_it06_run01_A_rlbd_1h75__SAVE_ALL_OUT_IGNORE_THE_RESTlr13_DECOY_21224_147_1
lrm_jorj_combined_torsion_it06_run01_A_rlbd_1o73__SAVE_ALL_OUT_IGNORE_THE_RESTlr8_DECOY_21224_118_0
lrm_jorj_combined_torsion_it06_run01_A_rlbd_2uzr__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_120_0
lrm_jorj_combined_torsion_it06_run01_A_rlbd_1s12__SAVE_ALL_OUT_IGNORE_THE_RESTlr8_DECOY_21224_213_0
lrm_jorj_combined_torsion_it06_run01_A_rlbd_2i1u__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_169_1

Also, the following error in the following tasks:
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out

td-only-2-Alg13_8-10_21413_155_1
td-only-2-RrR43_7-10_21413_131_1
td-only-2-DsbA_10-12_21413_137_1
ID: 67581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67583 - Posted: 4 Sep 2010, 8:09:15 UTC

Another failed after 2sec, same problem as others.


lrm_jorj_combined_torsion_it06_run01_A_rlbd_1xd6__SAVE_ALL_OUT_IGNORE_THE_RESTlr5_DECOY_21224_606

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331225615

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_1xd6.fix.out.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

ID: 67583 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67601 - Posted: 6 Sep 2010, 1:59:12 UTC

This one failed after 4sec, same as others.


lrm_jorj_combined_torsion_it06_run01_A_rlbd_2iiy__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_814_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331916208

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

( left out bits in middle )

Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr10_2iiy.fix.out.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>



ID: 67601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67602 - Posted: 6 Sep 2010, 5:31:34 UTC

And another one, took 12sec to die.

lrm_jorj_combined_torsion_it06_run01_A_rlbd_1l6p__SAVE_ALL_OUT_IGNORE_THE_RESTlr13_DECOY_21224_797

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331893230


<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>


Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr13_1l6p.fix.out.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

ID: 67602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,705,781
RAC: 1,723
Message 67629 - Posted: 7 Sep 2010, 17:58:29 UTC

thought you guys fixed the weights problem???

lrm_jorj_combined_torsion_it06_run01_A_rlbd_2i1u__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_584_0

ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_databasescoring/weights/dslf_weights.wts exist
ERROR:: Exit from: ....srccorescoringScoreFunctionFactory.cc line: 178
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 67629 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67634 - Posted: 7 Sep 2010, 22:04:42 UTC

Hi.

The first copy of this task errored, i can't see a problem with mine

don't know why it got a validate error.

cs-td-2-LkR15_5-5_20162_213_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331107668

Server state__Over
Outcome__Validate error
Client state__Done
Exit status__0 (0x0)

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<stderr_txt>

Starting work on structure: _00023
======================================================
DONE :: 1 starting structures 14182.3 cpu seconds
This process generated 23 decoys from 23 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>

ID: 67634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67694 - Posted: 10 Sep 2010, 21:52:08 UTC

This one has failed twice, mine after 14sec.

SAXS-score-1egaB_SAVE_ALL_OUT_21827_871_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=332831626


<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>


Starting work on structure: _00001

ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 250
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

ID: 67694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,495,587
RAC: 13,712
Message 67702 - Posted: 11 Sep 2010, 2:40:02 UTC

A strange one:

T0605_tjrs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21824_1219_1
ERROR: Error in traceback: pointer doesn't go anywhere!

ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79
BOINC:: Error reading and gzipping output datafile: default.out

ID: 67702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67706 - Posted: 11 Sep 2010, 7:50:11 UTC

This ran for 3min.

fix_disulf_v4_NMR_1j0t_DISULF__BOINC_abrelax.v1_SAVE_ALL_OUT_21861_87_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=333047271

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>

Starting work on structure: _00001
# cpu_run_time_pref: 14400

ERROR: rsd_type_list.size()
ERROR:: Exit from: src/core/fragment/Frame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

ID: 67706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 20
Message 67711 - Posted: 11 Sep 2010, 9:32:38 UTC

P.P.L Your core client (Boinc Manager) is a little out dated, try updating your core client to the recommended version This may help
Have a crunching good day!!
ID: 67711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67717 - Posted: 11 Sep 2010, 23:00:05 UTC

Hi.

As transient says, i think that (process exited with code 1 (0x1, -255) is

a generic error code they/boinc use.

I'll stop posting that bit in future.

As for boinc versions goes as they say, if it ain't broke don't fix it!

ID: 67717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Levent TERLEMEZ

Send message
Joined: 7 Dec 05
Posts: 18
Credit: 121,492
RAC: 0
Message 67726 - Posted: 13 Sep 2010, 7:56:31 UTC

I bought a brand new AMD Phenom(tm) II X4 925 Processor and return to BOINC my projects. But some interesting things began. What ever project working seti, einstein, rosetta or what ever it is, what wu number it is in that session (in a number downloaded WUs for that day), anyhow there was A (one) calculation error. What may it be?
Machine Specs:
XP Pro SP3
AMD Phenom(tm) II X4 925 Processor
2 GB DDR3 Ram
BOINC Ver. 6.10.58
THANKS for any answers or tips about after any observed the same or like this error before.
ID: 67726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67728 - Posted: 13 Sep 2010, 17:45:22 UTC

Levent TERLEMEZ
Looks like their task reported back with this:
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


The task was restarted five times. If the task was unable to reach a checkpoint in that time, then the task is aborted for you. But I would expect a message about too many restarts with no progress rather then the one you got.
Rosetta Moderator: Mod.Sense
ID: 67728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Levent TERLEMEZ

Send message
Joined: 7 Dec 05
Posts: 18
Credit: 121,492
RAC: 0
Message 67729 - Posted: 13 Sep 2010, 20:49:26 UTC - in response to Message 67728.  

Levent TERLEMEZ
Looks like their task reported back with this:
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


The task was restarted five times. If the task was unable to reach a checkpoint in that time, then the task is aborted for you. But I would expect a message about too many restarts with no progress rather then the one you got.


Thanks for the reply, well sorry for the easy way I selected-asking more, is it possible to be corrupted while downloading. Thanks again.


ID: 67729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67732 - Posted: 14 Sep 2010, 14:42:50 UTC

...is it possible to be corrupted while downloading.


It is possible for corruption to occur to any data that passes over a network. However, BOINC has signatures that double check the integrity of the files you receive. When a signature mismatch is found, the error is reported differently and the task is not run.

Generally the error about gzipping is due to the output file not being produced. So it isn't there to zip. And this is because the error occurred before any output was produced.
Rosetta Moderator: Mod.Sense
ID: 67732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67737 - Posted: 15 Sep 2010, 1:48:49 UTC

I've ran a few of these already no problem, this is a different error.

Ran for 17sec.

T0585_tj_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21908_3066_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=333706692

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>


ID: 67737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael*

Send message
Joined: 20 Apr 10
Posts: 2
Credit: 1,334,106
RAC: 0
Message 67747 - Posted: 16 Sep 2010, 11:12:05 UTC

I have a recurring problem with Rosetta. One or more workunit gets stuck at some random percentage to completion. Restarting BOINC seems to get the stuck WUs going again but I hate to see so much of my processing potential wasted.

With 8 threads, 4 are usually doing SIMAP with each using 12 or 13 percent processing power and 4 threads doing rosetta with each using 12 or 13 percent. Right now one of the rosetta threads is using 0 percent processing power. One of the workunits is stuck at 83.030% and has been running for 11 hours and 39 minutes. Rosetta WUs never take more than 3 hours. The only mention of this WU in the messages is the one where computation started.

Just now a restart set that WU back to 34% but at least it is moving again.

I don't think it is a memory problem. I've checked the messages and there is no mention of memory running out or any other problems. BOINC uses less than half of my available RAM (6GB) and I have it set to use 70% max while the computer is active.

Any solutions or ideas about this problem would be greatly appreciated.
ID: 67747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,853,229
RAC: 1,829
Message 67748 - Posted: 16 Sep 2010, 12:27:37 UTC

Does this look like a minirosetta 2.14 problem triggered problems in several other workunits from other BOINC projects?

9/15/2010 10:13:46 PM rosetta@home Sending scheduler request: To fetch work.
9/15/2010 10:13:46 PM rosetta@home Requesting new tasks for CPU and GPU
9/15/2010 10:13:49 PM rosetta@home Scheduler request completed: got 1 new tasks
9/15/2010 10:13:51 PM rosetta@home Started download of old_targets_calbindin_pcs_files4.zip
9/15/2010 10:14:14 PM rosetta@home Finished download of old_targets_calbindin_pcs_files4.zip
9/15/2010 10:15:46 PM rosetta@home Starting calbindin_old_targets_PCS_SAVE_ALL_OUT_21968_479_0
9/15/2010 10:16:20 PM rosetta@home Starting task calbindin_old_targets_PCS_SAVE_ALL_OUT_21968_479_0 using minirosetta version 214
9/15/2010 10:16:20 PM QMC@HOME Task qasino_b3lyp-E26_iso34.896_0 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM QMC@HOME If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:20 PM Docking Task 1g2k1ebw_mod0014crossdockinghiv1_7120_130310_0 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM Docking If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:20 PM World Community Grid Task BETA_E200366_495_A.24.C19H12N2OS2.250.1.set1d06_0 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM World Community Grid If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:20 PM malariacontrol.net Task wu_760_234_219331_0_1284580689_1 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM malariacontrol.net If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:20 PM ibercivis Task 1bm7opt_fix_gridmaps.7z__ZINC06701282_1284586816_S08_E05_0 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM ibercivis If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:20 PM boincsimap Task 10090101.156326_1 exited with zero status but no 'finished' file
9/15/2010 10:16:20 PM boincsimap If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:21 PM PrimeGrid Task pps_sr2sieve_1941162_0 exited with zero status but no 'finished' file
9/15/2010 10:16:21 PM PrimeGrid If this happens repeatedly you may need to reset the project.
9/15/2010 10:16:21 PM ibercivis Computation for task 1bm7opt_fix_gridmaps.7z__ZINC06722361_1284587828_S08_E05_0 finished

Most of the other workunits recovered enough to finish apparantly successfully.
ID: 67748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67751 - Posted: 16 Sep 2010, 16:18:32 UTC

robertmiles That SHOULD not be possible. But you certainly have some highly suspicious circumstantial evidence to assert otherwise. The only thing the various projects have in common that should be capable of causing a cascading crash like that is... well the BOINC client. My instinct is that BOINC had a problem at that time and took 'em all out.

Michael* I can't offer any suggestions. As you pointed out, suspend and resume of the task doesn't even seem to kick it to start, at least when tasks are kept in memory, so full restart of BOINC seems to be the only way to get CPU allocated to the task again. I can only confirm that others have observed this as well, and that it seems to be rather rare.

I haven't seen what happens if BOINC reschedules that task one it's own. I mean if you suspend it, it will begin another task. If you then release it, BOINC will eventually try to come back to it. At that time does it successfully get CPU time? Or does it get no CPU while BOINC still says it is running? Something to try anyway.
Rosetta Moderator: Mod.Sense
ID: 67751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,853,229
RAC: 1,829
Message 67752 - Posted: 16 Sep 2010, 17:58:44 UTC
Last modified: 16 Sep 2010, 18:06:24 UTC

Could be, although if so, it left no other evidence I can see on what went wrong. I do seem to have had problems with the SuperFetch feature of Windows Vista for some time, though - something not adequately documented so that I can see how to fix it. Need some information on how to control WHAT TYPE of information SuperFetch stores; I already have enough information on how to turn it off entirely.

On Michael*'s problem: Could that indicate that restarting from what's left in the main memory does not work adequately for that problem, but restarting from the last checkpoint on the hard drive does?
ID: 67752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : minirosetta 2.14



©2024 University of Washington
https://www.bakerlab.org