Rosetta@home

Problems with Minirosetta 1.76

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Problems with Minirosetta 1.76

Sort
AuthorMessage
Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 61791 - Posted 16 Jun 2009 17:58:06 UTC

This is a minor update to fix the problems with validation.

Stacey Baird Profile
Avatar

Joined: Apr 11 06
Posts: 19
ID: 75056
Credit: 74,745
RAC: 0
Message 61803 - Posted 17 Jun 2009 5:20:03 UTC - in response to Message ID 61791.

This is a minor update to fix the problems with validation.



I am still having computation errors with Rosetta Mini 175. Should I delete all the ones still in line and start over?
____________

Mike Tyka

Joined: Oct 20 05
Posts: 96
ID: 5612
Credit: 2,190
RAC: 0
Message 61805 - Posted 17 Jun 2009 7:06:29 UTC

Which jobs are failing for you ? The lb_thread_all_multi can all be cancelled, sure. Let us know if anything else is consistently dying.

M


____________
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/

ByRad Profile
Avatar

Joined: Apr 12 08
Posts: 8
ID: 252633
Credit: 8,231,131
RAC: 13,507
Message 61813 - Posted 17 Jun 2009 17:48:24 UTC

Rosetta Mini 1.76 is also erronous:

2009-06-17 19:36:59 rosetta@home Started upload of lb_dk_ksync_sametemp2_hb_t311__IGNORE_THE_REST_12882_470_0_0
2009-06-17 19:38:57 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 finished
2009-06-17 19:38:57 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0_0 for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 absent
2009-06-17 19:38:57 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0
2009-06-17 19:38:57 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 using minirosetta version 176
2009-06-17 19:39:12 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 finished
2009-06-17 19:39:12 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0_0 for task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 absent
2009-06-17 19:39:12 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0
2009-06-17 19:39:13 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 using minirosetta version 176
2009-06-17 19:39:17 rosetta@home Finished upload of lb_dk_ksync_sametemp2_hb_t311__IGNORE_THE_REST_12882_470_0_0
2009-06-17 19:39:28 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 finished
2009-06-17 19:39:28 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0_0 for task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 absent
2009-06-17 19:39:28 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0
2009-06-17 19:39:28 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 using minirosetta version 176
2009-06-17 19:39:42 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 finished
2009-06-17 19:39:42 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0_0 for task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 absent
2009-06-17 19:39:42 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0
2009-06-17 19:39:43 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 using minirosetta version 176
2009-06-17 19:40:13 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 finished
2009-06-17 19:40:13 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0_0 for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 absent


lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 - error after 2:06:42h
and the rest between 13 and 29 seconds.
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 61820 - Posted 17 Jun 2009 22:12:03 UTC
Last modified: 17 Jun 2009 22:45:54 UTC

Hi.

Just got my first validate error in along time, 26 min's is that a record!


looprebuild_t374_decoy_5_12863_1850_0


http://boinc.bakerlab.org/rosetta/workunit.php?wuid=236714383


Over_Validate error_Done_1,572.38

Edit// This process generated_99 decoys

pete.
____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 61821 - Posted 18 Jun 2009 1:12:58 UTC

Hi again.

I have these two tasks running now and i don,t know if it's just the graphics or

the tasks, but they both show this. On a 4hr run time.

Searching:0

Model:0

The first one is at 3hrs 9min, 39% step:45200

The second is at 1hr,30min, 18.6% step:46400

Thu 18 Jun 2009 07:50:04 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t374__IGNORE_THE_REST_12929_477_0 using minirosetta version 176


Thu 18 Jun 2009 09:35:13 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t367__IGNORE_THE_REST_12925_477_0 using minirosetta version 176

pete.

____________


koniiiik

Joined: Dec 25 08
Posts: 3
ID: 294317
Credit: 69,586
RAC: 0
Message 61825 - Posted 18 Jun 2009 8:12:20 UTC

Like the previous few versions did, although not as often as this one does, most of my tasks die on signal 4, which means illegal instruction. For example, see http://boinc.bakerlab.org/rosetta/result.php?resultid=259586800 or any other task assigned to me, there are about 20 of them in a row with the same error.
It is probably using some kind of special processor features which it doesn't detect correctly or whatever. I can provide core dumps if it will be of any help.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 61828 - Posted 18 Jun 2009 10:57:18 UTC
Last modified: 18 Jun 2009 10:59:01 UTC

Validate errors persist unfortunately

looprebuild_t374_decoy_6_12863_4812_0

Outcome Validate error
Client state Done
Exit status 0 (0x0)
CPU time 1097.358

stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
[...]
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4.76883339179875
Granted credit 0
application version 1.76


looprebuild_t374_nat_1_12863_4643_1
Outcome Validate error
Client state Done
Exit status 0 (0x0)
CPU time 548.7959

stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
[...]
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.38492471299452
Granted credit 0
application version 1.76

____________

Eugene

Joined: Nov 24 06
Posts: 4
ID: 131199
Credit: 252,135
RAC: 0
Message 61831 - Posted 18 Jun 2009 13:01:26 UTC

i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes
____________

Saharak

Joined: Apr 28 07
Posts: 7
ID: 170710
Credit: 499,019
RAC: 1,482
Message 61834 - Posted 18 Jun 2009 16:48:34 UTC
Last modified: 18 Jun 2009 16:52:10 UTC

wu's page
task's page

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 61838 - Posted 18 Jun 2009 21:29:32 UTC - in response to Message ID 61821.
Last modified: 18 Jun 2009 21:54:36 UTC

Hi again.

I have these two tasks running now and i don,t know if it's just the graphics or

the tasks, but they both show this. On a 4hr run time.

Searching:0

Model:0

The first one is at 3hrs 9min, 39% step:45200

The second is at 1hr,30min, 18.6% step:46400

Thu 18 Jun 2009 07:50:04 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t374__IGNORE_THE_REST_12929_477_0 using minirosetta version 176


Thu 18 Jun 2009 09:35:13 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t367__IGNORE_THE_REST_12925_477_0 using minirosetta version 176

pete.


Hi.

I had to abort these two plus another of the same type after i did a restart they both went backwards, the top one went back from 6hrs & 74% to 6mins & 6% everything else stayed the same and the second one 4hrs 25mins at 52% and went back as well the other i did not let it start.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888977

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888981

I haven't had these sort of problems before.

pete.
____________


koniiiik

Joined: Dec 25 08
Posts: 3
ID: 294317
Credit: 69,586
RAC: 0
Message 61844 - Posted 19 Jun 2009 5:54:39 UTC - in response to Message ID 61831.

i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes

I guess you were replying to my post. Well, this is very unlikely – I just completed a build of OpenOffice.org 3.1.0, without any error. I think you would agree that building OOo is a much more difficult test for the RAM than running a few minirosetta tasks.
In fact, minirosetta has been the only program dying on SIGILL on my machine ever since I first attached my machine to the project, only that the crashes were not as frequent as they are with 1.76.
I reported this error for version 1.54 (http://boinc.bakerlab.org/rosetta/forum_thread.php?id=4691&nowrap=true#59218) and nobody seemed to care at that time. Later I tried to set rosetta to assign the smallest WUs possible and it did indeed make it possible to run most tasks successfully but, well, now it seems minirosetta is crashing shortly after start.
Guess I'll have to switch to a different project.

Michael Hoffmann Profile
Avatar

Joined: Jun 5 08
Posts: 8
ID: 263088
Credit: 886,216
RAC: 882
Message 61855 - Posted 19 Jun 2009 20:29:09 UTC
Last modified: 19 Jun 2009 20:29:39 UTC

I had validate errors in http://boinc.bakerlab.org/rosetta/result.php?resultid=259300075 and http://boinc.bakerlab.org/rosetta/result.php?resultid=259966541 although the task finished properly. Maybe this got something to do with the update?

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 61856 - Posted 19 Jun 2009 22:20:33 UTC

wRMSF_1_5_core_jumps_mixcst2_hb_t370__IGNORE_THE_REST_12928_668_0

Exit status 1 (0x1)
Cpu Time: 1987.828

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: ..\..\src\core\kinematics\AtomTree.cc line: 762
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 61859 - Posted 20 Jun 2009 7:40:49 UTC

Validate error:
Task ID: 259620267
Name: looprebuild_t374_decoy_5_12863_2150_0
Workunit: 236957726
======================================================
DONE :: 1 starting structures 1749.9 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

Validate state: Invalid
____________

rhb

Joined: Jan 19 07
Posts: 5
ID: 142744
Credit: 277,050
RAC: 0
Message 61863 - Posted 20 Jun 2009 14:17:14 UTC

I have one task with a missing output file, the same problem as Message 61813.

20-Jun-2009 00:37:51 [rosetta@home] Computation for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 finished
20-Jun-2009 00:37:51 [rosetta@home] Output file wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0 for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 absent

http://boinc.bakerlab.org/rosetta/result.php?resultid=259829534

Task ID 259829534
Name wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0
Workunit 237149781

InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 25455.1 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
____________

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 61870 - Posted 21 Jun 2009 6:59:46 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=259823440
Task ID: 259823440
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0
Workunit: 237144213
InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18047.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid

---

http://boinc.bakerlab.org/rosetta/result.php?resultid=259799581
Task ID: 259799581
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0
Workunit: 237123479

InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18501.5 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid
____________

WinterWasp
Avatar

Joined: Jun 16 09
Posts: 2
ID: 321897
Credit: 11,905
RAC: 0
Message 61871 - Posted 21 Jun 2009 8:30:28 UTC
Last modified: 21 Jun 2009 8:33:21 UTC

The following task seems to be errorneous...

looprebuild_t374_decoy_6_12863_4497

I haven't encountered any problems regarding Minirosetta 1.76 itself yet :-)

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 61889 - Posted 22 Jun 2009 20:35:36 UTC

Three quick exits with code 1:

lb_cutback_all_multi_hb_t332__IGNORE_THE_REST_1NXZA_9_12960_3_0

lb_cutback_all_multi_hb_t328__IGNORE_THE_REST_2CG4A_12_12958_2_1

lb_cutback_all_multi_hb_t305__IGNORE_THE_REST_1LARA_6_12946_6_0

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 333
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


Snags

Message boards : Number crunching : Problems with Minirosetta 1.76


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^