Problems with Minirosetta 1.76

Message boards : Number crunching : Problems with Minirosetta 1.76

To post messages, you must log in.

AuthorMessage
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 61791 - Posted: 16 Jun 2009, 17:58:06 UTC

This is a minor update to fix the problems with validation.
ID: 61791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Stacey Baird
Avatar

Send message
Joined: 11 Apr 06
Posts: 19
Credit: 74,745
RAC: 0
Message 61803 - Posted: 17 Jun 2009, 5:20:03 UTC - in response to Message 61791.  

This is a minor update to fix the problems with validation.



I am still having computation errors with Rosetta Mini 175. Should I delete all the ones still in line and start over?
ID: 61803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 61805 - Posted: 17 Jun 2009, 7:06:29 UTC

Which jobs are failing for you ? The lb_thread_all_multi can all be cancelled, sure. Let us know if anything else is consistently dying.

M


http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 61805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ByRad
Avatar

Send message
Joined: 12 Apr 08
Posts: 8
Credit: 15,869,002
RAC: 386
Message 61813 - Posted: 17 Jun 2009, 17:48:24 UTC

Rosetta Mini 1.76 is also erronous:
2009-06-17 19:36:59 rosetta@home Started upload of lb_dk_ksync_sametemp2_hb_t311__IGNORE_THE_REST_12882_470_0_0
2009-06-17 19:38:57 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 finished
2009-06-17 19:38:57 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0_0 for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 absent
2009-06-17 19:38:57 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0
2009-06-17 19:38:57 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 using minirosetta version 176
2009-06-17 19:39:12 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 finished
2009-06-17 19:39:12 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0_0 for task lb_dk_ksync_sametemp2_hb_t306__IGNORE_THE_REST_12880_1196_0 absent
2009-06-17 19:39:12 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0
2009-06-17 19:39:13 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 using minirosetta version 176
2009-06-17 19:39:17 rosetta@home Finished upload of lb_dk_ksync_sametemp2_hb_t311__IGNORE_THE_REST_12882_470_0_0
2009-06-17 19:39:28 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 finished
2009-06-17 19:39:28 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0_0 for task lb_dk_ksync_sametemp2_hb_t331__IGNORE_THE_REST_12890_1713_0 absent
2009-06-17 19:39:28 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0
2009-06-17 19:39:28 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 using minirosetta version 176
2009-06-17 19:39:42 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 finished
2009-06-17 19:39:42 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0_0 for task lb_dk_ksync_sametemp2_hb_t297__IGNORE_THE_REST_12876_1872_0 absent
2009-06-17 19:39:42 rosetta@home Starting lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0
2009-06-17 19:39:43 rosetta@home Starting task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 using minirosetta version 176
2009-06-17 19:40:13 rosetta@home Computation for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 finished
2009-06-17 19:40:13 rosetta@home Output file lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0_0 for task lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_2342_0 absent


lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 - error after 2:06:42h
and the rest between 13 and 29 seconds.
ID: 61813 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 61820 - Posted: 17 Jun 2009, 22:12:03 UTC
Last modified: 17 Jun 2009, 22:45:54 UTC

Hi.

Just got my first validate error in along time, 26 min's is that a record!


looprebuild_t374_decoy_5_12863_1850_0


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236714383


Over_Validate error_Done_1,572.38

Edit// This process generated_99 decoys

pete.
ID: 61820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 61821 - Posted: 18 Jun 2009, 1:12:58 UTC

Hi again.

I have these two tasks running now and i don,t know if it's just the graphics or

the tasks, but they both show this. On a 4hr run time.

Searching:0

Model:0

The first one is at 3hrs 9min, 39% step:45200

The second is at 1hr,30min, 18.6% step:46400

Thu 18 Jun 2009 07:50:04 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t374__IGNORE_THE_REST_12929_477_0 using minirosetta version 176


Thu 18 Jun 2009 09:35:13 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t367__IGNORE_THE_REST_12925_477_0 using minirosetta version 176

pete.

ID: 61821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
koniiiik

Send message
Joined: 25 Dec 08
Posts: 3
Credit: 69,586
RAC: 0
Message 61825 - Posted: 18 Jun 2009, 8:12:20 UTC

Like the previous few versions did, although not as often as this one does, most of my tasks die on signal 4, which means illegal instruction. For example, see https://boinc.bakerlab.org/rosetta/result.php?resultid=259586800 or any other task assigned to me, there are about 20 of them in a row with the same error.
It is probably using some kind of special processor features which it doesn't detect correctly or whatever. I can provide core dumps if it will be of any help.
ID: 61825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,560,787
RAC: 9,320
Message 61828 - Posted: 18 Jun 2009, 10:57:18 UTC
Last modified: 18 Jun 2009, 10:59:01 UTC

Validate errors persist unfortunately

looprebuild_t374_decoy_6_12863_4812_0

Outcome Validate error
Client state Done
Exit status 0 (0x0)
CPU time 1097.358

stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
[...]
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4.76883339179875
Granted credit 0
application version 1.76


looprebuild_t374_nat_1_12863_4643_1
Outcome Validate error
Client state Done
Exit status 0 (0x0)
CPU time 548.7959

stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
[...]
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.38492471299452
Granted credit 0
application version 1.76

ID: 61828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Eugene

Send message
Joined: 24 Nov 06
Posts: 4
Credit: 252,135
RAC: 0
Message 61831 - Posted: 18 Jun 2009, 13:01:26 UTC

i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes
ID: 61831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Saharak

Send message
Joined: 28 Apr 07
Posts: 7
Credit: 1,170,212
RAC: 0
Message 61834 - Posted: 18 Jun 2009, 16:48:34 UTC
Last modified: 18 Jun 2009, 16:52:10 UTC

ID: 61834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 61838 - Posted: 18 Jun 2009, 21:29:32 UTC - in response to Message 61821.  
Last modified: 18 Jun 2009, 21:54:36 UTC

Hi again.

I have these two tasks running now and i don,t know if it's just the graphics or

the tasks, but they both show this. On a 4hr run time.

Searching:0

Model:0

The first one is at 3hrs 9min, 39% step:45200

The second is at 1hr,30min, 18.6% step:46400

Thu 18 Jun 2009 07:50:04 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t374__IGNORE_THE_REST_12929_477_0 using minirosetta version 176


Thu 18 Jun 2009 09:35:13 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t367__IGNORE_THE_REST_12925_477_0 using minirosetta version 176

pete.


Hi.

I had to abort these two plus another of the same type after i did a restart they both went backwards, the top one went back from 6hrs & 74% to 6mins & 6% everything else stayed the same and the second one 4hrs 25mins at 52% and went back as well the other i did not let it start.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888977

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888981

I haven't had these sort of problems before.

pete.
ID: 61838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
koniiiik

Send message
Joined: 25 Dec 08
Posts: 3
Credit: 69,586
RAC: 0
Message 61844 - Posted: 19 Jun 2009, 5:54:39 UTC - in response to Message 61831.  

i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes

I guess you were replying to my post. Well, this is very unlikely – I just completed a build of OpenOffice.org 3.1.0, without any error. I think you would agree that building OOo is a much more difficult test for the RAM than running a few minirosetta tasks.
In fact, minirosetta has been the only program dying on SIGILL on my machine ever since I first attached my machine to the project, only that the crashes were not as frequent as they are with 1.76.
I reported this error for version 1.54 (https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4691&nowrap=true#59218) and nobody seemed to care at that time. Later I tried to set rosetta to assign the smallest WUs possible and it did indeed make it possible to run most tasks successfully but, well, now it seems minirosetta is crashing shortly after start.
Guess I'll have to switch to a different project.
ID: 61844 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael Hoffmann
Avatar

Send message
Joined: 5 Jun 08
Posts: 9
Credit: 1,307,108
RAC: 0
Message 61855 - Posted: 19 Jun 2009, 20:29:09 UTC
Last modified: 19 Jun 2009, 20:29:39 UTC

I had validate errors in https://boinc.bakerlab.org/rosetta/result.php?resultid=259300075 and https://boinc.bakerlab.org/rosetta/result.php?resultid=259966541 although the task finished properly. Maybe this got something to do with the update?
ID: 61855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 61856 - Posted: 19 Jun 2009, 22:20:33 UTC

wRMSF_1_5_core_jumps_mixcst2_hb_t370__IGNORE_THE_REST_12928_668_0

Exit status 1 (0x1)
Cpu Time: 1987.828

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: ....srccorekinematicsAtomTree.cc line: 762
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


ID: 61856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 61859 - Posted: 20 Jun 2009, 7:40:49 UTC

Validate error:
Task ID: 259620267
Name: looprebuild_t374_decoy_5_12863_2150_0
Workunit: 236957726
======================================================
DONE :: 1 starting structures 1749.9 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

Validate state: Invalid
ID: 61859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rhb

Send message
Joined: 19 Jan 07
Posts: 5
Credit: 277,050
RAC: 0
Message 61863 - Posted: 20 Jun 2009, 14:17:14 UTC

I have one task with a missing output file, the same problem as Message 61813.

20-Jun-2009 00:37:51 [rosetta@home] Computation for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 finished
20-Jun-2009 00:37:51 [rosetta@home] Output file wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0 for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 absent

https://boinc.bakerlab.org/rosetta/result.php?resultid=259829534

Task ID 259829534
Name wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0
Workunit 237149781

InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 25455.1 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
ID: 61863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 61870 - Posted: 21 Jun 2009, 6:59:46 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=259823440
Task ID: 259823440
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0
Workunit: 237144213
InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18047.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid

---

https://boinc.bakerlab.org/rosetta/result.php?resultid=259799581
Task ID: 259799581
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0
Workunit: 237123479

InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18501.5 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid
ID: 61870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
WinterWasp
Avatar

Send message
Joined: 16 Jun 09
Posts: 2
Credit: 11,905
RAC: 0
Message 61871 - Posted: 21 Jun 2009, 8:30:28 UTC
Last modified: 21 Jun 2009, 8:33:21 UTC

The following task seems to be errorneous...

looprebuild_t374_decoy_6_12863_4497

I haven't encountered any problems regarding Minirosetta 1.76 itself yet :-)
ID: 61871 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 61889 - Posted: 22 Jun 2009, 20:35:36 UTC

Three quick exits with code 1:

lb_cutback_all_multi_hb_t332__IGNORE_THE_REST_1NXZA_9_12960_3_0

lb_cutback_all_multi_hb_t328__IGNORE_THE_REST_2CG4A_12_12958_2_1

lb_cutback_all_multi_hb_t305__IGNORE_THE_REST_1LARA_6_12946_6_0

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 333
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


Snags
ID: 61889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Problems with Minirosetta 1.76



©2024 University of Washington
https://www.bakerlab.org