Message boards : Number crunching : Problems with Minirosetta 1.76
Author | Message |
---|---|
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
This is a minor update to fix the problems with validation. |
Stacey Baird Send message Joined: 11 Apr 06 Posts: 19 Credit: 74,745 RAC: 0 |
This is a minor update to fix the problems with validation. I am still having computation errors with Rosetta Mini 175. Should I delete all the ones still in line and start over? |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Which jobs are failing for you ? The lb_thread_all_multi can all be cancelled, sure. Let us know if anything else is consistently dying. M http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
ByRad Send message Joined: 12 Apr 08 Posts: 8 Credit: 15,869,002 RAC: 386 |
Rosetta Mini 1.76 is also erronous: 2009-06-17 19:36:59 rosetta@home Started upload of lb_dk_ksync_sametemp2_hb_t311__IGNORE_THE_REST_12882_470_0_0 lb_dk_ksync_sametemp2_hb_t317__IGNORE_THE_REST_12886_1029_0 - error after 2:06:42h and the rest between 13 and 29 seconds. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Just got my first validate error in along time, 26 min's is that a record! looprebuild_t374_decoy_5_12863_1850_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236714383 Over_Validate error_Done_1,572.38 Edit// This process generated_99 decoys pete. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi again. I have these two tasks running now and i don,t know if it's just the graphics or the tasks, but they both show this. On a 4hr run time. Searching:0 Model:0 The first one is at 3hrs 9min, 39% step:45200 The second is at 1hr,30min, 18.6% step:46400 Thu 18 Jun 2009 07:50:04 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t374__IGNORE_THE_REST_12929_477_0 using minirosetta version 176 Thu 18 Jun 2009 09:35:13 EST|rosetta@home|Starting task wRMSF_1_5_core_jumps_mixcst2_hb_t367__IGNORE_THE_REST_12925_477_0 using minirosetta version 176 pete. |
koniiiik Send message Joined: 25 Dec 08 Posts: 3 Credit: 69,586 RAC: 0 |
Like the previous few versions did, although not as often as this one does, most of my tasks die on signal 4, which means illegal instruction. For example, see https://boinc.bakerlab.org/rosetta/result.php?resultid=259586800 or any other task assigned to me, there are about 20 of them in a row with the same error. It is probably using some kind of special processor features which it doesn't detect correctly or whatever. I can provide core dumps if it will be of any help. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,550,899 RAC: 8,846 |
Validate errors persist unfortunately looprebuild_t374_decoy_6_12863_4812_0 Outcome Validate error looprebuild_t374_nat_1_12863_4643_1 Outcome Validate error |
Eugene Send message Joined: 24 Nov 06 Posts: 4 Credit: 252,135 RAC: 0 |
i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes |
Saharak Send message Joined: 28 Apr 07 Posts: 7 Credit: 1,170,212 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi again. Hi. I had to abort these two plus another of the same type after i did a restart they both went backwards, the top one went back from 6hrs & 74% to 6mins & 6% everything else stayed the same and the second one 4hrs 25mins at 52% and went back as well the other i did not let it start. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888977 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=236888981 I haven't had these sort of problems before. pete. |
koniiiik Send message Joined: 25 Dec 08 Posts: 3 Credit: 69,586 RAC: 0 |
i had crashing WUs when i had faulty RAM installed on my PC. After running mem test and replacing faulty RAM everything was back to normal. Also overclocking RAM can cause some random WU crashes I guess you were replying to my post. Well, this is very unlikely – I just completed a build of OpenOffice.org 3.1.0, without any error. I think you would agree that building OOo is a much more difficult test for the RAM than running a few minirosetta tasks. In fact, minirosetta has been the only program dying on SIGILL on my machine ever since I first attached my machine to the project, only that the crashes were not as frequent as they are with 1.76. I reported this error for version 1.54 (https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4691&nowrap=true#59218) and nobody seemed to care at that time. Later I tried to set rosetta to assign the smallest WUs possible and it did indeed make it possible to run most tasks successfully but, well, now it seems minirosetta is crashing shortly after start. Guess I'll have to switch to a different project. |
Michael Hoffmann Send message Joined: 5 Jun 08 Posts: 9 Credit: 1,307,108 RAC: 0 |
I had validate errors in https://boinc.bakerlab.org/rosetta/result.php?resultid=259300075 and https://boinc.bakerlab.org/rosetta/result.php?resultid=259966541 although the task finished properly. Maybe this got something to do with the update? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
wRMSF_1_5_core_jumps_mixcst2_hb_t370__IGNORE_THE_REST_12928_668_0 Exit status 1 (0x1) Cpu Time: 1987.828 ERROR: AtomTree::torsion_angle_dof_id: angle range error ERROR:: Exit from: ....srccorekinematicsAtomTree.cc line: 762 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
lusvladimir Send message Joined: 18 Oct 05 Posts: 12 Credit: 1,784,854 RAC: 0 |
Validate error: Task ID: 259620267 Name: looprebuild_t374_decoy_5_12863_2150_0 Workunit: 236957726 ====================================================== DONE :: 1 starting structures 1749.9 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== Validate state: Invalid |
rhb Send message Joined: 19 Jan 07 Posts: 5 Credit: 277,050 RAC: 0 |
I have one task with a missing output file, the same problem as Message 61813. 20-Jun-2009 00:37:51 [rosetta@home] Computation for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 finished 20-Jun-2009 00:37:51 [rosetta@home] Output file wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0 for task wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 absent https://boinc.bakerlab.org/rosetta/result.php?resultid=259829534 Task ID 259829534 Name wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0 Workunit 237149781 InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ) ====================================================== DONE :: 1 starting structures 25455.1 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t369__IGNORE_THE_REST_12927_2156_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> |
lusvladimir Send message Joined: 18 Oct 05 Posts: 12 Credit: 1,784,854 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=259823440 Task ID: 259823440 Name: wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0 Workunit: 237144213 InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ) ====================================================== DONE :: 1 starting structures 18047.8 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> Validate state: Invalid --- https://boinc.bakerlab.org/rosetta/result.php?resultid=259799581 Task ID: 259799581 Name: wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0 Workunit: 237123479 InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ) ====================================================== DONE :: 1 starting structures 18501.5 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> Validate state: Invalid |
WinterWasp Send message Joined: 16 Jun 09 Posts: 2 Credit: 11,905 RAC: 0 |
The following task seems to be errorneous... looprebuild_t374_decoy_6_12863_4497 I haven't encountered any problems regarding Minirosetta 1.76 itself yet :-) |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Three quick exits with code 1: lb_cutback_all_multi_hb_t332__IGNORE_THE_REST_1NXZA_9_12960_3_0 lb_cutback_all_multi_hb_t328__IGNORE_THE_REST_2CG4A_12_12958_2_1 lb_cutback_all_multi_hb_t305__IGNORE_THE_REST_1LARA_6_12946_6_0 ERROR: dis==0 in pairtermderiv! ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 333 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Snags |
Message boards :
Number crunching :
Problems with Minirosetta 1.76
©2024 University of Washington
https://www.bakerlab.org