Minirosetta v1.45 bug thread

Message boards : Number crunching : Minirosetta v1.45 bug thread

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
DaveSun

Send message
Joined: 3 May 07
Posts: 5
Credit: 200,480
RAC: 0
Message 57815 - Posted: 12 Dec 2008, 13:15:25 UTC

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.
ID: 57815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 57816 - Posted: 12 Dec 2008, 13:33:02 UTC - in response to Message 57815.  

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I don't know, but I noticed that your wingman on that workunit seemed to have chosen a shorter workunit size, and therefore shut down before reaching whatever caused that problem. Also, I've noticed that choosing a preferred workunit length above 10 hours seems to get me more problematic workunits, so if you get such problems often, you might want to try reducing your preferred workunit size.
ID: 57816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 57818 - Posted: 12 Dec 2008, 14:46:18 UTC - in response to Message 57815.  

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.
ID: 57818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
A Few Good Men

Send message
Joined: 25 Mar 07
Posts: 14
Credit: 2,031,382
RAC: 0
Message 57819 - Posted: 12 Dec 2008, 14:56:08 UTC


Last 24 hours have produced this error on 5 WU's

Server state Over
Outcome Client error
Client state Compute error
Exit status -226 (0xffffff1e)
Computer ID 963376
Report deadline 22 Dec 2008 1:30:07 UTC
CPU time 21570.15
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
Can't acquire lockfile - exiting
Can't acquire lockfile - exiting



ID: 57819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaveSun

Send message
Joined: 3 May 07
Posts: 5
Credit: 200,480
RAC: 0
Message 57821 - Posted: 12 Dec 2008, 15:30:28 UTC - in response to Message 57818.  

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.
ID: 57821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 57822 - Posted: 12 Dec 2008, 15:35:59 UTC - in response to Message 57821.  
Last modified: 12 Dec 2008, 15:41:31 UTC

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.


Then perhaps the limit handled successfuly is higher than 99 decoys per workunit, but not as high as 596.
ID: 57822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 57823 - Posted: 12 Dec 2008, 16:01:25 UTC

Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0)
Workunit 195032150, Mac OS X 10.4.11

Failed after 30 seconds

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

ERROR: Assertion failure: assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 110
called boinc_finish
# cpu_run_time_pref: 14400

</stderr_txt>
]]>

ID: 57823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 57826 - Posted: 12 Dec 2008, 16:57:25 UTC

When I encountered two cs_vanilla compute errors in a row I set Rosetta to NNW. That was 4 days ago. Until the software is fixed and announced here it will remain so. It behooves the project team to fix these errors ASAP rather than wait until this thread (like its predecessors) is cluttered with hundreds of posts reporting the same stuff. I do not understand this counter-productive behavior.
ID: 57826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 57829 - Posted: 12 Dec 2008, 18:01:52 UTC

we are definitely working on it and will likely have an update within a few days after testing on ralph.
ID: 57829 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57830 - Posted: 12 Dec 2008, 19:00:45 UTC - in response to Message 57823.  
Last modified: 12 Dec 2008, 19:11:45 UTC

Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0)
Workunit 195032150, Mac OS X 10.4.11

Failed after 30 seconds

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

ERROR: Assertion failure: assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 110
called boinc_finish
# cpu_run_time_pref: 14400

</stderr_txt>
]]>




Appologies for this - i screwed up the submit for two proteins:
1qgv and 1t2j . I've tried to remove the jobs as soon as i noticed but
around 200 WUs went out anyway. If you get a WU with either of those two protein tags please abort it!

For the cs_vanilla jobs a fix is going out onto RALPH@HOme right now. If you get cs_vanilla jobs, also feel free to abort them. We'll resubmitonce the error is fixed
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,691,837
RAC: 1,806
Message 57831 - Posted: 12 Dec 2008, 19:16:03 UTC - in response to Message 57819.  

read here for two links on how to take care of lockfiles.


Last 24 hours have produced this error on 5 WU's

Server state Over
Outcome Client error
Client state Compute error
Exit status -226 (0xffffff1e)
Computer ID 963376
Report deadline 22 Dec 2008 1:30:07 UTC
CPU time 21570.15
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
Can't acquire lockfile - exiting
Can't acquire lockfile - exiting




ID: 57831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DaveSun

Send message
Joined: 3 May 07
Posts: 5
Credit: 200,480
RAC: 0
Message 57834 - Posted: 12 Dec 2008, 22:05:32 UTC - in response to Message 57822.  

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.


Then perhaps the limit handled successfuly is higher than 99 decoys per workunit, but not as high as 596.


While that is possible you'd think that if there was a limit it'd be coded into the app and tasks would end once the limit was reached.
ID: 57834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Francis
Avatar

Send message
Joined: 24 Nov 05
Posts: 8
Credit: 623,519
RAC: 0
Message 57841 - Posted: 13 Dec 2008, 10:44:34 UTC

12/13/2008 12:39:43 AM|rosetta@home|Starting loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1
12/13/2008 12:39:43 AM|rosetta@home|Starting task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 using minirosetta version 145
12/13/2008 12:46:00 AM|rosetta@home|Computation for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 finished
12/13/2008 12:46:00 AM|rosetta@home|Output file loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1_0 for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 absent

ID: 57841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 57842 - Posted: 13 Dec 2008, 10:52:54 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=213868228
Validate error Done 43,178.07 !!!!!

https://boinc.bakerlab.org/rosetta/result.php?resultid=213166655
https://boinc.bakerlab.org/rosetta/result.php?resultid=212932042
https://boinc.bakerlab.org/rosetta/result.php?resultid=212932029
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906401
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906412
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906413
https://boinc.bakerlab.org/rosetta/result.php?resultid=212931903
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
https://boinc.bakerlab.org/rosetta/result.php?resultid=212881858
https://boinc.bakerlab.org/rosetta/result.php?resultid=212692623
https://boinc.bakerlab.org/rosetta/result.php?resultid=212611598
https://boinc.bakerlab.org/rosetta/result.php?resultid=212499093

ID: 57842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,691,837
RAC: 1,806
Message 57846 - Posted: 13 Dec 2008, 13:41:58 UTC
Last modified: 13 Dec 2008, 13:44:01 UTC

1wjdA_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1wjdA-_5478_4043_0 got stuck and was showing 23.45% remaining which is odd, being that the messages in boinc manager showed it had started about 5 minutes earlier before getting inturputed by benchmark testing.
after aborting the task the next one started and the cores went to 100% immediately.
ID: 57846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 57863 - Posted: 14 Dec 2008, 13:39:28 UTC

Server Status Page is showing a problem 839am 12/14/08
ID: 57863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 57867 - Posted: 14 Dec 2008, 18:49:46 UTC - in response to Message 57842.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=213868228
Validate error Done 43,178.07 !!!!!

https://boinc.bakerlab.org/rosetta/result.php?resultid=213166655
https://boinc.bakerlab.org/rosetta/result.php?resultid=212932042
https://boinc.bakerlab.org/rosetta/result.php?resultid=212932029
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906401
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906412
https://boinc.bakerlab.org/rosetta/result.php?resultid=212906413
https://boinc.bakerlab.org/rosetta/result.php?resultid=212931903
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
https://boinc.bakerlab.org/rosetta/result.php?resultid=212881858
https://boinc.bakerlab.org/rosetta/result.php?resultid=212692623
https://boinc.bakerlab.org/rosetta/result.php?resultid=212611598
https://boinc.bakerlab.org/rosetta/result.php?resultid=212499093


I notice that your results are the first I've seen that were run under boinc 6.4.1. I wonder if that's the source of the problem instead of minirosetta 1.45?

ID: 57867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 57868 - Posted: 14 Dec 2008, 18:50:05 UTC - in response to Message 57842.  
Last modified: 14 Dec 2008, 18:51:03 UTC

[duplicate]
ID: 57868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikylinux

Send message
Joined: 25 Jul 07
Posts: 3
Credit: 73,155
RAC: 0
Message 57869 - Posted: 14 Dec 2008, 19:27:25 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=213307491
ID: 57869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 57870 - Posted: 14 Dec 2008, 20:36:05 UTC - in response to Message 57863.  

Server Status Page is showing a problem 839am 12/14/08

As of 14 Dec 2008 20:26:34 UTC the Server Status Page shows:
Program rah_make_work1 on host srv3 with status "Not running".
Work units Ready to send: 1

It looks like program rah_make_work2 isn't able to handle the load all by itself.
ID: 57870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Minirosetta v1.45 bug thread



©2024 University of Washington
https://www.bakerlab.org