Message boards : Number crunching : Minirosetta v1.45 bug thread
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this? I don't know, but I noticed that your wingman on that workunit seemed to have chosen a shorter workunit size, and therefore shut down before reaching whatever caused that problem. Also, I've noticed that choosing a preferred workunit length above 10 hours seems to get me more problematic workunits, so if you get such problems often, you might want to try reducing your preferred workunit size. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this? I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly. |
A Few Good Men Send message Joined: 25 Mar 07 Posts: 14 Credit: 2,031,382 RAC: 0 |
Last 24 hours have produced this error on 5 WU's Server state Over Outcome Client error Client state Compute error Exit status -226 (0xffffff1e) Computer ID 963376 Report deadline 22 Dec 2008 1:30:07 UTC CPU time 21570.15 stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> # cpu_run_time_pref: 86400 # cpu_run_time_pref: 86400 # cpu_run_time_pref: 86400 # cpu_run_time_pref: 86400 # cpu_run_time_pref: 86400 Can't acquire lockfile - exiting Can't acquire lockfile - exiting |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this? Then perhaps the limit handled successfuly is higher than 99 decoys per workunit, but not as high as 596. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0) Workunit 195032150, Mac OS X 10.4.11 Failed after 30 seconds <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> ERROR: Assertion failure: assert( ( begin + size - 1 ) <= pose.total_residue() ); ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 110 called boinc_finish # cpu_run_time_pref: 14400 </stderr_txt> ]]> |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
When I encountered two cs_vanilla compute errors in a row I set Rosetta to NNW. That was 4 days ago. Until the software is fixed and announced here it will remain so. It behooves the project team to fix these errors ASAP rather than wait until this thread (like its predecessors) is cluttered with hundreds of posts reporting the same stuff. I do not understand this counter-productive behavior. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
we are definitely working on it and will likely have an update within a few days after testing on ralph. |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0) Appologies for this - i screwed up the submit for two proteins: 1qgv and 1t2j . I've tried to remove the jobs as soon as i noticed but around 200 WUs went out anyway. If you get a WU with either of those two protein tags please abort it! For the cs_vanilla jobs a fix is going out onto RALPH@HOme right now. If you get cs_vanilla jobs, also feel free to abort them. We'll resubmitonce the error is fixed http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
read here for two links on how to take care of lockfiles.
|
Mike Francis Send message Joined: 24 Nov 05 Posts: 8 Credit: 623,519 RAC: 0 |
12/13/2008 12:39:43 AM|rosetta@home|Starting loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 12/13/2008 12:39:43 AM|rosetta@home|Starting task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 using minirosetta version 145 12/13/2008 12:46:00 AM|rosetta@home|Computation for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 finished 12/13/2008 12:46:00 AM|rosetta@home|Output file loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1_0 for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 absent |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=213868228 Validate error Done 43,178.07 !!!!! https://boinc.bakerlab.org/rosetta/result.php?resultid=213166655 https://boinc.bakerlab.org/rosetta/result.php?resultid=212932042 https://boinc.bakerlab.org/rosetta/result.php?resultid=212932029 https://boinc.bakerlab.org/rosetta/result.php?resultid=212906401 https://boinc.bakerlab.org/rosetta/result.php?resultid=212906412 https://boinc.bakerlab.org/rosetta/result.php?resultid=212906413 https://boinc.bakerlab.org/rosetta/result.php?resultid=212931903 https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182 https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182 https://boinc.bakerlab.org/rosetta/result.php?resultid=212881858 https://boinc.bakerlab.org/rosetta/result.php?resultid=212692623 https://boinc.bakerlab.org/rosetta/result.php?resultid=212611598 https://boinc.bakerlab.org/rosetta/result.php?resultid=212499093 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
1wjdA_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1wjdA-_5478_4043_0 got stuck and was showing 23.45% remaining which is odd, being that the messages in boinc manager showed it had started about 5 minutes earlier before getting inturputed by benchmark testing. after aborting the task the next one started and the cores went to 100% immediately. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
Server Status Page is showing a problem 839am 12/14/08 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=213868228 I notice that your results are the first I've seen that were run under boinc 6.4.1. I wonder if that's the source of the problem instead of minirosetta 1.45? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
[duplicate] |
mikylinux Send message Joined: 25 Jul 07 Posts: 3 Credit: 73,155 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=213307491 |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
Server Status Page is showing a problem 839am 12/14/08 As of 14 Dec 2008 20:26:34 UTC the Server Status Page shows: Program rah_make_work1 on host srv3 with status "Not running". Work units Ready to send: 1 It looks like program rah_make_work2 isn't able to handle the load all by itself. |
A Few Good Men Send message Joined: 25 Mar 07 Posts: 14 Credit: 2,031,382 RAC: 0 |
Greg be I have uninstalled and reinstalled XP, reinstalled boinc, added the save in memory clause, standard clocks on computer, memtest on ram, burning for whole machine and im still getting these errors about comp error and locked files. At the same time SETI has no problems at all with the machine, its speed or its ram or anything. "A good program doesnt need 54 hoops to jump through before it works" After a clean install and full format , i have to lean towards the rosetta coding as the cause. Problems with this code are: Doesnt release all or just some of the processes when asked to snooze, lockfile is always present, says too many restarts. Other machines here are running fine but this one seems to have problems with only Rosetta at home. After a clean install and full format , i have to lean towards the rosetta coding as the cause. Does Not follow fair sharing of resources, Boinc manager at 50:50 and Rosetta has basically locked out all other projects. How about a nice little msi file to patch up the damage and lets get folding. |
[AF>Slappyto] popolito Send message Joined: 8 Mar 06 Posts: 13 Credit: 1,041,105 RAC: 20 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=213877775 Exit status -1073741819 (0xc0000005) Reason: Access Violation (0xc0000005) at address 0x007FA877 read attempt to address 0xF87CC8B3 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
very strange...but i think that installing boinc again just reinstalls the base program and does nothing to the project files. did you go into your slots folder and erase the slots? that is where the lockfiles are located. be sure to complete all your current running tasks first before deleting. I run boinc off of a different partition than C, perhaps you can complete your current work and then install on a different partition and see if that takes care of the problem. after I did the slot clean up on my system everything worked ok. Greg be |
Message boards :
Number crunching :
Minirosetta v1.45 bug thread
©2025 University of Washington
https://www.bakerlab.org