Message boards : Number crunching : Problems with Minirosetta Version 1.71
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Divide Overflow Send message Joined: 17 Sep 05 Posts: 82 Credit: 921,382 RAC: 0 |
|
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Several errors (Mac) Task 257226638 failed after 7hrs (my run time preference is 3hrs: I haven't seen other tasks overrun like this) with an Hbond tripped Hbond tripped: [2009- 6- 8 4:54:59:] BOINC:: CPU time: 25400.8s, 14400s + 10800s[2009- 6- 8 12:44:44:] :: BOINC InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ) ====================================================== DONE :: 1 starting structures 25400.8 cpu seconds This process generated 1 decoys from 1 attempts ----- Task 257182103 failed, also with an Hbond tripped but in a different way and much earlier after anout 12 minutes Hbond tripped: [2009- 6- 8 1:42:17:] interpolate rotamers bin out of range: SER_p:NtermProteinFull 0 nan nan nan 3 3 10 11 2147483649 22 0 nan ERROR:: Exit from: src/core/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 589 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish ----- ANd [url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257133352] 257133352 failed in a similar way |
[SG] ronaldo Send message Joined: 18 Mar 07 Posts: 1 Credit: 2,000,373 RAC: 0 |
|
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Looks like 1.71 still has the lockfile problem: https://boinc.bakerlab.org/rosetta/result.php?resultid=257487687 This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem. |
Starfire Send message Joined: 1 Jan 06 Posts: 2 Credit: 301,905 RAC: 0 |
I've also run into some WUs that errored out: First type of error: 234647436 Second type of error: 234660726 234660731 Currently I've 2 more WUs running that show the same behavior in the application graphics like the 2 above: 234626282 234608536 Both have already errored out fore someone else. Should I abort them? Starfire |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like 1.71 still has the lockfile problem: Robert, have you been able to capture any of the information requested in the wiki here? http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting Rosetta Moderator: Mod.Sense |
[ESL Brigade] marcsen Send message Joined: 24 May 09 Posts: 1 Credit: 96,858 RAC: 0 |
I have also errors in all "lb_thread_all_multi..." WorkUnits after ~7 hours of crunching on 2 different computers. https://boinc.bakerlab.org/rosetta/result.php?resultid=257123602 https://boinc.bakerlab.org/rosetta/result.php?resultid=257088113 https://boinc.bakerlab.org/rosetta/result.php?resultid=257122604 https://boinc.bakerlab.org/rosetta/result.php?resultid=257121758 I will abort this sort of WorkUnits now when i see any of this in the task-list. |
gazzawazza Send message Joined: 4 May 07 Posts: 28 Credit: 297,648 RAC: 0 |
hi all. I've been having problems with BOINC 6.6.31 (running as a service) and Rosetta 1.71. Am running on vista home premium sp2 (32 bit) on a stock clock-speed Q6600. Please review my thread for the detail: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4933 In summary though, I've had file size mismatches (when resetting project and downloading core rosetta files again) but think I fixed that by making sure network activity is always available. Have been getting loads of task restarts, "exited with zero status but no 'finished' file" messages & recommendations to reset the project. Also, the minirosetta 1.71.exe gets locked (sometimes with the process stuck in memory), even after exiting BOINC (and all other related processes exiting ok too). Am running other projects (i.e. climateprediction, malariacontrol, world community grid) with no problems. Finally, for the record, I have one rosetta WU that has run ok, with no computation errors, no restarts, etc. All the rest have been problematic: "lr5_D_chbond_05_run2_rlbn_1u5z_SAVE_ALL_OUT_NATIVE_NOCON_12601_182_1". Regards, Gary |
PinkPenguin Send message Joined: 26 Apr 09 Posts: 5 Credit: 280,676 RAC: 0 |
Ok, seems like I am experiencing the same problem with mini rosetta 1.71 on both Linux a Windows Vista boxes. This seems to apply to all lb_thread_all_multi... work units which give a -161 error on file transfer at the end of the Job which appears to have completed OK. Pentium 4 - Linux (Fedora Core 9) - BOINC 6.4.7 https://boinc.bakerlab.org/rosetta/result.php?resultid=257118194 https://boinc.bakerlab.org/rosetta/result.php?resultid=257399741 Pentium Core Duo - Windows Vista - BOINC 6.6.31 https://boinc.bakerlab.org/rosetta/result.php?resultid=257114975 This is the error message in the output for all three examples above: <file_xfer_error> <file_name>lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> .... I suspect a bug - anyway the result is a "client error" and a depressed PC which I am having to persuade to keep crunching and, anyway, it should at least get a few points for having done it's best.... the things one has to do to persuade these things to do some work ! All the best, Richard |
Mike* Send message Joined: 16 Feb 09 Posts: 5 Credit: 102,030 RAC: 0 |
I have had 5 wu all error with the same result as this one: <file_xfer_error> <file_name>lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_447_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> All were also lb_thread_all_multi... One was even reprocessed by another host and IT also has the same error. I currently have 7 successful WUs, with 6 left in cache, 3 started. Host is 1077338 (core i7, vista 64 ultimate, 12g memory) Mike |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Looks like 1.71 still has the lockfile problem: I hadn't known about that request before, but I may have a situation ready to try it now. |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,337 RAC: 0 |
Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit? Have a crunching good day!! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit? Looks like the nightly credit granting script (which finds errors and gives credit for them) gave it credit. But you have to open the specific task to see it. It doesn't show on the task list when granted by the script. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Looks like 1.71 still has the lockfile problem: The email address for sending the results to does not work from my address, and my BOINC directory does not contain any of the files asked for. Does the email address work from your location? Under BOINC 6.2.28 installed to let all users use it under 32-bit Vista SP2, what is the standard name of the directory containing the files asked for and are such files wanted for all subdirectories or just that first directory level? Under that BOINC, where is the slots subdirectory? A files search was unable to find it. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Robert, keep in mind that BOINC has two main directories now. One for the BOINC Manager, and one for the "data directory" where the projects and slots reside (and they could be the same). I believe the files they are looking for are located in the data directory. The one just OVER the projects and slots subdirectories. And they start with "std". You could EMail me the files and I could try to forward for you if the EMail address still isn't working for you. Rosetta Moderator: Mod.Sense |
nick n Send message Joined: 26 Aug 07 Posts: 49 Credit: 219,102 RAC: 0 |
I guess when it rains it pours here too. https://boinc.bakerlab.org/rosetta/result.php?resultid=257820669 https://boinc.bakerlab.org/rosetta/result.php?resultid=257819658 https://boinc.bakerlab.org/rosetta/result.php?resultid=257812025 https://boinc.bakerlab.org/rosetta/result.php?resultid=257786791 https://boinc.bakerlab.org/rosetta/result.php?resultid=257682526 https://boinc.bakerlab.org/rosetta/result.php?resultid=257652583 https://boinc.bakerlab.org/rosetta/result.php?resultid=257238981 https://boinc.bakerlab.org/rosetta/result.php?resultid=257148875 https://boinc.bakerlab.org/rosetta/result.php?resultid=257098736 etc...... O and also how do you hyperlink stuff so you can just click on the links above? |
Starfire Send message Joined: 1 Jan 06 Posts: 2 Credit: 301,905 RAC: 0 |
O and also how do you hyperlink stuff so you can just click on the links above? Hi, take a look at this page. Basically you have to write it like this (without the *): [*url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257820669]Task 257820669[*/url] [*url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257819658]Task 257819658[*/url] Starfire |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,337 RAC: 0 |
Result id 257383313 was granted credit thanks. As you can see the top 2 tasks in the list has --- under the Granted Credit column.257383314 & 257383313 have been granted credit by the overnight script, credit hasn't been shown under the Granted Credit column. Is there a reason for this? Thanks in advance. Have a crunching good day!! |
PinkPenguin Send message Joined: 26 Apr 09 Posts: 5 Credit: 280,676 RAC: 0 |
Regarding the problems with the lb_thread_all_multi.... work units returning a -161 error code on file transfer at the end of the work unit. I should note that several people have signaled the problem and that the same error occurs on different machines both Linux and Windows. It also occurs on both runs of the same work unit by different people. This seems to indicate a general problem with this type of WU rather than an isolated client error (for example due to anti-virus activity as suggested in the past). The -161 error code appears to be given because there is no output file to send back. Here is the log message from stdoutdae.txt on windows: 09-Jun-2009 04:47:13 [rosetta@home] Computation for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 finished 09-Jun-2009 04:47:13 [rosetta@home] Output file lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0 for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 absent For examples se previous messages: Message 61638 (2nd type of error). Message 61647 Message 61650 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Robert, keep in mind that BOINC has two main directories now. One for the BOINC Manager, and one for the "data directory" where the projects and slots reside (and they could be the same). My current problem is that I'm having trouble finding the BOINC data directory, since for this combination of BOINC version and Windows version the frequently written files are in the data directory tree instead of the program-oriented BOINC directory. The Vista SP2 update seems to interfere with using the search function to find a directory if you know the lowest level name in the directory path, but not much more about it. Since the "std" files aren't directories, I'll try searching for them instead and see if that gets around this problem. In case I need to search for a complete filename instead, what is the current full name of the files minirosetta uses as lockfiles? |
Message boards :
Number crunching :
Problems with Minirosetta Version 1.71
©2024 University of Washington
https://www.bakerlab.org