Problems with Minirosetta Version 1.71

Message boards : Number crunching : Problems with Minirosetta Version 1.71

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 61614 - Posted: 8 Jun 2009, 6:46:56 UTC

A variety of errors:

lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_731_0
lb_thread_control_hb_t297__IGNORE_THE_REST_12685_817_0
lb_thread_control_hb_t297__IGNORE_THE_REST_12685_741_1
lb_thread_all_multi_hb_t312__IGNORE_THE_REST_12726_551_0
lb_thread_all_multi_hb_t286__IGNORE_THE_REST_12715_228_1

The errors include no templates to:
interpolate rotamers bin out of range: SER_p:NtermProteinFull 0 nan nan nan
3 3 10 11 2147483649 22 0 nan

first time I saw a "Not a Number" error ...
ID: 61614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 61623 - Posted: 8 Jun 2009, 15:11:25 UTC - in response to Message 61614.  

Had a sudden burst of faults here:

257095553
257150714
257151825

Value out of legal range and no template provided errors.

ID: 61623 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 61626 - Posted: 9 Jun 2009, 2:04:49 UTC

Several errors (Mac)

Task 257226638 failed after 7hrs (my run time preference is 3hrs: I haven't seen other tasks overrun like this) with an Hbond tripped

Hbond tripped: [2009- 6- 8 4:54:59:]
BOINC:: CPU time: 25400.8s, 14400s + 10800s[2009- 6- 8 12:44:44:] :: BOINC
InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 25400.8 cpu seconds
This process generated 1 decoys from 1 attempts

-----

Task 257182103 failed, also with an Hbond tripped but in a different way and much earlier after anout 12 minutes

Hbond tripped: [2009- 6- 8 1:42:17:]
interpolate rotamers bin out of range: SER_p:NtermProteinFull 0 nan nan nan
3 3 10 11 2147483649 22 0 nan
ERROR:: Exit from: src/core/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 589
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

-----

ANd [url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257133352] 257133352 failed in a similar way


ID: 61626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[SG] ronaldo

Send message
Joined: 18 Mar 07
Posts: 1
Credit: 2,000,373
RAC: 0
Message 61632 - Posted: 9 Jun 2009, 11:35:19 UTC

have also some errors

Task ID 257076984
Task ID 257061423
ID: 61632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,841,722
RAC: 1,590
Message 61636 - Posted: 9 Jun 2009, 14:20:31 UTC

Looks like 1.71 still has the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.
ID: 61636 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Starfire

Send message
Joined: 1 Jan 06
Posts: 2
Credit: 301,905
RAC: 0
Message 61638 - Posted: 9 Jun 2009, 16:59:39 UTC
Last modified: 9 Jun 2009, 17:01:10 UTC

I've also run into some WUs that errored out:

First type of error:
234647436

Second type of error:
234660726
234660731

Currently I've 2 more WUs running that show the same behavior in the application graphics like the 2 above:
234626282
234608536




Both have already errored out fore someone else. Should I abort them?
Starfire

ID: 61638 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61639 - Posted: 9 Jun 2009, 17:21:05 UTC - in response to Message 61636.  

Looks like 1.71 still has the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting
Rosetta Moderator: Mod.Sense
ID: 61639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[ESL Brigade] marcsen

Send message
Joined: 24 May 09
Posts: 1
Credit: 96,858
RAC: 0
Message 61643 - Posted: 9 Jun 2009, 18:15:40 UTC
Last modified: 9 Jun 2009, 18:17:27 UTC

I have also errors in all "lb_thread_all_multi..." WorkUnits after ~7 hours of crunching on 2 different computers.
https://boinc.bakerlab.org/rosetta/result.php?resultid=257123602
https://boinc.bakerlab.org/rosetta/result.php?resultid=257088113
https://boinc.bakerlab.org/rosetta/result.php?resultid=257122604
https://boinc.bakerlab.org/rosetta/result.php?resultid=257121758

I will abort this sort of WorkUnits now when i see any of this in the task-list.
ID: 61643 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gazzawazza

Send message
Joined: 4 May 07
Posts: 28
Credit: 297,648
RAC: 0
Message 61646 - Posted: 9 Jun 2009, 18:44:18 UTC

hi all.

I've been having problems with BOINC 6.6.31 (running as a service) and Rosetta 1.71.

Am running on vista home premium sp2 (32 bit) on a stock clock-speed Q6600.

Please review my thread for the detail:

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4933

In summary though, I've had file size mismatches (when resetting project and downloading core rosetta files again) but think I fixed that by making sure network activity is always available.

Have been getting loads of task restarts, "exited with zero status but no 'finished' file" messages & recommendations to reset the project.

Also, the minirosetta 1.71.exe gets locked (sometimes with the process stuck in memory), even after exiting BOINC (and all other related processes exiting ok too).

Am running other projects (i.e. climateprediction, malariacontrol, world community grid) with no problems.

Finally, for the record, I have one rosetta WU that has run ok, with no computation errors, no restarts, etc. All the rest have been problematic:

"lr5_D_chbond_05_run2_rlbn_1u5z_SAVE_ALL_OUT_NATIVE_NOCON_12601_182_1".


Regards,

Gary
ID: 61646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile PinkPenguin

Send message
Joined: 26 Apr 09
Posts: 5
Credit: 280,676
RAC: 0
Message 61647 - Posted: 9 Jun 2009, 20:43:50 UTC

Ok, seems like I am experiencing the same problem with mini rosetta 1.71 on both Linux a Windows Vista boxes. This seems to apply to all lb_thread_all_multi... work units which give a -161 error on file transfer at the end of the Job which appears to have completed OK.

Pentium 4 - Linux (Fedora Core 9) - BOINC 6.4.7
https://boinc.bakerlab.org/rosetta/result.php?resultid=257118194
https://boinc.bakerlab.org/rosetta/result.php?resultid=257399741

Pentium Core Duo - Windows Vista - BOINC 6.6.31
https://boinc.bakerlab.org/rosetta/result.php?resultid=257114975

This is the error message in the output for all three examples above:
<file_xfer_error>
  <file_name>lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0</file_name>
  <error_code>-161</error_code>
</file_xfer_error>


.... I suspect a bug - anyway the result is a "client error" and a depressed PC which I am having to persuade to keep crunching and, anyway, it should at least get a few points for having done it's best.... the things one has to do to persuade these things to do some work !

All the best,
Richard
ID: 61647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mike*

Send message
Joined: 16 Feb 09
Posts: 5
Credit: 102,030
RAC: 0
Message 61650 - Posted: 9 Jun 2009, 22:40:40 UTC
Last modified: 9 Jun 2009, 22:42:25 UTC

I have had 5 wu all error with the same result as this one:
<file_xfer_error>
<file_name>lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_447_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

All were also lb_thread_all_multi...

One was even reprocessed by another host and IT also has the same error.

I currently have 7 successful WUs, with 6 left in cache, 3 started.

Host is 1077338 (core i7, vista 64 ultimate, 12g memory)

Mike
ID: 61650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,841,722
RAC: 1,590
Message 61651 - Posted: 10 Jun 2009, 0:46:14 UTC - in response to Message 61639.  

Looks like 1.71 still has the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting


I hadn't known about that request before, but I may have a situation ready to try it now.
ID: 61651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 24
Message 61654 - Posted: 10 Jun 2009, 3:12:45 UTC
Last modified: 10 Jun 2009, 3:13:43 UTC

Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit?
Have a crunching good day!!
ID: 61654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61656 - Posted: 10 Jun 2009, 11:47:51 UTC - in response to Message 61654.  

Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit?


Looks like the nightly credit granting script (which finds errors and gives credit for them) gave it credit. But you have to open the specific task to see it. It doesn't show on the task list when granted by the script.
Rosetta Moderator: Mod.Sense
ID: 61656 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,841,722
RAC: 1,590
Message 61657 - Posted: 10 Jun 2009, 12:07:30 UTC - in response to Message 61651.  

Looks like 1.71 still has the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting


I hadn't known about that request before, but I may have a situation ready to try it now.


The email address for sending the results to does not work from my address, and my BOINC directory does not contain any of the files asked for. Does the email address work from your location?

Under BOINC 6.2.28 installed to let all users use it under 32-bit Vista SP2, what is the standard name of the directory containing the files asked for and are such files wanted for all subdirectories or just that first directory level?

Under that BOINC, where is the slots subdirectory? A files search was unable to find it.
ID: 61657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61658 - Posted: 10 Jun 2009, 12:39:06 UTC
Last modified: 10 Jun 2009, 12:40:11 UTC

Robert, keep in mind that BOINC has two main directories now. One for the BOINC Manager, and one for the "data directory" where the projects and slots reside (and they could be the same).

I believe the files they are looking for are located in the data directory. The one just OVER the projects and slots subdirectories. And they start with "std".

You could EMail me the files and I could try to forward for you if the EMail address still isn't working for you.
Rosetta Moderator: Mod.Sense
ID: 61658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nick n
Avatar

Send message
Joined: 26 Aug 07
Posts: 49
Credit: 219,102
RAC: 0
Message 61660 - Posted: 10 Jun 2009, 17:07:58 UTC
Last modified: 10 Jun 2009, 17:13:08 UTC

I guess when it rains it pours here too.

https://boinc.bakerlab.org/rosetta/result.php?resultid=257820669
https://boinc.bakerlab.org/rosetta/result.php?resultid=257819658
https://boinc.bakerlab.org/rosetta/result.php?resultid=257812025
https://boinc.bakerlab.org/rosetta/result.php?resultid=257786791
https://boinc.bakerlab.org/rosetta/result.php?resultid=257682526
https://boinc.bakerlab.org/rosetta/result.php?resultid=257652583
https://boinc.bakerlab.org/rosetta/result.php?resultid=257238981
https://boinc.bakerlab.org/rosetta/result.php?resultid=257148875
https://boinc.bakerlab.org/rosetta/result.php?resultid=257098736
etc...... O and also how do you hyperlink stuff so you can just click on the links above?
ID: 61660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Starfire

Send message
Joined: 1 Jan 06
Posts: 2
Credit: 301,905
RAC: 0
Message 61661 - Posted: 10 Jun 2009, 17:22:41 UTC - in response to Message 61660.  

O and also how do you hyperlink stuff so you can just click on the links above?


Hi,

take a look at this page.

Basically you have to write it like this (without the *):

[*url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257820669]Task 257820669[*/url]
[*url=https://boinc.bakerlab.org/rosetta/result.php?resultid=257819658]Task 257819658[*/url]

Starfire

ID: 61661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 24
Message 61663 - Posted: 10 Jun 2009, 21:12:00 UTC - in response to Message 61656.  


Looks like the nightly credit granting script (which finds errors and gives credit for them) gave it credit. But you have to open the specific task to see it. It doesn't show on the task list when granted by the script.

Result id 257383313 was granted credit thanks. As you can see the top 2 tasks in the list has --- under the Granted Credit column.257383314 & 257383313 have been granted credit by the overnight script, credit hasn't been shown under the Granted Credit column. Is there a reason for this? Thanks in advance.
Have a crunching good day!!
ID: 61663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile PinkPenguin

Send message
Joined: 26 Apr 09
Posts: 5
Credit: 280,676
RAC: 0
Message 61664 - Posted: 10 Jun 2009, 22:23:47 UTC

Regarding the problems with the lb_thread_all_multi.... work units returning a -161 error code on file transfer at the end of the work unit.

I should note that several people have signaled the problem and that the same error occurs on different machines both Linux and Windows. It also occurs on both runs of the same work unit by different people. This seems to indicate a general problem with this type of WU rather than an isolated client error (for example due to anti-virus activity as suggested in the past).

The -161 error code appears to be given because there is no output file to send back. Here is the log message from stdoutdae.txt on windows:
09-Jun-2009 04:47:13 [rosetta@home] Computation for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 finished
09-Jun-2009 04:47:13 [rosetta@home] Output file lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0 for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 absent


For examples se previous messages:
Message 61638 (2nd type of error).
Message 61647
Message 61650
ID: 61664 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problems with Minirosetta Version 1.71



©2024 University of Washington
https://www.bakerlab.org