minirosetta v1.19 bug thread

Message boards : Number crunching : minirosetta v1.19 bug thread

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 52876 - Posted: 6 May 2008, 0:37:02 UTC

We have an updated version of minirosetta v1.19 which should fix some of the stability issues with v1.15. Post minirosetta v1.19 bugs here.
ID: 52876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 52900 - Posted: 7 May 2008, 17:55:14 UTC

Here is an access violation error after 68,000+ seconds of CPU time:

Reason: Access Violation (0xc0000005) at address 0x005C3051 write attempt to address 0x00000024

There is a large and detailed debugger message.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 52900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
glaesum

Send message
Joined: 16 Oct 06
Posts: 21
Credit: 508,632
RAC: 0
Message 52910 - Posted: 8 May 2008, 12:54:00 UTC

things must be going pretty well as the thread is so quiet...

good news too with win98 OS - the 1.19 app is running, completing and validating although an error message is still getting thrown up. no idea if this matters or not.

on all three wus completed so far this is the message:

Task ID 161439715
Name score13_hb_envtest62_A_1ctf__3171_14411_0
Workunit 147493846
Received 8 May 2008 11:10:33 UTC
Outcome Success

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
AllocateAndInitializeSid Error 120
failed to create shared mem segment
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 13875.8 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

work unit ID nos are:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147390671
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147405464
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147493846
ID: 52910 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
radu

Send message
Joined: 7 May 08
Posts: 4
Credit: 66,301
RAC: 0
Message 52911 - Posted: 8 May 2008, 13:22:26 UTC
Last modified: 8 May 2008, 13:24:08 UTC

I get a crash when I detach from the project.
I'm not sure if this is a minirosetta bug.
Log messages seem to show that minirosetta was running when the crash occurred.

I'm running Gentoo linux 2.6.24-r7.
boinc-5.10.45

Logs:
08-May-2008 16:07:47 [rosetta@home] Starting task fa_max_dis_9-2vik_-test_2008-5-6_3222_134_0 using minirosetta version 119
08-May-2008 16:09:29 [rosetta@home] Resetting project
08-May-2008 16:09:30 [rosetta@home] Detaching from project
SIGSEGV: segmentation violation
Stack trace (9 frames):
/usr/bin/boinc_client[0x46cbf9]
/lib/libpthread.so.0[0x2aba6d950ed0]
/usr/bin/boinc_client[0x40afec]
/usr/bin/boinc_client[0x43060e]
/usr/bin/boinc_client[0x4310bc]
/usr/bin/boinc_client[0x422319]
/usr/bin/boinc_client[0x4516a4]
/lib/libc.so.6(__libc_start_main+0xf4)[0x2aba6ddfdb74]
/usr/bin/boinc_client(__gxx_personality_v0+0x1b1)[0x4048f9]

Exiting...
ID: 52911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 52913 - Posted: 8 May 2008, 13:56:23 UTC - in response to Message 52911.  

I get a crash when I detach from the project.
I'm not sure if this is a minirosetta bug.
Log messages seem to show that minirosetta was running when the crash occurred.

It is quite possible (and logical IMO) that the client forcibly terminates all related processes upon detach. Otherwise it could not clean up client_state.xml, slots/ and projects/.

Peter
ID: 52913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
radu

Send message
Joined: 7 May 08
Posts: 4
Credit: 66,301
RAC: 0
Message 52914 - Posted: 8 May 2008, 15:32:48 UTC - in response to Message 52913.  
Last modified: 8 May 2008, 15:37:21 UTC

I get a crash when I detach from the project.
I'm not sure if this is a minirosetta bug.
Log messages seem to show that minirosetta was running when the crash occurred.

It is quite possible (and logical IMO) that the client forcibly terminates all related processes upon detach. Otherwise it could not clean up client_state.xml, slots/ and projects/.

Peter

I'm new to BOINC so I don't know how the detach operation is handled.

I don't use the gui manager and boinc_client appears to be the only BOINC related process running:

$ ps -e | grep boinc
6279 ? 00:00:05 boinc_client

Anyway killing related processes should not generate segmentation faults, so it's clearly an error in boinc_client.
I don't know if it has anything to do with minirosetta though.
ID: 52914 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 52915 - Posted: 8 May 2008, 15:42:05 UTC - in response to Message 52914.  

I get a crash when I detach from the project.
I'm not sure if this is a minirosetta bug.
Log messages seem to show that minirosetta was running when the crash occurred.

It is quite possible (and logical IMO) that the client forcibly terminates all related processes upon detach. Otherwise it could not clean up client_state.xml, slots/ and projects/.

I'm new to BOINC so I don't know how the detach operation is handled.

Anyway killing related processes should not generate segmentation faults, so it's clearly an error in boinc_client.

I'm sorry, you are right. I was thinking on Rosetta crashing and omitted that actually the client crashed. Off course it should not. (And actually the application should also exit cleanly if asked to by the client.)

I don't know if it has anything to do with minirosetta though.

It should not. Which client, 5.10.45?

Peter
ID: 52915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
radu

Send message
Joined: 7 May 08
Posts: 4
Credit: 66,301
RAC: 0
Message 52916 - Posted: 8 May 2008, 15:45:30 UTC - in response to Message 52915.  

It should not. Which client, 5.10.45?

yes, 5.10.45
ID: 52916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rob

Send message
Joined: 16 Oct 06
Posts: 3
Credit: 121,375
RAC: 0
Message 52917 - Posted: 8 May 2008, 18:55:53 UTC

Someone forgot to post the Minirosetta 1.19 details on the version thread.
ID: 52917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alexander Klauer

Send message
Joined: 10 Mar 08
Posts: 3
Credit: 110,308
RAC: 0
Message 52933 - Posted: 9 May 2008, 8:35:22 UTC

Hi, I switched off my computer yesterday, in the middle (maybe 60%) of a task. When I switched it back on today, I got

Fri 09 May 2008 09:51:30 AM CEST|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 762923; location: (none); project prefs: default
Fri 09 May 2008 09:51:31 AM CEST|rosetta@home|Restarting task fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0 using minirosetta version 119
Fri 09 May 2008 09:52:00 AM CEST|rosetta@home|Computation for task fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0 finished
Fri 09 May 2008 09:52:01 AM CEST|rosetta@home|Starting lambda_repressor_folding_3191_8370_0
Fri 09 May 2008 09:52:01 AM CEST|rosetta@home|Starting task lambda_repressor_folding_3191_8370_0 using rosetta_beta version 596
Fri 09 May 2008 09:52:03 AM CEST|rosetta@home|Started upload of fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0_0
Fri 09 May 2008 09:52:14 AM CEST|rosetta@home|Finished upload of fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0_0

so the task finished virtually immediately after restart.

When I switched my computer on yesterday morning, I also had some task crunching at 0%. Back then I believed an old task had been restarted from the beginning due to some fluke, but now it seems more likely that the same thing as today has happened. To me, it seems too much of a coincidence of a task interrupted in the middle being finished immediately after resume, twice in a row.
ID: 52933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 52937 - Posted: 9 May 2008, 11:54:47 UTC - in response to Message 52910.  

Really

All access violations


https://boinc.bakerlab.org/rosetta/result.php?resultid=161740698

https://boinc.bakerlab.org/rosetta/result.php?resultid=160201341

https://boinc.bakerlab.org/rosetta/result.php?resultid=159794241

https://boinc.bakerlab.org/rosetta/result.php?resultid=160129454

https://boinc.bakerlab.org/rosetta/result.php?resultid=160185394

https://boinc.bakerlab.org/rosetta/result.php?resultid=161332559

https://boinc.bakerlab.org/rosetta/result.php?resultid=159408171
ID: 52937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 17 Sep 05
Posts: 18
Credit: 40,071
RAC: 0
Message 52961 - Posted: 10 May 2008, 4:10:29 UTC - in response to Message 52937.  
Last modified: 10 May 2008, 4:10:59 UTC


All access violations

https://boinc.bakerlab.org/rosetta/result.php?resultid=161740698
https://boinc.bakerlab.org/rosetta/result.php?resultid=160201341
https://boinc.bakerlab.org/rosetta/result.php?resultid=159794241
https://boinc.bakerlab.org/rosetta/result.php?resultid=160129454
https://boinc.bakerlab.org/rosetta/result.php?resultid=160185394
https://boinc.bakerlab.org/rosetta/result.php?resultid=161332559
https://boinc.bakerlab.org/rosetta/result.php?resultid=159408171


All those crashes are a result of an out of memory error.
----- Rom
My Blog
ID: 52961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 52967 - Posted: 10 May 2008, 6:48:21 UTC
Last modified: 10 May 2008, 6:48:45 UTC

My latest weirdness

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Maximum memory exceeded
</message>
]]>

resultid=161607307


ID: 52967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Quidgydog

Send message
Joined: 28 Sep 06
Posts: 3
Credit: 499,462
RAC: 0
Message 52969 - Posted: 10 May 2008, 8:22:42 UTC
Last modified: 10 May 2008, 8:24:56 UTC

Having exactly the same issue as I was having with the v1.15 WU. WU just sits there, CPU time not running, no progress.

Log file......


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C82A714 read attempt to address 0x00D767E5

Engaging BOINC Windows Runtime Debugger...


I'm detaching this computer until this is resolved.
ID: 52969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 52970 - Posted: 10 May 2008, 9:49:08 UTC - in response to Message 52961.  


All access violations

https://boinc.bakerlab.org/rosetta/result.php?resultid=161740698
https://boinc.bakerlab.org/rosetta/result.php?resultid=160201341
https://boinc.bakerlab.org/rosetta/result.php?resultid=159794241
https://boinc.bakerlab.org/rosetta/result.php?resultid=160129454
https://boinc.bakerlab.org/rosetta/result.php?resultid=160185394
https://boinc.bakerlab.org/rosetta/result.php?resultid=161332559
https://boinc.bakerlab.org/rosetta/result.php?resultid=159408171


All those crashes are a result of an out of memory error.



With 4Gb of memory what do I do to put it right?
ID: 52970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 52972 - Posted: 10 May 2008, 10:09:31 UTC - in response to Message 52970.  
Last modified: 10 May 2008, 10:10:08 UTC

All those crashes are a result of an out of memory error.

With 4Gb of memory what do I do to put it right?

You could once get out of memory with also 64 GB of RAM... (Do you know the sentence about 64 KB of RAM?)

How much pagefile do you have available there? Any other memory load? Like other projects' applications, preempted and waiting in memory? Take occasionally a look into Task Manager, Performance tab - what are the Commit Charge values like? If the Total (or Peak) anytimes reach the Limit, that's it. You're running at least 7 projects on the host, each Rosetta can require up to 600-900 MB, CPDN at least some 200-300 MB, other projects as well something, and it is a quad...

Peter
ID: 52972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 52973 - Posted: 10 May 2008, 10:30:52 UTC - in response to Message 52972.  

All those crashes are a result of an out of memory error.

With 4Gb of memory what do I do to put it right?

You could once get out of memory with also 64 GB of RAM... (Do you know the sentence about 64 KB of RAM?)

How much pagefile do you have available there? Any other memory load? Like other projects' applications, preempted and waiting in memory? Take occasionally a look into Task Manager, Performance tab - what are the Commit Charge values like? If the Total (or Peak) anytimes reach the Limit, that's it. You're running at least 7 projects on the host, each Rosetta can require up to 600-900 MB, CPDN at least some 200-300 MB, other projects as well something, and it is a quad...

Peter



Yes, I understand but my commit charge is a fraction of of my available charge 10% at the moment. I have increased my page file to 6GB with a total memory of 4GB on Win XP Pro 64

It just strikes me that the very kowledgeable Rom is arrogant enough to point to the cause without indicating any sort of a solution.
ID: 52973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile alpha

Send message
Joined: 4 Nov 06
Posts: 27
Credit: 1,550,107
RAC: 0
Message 52974 - Posted: 10 May 2008, 13:58:09 UTC

This work unit finished earlier than expected, but with no errors:

https://boinc.bakerlab.org/rosetta/result.php?resultid=161362748

Claimed 130.48, granted 32.86. :(
ID: 52974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 52977 - Posted: 10 May 2008, 15:04:09 UTC

Fat Loss, I'm guessing that the error is an indication that the task grew to exceed the maximum memory it was configured for, and so was terminated by BOINC. And so, regardless of your machine's physical configuration or % memory used to BOINC etc. etc. it still would have failed. So that would tend to indicate a logic problem in Mini, or perhaps a task that should be created with a higher memory maximum allowed.

We'll have to wait to see what DK finds.
Rosetta Moderator: Mod.Sense
ID: 52977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 17 Sep 05
Posts: 18
Credit: 40,071
RAC: 0
Message 52979 - Posted: 10 May 2008, 15:34:13 UTC - in response to Message 52973.  


It just strikes me that the very kowledgeable Rom is arrogant enough to point to the cause without indicating any sort of a solution.


In this particular case there isn't anything that any of us can do, I've passed the info on to the MiniRosetta devs. Basically MiniRosetta is a 32-bit process, and generally 32-bit processes are limited to 2GB of user-mode memory. MiniRosetta hit that limit and so when it asked for more the OS said NO, leading to the crash.

The sign that this sort of problem has occurred is:
LoadLibraryA( dbghelp.dll ): GetLastError = 8

and
- Virtual Memory Usage -
VirtualSize: 2127511552, PeakVirtualSize: 2127511552


Sorry for not explaining the situation sooner, I was heading for bed and I started thinking about how I was going to help the devs debug this problem in the wild if they are unable to reproduce this issue in the lab.

At present there isn't anything in the BOINC application framework that'll help them debug this in the wild.



----- Rom
My Blog
ID: 52979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : minirosetta v1.19 bug thread



©2024 University of Washington
https://www.bakerlab.org