minirosetta v1.19 bug thread

Message boards : Number crunching : minirosetta v1.19 bug thread

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,729
RAC: 2,242
Message 53118 - Posted: 18 May 2008, 3:48:05 UTC
Last modified: 18 May 2008, 3:48:26 UTC

huge debug dump on this task: rb_05_16_11639_20372_T0405_IGNORE_THE_REST_08_11_3323_227_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=164159635

it completed most of its computing before hitting a big error: -1073741819 (0xc0000005)

CPU time 8536.469
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005321B4 read attempt to address 0x3FC662A7

Engaging BOINC Windows Runtime Debugger...
ID: 53118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 53119 - Posted: 18 May 2008, 5:19:03 UTC

rb_05_16_11639_20371_T0405_IGNORE_THE_REST_05_11_3322_57_0



<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
======================================================
DONE :: 1 starting structures 9119.05 cpu seconds
This process generated 4 decoys from 4 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>rb_05_16_11639_20371_T0405_IGNORE_THE_REST_05_11_3322_57_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Validate state Invalid
Claimed credit 23.6733820084193
Granted credit 0
application version 1.19
ID: 53119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 53123 - Posted: 18 May 2008, 9:33:37 UTC

Running Ubuntu 7.10 x86 this task:
1opd__BOINC_ABRELAX_SAVE_ALL_OUT_IGNORE_THE_REST-S25-11-S3-4--1opd_-_3252_12
ended with a validate error for me after 11,727.76 seconds and ended successfully on the second run after 7,642.63 seconds running on Windows XP Professional Edition.
I've switched my computer off while this WU was running (no issues with that before).

Have a nice day,
Path7.
ID: 53123 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,729
RAC: 2,242
Message 53177 - Posted: 19 May 2008, 20:09:09 UTC

another long and scary debug thread here.
i think it has to do with I was trying to install a usb card reader that caused the system to go nuts.

its here you can read the post mortom

rb_05_16_11639_20372_T0405_IGNORE_THE_REST_06_11_3323_425_0
ID: 53177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scott McInness

Send message
Joined: 15 Mar 08
Posts: 1
Credit: 393,032
RAC: 0
Message 53197 - Posted: 20 May 2008, 14:59:01 UTC
Last modified: 20 May 2008, 15:00:35 UTC

I've just updated BOINC on a PC that I haven't used for BOINC for about 12 months (wow, there's an x64 version now!) and every work unit initiated with mini 1.19 x86_64 crashes after less than a second. It also seems to run as a 32-bit process...

165005856 - Access Violation (0xc0000005) at address 0x73010175 read attempt to address 0x73010175
165012983 - Access Violation (0xc0000005) at address 0x73010175 read attempt to address 0x73010175
165017550 - Access Violation (0xc0000005) at address 0x73010175 read attempt to address 0x73010175
165018958 - Access Violation (0xc0000005) at address 0x73010175 read attempt to address 0x73010175
165019984 - Access Violation (0xc0000005) at address 0x73010175 read attempt to address 0x73010175

There is a Rosetta Beta 5.96 x86_64 task running atm (which is also running as a 32-bit process) just on 13% without problem, and SETI tasks (32-bit only) seem to work too.
ID: 53197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 53204 - Posted: 20 May 2008, 22:11:57 UTC

This WU was marked "invalid" despite having a completely normal looking stderr.
ID: 53204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 20
Message 53205 - Posted: 20 May 2008, 22:33:50 UTC

Task ID 164939423
Name rb_05_19_11641_20436_T0407_IGNORE_THE_REST_04_16_3332_224_0 had a Compute error

CPU time 4351.235
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
======================================================
DONE :: 1 starting structures 4351.19 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>rb_05_19_11641_20436_T0407_IGNORE_THE_REST_04_16_3332_224_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Validate state Invalid
Claimed credit 17.070482774889
Granted credit 0
application version 1.19

Is this a goast wu?
Cheers
Speedy
Have a crunching good day!!
ID: 53205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 53213 - Posted: 21 May 2008, 0:43:27 UTC - in response to Message 53204.  

This WU was marked "invalid" despite having a completely normal looking stderr.

Bad luck, I suppose it was because of the wrong WU settings:
minimum quorum: 1
initial replication: 1
max # of error/total/success tasks: 1, 2, 1
errors: Too many error results Cancelled


IMHO you should not have got the task resent after your wingman failed - a task born to be cancelled?

Devs?

Peter
ID: 53213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 20
Message 53219 - Posted: 21 May 2008, 8:00:30 UTC

This is not a bug. I was wondering are there any plans to display what model the work unit is up to? Thanks for your hard work on this application on behalf of all of the cruncher's.
Cheers
Speedy

Have a crunching good day!!
ID: 53219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile nouqraz

Send message
Joined: 8 Apr 08
Posts: 6
Credit: 328,006
RAC: 52
Message 53225 - Posted: 21 May 2008, 12:05:54 UTC
Last modified: 21 May 2008, 12:13:26 UTC

One of my systems seems to be having issues runing minirosetta v1.19 WUs. It is a 4 processor Intel Xeon CPU X3210 (two dual core chips) running server 2003 R2.

It seems to be crunching through Rosetta Beta 5.96 WUs no problem, but when it goes to start a mini 1.19 WU, it switches the task to "running" but CPU time is ever used and the task stays at 0%. If I suspend all of the mini 1.19 WUs that are queued up the system immediately begins crunching on any Rosetta Beta 5.96 WUs without any problem. I have left the system sitting in the "running" @ 0% state on mini units for hours and it hasn't gotten anywhere, my only option seems to be to suspend or abort the work units.

I have two other machines - one an Intel P4, the other a Core 2 Quad 9300, both running XP - that seem to have no problems running mini or beta WUs.

Is it possible to get the client to not receive mini WUs? Or is there some known reason behind these stalled work units that there is a workaround for?

Thanks,
Adam
ID: 53225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jeremy

Send message
Joined: 15 May 08
Posts: 13
Credit: 2,636
RAC: 0
Message 53248 - Posted: 21 May 2008, 20:34:42 UTC - in response to Message 53225.  

I have had nothing but Compute errors with the mini version of rosetta. See this page
https://boinc.bakerlab.org/rosetta/results.php?userid=259031

I'd rather only have the normal ones for 2 reasons. One it keeps giving errors so the cpu time isn't putt to use. It doesn't have propper grafics, but I've read that that is not a priority.

I'd like to help debugging this application by sending whatever information you need.

Here is my host sheet.
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=812509
ID: 53248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,729
RAC: 2,242
Message 53294 - Posted: 23 May 2008, 9:37:05 UTC

5croA_BOINC_ABRELAX_SAVE_ALL_OUT_IGNORE_THE_REST-S25-7-S3-6--5croA-_3325_1_0 crashed and burned in a compute error.

Long error dump yet again.

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

ID: 53294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,729
RAC: 2,242
Message 53351 - Posted: 26 May 2008, 13:44:56 UTC
Last modified: 26 May 2008, 13:47:09 UTC

another one:
h001__BOINC_ABRELAX_IGNORE_THE_REST-S25-11-S3-5--h001_-_3324_45140_0
Client error
Client state Done
Exit status -1073741819 (0xc0000005)
Computer ID 293392
Report deadline 30 May 2008 19:09:52 UTC
CPU time 19774.5
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005C3030 write attempt to address 0x00000004

Engaging BOINC Windows Runtime Debugger...


it did grant me credit amazing enough
ID: 53351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 53726 - Posted: 16 Jun 2008, 19:38:54 UTC - in response to Message 52916.  

It should not. Which client, 5.10.45?

yes, 5.10.45

The "crash on project detach" bug should be fixed in next 6.2 release (changeset [trac]changeset:15407[/trac]).

Peter
ID: 53726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : minirosetta v1.19 bug thread



©2024 University of Washington
https://www.bakerlab.org