Problems with version 5.96

Message boards : Number crunching : Problems with version 5.96

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
lrs5

Send message
Joined: 2 Mar 08
Posts: 1
Credit: 1,304,464
RAC: 0
Message 52836 - Posted: 2 May 2008, 13:41:39 UTC

Came into work this morning and saw that Rosetta had crashed. Running Vista SP1.

The traceback was:

rosetta_beta_5.96_windows_intelx86.exe!00c50364()
[Frames below may be incorrect and/or missing, no symbols loaded for rosetta_beta_5.96_windows_intelx86.exe]
kernel32.dll!_HeapFree@12() + 0x14 bytes
rosetta_beta_5.96_windows_intelx86.exe!00c41a59()
rosetta_beta_5.96_windows_intelx86.exe!00402192()
rosetta_beta_5.96_windows_intelx86.exe!00b985f8()
kernel32.dll!_GetThreadContext@8()
ntdll.dll!@RtlpLowFragHeapAllocFromContext@8() + 0x125 bytes
ntdll.dll!_RtlAllocateHeap@12() + 0xaf bytes
528b5850()

The error was an addressing exception, trying to reference location 0x80000000.

The registers were:

EAX = 80000000 EBX = 00DE6619 ECX = 7FFFFFFE EDX = 774F9A73 ESI = 00000000 EDI = 80000000 EIP = 00C50364 ESP = 022FF718 EBP = 022FF7A4 EFL = 00010202

The code around the error was:

00C50336 8B 4D DC mov ecx,dword ptr [ebp-24h]
00C50339 C6 01 30 mov byte ptr [ecx],30h
00C5033C 40 inc eax
00C5033D EB 32 jmp 00C50371
00C5033F 49 dec ecx
00C50340 66 39 30 cmp word ptr [eax],si
00C50343 74 06 je 00C5034B
00C50345 40 inc eax
00C50346 40 inc eax
00C50347 3B CE cmp ecx,esi
00C50349 75 F4 jne 00C5033F
00C5034B 2B 45 DC sub eax,dword ptr [ebp-24h]
00C5034E D1 F8 sar eax,1
00C50350 EB 1F jmp 00C50371
00C50352 3B FE cmp edi,esi
00C50354 75 08 jne 00C5035E
00C50356 A1 A8 4E F6 00 mov eax,dword ptr ds:[00F64EA8h]
00C5035B 89 45 DC mov dword ptr [ebp-24h],eax
00C5035E 8B 45 DC mov eax,dword ptr [ebp-24h]
00C50361 EB 07 jmp 00C5036A
00C50363 49 dec ecx
>>> 00C50364 80 38 00 cmp byte ptr [eax],0
00C50367 74 05 je 00C5036E
00C50369 40 inc eax
00C5036A 3B CE cmp ecx,esi
00C5036C 75 F5 jne 00C50363
00C5036E 2B 45 DC sub eax,dword ptr [ebp-24h]
00C50371 89 45 D8 mov dword ptr [ebp-28h],eax
00C50374 83 7D B0 00 cmp dword ptr [ebp-50h],0
00C50378 0F 85 FB 00 00 00 jne 00C50479
00C5037E 8B 45 E8 mov eax,dword ptr [ebp-18h]
00C50381 A8 40 test al,40h
00C50383 74 25 je 00C503AA
00C50385 66 A9 00 01 test ax,100h
00C50389 74 06 je 00C50391
00C5038B C6 45 C8 2D mov byte ptr [ebp-38h],2Dh
00C5038F EB 12 jmp 00C503A3
00C50391 A8 01 test al,1
00C50393 74 06 je 00C5039B
00C50395 C6 45 C8 2B mov byte ptr [ebp-38h],2Bh
00C50399 EB 08 jmp 00C503A3
00C5039B A8 02 test al,2
00C5039D 74 0B je 00C503AA
00C5039F C6 45 C8 20 mov byte ptr [ebp-38h],20h
00C503A3 C7 45 C4 01 00 00 00 mov dword ptr [ebp-3Ch],1


The data around the registers was:

*EBX =
0x00DE6619 4b 65 72 6e 65 6c 20 54 69 6d 65 3a 20 25 66 2c 20 55 73 65 72 20 54 69 6d 65 3a 20 25 66 2c 20 Kernel Time: %f, User Time: %f,
0x00DE6639 57 61 69 74 20 54 69 6d 65 3a 20 25 66 0a 0a 00 00 00 00 50 72 69 6f 72 69 74 79 3a 20 00 00 42 Wait Time: %f......Priority: ..B
0x00DE6659 61 73 65 20 50 72 69 6f 72 69 74 79 3a 20 00 57 61 69 74 20 52 65 61 73 6f 6e 3a 20 00 00 00 4f ase Priority: .Wait Reason: ...O
0x00DE6679 70 65 6e 54 68 72 65 61 64 00 00 54 68 72 65 61 64 33 32 4e 65 78 74 00 00 00 00 54 68 72 65 61 penThread..Thread32Next....Threa
0x00DE6699 64 33 32 46 69 72 73 74 00 00 00 43 72 65 61 74 65 54 6f 6f 6c 68 65 6c 70 33 32 53 6e 61 70 73 d32First...CreateToolhelp32Snaps


*EDX =
0x774F9A73 e5 5d c3 8b ff 89 44 24 04 89 5c 24 08 e9 e9 49 fe ff 8d a4 24 00 00 00 00 8d 64 24 00 8b d4 0f å]Ã.ÿ.D$..$.ééIþÿ.¤$.....d$..Ô.
0x774F9A93 34 c3 8d a4 24 00 00 00 00 8d 64 24 00 8d 54 24 08 cd 2e c3 90 55 8b ec 8d a4 24 30 fd ff ff 54 4Ã.¤$.....d$..T$.Í.Ã.U.ì.¤$0ýÿÿT
0x774F9AB3 e8 53 01 00 00 8b 55 04 8b 45 08 83 84 24 c4 00 00 00 04 89 50 0c c7 04 24 07 00 01 00 8b cc 6a èS....U..E.ƒ.$Ä.....P.Ç.$.....Ìj
0x774F9AD3 01 51 ff 75 08 e8 6b f1 ff ff 50 e8 02 00 00 00 cc 90 55 8b ec 8d a4 24 e0 fc ff ff 54 e8 16 01 .Qÿu.èkñÿÿPè....Ì.U.ì.¤$àüÿÿTè..
0x774F9AF3 00 00 83 84 24 c4 00 00 00 04 8d 8c 24 d0 02 00 00 8b 45 04 c7 04 24 07 00 01 00 89 41 0c 83 61 ..ƒ.$Ä.....Œ$Ð....E.Ç.$.....A.ƒa

*(ESP - 0x40)
0x022FF6D8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................................
0x022FF6F8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................................
0x022FF718 00 00 00 00 a0 47 f6 00 00 00 00 00 01 00 00 00 54 fa 2f 02 00 00 00 00 07 00 00 00 01 00 00 00 .... Gö.........Tú/.............
0x022FF738 d8 f7 2f 02 00 00 00 00 70 fb 58 02 90 17 38 02 70 cd 4b 02 01 00 00 00 00 00 00 00 00 00 00 00 Ø÷/.....pûX...8.pÍK.............
0x022FF758 00 00 00 00 19 66 de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 66 00 00 00 a0 47 f6 00 .....fÞ.................f... Gö.

*(EBP - 0x40)
0x022FF764 00 00 00 00 00 00 00 00 00 00 00 00 66 00 00 00 a0 47 f6 00 04 fa 2f 02 1a 00 00 00 00 00 00 80 ............f... Gö..ú/........€
0x022FF784 ff ff ff ff 00 00 00 73 00 00 00 00 0f 61 de 00 00 00 00 00 00 00 00 00 00 00 00 00 79 00 00 00 ÿÿÿÿ...s.....aÞ.............y...
0x022FF7A4 92 01 00 00 a0 47 f6 00 54 fa 2f 02 07 00 00 00 bd f9 2f 02 f9 ff ff ff 40 00 00 00 40 00 00 00 ’... Gö.Tú/......ù/.ùÿÿÿ@...@...
0x022FF7C4 00 00 00 00 00 00 00 00 00 00 00 00 29 00 00 00 a0 47 f6 00 54 fa 2f 02 00 00 00 00 0b 00 09 03 ............)... Gö.Tú/.........
0x022FF7E4 00 00 00 00 08 f8 2f 00 00 00 00 00 c0 01 00 00 e0 86 e1 06 01 00 00 00 40 9c e1 06 14 f8 2f 02 .....ø/.....À...à.á.....@œá..ø/.

Hope that helps to track down the problem
ID: 52836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jasong

Send message
Joined: 7 Apr 08
Posts: 1
Credit: 96,122
RAC: 0
Message 52838 - Posted: 2 May 2008, 19:34:42 UTC

I'm BOINC 6.1.16 on Vista 32bit and the rosetta_beta version 5.96 graphics wont close. If I close the graphics window, it closes and then automatically pops back up and starts initilizing again.
ID: 52838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 52846 - Posted: 3 May 2008, 12:17:36 UTC - in response to Message 52831.  



This f1_atpase_beta_relax_31904_9833_1 is still alive and kicking after 45 1/2 hour of computing. The graphics are folding, the steps are increasing (presently at about model 1 step 2435), but it is enjoying the experience too much to complete. I confess my curiosity is roused, but now I have to evaluate the situation...



For those interested in statistics: The first model of this task needed 45 h 52 min to complete. With no checkpointing. I admit my laptop is small and elderly, but I believe that this is no task for unprepared crunchers.
ID: 52846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 52848 - Posted: 3 May 2008, 17:25:38 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=158175460
Validate state Invalid
ID: 52848 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Blacksun
Avatar

Send message
Joined: 2 May 07
Posts: 2
Credit: 1,284,699
RAC: 0
Message 52850 - Posted: 3 May 2008, 18:05:59 UTC

Task ID 160279211 upload stucks at 29,29% all other WU´s are uploading!
ID: 52850 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 52874 - Posted: 5 May 2008, 19:58:28 UTC

Workunit 1wit__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1wit_-vf__2589_100944_0
crashed with Segmentation Violation and has been stuck in this state for the last week. I'm going to abort it now.
Team Helix
ID: 52874 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 53056 - Posted: 14 May 2008, 17:04:42 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=162134741
https://boinc.bakerlab.org/rosetta/result.php?resultid=161404131
https://boinc.bakerlab.org/rosetta/result.php?resultid=161739079

ID: 53056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 53325 - Posted: 25 May 2008, 6:21:41 UTC
Last modified: 25 May 2008, 6:22:09 UTC

OUCH!

Yes boys and girls, this is a 24 hour validate error.

resultid=165956131

I hate it when this happens...
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 53325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 53339 - Posted: 26 May 2008, 3:43:32 UTC

I notice that some WUs are creating stdout.txt files over 10MB. For example:
t397__CASP8_JUMPAB_SAMPLE2_res1to79_SAVE_ALL_OUT_BARCODE_hom001__3500_3605
t397__CASP8_JUMPAB_SAMPLE1_res1to79_SAVE_ALL_OUT_BARCODE_hom001__3499_1937

The stdout.txt file contained a lot of lines like:

...
res 11 and var 1 at position 1 is not a proper Nterm variant
res 11 and var 1 at position 1 is not a proper Nterm variant
res 11 and var 1 at position 1 is not a proper Nterm variant
...

The WU t397__CASP8_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_1to69_hom001__3343_21413
had lines like the above, but also had lines like:

...
res 13and var 1 at position 69 is not a proper Cterm variant
res 13and var 1 at position 69 is not a proper Cterm variant
res 13and var 1 at position 69 is not a proper Cterm variant
...

and sometimes it interleaved them:

...
res 11 and var 1 at position 1 is not a proper Nterm variant
res 13and var 1 at position 69 is not a proper Cterm variant
res 11 and var 1 at position 1 is not a proper Nterm variant
res 13and var 1 at position 69 is not a proper Cterm variant
res 11 and var 1 at position 1 is not a proper Nterm variant
res 13and var 1 at position 69 is not a proper Cterm variant
...

The WUs seem to have finished normally, as far as I can tell.
ID: 53339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,263,150
RAC: 87
Message 53341 - Posted: 26 May 2008, 4:01:16 UTC

Back from May 19 (I've been busy) I found today that I have 3 little tasks that never finished:

05/19/08 task 164834608 WU 150659671 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it01_3335_370_0

05/19/08 task 164834608 WU 150659671 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it19_3335_672

05/19/08 TASK 164837987 WU 150662996 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it13_3335_525

They're all sitting in my task list labeled "In progress." They aren't running on my computer; I have no idea what happened to them. They all had a reporting deadline of 5/29. Any idea what's going on?

--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 53341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53350 - Posted: 26 May 2008, 13:37:07 UTC

25 may crash and burn with compute error:

Task ID 165078648
Name 1louA_BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1louA-vf__2568_110106_0
Workunit 150889023
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
13521.31
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
trouble finding Rama_smooth_dyn.dat_ss_6.4
ERROR:: Exit from: .read_paths.cc line: 360

</stderr_txt>
]]>



ID: 53350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile caesar1987
Avatar

Send message
Joined: 28 Nov 06
Posts: 13
Credit: 22,268
RAC: 0
Message 53418 - Posted: 28 May 2008, 18:25:49 UTC

error

https://boinc.bakerlab.org/rosetta/result.php?resultid=167013698
ID: 53418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
GT82 [HWU]

Send message
Joined: 26 Aug 07
Posts: 15
Credit: 154,103
RAC: 0
Message 53452 - Posted: 30 May 2008, 8:47:50 UTC

I can't crunch any WUs, like you see here --> https://boinc.bakerlab.org/rosetta/results.php?userid=201085

Where is the problem?? :(


ID: 53452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 53458 - Posted: 30 May 2008, 12:36:18 UTC - in response to Message 53452.  

CASP8 file and it generates this error according to your information:
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Can't link input file
</message>

This happened on the other computer that tried to run this file.
something is wrong with that batch of files.

Just keep on crunching, its not your machine.

I can't crunch any WUs, like you see here --> https://boinc.bakerlab.org/rosetta/results.php?userid=201085

Where is the problem?? :(



ID: 53458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 53525 - Posted: 2 Jun 2008, 22:22:55 UTC

I have reason to believe Rosetta is crashing my Fedora 7 box. After looking at the very brief logs, it appears my system spontaneously rebooted immediately after completing and then starting a Rosetta task. Where should I start getting to the bottom of this mystery?
ID: 53525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DevPao

Send message
Joined: 31 May 08
Posts: 1
Credit: 400
RAC: 0
Message 53527 - Posted: 3 Jun 2008, 16:14:56 UTC

I often get a visual studio 2005 popup offering to debug an "unhandled exception" in rosetta 5.96.
Given that the pc runs smooth other boinc matter it's time for the author to debug some of the "compute error" wu, most events are after few minutes of run. Should be fairly easy to reproduce
ID: 53527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 53529 - Posted: 3 Jun 2008, 22:32:01 UTC

Here is an example WU: Mine actually finished OK, so it says. But it CRASHED my Linux machine. This is bad.
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=153288323
ID: 53529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile alpha

Send message
Joined: 4 Nov 06
Posts: 27
Credit: 1,550,107
RAC: 0
Message 53544 - Posted: 5 Jun 2008, 12:23:24 UTC

A strange validate error here:

https://boinc.bakerlab.org/rosetta/result.php?resultid=168411774

"This process generated 930 decoys from 930 attempts"

Yet, it ran the full length of time and got no credit. Any clues as to what happened? I even popped up the graphical interface for this work unit and saw it crunching happily along.
ID: 53544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 53546 - Posted: 5 Jun 2008, 13:50:55 UTC

Here's a couple of FRA_t401_CASP8_2PRV_2ICG_1_IGNORE_THE_RESTt401_1_aaT0401_2ICGA_13_0001_3601_* WUs that had validate errors:

https://boinc.bakerlab.org/rosetta/result.php?resultid=168453002
https://boinc.bakerlab.org/rosetta/result.php?resultid=168377188

They ran for the usual length of time and the stderr looks perfectly normal, yet they were marked "invalid". These WUs ran on two different Linux computers.
ID: 53546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BitSpit
Avatar

Send message
Joined: 5 Nov 05
Posts: 33
Credit: 4,147,344
RAC: 0
Message 53561 - Posted: 5 Jun 2008, 22:41:32 UTC
Last modified: 5 Jun 2008, 22:43:10 UTC

ID: 53561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Problems with version 5.96



©2024 University of Washington
https://www.bakerlab.org