Message boards : Number crunching : Problems with version 5.96
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Author | Message |
---|---|
lrs5 Send message Joined: 2 Mar 08 Posts: 1 Credit: 1,304,464 RAC: 0 |
Came into work this morning and saw that Rosetta had crashed. Running Vista SP1. The traceback was: rosetta_beta_5.96_windows_intelx86.exe!00c50364() [Frames below may be incorrect and/or missing, no symbols loaded for rosetta_beta_5.96_windows_intelx86.exe] kernel32.dll!_HeapFree@12() + 0x14 bytes rosetta_beta_5.96_windows_intelx86.exe!00c41a59() rosetta_beta_5.96_windows_intelx86.exe!00402192() rosetta_beta_5.96_windows_intelx86.exe!00b985f8() kernel32.dll!_GetThreadContext@8() ntdll.dll!@RtlpLowFragHeapAllocFromContext@8() + 0x125 bytes ntdll.dll!_RtlAllocateHeap@12() + 0xaf bytes 528b5850() The error was an addressing exception, trying to reference location 0x80000000. The registers were: EAX = 80000000 EBX = 00DE6619 ECX = 7FFFFFFE EDX = 774F9A73 ESI = 00000000 EDI = 80000000 EIP = 00C50364 ESP = 022FF718 EBP = 022FF7A4 EFL = 00010202 The code around the error was: 00C50336 8B 4D DC mov ecx,dword ptr [ebp-24h] 00C50339 C6 01 30 mov byte ptr [ecx],30h 00C5033C 40 inc eax 00C5033D EB 32 jmp 00C50371 00C5033F 49 dec ecx 00C50340 66 39 30 cmp word ptr [eax],si 00C50343 74 06 je 00C5034B 00C50345 40 inc eax 00C50346 40 inc eax 00C50347 3B CE cmp ecx,esi 00C50349 75 F4 jne 00C5033F 00C5034B 2B 45 DC sub eax,dword ptr [ebp-24h] 00C5034E D1 F8 sar eax,1 00C50350 EB 1F jmp 00C50371 00C50352 3B FE cmp edi,esi 00C50354 75 08 jne 00C5035E 00C50356 A1 A8 4E F6 00 mov eax,dword ptr ds:[00F64EA8h] 00C5035B 89 45 DC mov dword ptr [ebp-24h],eax 00C5035E 8B 45 DC mov eax,dword ptr [ebp-24h] 00C50361 EB 07 jmp 00C5036A 00C50363 49 dec ecx >>> 00C50364 80 38 00 cmp byte ptr [eax],0 00C50367 74 05 je 00C5036E 00C50369 40 inc eax 00C5036A 3B CE cmp ecx,esi 00C5036C 75 F5 jne 00C50363 00C5036E 2B 45 DC sub eax,dword ptr [ebp-24h] 00C50371 89 45 D8 mov dword ptr [ebp-28h],eax 00C50374 83 7D B0 00 cmp dword ptr [ebp-50h],0 00C50378 0F 85 FB 00 00 00 jne 00C50479 00C5037E 8B 45 E8 mov eax,dword ptr [ebp-18h] 00C50381 A8 40 test al,40h 00C50383 74 25 je 00C503AA 00C50385 66 A9 00 01 test ax,100h 00C50389 74 06 je 00C50391 00C5038B C6 45 C8 2D mov byte ptr [ebp-38h],2Dh 00C5038F EB 12 jmp 00C503A3 00C50391 A8 01 test al,1 00C50393 74 06 je 00C5039B 00C50395 C6 45 C8 2B mov byte ptr [ebp-38h],2Bh 00C50399 EB 08 jmp 00C503A3 00C5039B A8 02 test al,2 00C5039D 74 0B je 00C503AA 00C5039F C6 45 C8 20 mov byte ptr [ebp-38h],20h 00C503A3 C7 45 C4 01 00 00 00 mov dword ptr [ebp-3Ch],1 The data around the registers was: *EBX = 0x00DE6619 4b 65 72 6e 65 6c 20 54 69 6d 65 3a 20 25 66 2c 20 55 73 65 72 20 54 69 6d 65 3a 20 25 66 2c 20 Kernel Time: %f, User Time: %f, 0x00DE6639 57 61 69 74 20 54 69 6d 65 3a 20 25 66 0a 0a 00 00 00 00 50 72 69 6f 72 69 74 79 3a 20 00 00 42 Wait Time: %f......Priority: ..B 0x00DE6659 61 73 65 20 50 72 69 6f 72 69 74 79 3a 20 00 57 61 69 74 20 52 65 61 73 6f 6e 3a 20 00 00 00 4f ase Priority: .Wait Reason: ...O 0x00DE6679 70 65 6e 54 68 72 65 61 64 00 00 54 68 72 65 61 64 33 32 4e 65 78 74 00 00 00 00 54 68 72 65 61 penThread..Thread32Next....Threa 0x00DE6699 64 33 32 46 69 72 73 74 00 00 00 43 72 65 61 74 65 54 6f 6f 6c 68 65 6c 70 33 32 53 6e 61 70 73 d32First...CreateToolhelp32Snaps *EDX = 0x774F9A73 e5 5d c3 8b ff 89 44 24 04 89 5c 24 08 e9 e9 49 fe ff 8d a4 24 00 00 00 00 8d 64 24 00 8b d4 0f å]Ã.ÿ.D$..$.ééIþÿ.¤$.....d$..Ô. 0x774F9A93 34 c3 8d a4 24 00 00 00 00 8d 64 24 00 8d 54 24 08 cd 2e c3 90 55 8b ec 8d a4 24 30 fd ff ff 54 4Ã.¤$.....d$..T$.Í.Ã.U.ì.¤$0ýÿÿT 0x774F9AB3 e8 53 01 00 00 8b 55 04 8b 45 08 83 84 24 c4 00 00 00 04 89 50 0c c7 04 24 07 00 01 00 8b cc 6a èS....U..E.ƒ.$Ä.....P.Ç.$.....Ìj 0x774F9AD3 01 51 ff 75 08 e8 6b f1 ff ff 50 e8 02 00 00 00 cc 90 55 8b ec 8d a4 24 e0 fc ff ff 54 e8 16 01 .Qÿu.èkñÿÿPè....Ì.U.ì.¤$àüÿÿTè.. 0x774F9AF3 00 00 83 84 24 c4 00 00 00 04 8d 8c 24 d0 02 00 00 8b 45 04 c7 04 24 07 00 01 00 89 41 0c 83 61 ..ƒ.$Ä.....Œ$Ð....E.Ç.$.....A.ƒa *(ESP - 0x40) 0x022FF6D8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................................ 0x022FF6F8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................................ 0x022FF718 00 00 00 00 a0 47 f6 00 00 00 00 00 01 00 00 00 54 fa 2f 02 00 00 00 00 07 00 00 00 01 00 00 00 .... Gö.........Tú/............. 0x022FF738 d8 f7 2f 02 00 00 00 00 70 fb 58 02 90 17 38 02 70 cd 4b 02 01 00 00 00 00 00 00 00 00 00 00 00 Ø÷/.....pûX...8.pÍK............. 0x022FF758 00 00 00 00 19 66 de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 66 00 00 00 a0 47 f6 00 .....fÞ.................f... Gö. *(EBP - 0x40) 0x022FF764 00 00 00 00 00 00 00 00 00 00 00 00 66 00 00 00 a0 47 f6 00 04 fa 2f 02 1a 00 00 00 00 00 00 80 ............f... Gö..ú/........€ 0x022FF784 ff ff ff ff 00 00 00 73 00 00 00 00 0f 61 de 00 00 00 00 00 00 00 00 00 00 00 00 00 79 00 00 00 ÿÿÿÿ...s.....aÞ.............y... 0x022FF7A4 92 01 00 00 a0 47 f6 00 54 fa 2f 02 07 00 00 00 bd f9 2f 02 f9 ff ff ff 40 00 00 00 40 00 00 00 ’... Gö.Tú/......ù/.ùÿÿÿ@...@... 0x022FF7C4 00 00 00 00 00 00 00 00 00 00 00 00 29 00 00 00 a0 47 f6 00 54 fa 2f 02 00 00 00 00 0b 00 09 03 ............)... Gö.Tú/......... 0x022FF7E4 00 00 00 00 08 f8 2f 00 00 00 00 00 c0 01 00 00 e0 86 e1 06 01 00 00 00 40 9c e1 06 14 f8 2f 02 .....ø/.....À...à.á.....@œá..ø/. Hope that helps to track down the problem |
jasong Send message Joined: 7 Apr 08 Posts: 1 Credit: 96,122 RAC: 0 |
I'm BOINC 6.1.16 on Vista 32bit and the rosetta_beta version 5.96 graphics wont close. If I close the graphics window, it closes and then automatically pops back up and starts initilizing again. |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
For those interested in statistics: The first model of this task needed 45 h 52 min to complete. With no checkpointing. I admit my laptop is small and elderly, but I believe that this is no task for unprepared crunchers. |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=158175460 Validate state Invalid |
Blacksun Send message Joined: 2 May 07 Posts: 2 Credit: 1,284,699 RAC: 0 |
Task ID 160279211 upload stucks at 29,29% all other WU´s are uploading! |
Thomas Leibold Send message Joined: 30 Jul 06 Posts: 55 Credit: 19,627,164 RAC: 0 |
Workunit 1wit__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1wit_-vf__2589_100944_0 crashed with Segmentation Violation and has been stuck in this state for the last week. I'm going to abort it now. Team Helix |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=162134741 https://boinc.bakerlab.org/rosetta/result.php?resultid=161404131 https://boinc.bakerlab.org/rosetta/result.php?resultid=161739079 |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
OUCH! Yes boys and girls, this is a 24 hour validate error. resultid=165956131 I hate it when this happens... Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I notice that some WUs are creating stdout.txt files over 10MB. For example: t397__CASP8_JUMPAB_SAMPLE2_res1to79_SAVE_ALL_OUT_BARCODE_hom001__3500_3605 t397__CASP8_JUMPAB_SAMPLE1_res1to79_SAVE_ALL_OUT_BARCODE_hom001__3499_1937 The stdout.txt file contained a lot of lines like: ... res 11 and var 1 at position 1 is not a proper Nterm variant res 11 and var 1 at position 1 is not a proper Nterm variant res 11 and var 1 at position 1 is not a proper Nterm variant ... The WU t397__CASP8_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_1to69_hom001__3343_21413 had lines like the above, but also had lines like: ... res 13and var 1 at position 69 is not a proper Cterm variant res 13and var 1 at position 69 is not a proper Cterm variant res 13and var 1 at position 69 is not a proper Cterm variant ... and sometimes it interleaved them: ... res 11 and var 1 at position 1 is not a proper Nterm variant res 13and var 1 at position 69 is not a proper Cterm variant res 11 and var 1 at position 1 is not a proper Nterm variant res 13and var 1 at position 69 is not a proper Cterm variant res 11 and var 1 at position 1 is not a proper Nterm variant res 13and var 1 at position 69 is not a proper Cterm variant ... The WUs seem to have finished normally, as far as I can tell. |
hedera Send message Joined: 15 Jul 06 Posts: 76 Credit: 5,246,877 RAC: 991 |
Back from May 19 (I've been busy) I found today that I have 3 little tasks that never finished: 05/19/08 task 164834608 WU 150659671 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it01_3335_370_0 05/19/08 task 164834608 WU 150659671 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it19_3335_672 05/19/08 TASK 164837987 WU 150662996 Rosetta Beta 5.96 BAK1b0o_loop_model_biased_it13_3335_525 They're all sitting in my task list labeled "In progress." They aren't running on my computer; I have no idea what happened to them. They all had a reporting deadline of 5/29. Any idea what's going on? --hedera Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
25 may crash and burn with compute error: Task ID 165078648 Name 1louA_BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1louA-vf__2568_110106_0 Workunit 150889023 Outcome Client error Client state Compute error Exit status 1 (0x1) 13521.31 stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> trouble finding Rama_smooth_dyn.dat_ss_6.4 ERROR:: Exit from: .read_paths.cc line: 360 </stderr_txt> ]]> |
caesar1987 Send message Joined: 28 Nov 06 Posts: 13 Credit: 22,268 RAC: 0 |
error https://boinc.bakerlab.org/rosetta/result.php?resultid=167013698 |
GT82 [HWU] Send message Joined: 26 Aug 07 Posts: 15 Credit: 154,103 RAC: 0 |
I can't crunch any WUs, like you see here --> https://boinc.bakerlab.org/rosetta/results.php?userid=201085 Where is the problem?? :( |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
CASP8 file and it generates this error according to your information: <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> Can't link input file </message> This happened on the other computer that tried to run this file. something is wrong with that batch of files. Just keep on crunching, its not your machine. I can't crunch any WUs, like you see here --> https://boinc.bakerlab.org/rosetta/results.php?userid=201085 |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
I have reason to believe Rosetta is crashing my Fedora 7 box. After looking at the very brief logs, it appears my system spontaneously rebooted immediately after completing and then starting a Rosetta task. Where should I start getting to the bottom of this mystery? |
DevPao Send message Joined: 31 May 08 Posts: 1 Credit: 400 RAC: 0 |
I often get a visual studio 2005 popup offering to debug an "unhandled exception" in rosetta 5.96. Given that the pc runs smooth other boinc matter it's time for the author to debug some of the "compute error" wu, most events are after few minutes of run. Should be fairly easy to reproduce |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Here is an example WU: Mine actually finished OK, so it says. But it CRASHED my Linux machine. This is bad. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=153288323 |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
A strange validate error here: https://boinc.bakerlab.org/rosetta/result.php?resultid=168411774 "This process generated 930 decoys from 930 attempts" Yet, it ran the full length of time and got no credit. Any clues as to what happened? I even popped up the graphical interface for this work unit and saw it crunching happily along. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
Here's a couple of FRA_t401_CASP8_2PRV_2ICG_1_IGNORE_THE_RESTt401_1_aaT0401_2ICGA_13_0001_3601_* WUs that had validate errors: https://boinc.bakerlab.org/rosetta/result.php?resultid=168453002 https://boinc.bakerlab.org/rosetta/result.php?resultid=168377188 They ran for the usual length of time and the stderr looks perfectly normal, yet they were marked "invalid". These WUs ran on two different Linux computers. |
BitSpit Send message Joined: 5 Nov 05 Posts: 33 Credit: 4,147,344 RAC: 0 |
And just to confirm that there's either a problem with the jobs or the validator, my list of invalidated FRA_t401_CASP8_2PRV_2ICG_1_IGNORE_THE_RESTt401_1_aaT0401_2ICGA_13_0001_3601 jobs so far: https://boinc.bakerlab.org/rosetta/result.php?resultid=168445437 https://boinc.bakerlab.org/rosetta/result.php?resultid=168445425 https://boinc.bakerlab.org/rosetta/result.php?resultid=168460452 https://boinc.bakerlab.org/rosetta/result.php?resultid=168390070 https://boinc.bakerlab.org/rosetta/result.php?resultid=168420107 https://boinc.bakerlab.org/rosetta/result.php?resultid=168419121 (invalid for second person also) https://boinc.bakerlab.org/rosetta/result.php?resultid=168363001 https://boinc.bakerlab.org/rosetta/result.php?resultid=168478613 https://boinc.bakerlab.org/rosetta/result.php?resultid=168500618 https://boinc.bakerlab.org/rosetta/result.php?resultid=168496136 https://boinc.bakerlab.org/rosetta/result.php?resultid=168445210 (invalid for second person also) https://boinc.bakerlab.org/rosetta/result.php?resultid=168412886 https://boinc.bakerlab.org/rosetta/result.php?resultid=168490942 https://boinc.bakerlab.org/rosetta/result.php?resultid=168338736 https://boinc.bakerlab.org/rosetta/result.php?resultid=167854925 (unique result: my invalid run produced 686 decoys. The reissued run produced 126 and was valid.) Looking at that last result, I think you have a problem with the validator not handling high decoys. |
Message boards :
Number crunching :
Problems with version 5.96
©2024 University of Washington
https://www.bakerlab.org