Message boards : Number crunching : Minirosetta v1.45 bug thread
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2168 Credit: 41,629,484 RAC: 5,494 |
Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away. Thanks. Just about to report 3 of them. Aborted another in advance. Another cs_vanilla WU went down though: 212913623 <core_client_version>6.2.19</core_client_version> |
Jim_Clark Send message Joined: 11 Sep 07 Posts: 7 Credit: 38,439 RAC: 0 |
I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU. I run many programs on my computer, even new programs that may need debugging; and Minirosetta is the only one that crashes or hangs my computer. Here are my stats for the last 100 Rosetta WUs: 2 new Rosetta Beta WUs (anticipate success) 2 done (OK) Rosetta Beta WUs 8 failed Rosetta Mini WUs, most 1.40, some 1.45 80 aborted Rosetta Mini WUs, most 1.40, some 1.45 So I waste my time with 88 Rosetta Mini WUs, for 4 Rosetta Beta WUs that are good. (22 to 1) Another computer in my house, an XP HE with 1/5th the horsepower, can sometimes compute Minirosetta WUs. Its stats are: 5 done (OK) Rosetta Mini WUs, most 1.40 3 failed Rosetta Mini WUs, most 1.40 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 292 |
I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU. How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference. Also, have you tried a 1.45 workunit with a name which doesn't start with cs_vanilla? Those workunits have been especially troublesome lately. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708. Failed in a different area this time. If its a BOINC version issue as suggested previously, this is bad news for me as the latest Mac version (which I'm running) is only 6.2.18 Thread 0 Crashed: 0 ...etta_1.45_i686-apple-darwin 0x00081e9c __ZNK4core4pose4Pose11is_fullatomEv + 154 1 ...etta_1.45_i686-apple-darwin 0x001afdf8 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9604 2 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881 3 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941 4 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216 5 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41 etc. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 292 |
Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708. What's the total amount of RAM memory on your machine? I just looked over many of the cs_vanilla type of workunits with enough information posted to this thread to find the workunits, and found the following: Most of that type of workunit that ran under BOINC 6.2.19 on a machine with at least 4 GB of memory succeeded. Perhaps half of those under BOINC 6.2.19 and 3 GB succeeded. Most of those with BOINC 6.2.14 or older failed. Most of those with 2 GB or less failed. I didn't find enough under BOINC 6.2.18 to be sure, but perhaps half of those I saw succeeded. Most of the workunits with _ZN_ABRELAX_ in the workunit name have problems; see the earlier message about them. I saw a lot fewer failures for workunits with different types of names. Naturally, what I was able to find was probably biased by the fact that people aren't likely to post enough information about workunits that don't fail on at least one machine for me to be able to find them. I suspect that we need a new system requirements evaluation specific to workunits that use the same features as the cs_vanilla workunits. |
Jim_Clark Send message Joined: 11 Sep 07 Posts: 7 Credit: 38,439 RAC: 0 |
How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference. The first one I spoke of has two cores, and - Memory: 1.94 GB physical, 3.78 GB virtual The second one has one core, and - Memory: 1.02 GB physical, 3.91 GB virtual I'll try a non-vanilla workunit. Do they have chocolate? If memory is a critical issue, why can't Minirosetta check available memory at the start, and quit right away if memory is too scarce? Also it seems that the Scheduler ought to observe when a client aborts more than 20 (or some other number) Minirosetta workunits, and then don't send any to that client. Or, better, let clients select the programs that they will accept, as other projects allow. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
In those cases where your wingman completed the workunit without a problem, it was on an Intel Xeon CPU with a lot more memory. This leads me to believe that cs_vanilla workunits need a lot more memory than your computer has, and suggests that your rather old version of BOINC (5.2.13) may not be sending the information needed to choose only workunits that will work with your memory size. I find my BOINC version quite reliable, and it certainly sends the memory size information properly. The computers' links at Rosetta show the proper memory size, and in the past my 512MB machines have been refused work when none was available for their memory size. Your suggestion about cs_vanilla WUs needing more memory may be right, though. I have two quads (8 cores total) with lots of memory, and neither have had any of the cs_vanilla errors. Maybe those WU eventually hit a model where they are using a lot more memory than they are supposed to. |
netwraith Send message Joined: 3 Sep 06 Posts: 80 Credit: 13,483,227 RAC: 0 |
I don't know about these units passing on Intel CORE cpu's and more memory... I am having dozens of these cs_vanilla units bomb out on machines with dual CORE2 quad XEONS and 16GB of RAM... I think these machines are big enough to handle anything out there. And I was running 14 of them up to a day or two ago... Most are still crunching, but, are starting to wind down so that they can be part of a compute farm... So, I think these is something wrong in these units or the v1.45 of mini.... Looking for a team ??? Join BoincSynergy!! |
Tony Send message Joined: 12 Dec 05 Posts: 7 Credit: 6,724,341 RAC: 0 |
Error with debug info. <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 21600 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.5.0 Dump Timestamp : 12/08/08 06:59:34 Install Directory : Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots2;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore ModLoad: 00400000 006e5000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_1.45_windows_x86_64.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : D:boinc_buildminirosetta_1.45miniVisual StudioBoincReleaseminirosetta_1.45_windows_intelx86.pdb ModLoad: 77140000 00160000 C:WindowsSysWOW64ntdll.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wntdll.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 76c90000 00110000 C:Windowssyswow64kernel32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wkernel32.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75520000 000d0000 C:Windowssyswow64USER32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wuser32.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 760e0000 00090000 C:Windowssyswow64GDI32.dll (6.0.6001.18023) (-exported- Symbols Loaded) Linked PDB Filename : wgdi32.pdb File Version : 6.0.6001.18023 (vistasp1_gdr.080221-1537) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18023 ModLoad: 759d0000 000c6000 C:Windowssyswow64ADVAPI32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75640000 000f0000 C:Windowssyswow64RPCRT4.dll (6.0.6001.18051) (-exported- Symbols Loaded) Linked PDB Filename : wrpcrt4.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 752f0000 00060000 C:Windowssyswow64Secur32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wsecur32.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75f10000 00060000 C:Windowssystem32IMM32.DLL (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wimm32.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75e00000 000c8000 C:Windowssyswow64MSCTF.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : msctf.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 75f70000 000aa000 C:Windowssyswow64msvcrt.dll (7.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 7.0.6001.18000 ModLoad: 76170000 00009000 C:Windowssyswow64LPK.DLL (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wlpk.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75aa0000 0007d000 C:Windowssyswow64USP10.dll (1.626.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : usp10.pdb File Version : 1.0626.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft(R) Uniscribe Unicode script processor Product Version : 1.0626.6001.18000 ModLoad: 73b00000 00021000 C:Windowssystem32NTMARTA.DLL (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 755f0000 0004a000 C:Windowssyswow64WLDAP32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : wldap32.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 75ed0000 0002d000 C:Windowssyswow64WS2_32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 75f00000 00006000 C:Windowssyswow64NSI.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : nsi.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75930000 00007000 C:Windowssyswow64PSAPI.DLL (6.0.6000.16386) (-exported- Symbols Loaded) Linked PDB Filename : psapi.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 73ae0000 00011000 C:Windowssystem32SAMLIB.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : samlib.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 75b20000 00144000 C:Windowssyswow64ole32.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : ole32.pdb File Version : 6.0.6000.16386 (vista_rtm.061101-2205) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6000.16386 ModLoad: 6e470000 000dc000 C:Windowssystem32dbghelp.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 ModLoad: 74fe0000 00008000 C:Windowssystem32version.dll (6.0.6001.18000) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840) Company Name : Microsoft Corporation Product Name : Microsoft� Windows� Operating System Product Version : 6.0.6001.18000 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 6819, Write: 0, Other 2098 - I/O Transfers Counters - Read: 0, Write: 20696, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 73000, QuotaPeakPagedPoolUsage: 73064 QuotaNonPagedPoolUsage: 10288, QuotaPeakNonPagedPoolUsage: 11520 - Virtual Memory Usage - VirtualSize: 257019904, PeakVirtualSize: 274767872 - Pagefile Usage - PagefileUsage: 184328192, PeakPagefileUsage: 207704064 - Working Set Size - WorkingSetSize: 190152704, PeakWorkingSetSize: 213340160, PageFaultCount: 544576 *** Dump of thread ID 2228 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 34632224.000000, User Time: 43788701696.000000, Wait Time: 3728048.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000 - Registers - eax=00000001 ebx=00000000 ecx=00000000 edx=0017c19c esi=0017c8cc edi=00000000 eip=004ead47 esp=0017c180 ebp=0017c588 cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246 - Callstack - ChildEBP RetAddr Args to Child 0017c588 bfcfa595 029c34d0 029c34d0 029c3704 00956b08 minirosetta_1.45_windows_x86_64!+0x0 00957020 009a48a8 00477310 009a49a8 00476130 009a4aa8 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'bfcfa595' 00957024 00477310 009a49a8 00476130 009a4aa8 00476750 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '009a48a8' 009a48a8 00000000 00000000 00a18e58 009a48bc 00000000 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '00477310' *** Dump of thread ID 764 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3728048.000000 - Registers - eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=01c8ff48 edi=00000000 eip=7716081d esp=01c8ff08 ebp=01c8ff6c cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 01c8ff6c 76ca0c88 00000064 00000000 01c8ff94 0041c61b ntdll!NtDelayExecution+0x0 01c8ff7c 0041c61b 00000064 00000000 76d1e3f3 00000000 kernel32!Sleep+0x0 01c8ff94 771bcfed 00000000 75d9cf1b 00000000 00000000 minirosetta_1.45_windows_x86_64!+0x0 01c8ffd4 771bd1ff 0041c610 00000000 00000000 00000000 ntdll!RtlCreateUserProcess+0x0 01c8ffec 00000000 0041c610 00000000 00000000 00000000 ntdll!RtlCreateProcessParameters+0x0 *** Dump of thread ID 5428 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3727960.000000 - Registers - eax=00000000 ebx=0281c900 ecx=00000000 edx=00000000 esi=01d8fdfc edi=00000000 eip=7716081d esp=01d8fdbc ebp=01d8fe20 cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 01d8fe20 76ca0c88 000007d0 00000000 76ca0c79 0079eb74 ntdll!NtDelayExecution+0x0 01d8fe30 0079eb74 000007d0 fc734f4d 00000000 0281c950 kernel32!Sleep+0x0 76ca0c79 ff006aec a6e80875 5d000006 900004c2 90909090 minirosetta_1.45_windows_x86_64!+0x0 76ca0c7d a6e80875 5d000006 900004c2 90909090 8b55ff8b minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff006aec' 76ca0c81 5d000006 900004c2 90909090 8b55ff8b 14458dec minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'a6e80875' 76ca0c85 900004c2 90909090 8b55ff8b 14458dec 1475ff50 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d000006' 76ca0c89 90909090 8b55ff8b 14458dec 1475ff50 ff1075ff minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '900004c2' 76ca0c8d 8b55ff8b 14458dec 1475ff50 ff1075ff 75ff0c75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '90909090' 76ca0c91 14458dec 1475ff50 ff1075ff 75ff0c75 c415ff08 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b55ff8b' 76ca0c95 1475ff50 ff1075ff 75ff0c75 c415ff08 8b76ca06 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14458dec' 76ca0c99 ff1075ff 75ff0c75 c415ff08 8b76ca06 c985184d minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1475ff50' 76ca0c9d 75ff0c75 c415ff08 8b76ca06 c985184d c0850b75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff1075ff' 76ca0ca1 c415ff08 8b76ca06 c985184d c0850b75 c0330e7c msvcrt!mktime+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '75ff0c75' 76ca0ca5 8b76ca06 c985184d c0850b75 c0330e7c 14c25d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c415ff08' 76ca0ca9 c985184d c0850b75 c0330e7c 14c25d40 14558b00 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b76ca06' 76ca0cad c0850b75 c0330e7c 14c25d40 14558b00 eeeb1189 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c985184d' 76ca0cb1 c0330e7c 14c25d40 14558b00 eeeb1189 0e95e850 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0850b75' 76ca0cb5 14c25d40 14558b00 eeeb1189 0e95e850 c0330000 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330e7c' 76ca0cb9 14558b00 eeeb1189 0e95e850 c0330000 9090ebeb msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14c25d40' 76ca0cbd eeeb1189 0e95e850 c0330000 9090ebeb 8b909090 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14558b00' 76ca0cc1 0e95e850 c0330000 9090ebeb 8b909090 ec8b55ff msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'eeeb1189' 76ca0cc5 c0330000 9090ebeb 8b909090 ec8b55ff 458b5151 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0e95e850' 76ca0cc9 9090ebeb 8b909090 ec8b55ff 458b5151 5d8b530c msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330000' 76ca0ccd 8b909090 ec8b55ff 458b5151 5d8b530c 358b5614 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '9090ebeb' 76ca0cd1 ec8b55ff 458b5151 5d8b530c 358b5614 76ca06c8 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b909090' 76ca0cd5 458b5151 5d8b530c 358b5614 76ca06c8 087d8b57 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ec8b55ff' 76ca0cd9 5d8b530c 358b5614 76ca06c8 087d8b57 8df84589 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '458b5151' 76ca0cdd 358b5614 76ca06c8 087d8b57 8df84589 6a501445 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d8b530c' 76ca0ce1 76ca06c8 087d8b57 8df84589 6a501445 fc458d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '358b5614' 76ca0ce5 087d8b57 8df84589 6a501445 fc458d40 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '76ca06c8' 76ca0ce9 8df84589 6a501445 fc458d40 f8458d50 5d895750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '087d8b57' 76ca0ced 6a501445 fc458d40 f8458d50 5d895750 85d6fffc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8df84589' 76ca0cf1 fc458d40 f8458d50 5d895750 85d6fffc 8d197dc0 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445' 76ca0cf5 f8458d50 5d895750 85d6fffc 8d197dc0 6a501445 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d40' 76ca0cf9 5d895750 85d6fffc 8d197dc0 6a501445 fc458d04 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50' 76ca0cfd 85d6fffc 8d197dc0 6a501445 fc458d04 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d895750' 76ca0d01 8d197dc0 6a501445 fc458d04 f8458d50 d6ff5750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '85d6fffc' 76ca0d05 6a501445 fc458d04 f8458d50 d6ff5750 8c0fc085 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8d197dc0' 76ca0d09 fc458d04 f8458d50 d6ff5750 8c0fc085 0000009f kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445' 76ca0d0d f8458d50 d6ff5750 8c0fc085 0000009f a814458b kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d04' 76ca0d11 d6ff5750 8c0fc085 0000009f a814458b a85675cc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50' 76ca0d15 8c0fc085 0000009f a814458b a85675cc 8d407503 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'd6ff5750' 76ca0d19 00000000 a814458b a85675cc 8d407503 53500845 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8c0fc085' *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> |
Wee Todd Didd Send message Joined: 6 Jan 08 Posts: 1 Credit: 839,210 RAC: 0 |
Same here. cs_vanilla* errors. Client 1.45 Using a Athlon 64 with 2g ram. Most do pass though. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194285701 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194171642 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194109205 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=193908266 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=193528547 |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
Well, I've now had a cs_vanilla error 193 on a quad with 4GB: https://boinc.bakerlab.org/rosetta/result.php?resultid=212454329 And this quad also had error 193 with two loopbuild_reference_hombench WUs: https://boinc.bakerlab.org/rosetta/result.php?resultid=212758774 https://boinc.bakerlab.org/rosetta/result.php?resultid=212602760 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_14875_0 died at 11355.2 seconds with -1073741819 (0xc0000005) error. I had 8 tasks that completed ok before this. |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
This is my very first time to find a WU that ended with a Unhandled Exception Detected...: cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_37184_0 ran for 11603.2 seconds when it errored with exit status: -1073741819 (0xc0000005) Windows XP-home, Boinc 5.10.45 Path7. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
Now I've seen some cs_vanilla errors on my biggest memory computer, an 8GB quad. This can't be due to running out of memory, (unless they're hitting the limit of a 32 bit processes address space). https://boinc.bakerlab.org/rosetta/result.php?resultid=213025220 https://boinc.bakerlab.org/rosetta/result.php?resultid=212933676 https://boinc.bakerlab.org/rosetta/result.php?resultid=212854713 |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
THanks for all the error reports! I think we've found the issue here. THis was damn tricky to find since, for some reason, it doesnt appear to occur on linux plattforms even nearly as frequently as on mAC and windows. I ran the equivalent of several thousand WUs on our local cluster and didnt have a single job crash. But i think we've found at least one issue by testing on our limited windows/mac resources, and a bug fix is going out to ralph tonight or tomorrow morning depending on how much caffeine i can get hold of. Our aim is to get mini inline with old rosetta in terms of error rate as soon as we can! Thanks for all the feedback, it totally helps finding these bugs! Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2168 Credit: 41,629,484 RAC: 5,494 |
Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away. In fact, the same workunits returned with very good success rate on the testing server. We are investigating why the alpha testing server did not catch such errors in the first place. This is another unfortunate incident which will be a new lesson for us. Sorry for any inconvenience this has brought to you. Is this still the case or have they been corrected and re-issued? I ask this because more are coming through and I just had one crash out on me. I aborted the rest, just in case, to save wasting processing time. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2168 Credit: 41,629,484 RAC: 5,494 |
If your computer used any processing time at all on the unit it must have been another error. Because in this particular case the tasks fail to process at all Of course, thanks. I forgot. Looks like they were corrected then - crashed out after 25 minutes. I'll let them run. |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
Failed tasks among tasks received 20081207: Errored out on 2 computers: cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_25492 cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_30102_1 One computer failures: cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_32109_0 loopbuild_reference_hombench_loopbuild_t363__IGNORE_THE_REST_2CU3A_7_5461_31_0 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Task 212871743 cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_35341_0 compute error died at 12807.86 seconds with the usual (0xc0000005)error 1co4A_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1co4A-_5476_95_1 died CPU time 0 stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: Illegal value for integer option -run:jran specified: </stderr_txt> task 212923776 cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_35553_1 died at 3373.781 seconds with the usual (0xc0000005) error 1dsvA_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1dsvA-_5476_655_1 CPU time 0 stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: Illegal value for integer option -run:jran specified: </stderr_txt> ]]> Task 212986916 cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_mth1598_olange_5388_38521_1 died at 1881.563 seconds with the usual (0xc0000005) error Task 213039289 cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_42252_0 died at 5740.828 seconds with the usual (0xc0000005) error Aborted the remaing vanila task, to many compute errors. |
Erwin Schlonz Send message Joined: 20 May 07 Posts: 5 Credit: 203,397 RAC: 0 |
<core_client_version>6.2.19</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> # cpu_run_time_pref: 86400 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000 https://boinc.bakerlab.org/rosetta/result.php?resultid=213272398 https://boinc.bakerlab.org/rosetta/result.php?resultid=213233743 https://boinc.bakerlab.org/rosetta/result.php?resultid=213221227 https://boinc.bakerlab.org/rosetta/result.php?resultid=212971112 https://boinc.bakerlab.org/rosetta/result.php?resultid=213385263 The last one was after 18 (!) hours of computation. 18 hours of wasted eletric power. That's it. I will suspend to participate until the application runs more stable. For that I will set all my computers to not to download any further workunits on monday. Maybe I will come around next spring to see if things are working again. Auf Wiedersehen Erwin Schlonz |
Message boards :
Number crunching :
Minirosetta v1.45 bug thread
©2025 University of Washington
https://www.bakerlab.org