Minirosetta v1.45 bug thread

Message boards : Number crunching : Minirosetta v1.45 bug thread

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 57723 - Posted: 9 Dec 2008, 0:49:01 UTC - in response to Message 57714.  

Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away.

Thanks. Just about to report 3 of them. Aborted another in advance.

Another cs_vanilla WU went down though:
212913623
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

ID: 57723 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jim_Clark
Avatar

Send message
Joined: 11 Sep 07
Posts: 7
Credit: 38,439
RAC: 0
Message 57725 - Posted: 9 Dec 2008, 1:50:05 UTC
Last modified: 9 Dec 2008, 2:03:31 UTC

I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU.

I run many programs on my computer, even new programs that may need debugging; and Minirosetta is the only one that crashes or hangs my computer.

Here are my stats for the last 100 Rosetta WUs:
2 new Rosetta Beta WUs (anticipate success)
2 done (OK) Rosetta Beta WUs
8 failed Rosetta Mini WUs, most 1.40, some 1.45
80 aborted Rosetta Mini WUs, most 1.40, some 1.45

So I waste my time with 88 Rosetta Mini WUs, for 4 Rosetta Beta WUs that are good. (22 to 1)

Another computer in my house, an XP HE with 1/5th the horsepower, can sometimes compute Minirosetta WUs. Its stats are:

5 done (OK) Rosetta Mini WUs, most 1.40
3 failed Rosetta Mini WUs, most 1.40

ID: 57725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 57726 - Posted: 9 Dec 2008, 2:02:21 UTC - in response to Message 57725.  
Last modified: 9 Dec 2008, 2:06:51 UTC

I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU.

Here are my stats for the last 100 Rosetta WUs:
2 new Rosetta Beta WUs (anticipate success)
2 done (OK) Rosetta Beta WUs
8 failed Rosetta Mini WUs, most 1.40, some 1.45
80 aborted Rosetta Mini WUs, most 1.40, some 1.45

So I waste my time with 88 Rosetta Mini WUs, for 4 Rosetta Beta WUs that are good. (22 to 1)

Another computer in my house, an XP HE with 1/5th the horsepower, can sometimes compute Minirosetta WUs. Its stats are:

5 done (OK) Rosetta Mini WUs, most 1.40
3 failed Rosetta Mini WUs, most 1.40




How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference.

Also, have you tried a 1.45 workunit with a name which doesn't start with cs_vanilla? Those workunits have been especially troublesome lately.
ID: 57726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 57727 - Posted: 9 Dec 2008, 2:05:00 UTC

Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708.

Failed in a different area this time.

If its a BOINC version issue as suggested previously, this is bad news for me as the latest Mac version (which I'm running) is only 6.2.18


Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00081e9c __ZNK4core4pose4Pose11is_fullatomEv + 154
1 ...etta_1.45_i686-apple-darwin 0x001afdf8 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9604
2 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
3 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
4 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
5 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

etc.
ID: 57727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 57729 - Posted: 9 Dec 2008, 2:43:07 UTC - in response to Message 57727.  

Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708.

Failed in a different area this time.

If its a BOINC version issue as suggested previously, this is bad news for me as the latest Mac version (which I'm running) is only 6.2.18


Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00081e9c __ZNK4core4pose4Pose11is_fullatomEv + 154
1 ...etta_1.45_i686-apple-darwin 0x001afdf8 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9604
2 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
3 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
4 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
5 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

etc.


What's the total amount of RAM memory on your machine? I just looked over many of the cs_vanilla type of workunits with enough information posted to this thread to find the workunits, and found the following:

Most of that type of workunit that ran under BOINC 6.2.19 on a machine with at least 4 GB of memory succeeded.

Perhaps half of those under BOINC 6.2.19 and 3 GB succeeded.

Most of those with BOINC 6.2.14 or older failed.

Most of those with 2 GB or less failed.

I didn't find enough under BOINC 6.2.18 to be sure, but perhaps half of those I saw succeeded.

Most of the workunits with _ZN_ABRELAX_ in the workunit name have problems; see the earlier message about them.

I saw a lot fewer failures for workunits with different types of names.

Naturally, what I was able to find was probably biased by the fact that people aren't likely to post enough information about workunits that don't fail on at least one machine for me to be able to find them.

I suspect that we need a new system requirements evaluation specific to workunits that use the same features as the cs_vanilla workunits.
ID: 57729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jim_Clark
Avatar

Send message
Joined: 11 Sep 07
Posts: 7
Credit: 38,439
RAC: 0
Message 57730 - Posted: 9 Dec 2008, 2:49:05 UTC - in response to Message 57726.  

How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference.

Also, have you tried a 1.45 workunit with a name which doesn't start with cs_vanilla? Those workunits have been especially troublesome lately.


The first one I spoke of has two cores, and -
Memory: 1.94 GB physical, 3.78 GB virtual

The second one has one core, and -
Memory: 1.02 GB physical, 3.91 GB virtual

I'll try a non-vanilla workunit. Do they have chocolate?

If memory is a critical issue, why can't Minirosetta check available memory at the start, and quit right away if memory is too scarce?

Also it seems that the Scheduler ought to observe when a client aborts more than 20 (or some other number) Minirosetta workunits, and then don't send any to that client. Or, better, let clients select the programs that they will accept, as other projects allow.


ID: 57730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 57731 - Posted: 9 Dec 2008, 3:11:47 UTC - in response to Message 57721.  

In those cases where your wingman completed the workunit without a problem, it was on an Intel Xeon CPU with a lot more memory. This leads me to believe that cs_vanilla workunits need a lot more memory than your computer has, and suggests that your rather old version of BOINC (5.2.13) may not be sending the information needed to choose only workunits that will work with your memory size.

I find my BOINC version quite reliable, and it certainly sends the memory size information properly. The computers' links at Rosetta show the proper memory size, and in the past my 512MB machines have been refused work when none was available for their memory size.

Your suggestion about cs_vanilla WUs needing more memory may be right, though. I have two quads (8 cores total) with lots of memory, and neither have had any of the cs_vanilla errors.

Maybe those WU eventually hit a model where they are using a lot more memory than they are supposed to.
ID: 57731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 57732 - Posted: 9 Dec 2008, 3:21:45 UTC



I don't know about these units passing on Intel CORE cpu's and more memory... I am having dozens of these cs_vanilla units bomb out on machines with dual CORE2 quad XEONS and 16GB of RAM... I think these machines are big enough to handle anything out there. And I was running 14 of them up to a day or two ago... Most are still crunching, but, are starting to wind down so that they can be part of a compute farm...

So, I think these is something wrong in these units or the v1.45 of mini....
Looking for a team ??? Join BoincSynergy!!


ID: 57732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tony

Send message
Joined: 12 Dec 05
Posts: 7
Credit: 6,724,341
RAC: 0
Message 57733 - Posted: 9 Dec 2008, 3:48:49 UTC

Error with debug info.
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.5.0


Dump Timestamp : 12/08/08 06:59:34
Install Directory :
Data Directory : C:ProgramDataBOINC
Project Symstore : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots2;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore


ModLoad: 00400000 006e5000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_1.45_windows_x86_64.exe (-nosymbols- Symbols Loaded)
Linked PDB Filename : D:boinc_buildminirosetta_1.45miniVisual StudioBoincReleaseminirosetta_1.45_windows_intelx86.pdb

ModLoad: 77140000 00160000 C:WindowsSysWOW64ntdll.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wntdll.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 76c90000 00110000 C:Windowssyswow64kernel32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wkernel32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75520000 000d0000 C:Windowssyswow64USER32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wuser32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 760e0000 00090000 C:Windowssyswow64GDI32.dll (6.0.6001.18023) (-exported- Symbols Loaded)
Linked PDB Filename : wgdi32.pdb
File Version : 6.0.6001.18023 (vistasp1_gdr.080221-1537)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18023

ModLoad: 759d0000 000c6000 C:Windowssyswow64ADVAPI32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : advapi32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75640000 000f0000 C:Windowssyswow64RPCRT4.dll (6.0.6001.18051) (-exported- Symbols Loaded)
Linked PDB Filename : wrpcrt4.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 752f0000 00060000 C:Windowssyswow64Secur32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wsecur32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75f10000 00060000 C:Windowssystem32IMM32.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wimm32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75e00000 000c8000 C:Windowssyswow64MSCTF.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : msctf.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75f70000 000aa000 C:Windowssyswow64msvcrt.dll (7.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : msvcrt.pdb
File Version : 7.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 7.0.6001.18000

ModLoad: 76170000 00009000 C:Windowssyswow64LPK.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wlpk.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75aa0000 0007d000 C:Windowssyswow64USP10.dll (1.626.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : usp10.pdb
File Version : 1.0626.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0626.6001.18000

ModLoad: 73b00000 00021000 C:Windowssystem32NTMARTA.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ntmarta.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 755f0000 0004a000 C:Windowssyswow64WLDAP32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wldap32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75ed0000 0002d000 C:Windowssyswow64WS2_32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ws2_32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75f00000 00006000 C:Windowssyswow64NSI.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : nsi.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75930000 00007000 C:Windowssyswow64PSAPI.DLL (6.0.6000.16386) (-exported- Symbols Loaded)
Linked PDB Filename : psapi.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 73ae0000 00011000 C:Windowssystem32SAMLIB.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : samlib.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75b20000 00144000 C:Windowssyswow64ole32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ole32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 6e470000 000dc000 C:Windowssystem32dbghelp.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : dbghelp.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 74fe0000 00008000 C:Windowssystem32version.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : version.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 6819, Write: 0, Other 2098

- I/O Transfers Counters -
Read: 0, Write: 20696, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 73000, QuotaPeakPagedPoolUsage: 73064
QuotaNonPagedPoolUsage: 10288, QuotaPeakNonPagedPoolUsage: 11520

- Virtual Memory Usage -
VirtualSize: 257019904, PeakVirtualSize: 274767872

- Pagefile Usage -
PagefileUsage: 184328192, PeakPagefileUsage: 207704064

- Working Set Size -
WorkingSetSize: 190152704, PeakWorkingSetSize: 213340160, PageFaultCount: 544576

*** Dump of thread ID 2228 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 34632224.000000, User Time: 43788701696.000000, Wait Time: 3728048.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

- Registers -
eax=00000001 ebx=00000000 ecx=00000000 edx=0017c19c esi=0017c8cc edi=00000000
eip=004ead47 esp=0017c180 ebp=0017c588
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246

- Callstack -
ChildEBP RetAddr Args to Child
0017c588 bfcfa595 029c34d0 029c34d0 029c3704 00956b08 minirosetta_1.45_windows_x86_64!+0x0
00957020 009a48a8 00477310 009a49a8 00476130 009a4aa8 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'bfcfa595'
00957024 00477310 009a49a8 00476130 009a4aa8 00476750 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '009a48a8'
009a48a8 00000000 00000000 00a18e58 009a48bc 00000000 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '00477310'

*** Dump of thread ID 764 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3728048.000000

- Registers -
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=01c8ff48 edi=00000000
eip=7716081d esp=01c8ff08 ebp=01c8ff6c
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
01c8ff6c 76ca0c88 00000064 00000000 01c8ff94 0041c61b ntdll!NtDelayExecution+0x0
01c8ff7c 0041c61b 00000064 00000000 76d1e3f3 00000000 kernel32!Sleep+0x0
01c8ff94 771bcfed 00000000 75d9cf1b 00000000 00000000 minirosetta_1.45_windows_x86_64!+0x0
01c8ffd4 771bd1ff 0041c610 00000000 00000000 00000000 ntdll!RtlCreateUserProcess+0x0
01c8ffec 00000000 0041c610 00000000 00000000 00000000 ntdll!RtlCreateProcessParameters+0x0

*** Dump of thread ID 5428 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3727960.000000

- Registers -
eax=00000000 ebx=0281c900 ecx=00000000 edx=00000000 esi=01d8fdfc edi=00000000
eip=7716081d esp=01d8fdbc ebp=01d8fe20
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
01d8fe20 76ca0c88 000007d0 00000000 76ca0c79 0079eb74 ntdll!NtDelayExecution+0x0
01d8fe30 0079eb74 000007d0 fc734f4d 00000000 0281c950 kernel32!Sleep+0x0
76ca0c79 ff006aec a6e80875 5d000006 900004c2 90909090 minirosetta_1.45_windows_x86_64!+0x0
76ca0c7d a6e80875 5d000006 900004c2 90909090 8b55ff8b minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff006aec'
76ca0c81 5d000006 900004c2 90909090 8b55ff8b 14458dec minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'a6e80875'
76ca0c85 900004c2 90909090 8b55ff8b 14458dec 1475ff50 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d000006'
76ca0c89 90909090 8b55ff8b 14458dec 1475ff50 ff1075ff minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '900004c2'
76ca0c8d 8b55ff8b 14458dec 1475ff50 ff1075ff 75ff0c75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '90909090'
76ca0c91 14458dec 1475ff50 ff1075ff 75ff0c75 c415ff08 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b55ff8b'
76ca0c95 1475ff50 ff1075ff 75ff0c75 c415ff08 8b76ca06 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14458dec'
76ca0c99 ff1075ff 75ff0c75 c415ff08 8b76ca06 c985184d minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1475ff50'
76ca0c9d 75ff0c75 c415ff08 8b76ca06 c985184d c0850b75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff1075ff'
76ca0ca1 c415ff08 8b76ca06 c985184d c0850b75 c0330e7c msvcrt!mktime+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '75ff0c75'
76ca0ca5 8b76ca06 c985184d c0850b75 c0330e7c 14c25d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c415ff08'
76ca0ca9 c985184d c0850b75 c0330e7c 14c25d40 14558b00 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b76ca06'
76ca0cad c0850b75 c0330e7c 14c25d40 14558b00 eeeb1189 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c985184d'
76ca0cb1 c0330e7c 14c25d40 14558b00 eeeb1189 0e95e850 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0850b75'
76ca0cb5 14c25d40 14558b00 eeeb1189 0e95e850 c0330000 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330e7c'
76ca0cb9 14558b00 eeeb1189 0e95e850 c0330000 9090ebeb msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14c25d40'
76ca0cbd eeeb1189 0e95e850 c0330000 9090ebeb 8b909090 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14558b00'
76ca0cc1 0e95e850 c0330000 9090ebeb 8b909090 ec8b55ff msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'eeeb1189'
76ca0cc5 c0330000 9090ebeb 8b909090 ec8b55ff 458b5151 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0e95e850'
76ca0cc9 9090ebeb 8b909090 ec8b55ff 458b5151 5d8b530c msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330000'
76ca0ccd 8b909090 ec8b55ff 458b5151 5d8b530c 358b5614 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '9090ebeb'
76ca0cd1 ec8b55ff 458b5151 5d8b530c 358b5614 76ca06c8 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b909090'
76ca0cd5 458b5151 5d8b530c 358b5614 76ca06c8 087d8b57 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ec8b55ff'
76ca0cd9 5d8b530c 358b5614 76ca06c8 087d8b57 8df84589 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '458b5151'
76ca0cdd 358b5614 76ca06c8 087d8b57 8df84589 6a501445 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d8b530c'
76ca0ce1 76ca06c8 087d8b57 8df84589 6a501445 fc458d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '358b5614'
76ca0ce5 087d8b57 8df84589 6a501445 fc458d40 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '76ca06c8'
76ca0ce9 8df84589 6a501445 fc458d40 f8458d50 5d895750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '087d8b57'
76ca0ced 6a501445 fc458d40 f8458d50 5d895750 85d6fffc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8df84589'
76ca0cf1 fc458d40 f8458d50 5d895750 85d6fffc 8d197dc0 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445'
76ca0cf5 f8458d50 5d895750 85d6fffc 8d197dc0 6a501445 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d40'
76ca0cf9 5d895750 85d6fffc 8d197dc0 6a501445 fc458d04 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50'
76ca0cfd 85d6fffc 8d197dc0 6a501445 fc458d04 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d895750'
76ca0d01 8d197dc0 6a501445 fc458d04 f8458d50 d6ff5750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '85d6fffc'
76ca0d05 6a501445 fc458d04 f8458d50 d6ff5750 8c0fc085 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8d197dc0'
76ca0d09 fc458d04 f8458d50 d6ff5750 8c0fc085 0000009f kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445'
76ca0d0d f8458d50 d6ff5750 8c0fc085 0000009f a814458b kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d04'
76ca0d11 d6ff5750 8c0fc085 0000009f a814458b a85675cc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50'
76ca0d15 8c0fc085 0000009f a814458b a85675cc 8d407503 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'd6ff5750'
76ca0d19 00000000 a814458b a85675cc 8d407503 53500845 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8c0fc085'


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>


ID: 57733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Wee Todd Didd

Send message
Joined: 6 Jan 08
Posts: 1
Credit: 839,210
RAC: 0
Message 57737 - Posted: 9 Dec 2008, 11:46:18 UTC

Same here. cs_vanilla* errors.
Client 1.45

Using a Athlon 64 with 2g ram.

Most do pass though.




https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194285701
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194171642
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194109205
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=193908266
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=193528547
ID: 57737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 57743 - Posted: 9 Dec 2008, 13:50:57 UTC

Well, I've now had a cs_vanilla error 193 on a quad with 4GB:

https://boinc.bakerlab.org/rosetta/result.php?resultid=212454329

And this quad also had error 193 with two loopbuild_reference_hombench WUs:

https://boinc.bakerlab.org/rosetta/result.php?resultid=212758774
https://boinc.bakerlab.org/rosetta/result.php?resultid=212602760
ID: 57743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 57750 - Posted: 9 Dec 2008, 17:59:38 UTC

cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_14875_0 died at 11355.2 seconds with -1073741819 (0xc0000005) error. I had 8 tasks that completed ok before this.
ID: 57750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 57754 - Posted: 9 Dec 2008, 21:03:33 UTC

This is my very first time to find a WU that ended with a Unhandled Exception Detected...:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_37184_0
ran for 11603.2 seconds when it errored with exit status: -1073741819 (0xc0000005)

Windows XP-home, Boinc 5.10.45

Path7.
ID: 57754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 57760 - Posted: 10 Dec 2008, 2:50:11 UTC

Now I've seen some cs_vanilla errors on my biggest memory computer, an 8GB quad. This can't be due to running out of memory, (unless they're hitting the limit of a 32 bit processes address space).

https://boinc.bakerlab.org/rosetta/result.php?resultid=213025220
https://boinc.bakerlab.org/rosetta/result.php?resultid=212933676
https://boinc.bakerlab.org/rosetta/result.php?resultid=212854713
ID: 57760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57761 - Posted: 10 Dec 2008, 2:52:48 UTC

THanks for all the error reports! I think we've found the issue here. THis was damn tricky to find since, for some reason, it doesnt appear to occur on linux plattforms even nearly as frequently as on mAC and windows. I ran the equivalent of several thousand WUs on our local cluster and didnt have a single job crash.

But i think we've found at least one issue by testing on our limited windows/mac resources, and a bug fix is going out to ralph tonight or tomorrow morning depending on how much caffeine i can get hold of.

Our aim is to get mini inline with old rosetta in terms of error rate as soon as we can!

Thanks for all the feedback, it totally helps finding these bugs!

Mike

http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 57795 - Posted: 11 Dec 2008, 5:07:52 UTC - in response to Message 57714.  

Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away. In fact, the same workunits returned with very good success rate on the testing server. We are investigating why the alpha testing server did not catch such errors in the first place. This is another unfortunate incident which will be a new lesson for us. Sorry for any inconvenience this has brought to you.

Is this still the case or have they been corrected and re-issued?

I ask this because more are coming through and I just had one crash out on me. I aborted the rest, just in case, to save wasting processing time.
ID: 57795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 57798 - Posted: 11 Dec 2008, 10:38:46 UTC - in response to Message 57796.  

If your computer used any processing time at all on the unit it must have been another error. Because in this particular case the tasks fail to process at all

Of course, thanks. I forgot. Looks like they were corrected then - crashed out after 25 minutes. I'll let them run.
ID: 57798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57799 - Posted: 11 Dec 2008, 10:48:44 UTC

ID: 57799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 57807 - Posted: 11 Dec 2008, 18:10:40 UTC

Task 212871743
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_35341_0
compute error
died at 12807.86 seconds with the usual (0xc0000005)error

1co4A_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1co4A-_5476_95_1 died
CPU time 0
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Illegal value for integer option -run:jran specified:

</stderr_txt>


task 212923776
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_35553_1
died at 3373.781 seconds with the usual (0xc0000005) error


1dsvA_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1dsvA-_5476_655_1
CPU time 0
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Illegal value for integer option -run:jran specified:

</stderr_txt>
]]>


Task 212986916
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_mth1598_olange_5388_38521_1
died at 1881.563 seconds with the usual (0xc0000005) error

Task 213039289
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_42252_0
died at 5740.828 seconds with the usual (0xc0000005) error

Aborted the remaing vanila task, to many compute errors.
ID: 57807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erwin Schlonz
Avatar

Send message
Joined: 20 May 07
Posts: 5
Credit: 203,397
RAC: 0
Message 57814 - Posted: 12 Dec 2008, 12:18:20 UTC

<core_client_version>6.2.19</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
# cpu_run_time_pref: 86400

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000


https://boinc.bakerlab.org/rosetta/result.php?resultid=213272398
https://boinc.bakerlab.org/rosetta/result.php?resultid=213233743
https://boinc.bakerlab.org/rosetta/result.php?resultid=213221227
https://boinc.bakerlab.org/rosetta/result.php?resultid=212971112
https://boinc.bakerlab.org/rosetta/result.php?resultid=213385263

The last one was after 18 (!) hours of computation. 18 hours of wasted eletric power.

That's it. I will suspend to participate until the application runs more stable. For that I will set all my computers to not to download any further workunits on monday. Maybe I will come around next spring to see if things are working again.

Auf Wiedersehen
Erwin Schlonz

ID: 57814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Minirosetta v1.45 bug thread



©2024 University of Washington
https://www.bakerlab.org