Problems with Minirosetta v1.54

Message boards : Number crunching : Problems with Minirosetta v1.54

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

AuthorMessage
trick@planet3dnow

Send message
Joined: 21 Feb 09
Posts: 8
Credit: 53,370
RAC: 0
Message 60260 - Posted: 22 Mar 2009, 2:57:01 UTC
Last modified: 22 Mar 2009, 3:01:21 UTC

hi!
as already posted here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4771 on my pc (this one here): https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1012657
i get lots of validate errors and several client errors (too much to link each of them here). the usual symptome is greatly increased processing time. the work units should run 3 hours, but they run for 7 hours.

when i notice that a work unit takes much too long, should i abort it? or let it run until it fails to validate after 7 hours?
ID: 60260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alberthuang

Send message
Joined: 5 Dec 05
Posts: 6
Credit: 171,257
RAC: 0
Message 60261 - Posted: 22 Mar 2009, 3:12:53 UTC

My computer's OS is Windows XP SP3, using the BOINC manager version 5.10.45. It computed two workunits (1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76 and lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652) with minirosetta version 1.54, and both of them showed compute error at last. Of course both of these workunits were invalid.

The former one (workunit 1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76) spent more than 4.5 hours CPU time in my computer. And a windows message showed that Windows C++ Runtime error when this workunit crashed. When this condition happened, I was using Mozilla Firefox browser V 3.0. And the Mozilla Firefox browser also accidently closed almost at the same time. The task detail is in the following:
Task ID 234173364
Name 1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76_0
Workunit 213483545
Created 9 Mar 2009 7:21:46 UTC
Sent 9 Mar 2009 7:23:00 UTC
Received 17 Mar 2009 8:07:24 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 224205
Report deadline 19 Mar 2009 7:23:00 UTC
CPU time 17563.45
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
BOINC:: Initializing ... ok.
[2009- 3-16 14:16:21:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _MOO18U9X9X_00001
# cpu_run_time_pref: 21600
Starting work on structure: _MOO18U9X9X_00002
Starting work on structure: _MOO18U9X9X_00003
Starting work on structure: _MOO18U9X9X_00004
BOINC:: Initializing ... ok.
[2009- 3-17 11:23:26:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 21600
Starting work on structure: _MOO18U9X9X_00004
Continuing computation from checkpoint: chk_S_MOO18U9X9X_00000004_ClassicAbinitio__stage_1 ... success!
Continuing computation from checkpoint: chk_S_MOO18U9X9X_00000004_ClassicAbinitio__stage_2 ... success!
Starting work on structure: _MOO18U9X9X_00005
Starting work on structure: _MOO18U9X9X_00006
Starting work on structure: _MOO18U9X9X_00007
Starting work on structure: _MOO18U9X9X_00008
Starting work on structure: _MOO18U9X9X_00009
Starting work on structure: _MOO18U9X9X_00010


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.5.0


Dump Timestamp : 03/17/09 16:01:02
Install Directory : C:Program FilesBOINC
Data Directory : C:Program FilesBOINC
Project Symstore :
Loaded Library : C:Program FilesBOINC\dbghelp.dll
Loaded Library : C:Program FilesBOINC\symsrv.dll
Loaded Library : C:Program FilesBOINC\srcsrv.dll
LoadLibraryA( C:Program FilesBOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:Program FilesBOINCslots1;C:Program FilesBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore


ModLoad: 00400000 00724000 C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_1.54_windows_intelx86.exe (-nosymbols- Symbols Loaded)
Linked PDB Filename : D:boinc_buildminirosetta_windowsminiVisual StudioBoincReleaseminirosetta_1.54_windows_intelx86.pdb

ModLoad: 7c920000 00094000 C:WINDOWSsystem32ntdll.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : ntdll.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2111)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 7c800000 0011f000 C:WINDOWSsystem32kernel32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : kernel32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2111)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 77d10000 0008f000 C:WINDOWSsystem32USER32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : user32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 77ef0000 00049000 C:WINDOWSsystem32GDI32.dll (5.1.2600.5698) (PDB Symbols Loaded)
Linked PDB Filename : gdi32.pdb
File Version : 5.1.2600.5698 (xpsp_sp3_gdr.081022-1932)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5698

ModLoad: 77da0000 000a7000 C:WINDOWSsystem32ADVAPI32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : advapi32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 77e50000 00092000 C:WINDOWSsystem32RPCRT4.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : rpcrt4.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2108)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512

ModLoad: 77fc0000 00011000 C:WINDOWSsystem32Secur32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : secur32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512

ModLoad: 76300000 0001d000 C:WINDOWSsystem32IMM32.DLL (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : imm32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512

ModLoad: 621f0000 00009000 C:WINDOWSsystem32LPK.DLL (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : lpk.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512

ModLoad: 73fa0000 0006b000 C:WINDOWSsystem32USP10.dll (1.420.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : usp10.pdb
File Version : 1.0420.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0420.2600.5512

ModLoad: 76cb0000 00020000 C:WINDOWSsystem32NTMARTA.DLL (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : ntmarta.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 77be0000 00058000 C:WINDOWSsystem32msvcrt.dll (7.0.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : msvcrt.pdb
File Version : 7.0.2600.5512 (xpsp.080413-2111)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 7.0.2600.5512

ModLoad: 76990000 0013d000 C:WINDOWSsystem32ole32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : ole32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2108)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 71b70000 00013000 C:WINDOWSsystem32SAMLIB.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : samlib.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512

ModLoad: 76f30000 0002c000 C:WINDOWSsystem32WLDAP32.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : wldap32.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Windows(R) Operating System
Product Version : 5.1.2600.5512

ModLoad: 0b610000 00115000 C:Program FilesBOINCdbghelp.dll (6.6.7.5) (PDB Symbols Loaded)
Linked PDB Filename : dbghelp.pdb
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.6.0007.5

ModLoad: 0b830000 00083000 C:Program FilesBOINCsymsrv.dll (6.6.7.5) (PDB Symbols Loaded)
Linked PDB Filename : symsrv.pdb
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.6.0007.5

ModLoad: 0b8c0000 0003a000 C:Program FilesBOINCsrcsrv.dll (6.6.7.5) (PDB Symbols Loaded)
Linked PDB Filename : srcsrv.pdb
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.6.0007.5

ModLoad: 77bd0000 00008000 C:WINDOWSsystem32version.dll (5.1.2600.5512) (PDB Symbols Loaded)
Linked PDB Filename : version.pdb
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 5.1.2600.5512



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 4199, Write: 0, Other 4119

- I/O Transfers Counters -
Read: 0, Write: 283156, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 29464, QuotaPeakPagedPoolUsage: 29484
QuotaNonPagedPoolUsage: 3856, QuotaPeakNonPagedPoolUsage: 5104

- Virtual Memory Usage -
VirtualSize: 288505856, PeakVirtualSize: 294109184

- Pagefile Usage -
PagefileUsage: 177410048, PeakPagefileUsage: 180875264

- Working Set Size -
WorkingSetSize: 44548096, PeakWorkingSetSize: 142151680, PageFaultCount: 4153040

*** Dump of thread ID 1256 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 929636736.000000, User Time: 118402555904.000000, Wait Time: 1696694.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024

- Registers -
eax=097646c8 ebx=097646cc ecx=038ffe20 edx=038ffe20 esi=097646a0 edi=00000000
eip=0055b8c1 esp=0012c02c ebp=0ab9f938
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202

- Callstack -
ChildEBP RetAddr Args to Child
0012c048 0061138e 00000000 fa25f9aa 0b235e50 097646a0 minirosetta_1.54_windows_intelx!+0x0
0012c068 006113fe 0b235e50 fa25f98a 00000001 097646a0 minirosetta_1.54_windows_intelx!+0x0
00000000 00000000 00000000 00000000 00000000 00000000 minirosetta_1.54_windows_intelx!+0x0

*** Dump of thread ID 672 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 1902736.000000, User Time: 6909936.000000, Wait Time: 1696719.000000

- Registers -
eax=0164fb44 ebx=00000000 ecx=fa3739f2 edx=00000000 esi=00000000 edi=0164ff70
eip=7c92e4f4 esp=0164ff40 ebp=0164ff98
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0164ff3c 7c92d1fc 7c8023f1 00000000 0164ff70 00000000 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
0164ff40 7c8023f1 00000000 0164ff70 00000000 7c802446 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0]
0164ff98 7c802455 00000064 00000000 0164ffec 00411a7b kernel32!_SleepEx@8+0x0
0164ffa8 00411a7b 00000064 00000000 7c80b713 00000000 kernel32!_Sleep@4+0x0
0164ffec 00000000 00411a70 00000000 00000000 2f73fcd8 minirosetta_1.54_windows_intelx!+0x0

*** Dump of thread ID 1808 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 100144.000000, User Time: 0.000000, Wait Time: 1696642.000000

- Registers -
eax=0272fe28 ebx=021c4a01 ecx=0272e734 edx=00001f9a esi=00000000 edi=0272fdf8
eip=7c92e4f4 esp=0272fdc8 ebp=0272fe20
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0272fdc4 7c92d1fc 7c8023f1 00000000 0272fdf8 00000122 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
0272fdc8 7c8023f1 00000000 0272fdf8 00000122 09778748 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0]
0272fe20 7c802455 000007d0 00000000 7c802446 0079aa61 kernel32!_SleepEx@8+0x0
0272fe30 0079aa61 000007d0 f845c7b2 0012bfe0 021c4a38 kernel32!_Sleep@4+0x0
0272fe38 f845c7b2 0012bfe0 021c4a38 0272ff6c 021c4a38 minirosetta_1.54_windows_intelx!+0x0
0272fe3c 0012bfe0 021c4a38 0272ff6c 021c4a38 00000001 minirosetta_1.54_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f845c7b2'
0272ff3c 7c937de9 7c937ea0 7c800000 0272ff7c 00000000 minirosetta_1.54_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0012bfe0'
0272ffe0 7c80b71f 00000000 00000000 00000000 0041eb46 ntdll!_LdrpGetProcedureAddress@20+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c937de9'
0272ffe4 00000000 00000000 00000000 0041eb46 021c4a38 kernel32!_BaseThreadStart@8+0x0 FPO: [0,0,0] SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c80b71f'


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 32.9406239634204
Granted credit 0
application version 1.54

The other one (workunit lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652) only spent less than a half hour in my computer, but the error message did not show when it crashed. And I also used Mozilla Firefox browser V 3.0 then, strangely the Mozilla Firefox browser did not accidently closed at the same time. The task detail is in the following:
Task ID 236172160
Name lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652_1
Workunit 215347031
Created 17 Mar 2009 8:05:59 UTC
Sent 17 Mar 2009 8:07:24 UTC
Received 20 Mar 2009 17:36:16 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 224205
Report deadline 27 Mar 2009 8:07:24 UTC
CPU time 1436.896
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
BOINC:: Initializing ... ok.
[2009- 3-21 1: 5:10:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/mtyka_lr5_D_score12.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/mtyka_lr5_D_score12.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_2hsb.out.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/lr5_2hsb.out.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Setting database description ...
Setting up checkpointing ...
Initializing score function:
Initializing relax mover:
Starting protocol...
Silent Output Mode
Jobdist startup..
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: S_shuffle_00001 <--- S_00002_0000216_0_test_6.0.out
Fullatom mode ..
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.5.0


Dump Timestamp : 03/21/09 01:34:14
Install Directory : C:Program FilesBOINC
Data Directory : C:Program FilesBOINC
Project Symstore :
LoadLibraryA( C:Program FilesBOINC\dbghelp.dll ): GetLastError = 1455
LoadLibraryA( dbghelp.dll ): GetLastError = 1455
*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 10715, Write: 0, Other 3493

- I/O Transfers Counters -
Read: 0, Write: 200794, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 29464, QuotaPeakPagedPoolUsage: 29464
QuotaNonPagedPoolUsage: 4416, QuotaPeakNonPagedPoolUsage: 5664

- Virtual Memory Usage -
VirtualSize: 288079872, PeakVirtualSize: 296271872

- Pagefile Usage -
PagefileUsage: 192016384, PeakPagefileUsage: 208936960

- Working Set Size -
WorkingSetSize: 136130560, PeakWorkingSetSize: 213221376, PageFaultCount: 366777

*** Dump of thread ID 1164 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 93334208.000000, User Time: 14287143936.000000, Wait Time: 2525130.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024


*** Dump of thread ID 3344 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 300432.000000, User Time: 300432.000000, Wait Time: 2525124.000000


*** Dump of thread ID 2416 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 100144.000000, Wait Time: 2524973.000000



*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.76822531352161
Granted credit 2.76822531352161
application version 1.54

And another computer computed this workunit also computed error. The message is in the following:
Task ID 236168980
Name lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652_0
Workunit 215347031
Created 17 Mar 2009 7:49:09 UTC
Sent 17 Mar 2009 7:50:56 UTC
Received 17 Mar 2009 8:05:56 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -185 (0xffffff47)
Computer ID 868926
Report deadline 27 Mar 2009 7:50:56 UTC
CPU time 0
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Input file minirosetta_1.54_windows_intelx86.exe missing or invalid: -163
</message>
]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version 1.54


ID: 60261 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60265 - Posted: 22 Mar 2009, 4:00:06 UTC - in response to Message 60260.  
Last modified: 22 Mar 2009, 4:13:49 UTC

hi!
as already posted here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4771 on my pc (this one here): https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1012657
i get lots of validate errors and several client errors (too much to link each of them here). the usual symptome is greatly increased processing time. the work units should run 3 hours, but they run for 7 hours.

when i notice that a work unit takes much too long, should i abort it? or let it run until it fails to validate after 7 hours?


I can only tell you that the v1.54 mini version now includes code both to end such tasks sooner, and to report information useful to help determine why those models are running so long. Prior to these enhancements, the watchdog would wait until the task ran for 3 or 4 times longer then the runtime preference, and the results when such a watchdog end was made were not as useful in studying what occurred.

I've been asking why such tasks are not receiving credit from the nightly credit granting script, but have not yet received any word.
Rosetta Moderator: Mod.Sense
ID: 60265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 60268 - Posted: 22 Mar 2009, 7:38:05 UTC

I just tried resetting the Rosetta@home project and got these error messages (with no Rosetta@home workunit running, none downloaded but not run, and the last one already reported):

3/22/2009 2:03:10 AM|rosetta@home|Resetting project
3/22/2009 2:03:16 AM|rosetta@home|[error] Couldn't delete file projects/boinc.bakerlab.org_rosetta/minirosetta_1.54_windows_intelx86.exe

Attempts to delete the file manually also failed, with error messages about being unable to move it to the deleted items folder.

I currently have Rosetta@home on no new tasks, to keep it this way until you can give me some usable advice about how to finish the reset.

I run BOINC 6.2.28 under Vista SP1.
ID: 60268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,084
RAC: 1,976
Message 60275 - Posted: 22 Mar 2009, 18:43:38 UTC
Last modified: 22 Mar 2009, 18:45:12 UTC

this task 5croA_BOINC_ABINITIO_IGNORE_THE_REST-MOO56-S25-11-S3-13--5croA-_7876_63 crashed on 2 computers and did not reply on another.

I got a validate error, another person got a compute error and the third never replied with the task error or completion.
ID: 60275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60279 - Posted: 23 Mar 2009, 0:48:30 UTC

robertmiles
Sounds like a reboot is in order to clear all of the locks. I've never heard of that happening before. Perhaps something like anti-virus software has taken a lock on the file to perform a scan?

Curious, why were you resetting the project?
Rosetta Moderator: Mod.Sense
ID: 60279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 60285 - Posted: 23 Mar 2009, 14:43:41 UTC - in response to Message 60279.  

robertmiles
Sounds like a reboot is in order to clear all of the locks. I've never heard of that happening before. Perhaps something like anti-virus software has taken a lock on the file to perform a scan?

Curious, why were you resetting the project?


A reboot may have helped - it was part of the procedure I described trying over on Ralph@home, and was able to remove the lockfiles for a while.

I was resetting the project because that's what the error messages from the lockfile problem suggest I may need to do. However, it doesn't seem to have helped enough, since the first Rosetta@home workunit my machine completed since the reset had the lockfile problem again:

https://boinc.bakerlab.org/rosetta/result.php?resultid=237629070

Two more Rosetta@home workunits that started later aren't finished, but at least don't seem to have run into the lockfile problem yet.

My antivirus program, and also my three antispyware programs, are able to finish scanning a file in much less time than it needs for Rosetta@home and Ralph@home workunits to fail due to too many restarts from a lockfile problem, so I'd expect a lock from any of them to cause lockfile error messages for only a short time, followed by a successful minirosetta restart.

A suggestion - modify minirosetta to check for the lockfile as it starts up (preferably before any effort to create one), report the results of this check if it can, and if this first check for the lockfile finds one, don't waste as much time restarting over and over before declaring the workunit failed.

Another suggestion - modify minirosetta to report which slot it ran in, since the problem looks like it may be specific to workunits assigned to specific slots, due to what looks like its inability to remove lockfiles left by previous workunits assigned to the same slot but already completed since the last reboot.

I leave BOINC running nearly 24 hours a day, often days between reboots, which may have something to do with why I'm seeing the lockfile problem as often as I do.

I'm still using BOINC 6.2.28 under 32-bit Vista SP1.
ID: 60285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,816,664
RAC: 863
Message 60294 - Posted: 24 Mar 2009, 6:00:33 UTC - in response to Message 60285.  

robertmiles
Sounds like a reboot is in order to clear all of the locks. I've never heard of that happening before. Perhaps something like anti-virus software has taken a lock on the file to perform a scan?

Curious, why were you resetting the project?


A reboot may have helped - it was part of the procedure I described trying over on Ralph@home, and was able to remove the lockfiles for a while.

I was resetting the project because that's what the error messages from the lockfile problem suggest I may need to do. However, it doesn't seem to have helped enough, since the first Rosetta@home workunit my machine completed since the reset had the lockfile problem again:

https://boinc.bakerlab.org/rosetta/result.php?resultid=237629070

Two more Rosetta@home workunits that started later aren't finished, but at least don't seem to have run into the lockfile problem yet.

My antivirus program, and also my three antispyware programs, are able to finish scanning a file in much less time than it needs for Rosetta@home and Ralph@home workunits to fail due to too many restarts from a lockfile problem, so I'd expect a lock from any of them to cause lockfile error messages for only a short time, followed by a successful minirosetta restart.

A suggestion - modify minirosetta to check for the lockfile as it starts up (preferably before any effort to create one), report the results of this check if it can, and if this first check for the lockfile finds one, don't waste as much time restarting over and over before declaring the workunit failed.

Another suggestion - modify minirosetta to report which slot it ran in, since the problem looks like it may be specific to workunits assigned to specific slots, due to what looks like its inability to remove lockfiles left by previous workunits assigned to the same slot but already completed since the last reboot.

I leave BOINC running nearly 24 hours a day, often days between reboots, which may have something to do with why I'm seeing the lockfile problem as often as I do.

I'm still using BOINC 6.2.28 under 32-bit Vista SP1.


You might be interested in this announcement by Bernd over at Einstein@home. He has made an Einstein Windows app specifically to collect more info on the CPU throttling=too many exits/can't acquire lockfile errors. Hopefully his discoveries will prove useful here on rosetta@home as well.

ID: 60294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60300 - Posted: 24 Mar 2009, 14:56:45 UTC
Last modified: 24 Mar 2009, 14:58:33 UTC

This task is currently using 496MB on my machine. Max was 536MB. It is called 2P09A_BOINC_MPZN_vanilla_abrelax_9106_6681_0

What is the status now that the minimum recommended memory is 512MB? Are there still WUs created that will only go to systems with more? My machine has 2GB. But was wondering if this task is using more then planned.

That task seems to be running normally otherwise. It is 22hrs in to my 24hr preference.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60319 - Posted: 25 Mar 2009, 15:28:05 UTC

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338

Task ID:237330352
ID: 60319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 60337 - Posted: 27 Mar 2009, 3:16:16 UTC

A workunit that ran for a while, then ran into the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=238431267

Two of the five subdirectories under the slots directory contain a large number of files, and appear to be for the two workunits now in progress. Two are empty.

The other subdirectory contains only 3 files, and appears to be left over from this failed workunit.

File boinc_lockfile appears to be empty, since its size is zero. It's marked as still is use, though, so I can't check this.

The contents of stderr.txt start with this:

BOINC:: Initializing ... ok.
[2009- 3-25 22:55: 2:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip
<unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./>
Firstarg=true; pp=-d./
firstarg: <-d./>
End of unzipping.
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _U9X3X_00001
# cpu_run_time_pref: 43200
Starting work on structure: _U9X3X_00002
Starting work on structure: _U9X3X_00003
Starting work on structure: _U9X3X_00004
Starting work on structure: _U9X3X_00005
Starting work on structure: _U9X3X_00006
Starting work on structure: _U9X3X_00007
Starting work on structure: _U9X3X_00008
Starting work on structure: _U9X3X_00009
Starting work on structure: _U9X3X_00010
Starting work on structure: _U9X3X_00011
Starting work on structure: _U9X3X_00012
Starting work on structure: _U9X3X_00013
Starting work on structure: _U9X3X_00014
Starting work on structure: _U9X3X_00015
Starting work on structure: _U9X3X_00016
Starting work on structure: _U9X3X_00017
Starting work on structure: _U9X3X_00018
Starting work on structure: _U9X3X_00019
Starting work on structure: _U9X3X_00020
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting


The contents of stdout.txt are:

Created shared memory segment
Created semaphore


Do these results mean that Rosetta@home never tries to clear up these three files for failed workunits? Should it? They appear to prevent any workunits from Rosetta@home or Ralph@home from being able to run in this slot until the next reboot - often meaning a few days for me. I haven't seen them have a similar effect on workunits from other BOINC projects, though.
ID: 60337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 60341 - Posted: 27 Mar 2009, 19:50:11 UTC
Last modified: 27 Mar 2009, 19:50:29 UTC

Can anyone shed some light on this WU, I just started crunching for Rosetta, it didn't report any client side errors.

217630163

Thanks
ID: 60341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 60343 - Posted: 27 Mar 2009, 20:08:33 UTC

Is your CPU overclocked?
ID: 60343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 60344 - Posted: 27 Mar 2009, 20:32:06 UTC

Nope here is some system info:
Amd Phenom x4 9600 (not overclocked)
3GB RAM
Windows Vista Home Premium 32-bit
BOINC version 6.4.7
ID: 60344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 60388 - Posted: 30 Mar 2009, 16:54:44 UTC

Validate error on this workunit 218443282 on Mac.

cc_natcst_1_8_nocstinrelax_hb_t327__IGNORE_THE_REST_2FSWA_7_9505_20_1

An unlikely 99 decoys from 99 attempts: a wingman had the same problem.

Starting work on structure: _2FSWA_7_00098
Starting work on structure: _2FSWA_7_00099
======================================================
DONE :: 1 starting structures 145.451 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>


ID: 60388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60418 - Posted: 31 Mar 2009, 17:11:41 UTC

ID: 60418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 60430 - Posted: 1 Apr 2009, 13:39:06 UTC
Last modified: 1 Apr 2009, 13:45:56 UTC

frb_1_8_bestfrag_hb_t313___IGNORE_THE_REST_1F9TA_5_9696_15_0

7 hours running (3hr default), no decoys, Validate Error.

I've been noticing these "frb" WUs are singularly unsuccessful. What are the stats on their successful completion? I'd say they were minimal.
ID: 60430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,536,805
RAC: 15,887
Message 60432 - Posted: 1 Apr 2009, 18:18:53 UTC - in response to Message 60430.  

frb_1_8_bestfrag_hb_t313___IGNORE_THE_REST_1F9TA_5_9696_15_0

7 hours running (3hr default), no decoys, Validate Error.

I've been noticing these "frb" WUs are singularly unsuccessful. What are the stats on their successful completion? I'd say they were minimal.

Oh, I don't know...

frb_1_8_ecut_hb_t322___IGNORE_THE_REST_1VPMA_12_9712_12_0

# cpu_run_time_pref: 14400
CPU time 14099.2

Claimed credit 69.0173659142213
Granted credit 229.296476006251

No complaints here!!! :)
ID: 60432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 60460 - Posted: 2 Apr 2009, 17:46:57 UTC
Last modified: 2 Apr 2009, 17:50:13 UTC

Success & Error on the same WU

Hello all,

This WU:
frb_0_8_el_chosen_hb_t312___IGNORE_THE_REST_1XV2A_15_9667_54_0

has official been reported as: Outcome = Success.
However the WU ran only for 4309.559 seconds, cpu_run_time_pref: 21600 and
ended with an error:
Starting work on structure: _1XV2A_15_00008
interpolate rotamers bin out of range: GLN -107.207 180 -7e-005 -6.1e-005 -5.1e-005
34 36 8 9 37 2 0.2793 0
ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrccore/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 593
called boinc_finish

Have a nice day,
Path7.
ID: 60460 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 60461 - Posted: 2 Apr 2009, 19:25:31 UTC

Another WU with 99 successful decoys

ala_2he4_p40-1.ala.ppk_dock_random.xml_RANDOM12_BOUND_DOCK_9895_843_0

# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 6841.62 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

My preferred run time is 6 hours, but this one completed in less than 2. Either this is an extremely quick model or something odd occurred.
ID: 60461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

Message boards : Number crunching : Problems with Minirosetta v1.54



©2024 University of Washington
https://www.bakerlab.org