Computation errors

Message boards : Number crunching : Computation errors

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile David703

Send message
Joined: 17 Jul 17
Posts: 5
Credit: 64,485
RAC: 0
Message 90595 - Posted: 30 Mar 2019, 18:06:19 UTC

Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples:
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314770
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314768
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

How can I keep these errors from happening?
ID: 90595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,341,779
RAC: 15,357
Message 90599 - Posted: 31 Mar 2019, 16:29:03 UTC - in response to Message 90595.  

Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples:
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314770
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314768
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

How can I keep these errors from happening?


Rosetta developers were quite sloppy in their allocation and use of memory.

Task 1065460662 ran out of memory.
https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

The other two error out with "Funzione non corretta" or "incorrect function"

When one WU runs out of memory, other WU may get strange error messages from function calls as developers don't always check the return results of all system calls.

The WU you are running are 64-bit and sometimes take large amounts of memory ... frequently over a GB each.

8gb should be enough to run 4 Rosetta 64-bit WU, so I would examine how memory is being used and change the workload.
Buy more memory if practical.
Lower the number of Rosetta WU running simultaneously with app_config.xml or BOINC -> OPTIONS -> COMPUTING PREFERENCES -> USAGE LIMITS
ID: 90599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David703

Send message
Joined: 17 Jul 17
Posts: 5
Credit: 64,485
RAC: 0
Message 90600 - Posted: 31 Mar 2019, 19:25:31 UTC - in response to Message 90599.  

Ok, thank you!
ID: 90600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 90934 - Posted: 24 Jul 2019, 9:15:55 UTC
Last modified: 24 Jul 2019, 9:16:45 UTC

Seems unlikely they've ever addressed this problem, eh? I see them pretty often. Especially annoying when they have run up 8 hours of effort before crashing, presumably with no points earned. And no, at this point I don't care enough to do the searching to try to figure out if the points were granted. I don't even care enough to read the rest of the thread beyond the Subject: and glancing at a couple of the posts.

Latest example:

Application
Rosetta Mini 3.78
Name
start_close_HHH_rd4_0056.min_rise1.83_whole_pass_aagb.bp_20190406150644_0001_0001_0001_0003_0001_0001_fragments_fold_SAVE_ALL_OUT_833066_1053
State
Computation error
Received
2019年07月22日 08時13分16秒
Report deadline
2019年07月30日 08時13分11秒
Estimated computation size
80,000 GFLOPs
CPU time
07:49:11
Elapsed time
07:59:03
Executable
minirosetta_3.78_x86_64-pc-linux-gnu
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 90934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1859
Credit: 8,144,596
RAC: 7,834
Message 90937 - Posted: 24 Jul 2019, 13:58:09 UTC - in response to Message 90934.  

Latest example:

Application
Rosetta Mini 3.78
Name
start_close_HHH_rd4_0056.min_rise1.83_whole_pass_aagb.bp_20190406150644_0001_0001_0001_0003_0001_0001_fragments_fold_SAVE_ALL_OUT_833066_1053
State
Computation error


Rosetta Mini 3.78 was release in October 2017.
Since then, a lot of errors and problems.
No debug, no new version. Nothing
ID: 90937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 54
Credit: 20,058,207
RAC: 4,375
Message 90943 - Posted: 26 Jul 2019, 0:25:06 UTC

I'd rather have the Rosetta mini tasks vs the Rosetta version that runs for 5h then has an error when the set run time is 1hr.
ID: 90943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
blyons123

Send message
Joined: 8 Apr 14
Posts: 4
Credit: 118,348
RAC: 0
Message 91113 - Posted: 12 Sep 2019, 14:42:48 UTC

Happened again after resetting project!?

9/12/2019 10:20:17 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file
9/12/2019 10:20:17 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:22:08 PM | Rosetta@home | Task Longxing_ems_ferrM_2260.11745_fold_SAVE_ALL_OUT_863531_13_0 exited with zero status but no 'finished' file
9/12/2019 10:22:08 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:23:49 PM | Rosetta@home | Task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 exited with zero status but no 'finished' file
9/12/2019 10:23:49 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:24:33 PM | Rosetta@home | Task bc96_4h_hb1_1620_fold_SAVE_ALL_OUT_857813_1040_0 exited with zero status but no 'finished' file
9/12/2019 10:24:33 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:25:54 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file
9/12/2019 10:25:54 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:30:08 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file
9/12/2019 10:30:08 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:34:11 PM | Rosetta@home | Task bc96_EHEE_hb1_2413_fold_SAVE_ALL_OUT_857815_1040_0 exited with zero status but no 'finished' file
9/12/2019 10:34:11 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:36:06 PM | Rosetta@home | work fetch suspended by user
9/12/2019 10:36:15 PM | Rosetta@home | Task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 exited with zero status but no 'finished' file
9/12/2019 10:36:15 PM | Rosetta@home | If this happens repeatedly you may need to reset the project.
9/12/2019 10:36:32 PM | Rosetta@home | task bc96_4h_hb1_1620_fold_SAVE_ALL_OUT_857813_1040_0 suspended by user
9/12/2019 10:36:35 PM | Rosetta@home | Starting task foldit_2007855_0007_fold_and_dock_SAVE_ALL_OUT_849408_1557_0
9/12/2019 10:36:35 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 suspended by user
9/12/2019 10:36:35 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 resumed by user
9/12/2019 10:36:37 PM | Rosetta@home | task Longxing_ems_4hM_2152.10077_fold_SAVE_ALL_OUT_861867_13_0 suspended by user
9/12/2019 10:36:39 PM | Rosetta@home | Starting task Longxing_ems_ferrM_3025.11863_fold_SAVE_ALL_OUT_863659_13_0
9/12/2019 10:36:40 PM | Rosetta@home | task Longxing_ems_ferrM_2260.11745_fold_SAVE_ALL_OUT_863531_13_0 suspended by user
ID: 91113 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
blyons123

Send message
Joined: 8 Apr 14
Posts: 4
Credit: 118,348
RAC: 0
Message 91153 - Posted: 23 Sep 2019, 10:51:08 UTC

every mini task gives me this error.
9/23/2019 6:22:43 PM | Rosetta@home | Task Longxing_ems_ferrM_5178.12181_fold_SAVE_ALL_OUT_863970_24_0 exited with zero status but no 'finished' file
ID: 91153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 91157 - Posted: 24 Sep 2019, 16:00:33 UTC

I've also had two work units crash out today, one with this...

Exit status 1 (0x00000001) Unknown error code

... the other with this...

Exit status -529697949 (0xE06D7363) Unknown error code

No new tasks set for now.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 91157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 91158 - Posted: 24 Sep 2019, 16:39:04 UTC

rb_09_19_8636_8623_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_05_867741_37 failed with invalid chi angle on Windows

File: C:cygwin64homeboincRosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)

</stderr_txt>
[/url]

Task 1094602971 https://boinc.bakerlab.org/rosetta/result.php?resultid=109460297
Workunit 985919585 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=985919585
ID: 91158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 91159 - Posted: 24 Sep 2019, 20:43:54 UTC

Some errored out tasks, 10 in total. That's a lot of computing time gone. I'm not sure which file is in use or if it was Rosetta or Boinc. This machine has been up solid. Suspending for now.
https://boinc.bakerlab.org/rosetta/result.php?resultid=1095086805
https://boinc.bakerlab.org/rosetta/result.php?resultid=1095124487
https://boinc.bakerlab.org/rosetta/result.php?resultid=1095064719
ID: 91159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 91330 - Posted: 3 Nov 2019, 11:32:47 UTC

I have had four units crash out in recent days. One with "Aborted by Server" so I discount that one. The other three with "Out of Memory". I think this is because I was sent "Rosetta v4.07
windows_intelx86" to run the job, and not "Rosetta v4.07 windows_intelx86_64". Of wingmen on the failing jobs Others have crashed with the same error, except one, who completed the unit, but was running Rosetta v4.07 windows_intelx86_64. Obviously, a 64 bit system can access a much greater memory range than a 32 bit. The question that arises though, is why was I sent x86 and not x86_64? My system runs 64 bit Windows and has more memory installed and available to BOINC than the chap that completed the job without error.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 91330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 91443 - Posted: 7 Dec 2019, 14:40:10 UTC

I've got another couple of weird ones now. One is 0.586% done but has 16:08:53 elapsed and 114d 02:32:50 remaining increasing quite rapidly, the other 0.259% after 06:15:56 elapsed and 12:42:49 with the last digit flipping 48 - 49 - 48 - 49.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 91443 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 91570 - Posted: 17 Jan 2020, 18:13:51 UTC

I have two errored out tasks over here. Both ran for a day before showing up as invalid, which is amazingly frustrating. Message appears to be finish file too long for both. I just attached more machines (granted, with more memory able to handle the occasional usage up to 1 gb per core). I just hope they don't receive the same errors.

https://boinc.bakerlab.org/rosetta/result.php?resultid=1116966347
https://boinc.bakerlab.org/rosetta/result.php?resultid=1116965297
ID: 91570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile kaancanbaz

Send message
Joined: 28 Apr 06
Posts: 23
Credit: 2,994,358
RAC: 418
Message 91571 - Posted: 17 Jan 2020, 23:15:03 UTC

Maybe they are sending previously failed wus only. There is not much new work units available. I hope that they are about the release a new version and thats why they slowed down the wus.
ID: 91571 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,575,835
RAC: 14,294
Message 92719 - Posted: 31 Mar 2020, 9:07:35 UTC
Last modified: 31 Mar 2020, 9:20:00 UTC

I got a few "Out of memory" errors when i first started Rosetta, so i doubled my system RAM. I've now got 32GB, 95% available to BOINC, using all 6c/12t on the CPU.

Even so, i just had another WU end in error with an "Out of memory" Unhandled Exception error.

Edit-
So far, all computation errors have occurred with the Rosetta v4.07 windows_intelx86 application.



<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -529697949 (0xe06d7363)
</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe -run:protocol jd2_scripting -parser:protocol jhr_boinc_v2.xml @flags -in:file:silent 1uq3vg3u_Junior_HalfRoid_design2_COVID-19.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 1uq3vg3u_Junior_HalfRoid_design2_COVID-19.zip @1uq3vg3u_Junior_HalfRoid_design2_COVID-19.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2037194
Starting watchdog...
Watchdog active.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x76484192

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 03/31/20 17:11:54
Install Directory : 
Data Directory    : C:ProgramDataBOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots12;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore


ModLoad: 0000000000200000 0000000003413000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.07_windows_intelx86.exe (-exported- Symbols Loaded)
    Linked PDB Filename   : C:cygwinhomeboincRosetta_4.07mainsourceideVisualStudioBoincReleaserosetta_4.07_windows_intelx86.pdb

ModLoad: 0000000077030000 000000000019a000 C:WINDOWSSYSTEM32ntdll.dll (6.2.18362.719) (-exported- Symbols Loaded)
    Linked PDB Filename   : wntdll.pdb
    File Version          : 10.0.18362.329 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.329

ModLoad: 0000000075b60000 00000000000e0000 C:WINDOWSSystem32KERNEL32.DLL (6.2.18362.329) (-exported- Symbols Loaded)
    Linked PDB Filename   : wkernel32.pdb
    File Version          : 10.0.18362.329 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.329

ModLoad: 0000000076370000 00000000001fe000 C:WINDOWSSystem32KERNELBASE.dll (6.2.18362.719) (-exported- Symbols Loaded)
    Linked PDB Filename   : wkernelbase.pdb
    File Version          : 10.0.18362.329 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.329

ModLoad: 0000000076880000 000000000005e000 C:WINDOWSSystem32WS2_32.dll (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : ws2_32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000076a50000 00000000000bb000 C:WINDOWSSystem32RPCRT4.dll (6.2.18362.628) (-exported- Symbols Loaded)
    Linked PDB Filename   : wrpcrt4.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000074800000 0000000000020000 C:WINDOWSSystem32SspiCli.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : wsspicli.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000747f0000 000000000000a000 C:WINDOWSSystem32CRYPTBASE.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : cryptbase.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000076b10000 000000000005f000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.18362.295) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcryptprimitives.pdb
    File Version          : 10.0.18362.295 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.295

ModLoad: 0000000075ea0000 0000000000076000 C:WINDOWSSystem32sechost.dll (6.2.18362.693) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000074eb0000 0000000000197000 C:WINDOWSSystem32USER32.dll (6.2.18362.719) (-exported- Symbols Loaded)
    Linked PDB Filename   : wuser32.pdb
    File Version          : 10.0.17134.343 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.343

ModLoad: 0000000075180000 0000000000017000 C:WINDOWSSystem32win32u.dll (6.2.18362.719) (-exported- Symbols Loaded)
    Linked PDB Filename   : wwin32u.pdb
    File Version          : 10.0.18362.719 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.719

ModLoad: 0000000075310000 0000000000021000 C:WINDOWSSystem32GDI32.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : wgdi32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000751b0000 000000000015a000 C:WINDOWSSystem32gdi32full.dll (6.2.18362.719) (-exported- Symbols Loaded)
    Linked PDB Filename   : wgdi32full.pdb
    File Version          : 10.0.18362.719 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.719

ModLoad: 0000000075050000 000000000007c000 C:WINDOWSSystem32msvcp_win.dll (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.18362.387 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.387

ModLoad: 0000000075340000 000000000011f000 C:WINDOWSSystem32ucrtbase.dll (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.18362.387 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.387

ModLoad: 00000000767c0000 0000000000079000 C:WINDOWSSystem32ADVAPI32.dll (6.2.18362.329) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000074820000 00000000000bf000 C:WINDOWSSystem32msvcrt.dll (7.0.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.18362.1

ModLoad: 0000000076c70000 0000000000025000 C:WINDOWSSystem32IMM32.DLL (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : wimm32.pdb
    File Version          : 10.0.18362.387 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.387

ModLoad: 0000000075da0000 000000000000f000 C:WINDOWSSystem32kernel.appcore.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000747c0000 0000000000029000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntmarta.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000744c0000 000000000018f000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000744b0000 0000000000008000 C:WINDOWSSYSTEM32version.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 47711, Write: 0, Other 13591

- I/O Transfers Counters -
Read: 0, Write: 207023, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 247216, QuotaPeakPagedPoolUsage: 247520
QuotaNonPagedPoolUsage: 32152, QuotaPeakNonPagedPoolUsage: 38544

- Virtual Memory Usage -
VirtualSize: 2110238720, PeakVirtualSize: 2137595904

- Pagefile Usage -
PagefileUsage: 445714432, PeakPagefileUsage: 1451794432

- Working Set Size -
WorkingSetSize: 454291456, PeakWorkingSetSize: 1457319936, PageFaultCount: 20195291

*** Dump of thread ID 1932 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 367187488.000000, User Time: 173045153792.000000, Wait Time: 6001909.000000

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x76484192

- Registers -
eax=0995d9a8 ebx=0995da54 ecx=00000003 edx=00000000 esi=02971c60 edi=02da4c54
eip=76484192 esp=0995d9a8 ebp=0995da00
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212

- Callstack -
ChildEBP RetAddr  Args to Child
0995da00 004dac4b e06d7363 00000001 00000003 0995da38 KERNELBASE!RaiseException+0x0 
0995da44 004e2854 0995da54 02da4c54 029729a0 029729a8 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 
0995da60 004e1b6e 0995da7c 00b5df10 0006b9b0 3dab83e0 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 
0995da68 00b5df10 0006b9b0 3dab83e0 0995dc30 0995daa8 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 
0995da7c 00b5de44 0995dc30 7d27702a 3dab83e0 0b100380 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995daa8 02496c54 0995dc30 7d27715e 3dab83e0 3dab83d8 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995dbdc 0083db19 0b1017d0 0b100380 1fdac898 4843b188 rosetta_4.07_windows_intelx86!cppdb::mutex::mutex+0x0 
0995dc08 00b14595 0b1017d0 0b100380 1fdac898 0995dc30 rosetta_4.07_windows_intelx86!cppdb::backend::statement::cache+0x0 
0995dd34 00b115a7 1fdac898 4843b188 5e90a170 6f203200 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995dda8 00b2e198 1fdac898 4843b188 5e90a170 6f203200 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995ddec 00b2de34 0995de20 717bbdf0 5c2c2d70 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995de50 00b0b7f1 0995de98 717bbdf0 5c2c2d70 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995def8 00c6d009 1fdac898 4843b188 717bbdf0 6f826738 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995df78 00c6ba06 1fdac898 7d274a0a 46118a30 1fdac898 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995e088 00c8a95b 1fdac898 7d275946 5330f1b0 46118a30 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f3c4 00c3979a 1fdac898 7d275eee 5330f1b0 616751ac rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f46c 00c38ce7 1fdac898 616751ac 7d275faa 09fe5c80 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f528 00bfaf25 1fdac898 7d275c4e 5e82afe7 09fe5c80 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f6cc 00bf86df 0995f7b4 5e82afe7 00000000 0995f76c rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f7ac 00c3fa05 5330f1b0 484a9d50 7d275d5a 00004e21 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f7d8 00c33dcf 09f83870 0a012f98 7d27529a 09c94298 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995f818 004d94aa 09f83870 0a012f98 7d275656 032a14a4 rosetta_4.07_windows_intelx86!cppdb::backend::static_driver::in_use+0x0 
0995fcd4 004e2267 00000027 09aa39f0 09a9af20 7d27579e rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 
0995fd1c 75b76359 039e4000 75b76340 0995fd88 77097b74 rosetta_4.07_windows_intelx86!xmlParserInputRead+0x0 
0995fd2c 77097b74 039e4000 1537adee 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
0995fd88 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 
0995fd98 00000000 004e22dd 039e4000 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 

*** Dump of thread ID 7456 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 156250.000000, User Time: 156250.000000, Wait Time: 6001909.000000

- Registers -
eax=00000000 ebx=0000000a ecx=00000000 edx=00000000 esi=00000000 edi=2ef2fc94
eip=770a20bc esp=2ef2fc54 ebp=2ef2fcb8
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202

- Callstack -
ChildEBP RetAddr  Args to Child
2ef2fcb8 7647f32f 00000064 00000000 2ef2fef4 016ed11b ntdll!ZwDelayExecution+0x0 
2ef2fcc8 016ed11b 00000064 016ed0f0 016ed0f0 00000000 KERNELBASE!Sleep+0x0 
2ef2fef4 75b76359 00000000 75b76340 2ef2ff60 77097b74 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 
2ef2ff04 77097b74 00000000 3250af06 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
2ef2ff60 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 
2ef2ff70 00000000 016ed0f0 00000000 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 

*** Dump of thread ID 4484 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 6001798.000000

- Registers -
eax=00000000 ebx=0a919501 ecx=00000000 edx=00000000 esi=00000000 edi=366afd74
eip=770a20bc esp=366afd34 ebp=366afd98
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202

- Callstack -
ChildEBP RetAddr  Args to Child
366afd98 7647f32f 000007d0 00000000 366afe90 00f68d91 ntdll!ZwDelayExecution+0x0 
366afda8 00f68d91 000007d0 42d85412 0a919570 00f68f70 KERNELBASE!Sleep+0x0 
366afe90 00f68f77 00000000 016da2f5 00000000 42d85456 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 
366afed4 75b76359 0a919570 75b76340 366aff40 77097b74 rosetta_4.07_windows_intelx86!xmlMutexLock+0x0 
366afee4 77097b74 0a919570 2ac8af26 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
366aff40 77097b44 ffffffff 770b8f3c 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 
366aff50 00000000 016da29e 0a919570 00000000 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Grant
Darwin NT
ID: 92719 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 92725 - Posted: 31 Mar 2020, 10:57:05 UTC
Last modified: 31 Mar 2020, 10:59:53 UTC

I've had 5 crashes in the last couple of days, a 1 hour, a 2 hour, a 3 hour, a 6 hour and a 12 hour. All zero credit, so from another thread, I assume this means none of them actually did anything at all. All Rosetta's, not mini's.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 92725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 95
Credit: 289,903
RAC: 0
Message 92838 - Posted: 1 Apr 2020, 0:15:29 UTC

I have had 13 errors is the past two days. Only one was "cancelled by server". Compute errors (exit status 139) and nothing of much use in the stderr.txt output other than a:
Starting watchdog...
Watchdog active.
statement.

So what does a "watchdog" do at this project?
ID: 92838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92848 - Posted: 1 Apr 2020, 1:23:47 UTC

"watchdog" is the name given to the mechanism that watches of the work unit and detects if it runs for more than 4 hours longer than the preferred runtime preference.

Unfortunately, the message "watchdog ending" is NOT an indication that the watchdog took action and ended the task. It is simply reporting that as the task ends, that the watchdog is ending as well. So it can be had to tell if the watchdog stepped in or not.
Rosetta Moderator: Mod.Sense
ID: 92848 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 95
Credit: 289,903
RAC: 0
Message 92864 - Posted: 1 Apr 2020, 7:19:36 UTC - in response to Message 92848.  
Last modified: 1 Apr 2020, 7:30:48 UTC

"watchdog" is the name given to the mechanism that watches of the work unit and detects if it runs for more than 4 hours longer than the preferred runtime preference.

Unfortunately, the message "watchdog ending" is NOT an indication that the watchdog took action and ended the task. It is simply reporting that as the task ends, that the watchdog is ending as well. So it can be had to tell if the watchdog stepped in or not.

Very strange then. NONE of my errors ran longer than the specified 8 hours.
For example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1136159909
24,686 seconds or 6.84 hours. Well within the stock 8 hr target cpu time.
ID: 92864 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computation errors



©2024 University of Washington
https://www.bakerlab.org