Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 34 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98141 - Posted: 16 Jul 2020, 20:51:42 UTC - in response to Message 98139.  

PrimeGrid tracks the workunits to a degree


What do they track?

but I don't know any project that does what you are suggesting.


I was suggesting that the Boinc client should automatically add "abort" and "project update" when you click "detach".
ID: 98141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,771,116
RAC: 4,876
Message 98149 - Posted: 16 Jul 2020, 23:43:36 UTC - in response to Message 98141.  

PrimeGrid tracks the workunits to a degree


What do they track? [/url]

They track whether you are actually working on a workunit and will auto extend the deadline if your pc is taking extra time to finish a task as long as it is actively crunching the task.

[quote]but I don't know any project that does what you are suggesting.


I was suggesting that the Boinc client should automatically add "abort" and "project update" when you click "detach".


I don't disagree but we both know that won't ever happen as long as the current group of Developers are in charge.
ID: 98149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 98322 - Posted: 25 Jul 2020, 8:33:15 UTC

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0
Application: Rosetta v4.20 windows_x86_64
Device: 3710630
Task: 1226495372. WU: 1100137198
Status: Error while computing.
Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Stderr output:
(unknown error) - exit code -1073741819 (0xc0000005)
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000002

Engaging BOINC Windows Runtime Debugger...
Waiting to see if wingperson has same error.

Also same exit status for WU 1099799338 with my task 1226216078 I was wingman on this WU, and we both errored out.
Errors: Too many errors (may have bug) Too many total results
Stderr output:
(unknown error) - exit code -1073741819 (0xc0000005)
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000000013FEB8316 read attempt to address 0xFFFFFFFF

Engaging BOINC Windows Runtime Debugger...

ID: 98322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1500
Credit: 14,762,579
RAC: 17,406
Message 98324 - Posted: 25 Jul 2020, 10:07:16 UTC - in response to Message 98322.  

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0
Application: Rosetta v4.20 windows_x86_64
Device: 3710630
Task: 1226495372. WU: 1100137198
Status: Error while computing.
Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Waiting to see if wingperson has same error.

Also same exit status for WU 1099799338 with my task 1226216078 I was wingman on this WU, and we both errored out.
Errors: Too many errors (may have bug) Too many total results
I'm finding most are processing OK, but the odd one here & there is erroring out.
Grant
Darwin NT
ID: 98324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98325 - Posted: 25 Jul 2020, 11:07:37 UTC - in response to Message 98322.  

Waiting to see if wingperson has same error.


This made me laugh out loud. Do you also say "personhole cover"?
ID: 98325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,082,476
RAC: 6,027
Message 98328 - Posted: 25 Jul 2020, 12:54:59 UTC - in response to Message 98322.  

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0]

I have seen a few of these running and taking 2.2GB (on a Pi4 4GB). Check if your system has enough memory to run a couple of these and other tasks at the same time. Each time I saw these they were in “suspended waiting for memory” status.
BOINC blog
ID: 98328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98329 - Posted: 25 Jul 2020, 13:35:55 UTC - in response to Message 98328.  

I have seen a few of these running and taking 2.2GB
I wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-⁠bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?)
ID: 98329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98331 - Posted: 25 Jul 2020, 15:35:17 UTC - in response to Message 98329.  

I have seen a few of these running and taking 2.2GB
I wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-⁠bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?)

It looks like that is the case for me too. All eight of mine over 2.1 GB (on three Ryzen 3000 machines running Ubuntu 18.04.4) have failed, and non succeeded.
ID: 98331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98332 - Posted: 25 Jul 2020, 18:53:08 UTC - in response to Message 98331.  
Last modified: 25 Jul 2020, 18:57:24 UTC

It looks like that is the case for me too. All eight of mine over 2.1 GB (on three Ryzen 3000 machines running Ubuntu 18.04.4) have failed, and non succeeded.


I don't know if it's Rosetta causing it, but that's the project running on almost all the cores of my two dual Xeon X5650 machines. A couple of times over the last couple of days, a CPU has suddenly got a lot hotter, and invoked throttling. One of the CPUs went to 35% CPU usage at 75C, where it normally gets 100% CPU usage at 75C. I at first thought it was a hardware fault, until the other machine did it too.

Perhaps a recent batch of Rosetta tasks uses a different instruction set which makes them run hotter? Someone did say this was possible - SETI programs that were optimised could do the same.

Or is it what you're encountering, and invalid memory accesses cause a lot of heat in old Xeons? I do have "sub optimal memory configurations" according to the BIOSes, they complain every time I boot. The RAM chips are not all the same size and not the same geometry (eg an 8x4 and a 16x2). So perhaps it works the CPU's memory controller really hard when programs go wrong? Being dual CPU, they can also access each other's RAM when they need it, to complicate matters further.
ID: 98332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98333 - Posted: 25 Jul 2020, 19:51:52 UTC - in response to Message 98332.  

invalid memory accesses cause a lot of heat in old Xeons?
Unlikely: as soon as the invalid access occurs the BOINC wrapper around the Rosetta application detects it, logs it and ends the task.

Not sure what could be causing your heat spikes. Can you correlate the times of the spikes with events in BOINC’s log?
ID: 98333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98334 - Posted: 25 Jul 2020, 20:24:23 UTC - in response to Message 98333.  

Not sure what could be causing your heat spikes. Can you correlate the times of the spikes with events in BOINC’s log?


They're not just quick spikes, they continue for about 4 hours. Only happened twice for one machine and once for the other. It would be rather difficult to tell which task was to blame. It's probably a combination of many tasks. Add to that I don't have a temperature log running, but I'll see if can use Speedfan to do so from now on. What was weird is that the machine with the strongest spikes and has done it twice, it was always o e CPU that did it and not the other.
ID: 98334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98335 - Posted: 25 Jul 2020, 20:33:00 UTC
Last modified: 25 Jul 2020, 20:38:09 UTC

Cancel that, I can't log it. Silly me, the temperature doesn't spike, as Tthrottle lowers the Boinc processing speed to maintain 75C. Tthrottle doesn't have the ability to log it's throttling level.

If you look at the worst of the two computers here, https://boinc.bakerlab.org/rosetta/results.php?hostid=4360598&offset=0&show_names=0&state=4&appid= you will see when the tasks were taking longer wall time. But sifting through that data to work out which ones were running at the time it was slower sounds too much like hard work. Feel free....
ID: 98335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98336 - Posted: 25 Jul 2020, 21:08:57 UTC - in response to Message 98335.  

Could it be something non-BOINC? With Windows 10, anything could be happening…
ID: 98336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,762,632
RAC: 6,840
Message 98337 - Posted: 25 Jul 2020, 21:15:38 UTC - in response to Message 98336.  

Could it be something non-BOINC? With Windows 10, anything could be happening…


Nothing else has changed, even the room temperature. And I doubt Windows 10 could cause a CPU to overheat.

Since it's intermittent, it must be either a loose connection or a dodgy transistor in the CPU power supply etc [1], or things that change a lot - eg. what program is running.

[1] Which it can't be since two machines started it within a day of each other.
ID: 98337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1500
Credit: 14,762,579
RAC: 17,406
Message 98339 - Posted: 25 Jul 2020, 21:56:38 UTC - in response to Message 98329.  
Last modified: 25 Jul 2020, 21:57:49 UTC

I have seen a few of these running and taking 2.2GB
I wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-⁠bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?)
Could be the case, although i've had Tasks with over 3GB of RAM used process with no problems.
System RAM isn't an issue for me- 32GB with only 6c/12t. So it would take a lot of Tasks all needing over 3GB for system RAM limits to be an issue here.

And after having most Tasks process ok, and only a few fail, now it's most failing and only a few processing OK. Here are the RAM limits for my present batch of errors, with a full dump for the last Task.

Peak working set size 2,267.54 MB
       Peak swap size 2,256.07 MB
      Peak disk usage 5.07 MB


Peak working set size 2,256.12 MB
       Peak swap size 2,244.26 MB
      Peak disk usage 6.14 MB


Peak working set size 2,282.17 MB
       Peak swap size 2,270.30 MB
      Peak disk usage 4.07 MB


Peak working set size 2,259.89 MB
       Peak swap size 2,248.08 MB
      Peak disk usage 3.78 MB


Peak working set size 2,284.63 MB
       Peak swap size 2,273.26 MB
      Peak disk usage 4.50 MB


Peak working set size 2,306.46 MB
       Peak swap size 2,295.61 MB
      Peak disk usage 4.32 MB


miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f_1002874_3_1
Peak working set size 2,258.46 MB
       Peak swap size 2,245.98 MB
      Peak disk usage 3.96 MB

Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol fr_cart_fast.xml @fr_flags_bcov2 -in:file:silent miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.zip @miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1231891
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000004 

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 07/26/20 03:44:27
Install Directory : C:Program FilesBOINC
Data Directory    : C:ProgramDataBOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore


ModLoad: 0000000091800000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded)
    Linked PDB Filename   : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb

ModLoad: 0000000012000000 00000000001f0000 C:WINDOWSSYSTEM32ntdll.dll (6.2.18362.815) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 10.0.18362.329 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.329

ModLoad: 0000000011d50000 00000000000b2000 C:WINDOWSSystem32KERNEL32.DLL (6.2.18362.900) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 10.0.18362.900 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.900

ModLoad: 000000000f3a0000 00000000002a4000 C:WINDOWSSystem32KERNELBASE.dll (6.2.18362.815) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 10.0.18362.900 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.900

ModLoad: 0000000011960000 000000000006f000 C:WINDOWSSystem32WS2_32.dll (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : ws2_32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000115b0000 0000000000120000 C:WINDOWSSystem32RPCRT4.dll (6.2.18362.628) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 00000000100b0000 0000000000194000 C:WINDOWSSystem32USER32.dll (6.2.18362.836) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 10.0.17763.802 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17763.802

ModLoad: 000000000ef90000 0000000000021000 C:WINDOWSSystem32win32u.dll (6.2.18362.900) (-exported- Symbols Loaded)
    Linked PDB Filename   : win32u.pdb
    File Version          : 10.0.18362.900 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.900

ModLoad: 0000000010a20000 0000000000026000 C:WINDOWSSystem32GDI32.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 000000000f0f0000 0000000000195000 C:WINDOWSSystem32gdi32full.dll (6.2.18362.900) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32full.pdb
    File Version          : 10.0.18362.900 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.900

ModLoad: 000000000f2e0000 000000000009e000 C:WINDOWSSystem32msvcp_win.dll (6.2.18362.815) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.18362.815 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.815

ModLoad: 000000000fe60000 00000000000fa000 C:WINDOWSSystem32ucrtbase.dll (6.2.18362.815) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.18362.815 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.815

ModLoad: 0000000010810000 00000000000a3000 C:WINDOWSSystem32ADVAPI32.dll (6.2.18362.752) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000010770000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.18362.1

ModLoad: 0000000011c90000 0000000000097000 C:WINDOWSSystem32sechost.dll (6.2.18362.693) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 0000000010600000 000000000002e000 C:WINDOWSSystem32IMM32.DLL (6.2.18362.387) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 10.0.18362.387 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.387

ModLoad: 000000000eed0000 0000000000011000 C:WINDOWSSystem32kernel.appcore.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 000000000df20000 0000000000031000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntmarta.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 000000000e8a0000 000000000000c000 C:WINDOWSSYSTEM32CRYPTBASE.DLL (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : cryptbase.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 000000000efc0000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.18362.836) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcryptprimitives.pdb
    File Version          : 10.0.18362.836 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.836

ModLoad: 000000000a680000 00000000001f4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1

ModLoad: 000000000b170000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.18362.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.18362.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.18362.1



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 10657, Write: 1686, Other 18154

- I/O Transfers Counters -
Read: 37024714, Write: 5529203, Other 12608

- Paged Pool Usage -
QuotaPagedPoolUsage: 318584, QuotaPeakPagedPoolUsage: 318760
QuotaNonPagedPoolUsage: 31816, QuotaPeakNonPagedPoolUsage: 34128

- Virtual Memory Usage -
VirtualSize: 721858560, PeakVirtualSize: -1365209088

- Pagefile Usage -
PagefileUsage: 721858560, PeakPagefileUsage: -1935519744

- Working Set Size -
WorkingSetSize: 728240128, PeakWorkingSetSize: -1922527232, PageFaultCount: 1394802

*** Dump of thread ID 9592 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000004 

- Registers -
rax=000000000000003a rbx=0000000053525220 rcx=0000000053f04ae0 rdx=0000000053fe4c18 rsi=000000000000000b rdi=0000000053f04ae0
r8=000000000000003a r9=0000000000000421 r10=00000000953a6e80 r11=00000000399466c0 r12=0000000091800000 r13=000000003995fa30
r14=0000000039946e00 r15=000000000048b215 rip=0000000000000004 rsp=0000000039946738 rbp=0000000000000000
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202

- Callstack -
ChildEBP RetAddr  Args to Child
39946730 91cd831c 00000002 953a6d60 953a6e80 9538be78 !+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '00000004'
39946760 91c9935d 53525220 39946800 39946f80 91c8355d rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91cd831c'
39946790 94e07f10 95cf0150 3995fa30 00000000 91c83265 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91c9935d'
399467c0 91c839e8 39947470 39946ac0 39946dc8 39946e50 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '94e07f10'
39946830 120a11cf 00000000 39946db0 39947470 39947470 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91c839e8'
39946860 1206a209 00000001 91800000 00000000 96d9a32c ntdll!__chkstk+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '120a11cf'
39946f70 1209fe3e 399473d0 ffffffff 399477c0 91ed2be9 ntdll!RtlRaiseException+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '1206a209'
39947720 91ea2938 00000000 00000001 399477c0 91c83615 ntdll!KiUserExceptionDispatcher+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '1209fe3e'
39947760 9270c536 00000001 39947870 9545b498 00000000 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91ea2938'
39947960 944fb619 39947b10 39948f10 03564c00 39947800 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '9270c536'
39947c40 944f5c06 39948400 53cbb5e0 53cbb5e0 39947d50 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '944fb619'
399489e0 944fd5f3 6daf68e0 39948ae8 00000000 00000000 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '944f5c06'
39949150 926dff1c fffffffe 00000000 697f38f8 bc664fee rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '944fd5f3'
39949220 926dbb43 6ac2f440 69766163 39949289 00000006 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926dff1c'
399492e0 926da41f 5426e980 ffffffff 697f3c58 ffffffff rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926dbb43'
399495a0 92682dc0 96c76ec0 53af9b60 72eaa550 53af9b60 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926da41f'
39949b40 926800f8 53af9b60 53af9b60 39949c50 39949c38 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '92682dc0'
39949d10 926ea4d1 39949d98 53af9b60 39949d98 39949f20 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926800f8'
39949d60 926d006d 5394e3b0 39949d98 5394e3b0 00000000 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926ea4d1'
39949df0 91c80dfd 53ec75c0 39949f20 00000000 53526801 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '926d006d'
3995fa20 91c8b215 00000000 00000000 96baccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91c80dfd'
3995fa60 11d67bd4 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '91c8b215'
3995fa90 1206ce51 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '11d67bd4'
3995fb10 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 SymFromAddr(): GetLastError = '126'  SymGetModuleInfo(): GetLastError = '126' Address = '1206ce51'

*** Dump of thread ID 32763 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 1569998848.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 

*** Dump of thread ID 30827173 (state: Unknown): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474875392.000000, Wait Time: 0.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Grant
Darwin NT
ID: 98339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 98342 - Posted: 26 Jul 2020, 6:29:52 UTC - in response to Message 98328.  
Last modified: 26 Jul 2020, 6:32:44 UTC

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0]

I have seen a few of these running and taking 2.2GB (on a Pi4 4GB). Check if your system has enough memory to run a couple of these and other tasks at the same time. Each time I saw these they were in “suspended waiting for memory” status.

I took a look at the memory usage on my 2 failed tasks of the above type. One was on my Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz [Family 6 Model 58 Stepping 9] with 2 cores hyper-threaded (acting as 4 processors), with 4 GB RAM. This PC runs 3 Rosetta tasks at a time.

The other error was on my Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz [Family 6 Model 58 Stepping 9] with 4 cores, with 8 GB RAM. This PC is running 4 Rosetta tasks at a time. Each of the failed tasks was using approximately 2.25GB of memory. On both machines, tasks that validated used approximately 0.7 GB of memory.
ID: 98342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 98363 - Posted: 28 Jul 2020, 5:30:18 UTC - in response to Message 98322.  

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0
Application: Rosetta v4.20 windows_x86_64
Device: 3710630
Task: 1226495372. WU: 1100137198
Status: Error while computing.
Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
**Snip**

Engaging BOINC Windows Runtime Debugger...
Waiting to see if wingperson has same error.
Well, the wingperson DID end up getting the same error, as expected. Task did use 2.2 GB.
ID: 98363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 98390 - Posted: 31 Jul 2020, 6:07:18 UTC

Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9eq1sg1v_1002874_7
Application: Rosetta v4.20 windows_x86_64
Device: 3710630
Task: 1229563309. WU: 1102587563
Status: Error while computing.
Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Errors: Too many errors (may have bug) Too many total results
Stderr output:
(unknown error) - exit code -1073741819 (0xc0000005)
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000000

Engaging BOINC Windows Runtime Debugger...
I was wingman on this WU and we both ended with same error. Again, was a task using over 2 GB of RAM, though each host used slightly different amounts of memory. One question - which one of these RAM usage figures is most critical?
Peak working set size: 1,851.05 MB
Peak swap size: 2,270.34 MB
FYI and thanks.
ID: 98390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1500
Credit: 14,762,579
RAC: 17,406
Message 98433 - Posted: 7 Aug 2020, 8:13:10 UTC
Last modified: 7 Aug 2020, 8:20:30 UTC

Looks like another bunch of dodgy Work Units.

A bunch of foldit's crashing and burning within 5 seconds (it's about 50/50 at this stage between those that complete & Validate & those that die early).

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)
</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @foldit0_2008492_0009_global_dock_flags -in:file:boinc_wu_zip asym_dock_foldit0_2008492_0009_data.zip -patchdock foldit0_2008492_0009_patchdock.patchdock -patchdock_random_entry 1 6136 -in:file:s foldit0_2008492_0009_patchdock.pdb -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1898631

[ ERROR ]: Caught exception:


File: ......srcutilityoptionsOptionCollection.cc:1398
Option matching -boinc:score_cut_smart_throttle not found in command line top-level context


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>



And then there is one that failed to Validate after 35 sec.
             Name foldit1_2001860_s005_relax_dock_SAVE_ALL_OUT_1005736_14_0
         Workunit 1107625607
 Outcome Validate error
     Client state Done
   Validate state Invalid


Stderr output
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @foldit1_2001860_s005_local_dock_flags -in:file:boinc_wu_zip asym_dock_foldit1_2001860_s005_data.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1350995
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: ......srcprotocolsrosetta_scriptsRosettaScriptsParser.cc:1313
Input rosetta scripts XML file "asym_dock_local.xml" failed to validate against the rosetta scripts schema. Use the option -parser::output_schema <output filename> to output the schema to a file to see all valid options.
Your XML has failed validation.  The error message below will tell you where in your XML file the error occurred.  Here's how to fix it:

1) If the validation fails on something obvious, like an illegal attribute due to a spelling error (perhaps you used scorefnction instead of scorefunction), then you need to fix your XML file.
2) If you haven&#226;&#128;&#153;t run the XML rewriter script and this might be pre-2017 Rosetta XML, run the rewriter script (tools/xsd_xrw/rewrite_rosetta_script.py) on your input XML first.  The attribute values not being in quotes (scorefunction=talaris2014 instead of scorefunction="talaris2014") is a good indicator that this is your problem.
3) If you are a developer and neither 1 nor 2 worked - email the developer&#226;&#128;&#153;s mailing list or try Slack.
4) If you are an academic or commercial user - try the Rosetta Forums https://www.rosettacommons.org/forum


Error messages were:
Error: AttValue: " or ' expected

1:  <dock_design>
2: 	<SCOREFXNS>
3: 	   <fullatom weights=talaris2013 symmetric=0>
4: 	   </fullatom>
5: 	</SCOREFXNS>
6: 
7: 	<FILTERS>
8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
Error: attributes construct error

1:  <dock_design>
2: 	<SCOREFXNS>
3: 	   <fullatom weights=talaris2013 symmetric=0>
4: 	   </fullatom>
5: 	</SCOREFXNS>
6: 
7: 	<FILTERS>
8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
Error: Couldn't find end of Start Tag fullatom line 3

1:  <dock_design>
2: 	<SCOREFXNS>
3: 	   <fullatom weights=talaris2013 symmetric=0>
4: 	   </fullatom>
5: 	</SCOREFXNS>
6: 
7: 	<FILTERS>
8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
Error: Opening and ending tag mismatch: SCOREFXNS line 2 and fullatom

1:  <dock_design>
2: 	<SCOREFXNS>
3: 	   <fullatom weights=talaris2013 symmetric=0>
4: 	   </fullatom>
5: 	</SCOREFXNS>
6: 
7: 	<FILTERS>
8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
9: 		<Sasa name=sasa confidence=0/>
Error: Opening and ending tag mismatch: dock_design line 1 and SCOREFXNS

 1:  <dock_design>
 2: 	<SCOREFXNS>
 3: 	   <fullatom weights=talaris2013 symmetric=0>
 4: 	   </fullatom>
 5: 	</SCOREFXNS>
 6: 
 7: 	<FILTERS>
 8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
 9: 		<Sasa name=sasa confidence=0/>
10: 		<ShapeComplementarity name=shape verbose=1  confidence=0 jump=1/>
Error: Extra content at the end of the document

 2: 	<SCOREFXNS>
 3: 	   <fullatom weights=talaris2013 symmetric=0>
 4: 	   </fullatom>
 5: 	</SCOREFXNS>
 6: 
 7: 	<FILTERS>
 8: 		<Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/>
 9: 		<Sasa name=sasa confidence=0/>
10: 		<ShapeComplementarity name=shape verbose=1  confidence=0 jump=1/>
11: 	</FILTERS>
12: 
------------------------------------------------------------
Warning messages were:
------------------------------------------------------------

 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.


DummyMover::apply() should never have been called! (JobDistributor/Parser should have replaced DummyMover.)

ERROR: Function not implemented.
ERROR:: Exit from: ......srcappspublicboincminirosetta.cc line: 101
14:28:43 (1848): called boinc_finish(0)

</stderr_txt>
]]>

Grant
Darwin NT
ID: 98433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98435 - Posted: 7 Aug 2020, 13:07:20 UTC - in response to Message 98433.  

Looks like another bunch of dodgy Work Units.

Yes, I have a bunch of them - almost 10%. But they have started to abort the latest ones, so I think they have caught the problem.
ID: 98435 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 34 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2024 University of Washington
https://www.bakerlab.org