Message boards : Number crunching : Rosetta 4.1+ and 4.2+
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 34 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
PrimeGrid tracks the workunits to a degree What do they track? but I don't know any project that does what you are suggesting. I was suggesting that the Boinc client should automatically add "abort" and "project update" when you click "detach". |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
PrimeGrid tracks the workunits to a degree I don't disagree but we both know that won't ever happen as long as the current group of Developers are in charge. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0 Application: Rosetta v4.20 windows_x86_64 Device: 3710630 Task: 1226495372. WU: 1100137198 Status: Error while computing. Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION Stderr output: (unknown error) - exit code -1073741819 (0xc0000005)Waiting to see if wingperson has same error. Also same exit status for WU 1099799338 with my task 1226216078 I was wingman on this WU, and we both errored out. Errors: Too many errors (may have bug) Too many total results Stderr output: (unknown error) - exit code -1073741819 (0xc0000005) |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0I'm finding most are processing OK, but the odd one here & there is erroring out. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Waiting to see if wingperson has same error. This made me laugh out loud. Do you also say "personhole cover"? |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0] I have seen a few of these running and taking 2.2GB (on a Pi4 4GB). Check if your system has enough memory to run a couple of these and other tasks at the same time. Each time I saw these they were in “suspended waiting for memory” status. BOINC blog |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
I have seen a few of these running and taking 2.2GBI wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have seen a few of these running and taking 2.2GBI wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?) It looks like that is the case for me too. All eight of mine over 2.1 GB (on three Ryzen 3000 machines running Ubuntu 18.04.4) have failed, and non succeeded. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
It looks like that is the case for me too. All eight of mine over 2.1 GB (on three Ryzen 3000 machines running Ubuntu 18.04.4) have failed, and non succeeded. I don't know if it's Rosetta causing it, but that's the project running on almost all the cores of my two dual Xeon X5650 machines. A couple of times over the last couple of days, a CPU has suddenly got a lot hotter, and invoked throttling. One of the CPUs went to 35% CPU usage at 75C, where it normally gets 100% CPU usage at 75C. I at first thought it was a hardware fault, until the other machine did it too. Perhaps a recent batch of Rosetta tasks uses a different instruction set which makes them run hotter? Someone did say this was possible - SETI programs that were optimised could do the same. Or is it what you're encountering, and invalid memory accesses cause a lot of heat in old Xeons? I do have "sub optimal memory configurations" according to the BIOSes, they complain every time I boot. The RAM chips are not all the same size and not the same geometry (eg an 8x4 and a 16x2). So perhaps it works the CPU's memory controller really hard when programs go wrong? Being dual CPU, they can also access each other's RAM when they need it, to complicate matters further. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
invalid memory accesses cause a lot of heat in old Xeons?Unlikely: as soon as the invalid access occurs the BOINC wrapper around the Rosetta application detects it, logs it and ends the task. Not sure what could be causing your heat spikes. Can you correlate the times of the spikes with events in BOINC’s log? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Not sure what could be causing your heat spikes. Can you correlate the times of the spikes with events in BOINC’s log? They're not just quick spikes, they continue for about 4 hours. Only happened twice for one machine and once for the other. It would be rather difficult to tell which task was to blame. It's probably a combination of many tasks. Add to that I don't have a temperature log running, but I'll see if can use Speedfan to do so from now on. What was weird is that the machine with the strongest spikes and has done it twice, it was always o e CPU that did it and not the other. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Cancel that, I can't log it. Silly me, the temperature doesn't spike, as Tthrottle lowers the Boinc processing speed to maintain 75C. Tthrottle doesn't have the ability to log it's throttling level. If you look at the worst of the two computers here, https://boinc.bakerlab.org/rosetta/results.php?hostid=4360598&offset=0&show_names=0&state=4&appid= you will see when the tasks were taking longer wall time. But sifting through that data to work out which ones were running at the time it was slower sounds too much like hard work. Feel free.... |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Could it be something non-BOINC? With Windows 10, anything could be happening… |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Could it be something non-BOINC? With Windows 10, anything could be happening… Nothing else has changed, even the room temperature. And I doubt Windows 10 could cause a CPU to overheat. Since it's intermittent, it must be either a loose connection or a dodgy transistor in the CPU power supply etc [1], or things that change a lot - eg. what program is running. [1] Which it can't be since two machines started it within a day of each other. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Could be the case, although i've had Tasks with over 3GB of RAM used process with no problems.I have seen a few of these running and taking 2.2GBI wonder whether that’s the common theme: the ones that are failing are going over 2.1 GB (= 231 bytes)? (Calculation involving a signed 32-bit integer overflowing, yielding a negative number instead of a large positive one, resulting in an invalid memory access?) System RAM isn't an issue for me- 32GB with only 6c/12t. So it would take a lot of Tasks all needing over 3GB for system RAM limits to be an issue here. And after having most Tasks process ok, and only a few fail, now it's most failing and only a few processing OK. Here are the RAM limits for my present batch of errors, with a full dump for the last Task. Peak working set size 2,267.54 MB Peak swap size 2,256.07 MB Peak disk usage 5.07 MB Peak working set size 2,256.12 MB Peak swap size 2,244.26 MB Peak disk usage 6.14 MB Peak working set size 2,282.17 MB Peak swap size 2,270.30 MB Peak disk usage 4.07 MB Peak working set size 2,259.89 MB Peak swap size 2,248.08 MB Peak disk usage 3.78 MB Peak working set size 2,284.63 MB Peak swap size 2,273.26 MB Peak disk usage 4.50 MB Peak working set size 2,306.46 MB Peak swap size 2,295.61 MB Peak disk usage 4.32 MB miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f_1002874_3_1 Peak working set size 2,258.46 MB Peak swap size 2,245.98 MB Peak disk usage 3.96 MB Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741819 (0xc0000005) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol fr_cart_fast.xml @fr_flags_bcov2 -in:file:silent miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.zip @miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9of5bj9f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1231891 Using database: database_357d5d93529_n_methylminirosetta_database Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0000000000000004 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 07/26/20 03:44:27 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots ;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 0000000091800000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb ModLoad: 0000000012000000 00000000001f0000 C:WINDOWSSYSTEM32ntdll.dll (6.2.18362.815) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 0000000011d50000 00000000000b2000 C:WINDOWSSystem32KERNEL32.DLL (6.2.18362.900) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.18362.900 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.900 ModLoad: 000000000f3a0000 00000000002a4000 C:WINDOWSSystem32KERNELBASE.dll (6.2.18362.815) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.18362.900 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.900 ModLoad: 0000000011960000 000000000006f000 C:WINDOWSSystem32WS2_32.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000115b0000 0000000000120000 C:WINDOWSSystem32RPCRT4.dll (6.2.18362.628) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000100b0000 0000000000194000 C:WINDOWSSystem32USER32.dll (6.2.18362.836) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.17763.802 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.17763.802 ModLoad: 000000000ef90000 0000000000021000 C:WINDOWSSystem32win32u.dll (6.2.18362.900) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.18362.900 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.900 ModLoad: 0000000010a20000 0000000000026000 C:WINDOWSSystem32GDI32.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 000000000f0f0000 0000000000195000 C:WINDOWSSystem32gdi32full.dll (6.2.18362.900) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.18362.900 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.900 ModLoad: 000000000f2e0000 000000000009e000 C:WINDOWSSystem32msvcp_win.dll (6.2.18362.815) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.18362.815 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.815 ModLoad: 000000000fe60000 00000000000fa000 C:WINDOWSSystem32ucrtbase.dll (6.2.18362.815) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.18362.815 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.815 ModLoad: 0000000010810000 00000000000a3000 C:WINDOWSSystem32ADVAPI32.dll (6.2.18362.752) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000010770000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.18362.1 ModLoad: 0000000011c90000 0000000000097000 C:WINDOWSSystem32sechost.dll (6.2.18362.693) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 0000000010600000 000000000002e000 C:WINDOWSSystem32IMM32.DLL (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 000000000eed0000 0000000000011000 C:WINDOWSSystem32kernel.appcore.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 000000000df20000 0000000000031000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 000000000e8a0000 000000000000c000 C:WINDOWSSYSTEM32CRYPTBASE.DLL (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : cryptbase.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 000000000efc0000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.18362.836) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.18362.836 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.836 ModLoad: 000000000a680000 00000000001f4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 000000000b170000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 10657, Write: 1686, Other 18154 - I/O Transfers Counters - Read: 37024714, Write: 5529203, Other 12608 - Paged Pool Usage - QuotaPagedPoolUsage: 318584, QuotaPeakPagedPoolUsage: 318760 QuotaNonPagedPoolUsage: 31816, QuotaPeakNonPagedPoolUsage: 34128 - Virtual Memory Usage - VirtualSize: 721858560, PeakVirtualSize: -1365209088 - Pagefile Usage - PagefileUsage: 721858560, PeakPagefileUsage: -1935519744 - Working Set Size - WorkingSetSize: 728240128, PeakWorkingSetSize: -1922527232, PageFaultCount: 1394802 *** Dump of thread ID 9592 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0000000000000004 - Registers - rax=000000000000003a rbx=0000000053525220 rcx=0000000053f04ae0 rdx=0000000053fe4c18 rsi=000000000000000b rdi=0000000053f04ae0 r8=000000000000003a r9=0000000000000421 r10=00000000953a6e80 r11=00000000399466c0 r12=0000000091800000 r13=000000003995fa30 r14=0000000039946e00 r15=000000000048b215 rip=0000000000000004 rsp=0000000039946738 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202 - Callstack - ChildEBP RetAddr Args to Child 39946730 91cd831c 00000002 953a6d60 953a6e80 9538be78 !+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '00000004' 39946760 91c9935d 53525220 39946800 39946f80 91c8355d rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91cd831c' 39946790 94e07f10 95cf0150 3995fa30 00000000 91c83265 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91c9935d' 399467c0 91c839e8 39947470 39946ac0 39946dc8 39946e50 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '94e07f10' 39946830 120a11cf 00000000 39946db0 39947470 39947470 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91c839e8' 39946860 1206a209 00000001 91800000 00000000 96d9a32c ntdll!__chkstk+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '120a11cf' 39946f70 1209fe3e 399473d0 ffffffff 399477c0 91ed2be9 ntdll!RtlRaiseException+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1206a209' 39947720 91ea2938 00000000 00000001 399477c0 91c83615 ntdll!KiUserExceptionDispatcher+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1209fe3e' 39947760 9270c536 00000001 39947870 9545b498 00000000 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91ea2938' 39947960 944fb619 39947b10 39948f10 03564c00 39947800 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '9270c536' 39947c40 944f5c06 39948400 53cbb5e0 53cbb5e0 39947d50 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '944fb619' 399489e0 944fd5f3 6daf68e0 39948ae8 00000000 00000000 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '944f5c06' 39949150 926dff1c fffffffe 00000000 697f38f8 bc664fee rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '944fd5f3' 39949220 926dbb43 6ac2f440 69766163 39949289 00000006 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926dff1c' 399492e0 926da41f 5426e980 ffffffff 697f3c58 ffffffff rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926dbb43' 399495a0 92682dc0 96c76ec0 53af9b60 72eaa550 53af9b60 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926da41f' 39949b40 926800f8 53af9b60 53af9b60 39949c50 39949c38 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '92682dc0' 39949d10 926ea4d1 39949d98 53af9b60 39949d98 39949f20 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926800f8' 39949d60 926d006d 5394e3b0 39949d98 5394e3b0 00000000 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926ea4d1' 39949df0 91c80dfd 53ec75c0 39949f20 00000000 53526801 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '926d006d' 3995fa20 91c8b215 00000000 00000000 96baccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91c80dfd' 3995fa60 11d67bd4 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '91c8b215' 3995fa90 1206ce51 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '11d67bd4' 3995fb10 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 SymFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1206ce51' *** Dump of thread ID 32763 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 1569998848.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Dump of thread ID 30827173 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474875392.000000, Wait Time: 0.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Grant Darwin NT |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0] I took a look at the memory usage on my 2 failed tasks of the above type. One was on my Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz [Family 6 Model 58 Stepping 9] with 2 cores hyper-threaded (acting as 4 processors), with 4 GB RAM. This PC runs 3 Rosetta tasks at a time. The other error was on my Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz [Family 6 Model 58 Stepping 9] with 4 cores, with 8 GB RAM. This PC is running 4 Rosetta tasks at a time. Each of the failed tasks was using approximately 2.25GB of memory. On both machines, tasks that validated used approximately 0.7 GB of memory. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_1su3dx8n_1002874_10_0 Waiting to see if wingperson has same error.Well, the wingperson DID end up getting the same error, as expected. Task did use 2.2 GB. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Name: miniprotein_relax5_SAVE_ALL_OUT_IGNORE_THE_REST_9eq1sg1v_1002874_7 Application: Rosetta v4.20 windows_x86_64 Device: 3710630 Task: 1229563309. WU: 1102587563 Status: Error while computing. Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION Errors: Too many errors (may have bug) Too many total results Stderr output: (unknown error) - exit code -1073741819 (0xc0000005)I was wingman on this WU and we both ended with same error. Again, was a task using over 2 GB of RAM, though each host used slightly different amounts of memory. One question - which one of these RAM usage figures is most critical? Peak working set size: 1,851.05 MBFYI and thanks. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Looks like another bunch of dodgy Work Units. A bunch of foldit's crashing and burning within 5 seconds (it's about 50/50 at this stage between those that complete & Validate & those that die early). <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @foldit0_2008492_0009_global_dock_flags -in:file:boinc_wu_zip asym_dock_foldit0_2008492_0009_data.zip -patchdock foldit0_2008492_0009_patchdock.patchdock -patchdock_random_entry 1 6136 -in:file:s foldit0_2008492_0009_patchdock.pdb -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1898631 [ ERROR ]: Caught exception: File: ......srcutilityoptionsOptionCollection.cc:1398 Option matching -boinc:score_cut_smart_throttle not found in command line top-level context AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> And then there is one that failed to Validate after 35 sec. Name foldit1_2001860_s005_relax_dock_SAVE_ALL_OUT_1005736_14_0 Workunit 1107625607 Outcome Validate error Client state Done Validate state Invalid Stderr output <core_client_version>7.6.33</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @foldit1_2001860_s005_local_dock_flags -in:file:boinc_wu_zip asym_dock_foldit1_2001860_s005_data.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1350995 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: ......srcprotocolsrosetta_scriptsRosettaScriptsParser.cc:1313 Input rosetta scripts XML file "asym_dock_local.xml" failed to validate against the rosetta scripts schema. Use the option -parser::output_schema <output filename> to output the schema to a file to see all valid options. Your XML has failed validation. The error message below will tell you where in your XML file the error occurred. Here's how to fix it: 1) If the validation fails on something obvious, like an illegal attribute due to a spelling error (perhaps you used scorefnction instead of scorefunction), then you need to fix your XML file. 2) If you haven’t run the XML rewriter script and this might be pre-2017 Rosetta XML, run the rewriter script (tools/xsd_xrw/rewrite_rosetta_script.py) on your input XML first. The attribute values not being in quotes (scorefunction=talaris2014 instead of scorefunction="talaris2014") is a good indicator that this is your problem. 3) If you are a developer and neither 1 nor 2 worked - email the developer’s mailing list or try Slack. 4) If you are an academic or commercial user - try the Rosetta Forums https://www.rosettacommons.org/forum Error messages were: Error: AttValue: " or ' expected 1: <dock_design> 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> Error: attributes construct error 1: <dock_design> 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> Error: Couldn't find end of Start Tag fullatom line 3 1: <dock_design> 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> Error: Opening and ending tag mismatch: SCOREFXNS line 2 and fullatom 1: <dock_design> 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> 9: <Sasa name=sasa confidence=0/> Error: Opening and ending tag mismatch: dock_design line 1 and SCOREFXNS 1: <dock_design> 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> 9: <Sasa name=sasa confidence=0/> 10: <ShapeComplementarity name=shape verbose=1 confidence=0 jump=1/> Error: Extra content at the end of the document 2: <SCOREFXNS> 3: <fullatom weights=talaris2013 symmetric=0> 4: </fullatom> 5: </SCOREFXNS> 6: 7: <FILTERS> 8: <Ddg name=Isc scorefxn=fullatom threshold=0 jump=1 repeats=1 repack=0 confidence=1/> 9: <Sasa name=sasa confidence=0/> 10: <ShapeComplementarity name=shape verbose=1 confidence=0 jump=1/> 11: </FILTERS> 12: ------------------------------------------------------------ Warning messages were: ------------------------------------------------------------ ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. DummyMover::apply() should never have been called! (JobDistributor/Parser should have replaced DummyMover.) ERROR: Function not implemented. ERROR:: Exit from: ......srcappspublicboincminirosetta.cc line: 101 14:28:43 (1848): called boinc_finish(0) </stderr_txt> ]]> Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Looks like another bunch of dodgy Work Units. Yes, I have a bunch of them - almost 10%. But they have started to abort the latest ones, so I think they have caught the problem. |
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
©2024 University of Washington
https://www.bakerlab.org