Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 48 · 49 · 50 · 51 · 52 · 53 · 54 . . . 310 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1729 Credit: 18,490,957 RAC: 20,862 |
In the last couple days I've noticed a number of failed tasks, they all start with 3cl in the name, here is an example.Try limiting the number of cores you use to process work (or add more system RAM) - it's a memory issue. I used to get the same errors that you are getting when i was running all 6c/12t with only 16GB of RAM, once i upgraded my system to 32GB RAM i've had no such errors since. You generally need to allow for 1.3GB of RAM per running Task. Many Tasks use a lot lot less, quite few Tasks use a hell of a lot more. If a Task requires more RAM, it should gracefully suspend until there's enough RAM to continue, but that isn't always the case. Edit- having said that, i just had one of those WUs do the same thing on my system, yet was processed OK on another system, and even though i've processed several others of the same type with no problems. 3cl_7aa_6lu7_modified_AVLstub_relaxed_renumbered_0074_110_extract_B_SAVE_ALL_OUT_927956_74_0 [pre]<core_client_version>7.6.33</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741819 (0xc0000005) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @3cl_7aa_6lu7_modified_AVLstub_relaxed_renumbered_0074_110_extract_B.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2443625 Using database: database_357d5d93529_n_methylminirosetta_database Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF63B7D1D48 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 05/12/20 22:11:33 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots5;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 0000000037c20000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb ModLoad: 00000000f3140000 00000000001f0000 C:WINDOWSSYSTEM32ntdll.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 00000000f1500000 00000000000b2000 C:WINDOWSSystem32KERNEL32.DLL (6.2.18362.329) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 00000000f06e0000 00000000002a3000 C:WINDOWSSystem32KERNELBASE.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.18362.329 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.329 ModLoad: 00000000f18a0000 000000000006f000 C:WINDOWSSystem32WS2_32.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f1710000 0000000000120000 C:WINDOWSSystem32RPCRT4.dll (6.2.18362.628) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f1b20000 0000000000194000 C:WINDOWSSystem32USER32.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.17134.343 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.17134.343 ModLoad: 00000000f00d0000 0000000000021000 C:WINDOWSSystem32win32u.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.18362.719 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.719 ModLoad: 00000000f28e0000 0000000000026000 C:WINDOWSSystem32GDI32.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f0100000 0000000000194000 C:WINDOWSSystem32gdi32full.dll (6.2.18362.719) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.18362.719 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.719 ModLoad: 00000000f04f0000 000000000009e000 C:WINDOWSSystem32msvcp_win.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 00000000f05e0000 00000000000fa000 C:WINDOWSSystem32ucrtbase.dll (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 00000000f15c0000 00000000000a3000 C:WINDOWSSystem32ADVAPI32.dll (6.2.18362.329) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f1250000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.18362.1 ModLoad: 00000000f1670000 0000000000097000 C:WINDOWSSystem32sechost.dll (6.2.18362.693) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f2440000 000000000002e000 C:WINDOWSSystem32IMM32.DLL (6.2.18362.387) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.18362.387 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.387 ModLoad: 00000000f00b0000 0000000000011000 C:WINDOWSSystem32kernel.appcore.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000ef060000 0000000000031000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000e1640000 00000000001f4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 ModLoad: 00000000f1110000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.18362.295) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.18362.295 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.295 ModLoad: 00000000eafb0000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.18362.1) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.18362.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.18362.1 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 5000, Write: 662, Other 13721 - I/O Transfers Counters - Read: 14491204, Write: 9881, Other 6728 - Paged Pool Usage - QuotaPagedPoolUsage: 317448, QuotaPeakPagedPoolUsage: 317752 QuotaNonPagedPoolUsage: 6792, QuotaPeakNonPagedPoolUsage: 7352 - Virtual Memory Usage - VirtualSize: 83140608, PeakVirtualSize: 895655936 - Pagefile Usage - PagefileUsage: 83140608, PeakPagefileUsage: 83140608 - Working Set Size - WorkingSetSize: 103694336, PeakWorkingSetSize: 103698432, PageFaultCount: 25722 *** Dump of thread ID 9108 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF63B7D1D48 - Registers - rax=000000000000003a rbx=0000000060d95750 rcx=00000000617a2ac0 rdx=0000000061882bf8 rsi=000000000000000b rdi=00000000617a2ac0 r8=000000000000003a r9=0000000000000421 r10=000000003b7c6e80 r11=00000000b6545480 r12=0000000037c20000 r13=00000000b655fba0 r14=00000000b6545bc0 r15=000000000048b215 rip=000000003b7d1d48 rsp=00000000b65454f8 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202 - Callstack - ChildEBP RetAddr Args to Child b65454f0 380f831c 00000000 3b7c6d60 3b7c6e80 3b7abe78 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 b6545520 380b935d 60d95750 b65455c0 b6545d40 380a355d rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 b6545550 3b227f10 3c110150 b655fba0 00000000 380a3265 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 b6545580 380a39e8 b6546230 e0000000 b6545b88 b6545c10 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 b65455f0 f31e121f 00000000 b6545b70 b6546230 b6546230 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 b6545620 f31aa289 00000001 37c20000 00000000 3d1ba32c ntdll!__chkstk+0x0 b6545d30 f31dfe8e 00000030 b6545e09 3b89a450 f317ba17 ntdll!RtlRaiseException+0x0 b65464c0 38363e2b fffffffe 65103d58 ffffffff 383718c5 ntdll!KiUserExceptionDispatcher+0x0 b6546510 38373690 3b89a3a0 65103ab0 3b89a3a0 b6546609 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 b6546640 38489ee8 64a9d698 65479d00 65103ab0 65479d00 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 b65471f0 38424b6c 6556ac10 f317ba17 60cd0000 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b65473f0 3842488e b65474d8 00000000 b65476c0 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6547550 38383da1 b65476c8 00000000 60d95410 b6547790 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6547910 38389f08 b6547c60 b6547c60 b6547c60 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6547f60 383884db 613cac90 b6547fc0 613bb580 613bb580 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b65480c0 382f1fb7 00000000 b65481d0 613bb580 b65483d0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6548230 382f57a6 00000005 38095190 612dfb10 612dfb10 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 b65482a0 382f56cc b65485a8 b6548419 b65485a8 613bb580 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 b6548350 383bb6f5 b65485a8 b6548841 00000000 380b75e8 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 b6548470 383ba592 00000005 b65485a8 b6548780 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6548540 383bad06 00000000 00000000 b6548e60 60cd0000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b65486e0 388171a3 b6548780 b6548e60 ffffff01 380a3e73 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b65489d0 38819d09 00000000 00000001 b6548ae0 b6548e60 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6548d60 38812f8a b6548da0 b6548e60 6459df80 613720c0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6548dc0 38a2cc70 b6548e60 b6549588 612dfb10 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6549550 38a2c6e4 654196d0 654755c0 3d095cc0 380975a6 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b65495b0 38a3603e b65496a0 65419400 b65496c0 b6549e10 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6549d30 38a356d4 911ecceb 911ecbfb 3d007f70 38a56cb4 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6549dc0 38a3578e 00000005 b654a368 613720c0 00000001 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b6549f60 380a081d 61673820 61673820 613720c0 60d96d01 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 b655fb90 380ab215 00000000 00000000 3cfcccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 b655fbd0 f1517bd4 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 b655fc00 f31aced1 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 b655fc80 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 32764 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 2744734720.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Dump of thread ID 30812250 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474836480.000000, Wait Time: 0.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]>[pre] Grant Darwin NT |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
Hi all, Here we are, one of the failed tasks with its specific log file: Don't know where to get the minirosetta database from if it's not downloaded by Boinc itself. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.21_windows_intelx86.exe -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_groove_design_boinc_v1_mod.xml @flags_covid_groove2 -in:file:silent Mini_Protein_binds_COVID-19_groove_design1_8_SAVE_ALL_OUT_IGNORE_THE_REST_4gv6ne9i.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Mini_Protein_binds_COVID-19_groove_design1_8_SAVE_ALL_OUT_IGNORE_THE_REST_4gv6ne9i.zip @Mini_Protein_binds_COVID-19_groove_design1_8_SAVE_ALL_OUT_IGNORE_THE_REST_4gv6ne9i.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3074316 Extracting in slot directory: minirosetta_database.zip unzip: cannot find either ../../projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl.zip or ../../projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl.zip.zip. Using database: minirosetta_database Cannot find database: minirosetta_database </stderr_txt> ]]> |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
I guess it would be worth it to try and install a parallel running Linux distribution on this machine. Adjusting the antivirus settings didn't work at all and it seems to be generally problematic with running on Windows XP. My lack of memory would still be unsolved. |
MeeeK Send message Joined: 7 Feb 16 Posts: 31 Credit: 19,737,304 RAC: 0 |
i also have lots of failed WUs. 53 just yesterday and today! Lots of them with wasted CPU-Time. What the hell is wrong again? All of them have error-code "139 (0x0000008B) Unknown error code" and all of them failed on my System with Ryzen 5 3600X and 32GB @ 3200Mhz (standard) of Ram. the other system (R5 3600 (withoutX) with 32GB@ 3000Mhz (standard)) is running fine. Both Systems are not overclocked on CPU and have XMP turned on. No Gameboost or anything else, Fans at 100% Temperatures are fine. Both are Watercooled in one loop with two 280mm radiators. some examples: https://boinc.bakerlab.org/rosetta/result.php?resultid=1178373908 https://boinc.bakerlab.org/rosetta/result.php?resultid=1178374184 https://boinc.bakerlab.org/rosetta/result.php?resultid=1178329397 https://boinc.bakerlab.org/rosetta/result.php?resultid=1178275970 https://boinc.bakerlab.org/rosetta/result.php?resultid=1178221426 https://boinc.bakerlab.org/rosetta/result.php?resultid=1178221294 somebody any Ideas? its frustrating to see the wasted time without result and no points. |
funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0 |
You're not the only person experiencing signal 11. Please post in the other thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13941 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
i also have lots of failed WUs. Each of your examples ran under Linux, If you scroll down to the stderr output of each, you will see that each of them gave signal 11. Under Linux, signal 11 means segmentation error - in other words, the program tried to execute something that was not marked as executable code. Error code 139 came from a higher level, which did not know what to do about signal 11. This means that there's an error somewhere in the Linux version of Rosetta 4.20. You can't fix that. You can only wait for tasks that either use a corrected version of the program, or have the input files adjusted so that they don't try to use the part of the program that triggers the error. |
MeeeK Send message Joined: 7 Feb 16 Posts: 31 Credit: 19,737,304 RAC: 0 |
thank you. any ideas for solving this problem? maybe reinstall ubuntu or something? yesterday i did the update to 20.04. think thats the problem. but both systems are identical(software). both have same version and i did same update on both. the first system is runing into this problem, the secound is forking fine. |
Rob R Send message Joined: 20 May 14 Posts: 2 Credit: 2,785,677 RAC: 970 |
In the last couple days I've noticed a number of failed tasks, they all start with 3cl in the name, here is an example.Try limiting the number of cores you use to process work (or add more system RAM) - it's a memory issue. Yea, my settings were set to max 50% of ram. I upped it to 75% but I don't think that's the problem. At the time it failed their was only a total of 6.5G ram in use and the 50% limit would have allowed 8GB ram. The absolute limit is set to 10G. Several of the 3cl* tasks have completed just fine it seams to just be a select few that have some kind of problem. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
[snip]
I doubt if you can fix it by doing anything on that computer. You probably have to wait for the project people to do it for you. |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
This WU: https://boinc.bakerlab.org/rosetta/result.php?resultid=1177441830 has only progressed 2 minutes in the last 12 hours. My phone is running other WUs just fine - having reported 4 valid WUs today for Rosetta, and more for smaller projects like WCG. It just isn't running. I'll provide any additional information as requested, as soon as possible. Its deadline is tomorrow, and it has been stuck at 16% (4 hrs 10 minutes, now up to 12 minutes) for ages. What should I do? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
This WU: For tasks like that, I'd let try to run for about twice the time limit set for your Rosetta@home tasks, then abort it. This is in case the problem is only in reporting the progress. Aborting it is in case it is now waiting for something that will never happen, In the meantime, you might try making more memory available for it, by blocking any other tasks from starting but allowing any tasks that have started to continue until they finish. |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
This WU: Hmm - I paused one or two other WUs - and it started working again. Rip. I'll have to babysit Rosetta WUs on that device to make sure no more than 3 of them are running at once. At least it's a phone - so it won't be as much of a bother. Thanks so much, Robert. |
slowbook Send message Joined: 10 Mar 20 Posts: 4 Credit: 20,089 RAC: 1 |
I posted a question over in the Q&A Android forum about a large number of error 11 returns on my Android 5.1.1 phone (Galaxy J3 2016). The phone does return some valid work units, and the work units seem to complete when they are sent to someone else's computer, so I suspect my phone is acting up, but I don't really have any good leads. Does anyone have any thoughts? Thanks! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
I posted a question over in the Q&A Android forum about a large number of error 11 returns on my Android 5.1.1 phone (Galaxy J3 2016). The phone does return some valid work units, and the work units seem to complete when they are sent to someone else's computer, so I suspect my phone is acting up, but I don't really have any good leads. Does anyone have any thoughts? See my reply to MeeeK further down this thread. Are you sure that the tasks that were successful when sent to someone else used exactly the same version as on your Galaxy? For example, also using the Android Linux version of 4.20, or some other variety of 4.20 instead? The only failed task I could find for you has another task sent to someone else, but not finished yet. |
slowbook Send message Joined: 10 Mar 20 Posts: 4 Credit: 20,089 RAC: 1 |
I posted a question over in the Q&A Android forum about a large number of error 11 returns on my Android 5.1.1 phone (Galaxy J3 2016). The phone does return some valid work units, and the work units seem to complete when they are sent to someone else's computer, so I suspect my phone is acting up, but I don't really have any good leads. Does anyone have any thoughts? Dear Robert, They've fallen out of my history already (they were earlier this week; my phone doesn't get consistent WU's from Rosetta even though I have it overbalanced towards Rosetta). I do think most of the ones completed successfully by other hosts were on 4.20. Some of them were Windows or Linux, but I do recall seeing at least one successful one done by ARM on 4.20. Would it make sense to dial up the BOINC log level towards 5 and monitor for more WU's? Thanks! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
[snip] Dear Robert, I was talking about the several different versions of 4.20 for different operating systems. As for the BOINC log level, I've never used that, so I don't know whether it will be useful or not. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,555,266 RAC: 8,961 |
its frustrating to see the wasted time without result and no points If you check, you'll find they are being awarded points retrospectively according to their runtime. A clean-up job periodically runs and awards credit as it's not your fault |
JP Send message Joined: 20 Mar 20 Posts: 2 Credit: 102,246 RAC: 6 |
Hi, today all my R@H jobs terminated with error, named "RSA key check failed", directly after BOINC start. 3 tasks were running yesterday, 4 have not yet been started before. The job names were different "hgfp3_xx", "rep212_xx", "rep1153_xx" and "Junior_HalfRoid_xx". Error code and error message are always the same. -185 (0xFFFFFF47) ERR_RESULT_START Here complete message from one example job that already run before. https://boinc.bakerlab.org/rosetta/result.php?resultid=1186603301 <core_client_version>7.16.5</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>database_357d5d93529_n_methyl.zip</file_name> <error_code>-120 (RSA key check failed for file)</error_code> <error_message>signature verification failed</error_message> </file_xfer_error> </message> ]]> Today new jobs were downloaded and run smoothly, so I think there is no general problem. Any idea what went wrong? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
Hi, It looks like all of the successful tasks extracted part of the input from one database zip file, and all of the failed tasks tried (and failed) to extract something similar from a different database zip file. This looks likely to mean that the database zip file with the longer name downloaded with an error, and therefore caused all of the tasks trying to use it to fail. If someone can tell you how to download a replacement for the file with the error, this should fix the problem. I'm sorry that I can't do that. The cause MIGHT be an overly aggressive antivirus program somewhere along the path from the download server to you chopping off the end of the file, but it's hard to be sure unless you can compare the copy with the error to a corrected version. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The databases are all in the root of the downloads directory https://boinc.bakerlab.org/rosetta/download/database_357d5d93529_n_methyl.zip Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org