Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 309 · Next
Author | Message |
---|---|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
Is this the so called "credit new"?That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well. I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.? If the Credit system worked as intended, regardless of the Task and regardless of the project, and regardless of how long it takes to process a Task, a given machine would get the same amount of Credit per hour for processing work. That way the only reason for choosing a particular project over another would be the project itself. A more powerful system will get more Credit because it's doing more work. If a Task of a given type takes 5min or takes 5 weeks, the amount of Credit awarded per hour should be the same. Of course if there is an application that is more efficient then that will result in more credit due to more work being done within a given time frame- but the amount of work actually being done is the same, so the Credit for that Task should remain unchanged. You just get the benefit of doing more work per hour, so you get a higher RAC thanks to the more optimised application. Unfortunately by it's very design Credit New varies the amount of Credit a Task will get for all sorts of reasons. And then you get projects such as Collatz that completely ignore the definition of a Cobblestone and award ridiculously, excessively inflated amounts for a Task. As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. And THAT is a very good thing, next step climbing again!!! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I spoke too soon.As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. It's gone back to falling again. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. UGH!!! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
And to add to the continually falling RAC, a new batch of work has a roughly 50% failure rate. 11mers_FF2__cyclo_11mer_ are culprits. Runs for 30sec or less and then dies- <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3221225477 (0xc0000005)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @11mers_FF2__cyclo_11mer_LVStub2818_000054_extract_B.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2375971 Using database: database_357d5d93529_n_methylminirosetta_database Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 06/10/21 12:57:52 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots4;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 000000006ded0000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb ModLoad: 00000000d9bb0000 00000000001f5000 C:WINDOWSSYSTEM32ntdll.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d8010000 00000000000bd000 C:WINDOWSSystem32KERNEL32.DLL (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d76f0000 00000000002c8000 C:WINDOWSSystem32KERNELBASE.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d81c0000 000000000006b000 C:WINDOWSSystem32WS2_32.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d83e0000 000000000012b000 C:WINDOWSSystem32RPCRT4.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d94b0000 00000000001a0000 C:WINDOWSSystem32USER32.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.19038.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19038.1 ModLoad: 00000000d7500000 0000000000022000 C:WINDOWSSystem32win32u.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.19041.906 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.906 ModLoad: 00000000d8640000 000000000002a000 C:WINDOWSSystem32GDI32.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.19041.746 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.746 ModLoad: 00000000d72f0000 000000000010b000 C:WINDOWSSystem32gdi32full.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.19041.928 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.928 ModLoad: 00000000d7b70000 000000000009d000 C:WINDOWSSystem32msvcp_win.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 00000000d7400000 0000000000100000 C:WINDOWSSystem32ucrtbase.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 00000000d8bd0000 00000000000ac000 C:WINDOWSSystem32ADVAPI32.dll (6.2.19041.610) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d8c80000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.19041.546 ModLoad: 00000000d8290000 000000000009b000 C:WINDOWSSystem32sechost.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d8670000 0000000000030000 C:WINDOWSSystem32IMM32.DLL (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d5250000 0000000000012000 C:WINDOWSSYSTEM32kernel.appcore.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d6030000 0000000000033000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d1530000 00000000001e4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.19041.867) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.19041.867 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.867 ModLoad: 00000000d2270000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d7590000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.19041.662) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.19041.662 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.662 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 5698, Write: 646, Other 13797 - I/O Transfers Counters - Read: 17336636, Write: 14175, Other 6664 - Paged Pool Usage - QuotaPagedPoolUsage: 317096, QuotaPeakPagedPoolUsage: 317376 QuotaNonPagedPoolUsage: 7200, QuotaPeakNonPagedPoolUsage: 7352 - Virtual Memory Usage - VirtualSize: 83505152, PeakVirtualSize: 895533056 - Pagefile Usage - PagefileUsage: 83505152, PeakPagefileUsage: 83505152 - Working Set Size - WorkingSetSize: 119312384, PeakWorkingSetSize: 119316480, PageFaultCount: 29541 *** Dump of thread ID 7328 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 - Registers - rax=000000000000003a rbx=0000000002c36ae0 rcx=0000000003642ac0 rdx=0000000003722bf8 rsi=000000000000000b rdi=0000000003642ac0 r8=000000000000003a r9=0000000000000421 r10=0000000071a76e80 r11=0000000040745240 r12=000000006ded0000 r13=000000004075f960 r14=0000000040745980 r15=000000000048b215 rip=0000000071b2d578 rsp=00000000407452b8 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 - Callstack - ChildEBP RetAddr Args to Child 407452b0 6e3a831c 00000000 71a76d60 71a76e80 40745298 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 407452e0 6e36935d 02c36ae0 40745380 6e35b215 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 40745310 714d7f10 723c0150 4075f960 00000000 00000001 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 40745340 6e3539e8 7346a32c 6ded0000 40745430 d9be0e7b rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 407453b0 d9c5207f 00000000 40745930 40745ff0 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 407453e0 d9c01454 00000000 40745930 40745ff0 00000000 ntdll!__chkstk+0x0 40745af0 d9c50bae 02c20000 40745bc9 71b4a450 d9bdb3c7 ntdll!RtlRaiseException+0x0 40746280 6e613e2b fffffffe 06eb4dc8 ffffffff 6e6218c5 ntdll!KiUserExceptionDispatcher+0x0 407462d0 6e623690 71b4a3a0 06eb4b20 71b4a3a0 407463c9 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40746400 6e739ee8 06923ea8 07290720 06eb4b20 07290720 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40746fb0 6e6d4b6c 07445680 d9bdb3c7 08b80000 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407471b0 6e6d488e 40747298 00000000 40747480 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747310 6e633da1 40747488 00000000 02c34920 40747550 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407476d0 6e639f08 40747a20 40747a20 40747a20 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747d20 6e6384db 03310330 40747d80 031d9b90 031d9b90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747e80 6e5a1fb7 00000000 40747f90 031d9b90 40748190 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747ff0 6e5a57a6 00000005 6e345190 031f9b40 031f9b40 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748060 6e5a56cc 40748368 407481d9 40748368 031d9b90 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748110 6e66b6f5 40748368 40748641 00000000 6e3675e8 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748230 6e66a592 00000005 40748368 40748540 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748300 6e66ad06 00000000 00000000 40748c20 08b80000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407484a0 6eac71a3 40748540 40748c20 ffffff01 6e353e73 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748790 6eac9d09 00000000 00000001 407488a0 40748c20 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748b20 6eac2f8a 40748b60 40748c20 06730c40 0346fb90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748b80 6ecdcc70 40748c20 40749348 031f9b40 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749310 6ecdc6e4 0716e7d0 07289100 73345cc0 6e3475a6 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749370 6ece603e 40749460 0716e500 40749480 40749bd0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749af0 6ece56d4 2305e878 2305eb68 732b7f70 6ed06cb4 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749b80 6ece578e 00000005 4074a128 0346fb90 00000001 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749d20 6e35081d 034ff820 034ff820 0346fb90 02c37e01 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 4075f950 6e35b215 00000000 00000000 7327ccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 4075f990 d8027034 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 4075f9c0 d9c02651 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 4075fa40 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 32761 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 2428590080.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Dump of thread ID 30891432 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474836480.000000, Wait Time: 0.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
Looks like we've got a server issue. Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none. As a result, Work in progress is taking a dive. Around 21:30 Monday server time was when things fell over. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
Looks like we've got a server issue. Correct... :( The work unit generator was being worked on yesterday to add support for VM applications and it was crashing. The good news is it was fixed a couple of hours ago - tasks coming down here |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
It lives! Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
Not sure what's going on, but the amount of work In progress ahs been falling away slowly but surely for 2 days now. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
Not sure what's going on, but the amount of work In progress has been falling away slowly but surely for 2 days now. Not sure either. The mix of tasks doesn't imply anything different recently. I put in that request to reduce the disk-demand for pre_helical_bundles tasks, but it hasn't happened. That might've improved things, but by not changing it wouldn't result in any change to In Progress tasks you're seeing. No idea |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_1mq3ho7q_1391020_4 https://boinc.bakerlab.org/rosetta/result.php?resultid=1400578710 I had to abort this task after running for nearly 2 days elapsed time. I stalled out around 80%. No movement in the graphics, no increase/decrease in the CPU usage rate, no counting up on completion, nothing. I shut my system down at night after running a huge electric bill last year running 24/7, but my shut down sequence is to suspend the tasks, shut down the client and then exit. The leave non GPU tasks in memory while suspended is checked. I have no idea what could cause it to stall. I had this problem with another pre helical task in the past. Any ideas how to prevent this or what is causing it? How long does this kind of task take to complete? I run 16-17 hours a day, change tasks is set at 6 hours. oh...and cpu run time was 7 hours in 1.5 days. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
Any ideas how to prevent this or what is causing it?I suspect that things aren't actually stalling as such, it's just that your system so ridiculously overcommitted trying to do work that some Tasks take forever to complete so it looks like they have stalled. Even those that do complete take way more time than they should. eg This Task of yours. rb_06_04_79633_77510_ab_t000__h002_robetta_IGNORE_THE_REST_04_10_1395231_248_0 Run time 22 hours 21 min 14 sec CPU time 7 hours 59 min 47 sec And one of your Moo wrapper CPU Tasks is even worse dnetc_r72_1624557342_9_9_0 Run time 7 hours 2 min 3 sec CPU time 23 min 18 sec 22 hours to do 8 hours of work is bad enough, but 7 hours to do 23 minutes of work is beyond ridiculous. Compared to one of my Rosetta Tasks. pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_3rl4pd6v_1391037_4_1 Run time 7 hours 59 min 37 sec CPU time 7 hours 57 min 5 sec Check Task Manager or use Process Explorer to see what else is running on the system apart from BOINC projects. If there's nothing else and it's just BOINC projects, then you need to reserve a CPU core to support each GPU Task that is running. Running multiple GPU tasks at the same time as trying to process CPU work on CPU cores that are trying to support the GPU work is the most likely cause of all your issues. Reserve a CPU core for each GPU Task being run for each project doing GPU work, make sure that there are no other processes sucking up CPU time and your system output for all BOINC projects will improve hugely with all the CPU Tasks that no longer take 7 hours to do 23 minutes of work! Ah! I though this sounded familiar & now i remember we have had this conversation before, and back then you mentioned you run Folding at home as well. I told you then what needed to be done, it appears you haven't done it, hence you are still having the same issues. Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ah ha! FAH is the culprit. OK....taking that off CPU then. I have an interest in so many things, I guess I will have to eliminate a few. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I don't know how FAH got ahold of my CPU, maybe after an update or something. Anyway it's eliminated from CPU now. Thanks for the reminder. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I don't know how FAH got ahold of my CPU, maybe after an update or something.If you are still doing FAH on the GPU, you'll need to check how much CPU time is required to support the GPU Tasks and then reduce the number of CPU cores/threads BOINC can use to stop overcommitting the CPU. Bring up Task Manager/Process Explorer and see how much CPU time is being used by the Folding@home GPU application. If it needs 1 CPU core/thread per Task, and you're running 4 GPU Tasks then in your Account, Computing preferences, "Use at most xxx % of the CPUs" should be set to 75%. If you're running only 2 GPU Tasks, and they only need 0.5 CPU cores/threads to support them, then set "Use at most xxx % of the CPUs" to 7%. 2 GPU Tasks, 1 CPU Core/Thread to support them then set it to 13%, etc, etc. Save those changes, then the next time BOINC contacts the Scheduler for the project you made the changes in (or you select it and hit Update in the BOINC Manager) and then the changes will take effect. While Folding@home doing CPU work is probably having the greatest effect, Reserving a CPU/Core thread for the Moowrapper (and any other GPU projects you do) will also be necessary to stop the CPU from being overcommitted and result in CPU time & Runtimes becoming almost identical. Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I don't know how FAH got ahold of my CPU, maybe after an update or something. I usually use just the GPU on Folding, and delete the CPU slot. But it takes more than one shot. After the first reboot, it comes back. If I delete it two or three times, it usually gets the message and stays deleted. But occasionally it comes back from the dead even after that; maybe due to an update. It does its own thing. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error. pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m_1389982_5_1 <core_client_version>7.16.16</core_client_version> <![CDATA[ <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_arm-android-linux-gnu -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3400220 Extracting in project directory: database_357d5d93529_n_methyl.zip Using database: database_357d5d93529_n_methyl/minirosetta_database ERROR: [ERROR] Unable to open constraints file: f9b5372889da3e02c19de86d067b31e6_0001.MSAcst ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 457 called boinc_finish(0) </stderr_txt> ]]> pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f_1389895_5_1 <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1087567 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: [ERROR] Unable to open constraints file: f5d52c1749a40719598f1d8b37e13c45_0001.MSAcst ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457 BOINC:: Error reading and gzipping output datafile: default.out 10:42:25 (29468): called boinc_finish(1) </stderr_txt> ]]> Funny to note that my Snapdragon 888 is beating my Ryzen 5 3600 and gets much more consistent credits per task. My phone is getting around 394 credits for every 8hr task it gets, My Ryzen 5 3600 gets a measly 346 credits per 8 hour task. Either something is up with the credit calculation, or there is something about these pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks that make them particularly good on ARM. I'm leaning towards the former explanation because: 1) the benchmarks for my phone are 5887.99 million ops/sec floating point speed and 29296.86 million ops/sec integer speed , whilst my 3600 gets 5198.63 million ops/sec floating point speed and 19515.05 million ops/sec integer speed. 2) My 3600 seems to be consistently generating more "decoys" than my phone despite the credit deficit. Probably a BOINC issue that's been beaten to death already. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error.Those errors have been occurring with some pre_helical_bundles_ Tasks ever since they were released months ago. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Understood, thanks! |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org