Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 95 · 96 · 97 · 98 · 99 · 100 · 101 . . . 309 · Next
Author | Message |
---|---|
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
My PCs have 8, 8, 8, 16, 36, 36, and 64GB. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
I disagree. I'm sure you can become a brilliant biologist without knowing the first thing about coding.What? It's a valid point. Why would you think a biologist knows about the code? You reframed the argument from talking about "any" biologist to all of them. I've noticed that you use that move when you start losing in an argument (that is to say, all of them) and it's annoying. Also, your new version of the argument is kind of a "Captain Obvious" eyeroller. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I've noticed than some of the latest Tasks aren't checkpointing properly, so if you interrupt them they will revert back to the last successful checkpoint.It sometimes helps to shut down BOINC, then restart BOINC and then the task.OK, thank you. I just tried that (rebooted in between as well), and it reset to ~16% complete... Maybe I just kill it if it's going to run forever? Next time, just let it run- the default time is 8 hours, and there is a 10 hour watchdog timer in case it's not done within 8 hours. If it's still going after 20hours, then you might want to kill it off. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
I've had about a 50% failure rate for the miniprotein_relax8_ Tasks so far. 4 completed & Validated, 5 errored out.Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3221225477 (0xc0000005)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol fr_cart_fast.xml @fr_flags_bcov2 -in:file:silent miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_4sp2hc8l.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_4sp2hc8l.zip @miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_4sp2hc8l.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3662000 Using database: database_357d5d93529_n_methylminirosetta_database Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF620BF8316 read attempt to address 0xFFFFFFFF Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 04/11/21 06:44:57 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots8;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 0000000020720000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb ModLoad: 0000000028070000 00000000001f5000 C:WINDOWSSYSTEM32ntdll.dll (6.2.19041.844) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 0000000027c10000 00000000000bd000 C:WINDOWSSystem32KERNEL32.DLL (6.2.19041.804) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 0000000025c50000 00000000002c9000 C:WINDOWSSystem32KERNELBASE.dll (6.2.19041.804) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 0000000027de0000 000000000006b000 C:WINDOWSSystem32WS2_32.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 0000000027a80000 000000000012b000 C:WINDOWSSystem32RPCRT4.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 0000000027250000 00000000001a0000 C:WINDOWSSystem32USER32.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.19038.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19038.1 ModLoad: 00000000257b0000 0000000000022000 C:WINDOWSSystem32win32u.dll (6.2.19041.867) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.19041.867 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.867 ModLoad: 0000000027520000 000000000002a000 C:WINDOWSSystem32GDI32.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.19041.746 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.746 ModLoad: 0000000025f20000 000000000010b000 C:WINDOWSSystem32gdi32full.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.19041.746 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.746 ModLoad: 0000000026030000 000000000009d000 C:WINDOWSSystem32msvcp_win.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 0000000025910000 0000000000100000 C:WINDOWSSystem32ucrtbase.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 0000000026ef0000 00000000000ac000 C:WINDOWSSystem32ADVAPI32.dll (6.2.19041.610) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000260d0000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.19041.546 ModLoad: 0000000027680000 000000000009c000 C:WINDOWSSystem32sechost.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 0000000026700000 0000000000030000 C:WINDOWSSystem32IMM32.DLL (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 0000000023730000 0000000000012000 C:WINDOWSSYSTEM32kernel.appcore.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 0000000024500000 0000000000033000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 0000000025170000 000000000000c000 C:WINDOWSSYSTEM32CRYPTBASE.DLL (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : cryptbase.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 0000000025a10000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.19041.662) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.19041.662 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.662 ModLoad: 0000000020110000 00000000001e4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.19041.867) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.19041.867 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.867 ModLoad: 00000000206e0000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 11302, Write: 7110, Other 38617 - I/O Transfers Counters - Read: 37800841, Write: 25327698, Other 29684 - Paged Pool Usage - QuotaPagedPoolUsage: 318232, QuotaPeakPagedPoolUsage: 318408 QuotaNonPagedPoolUsage: 26784, QuotaPeakNonPagedPoolUsage: 28280 - Virtual Memory Usage - VirtualSize: 689004544, PeakVirtualSize: -1985282048 - Pagefile Usage - PagefileUsage: 689004544, PeakPagefileUsage: 1726263296 - Working Set Size - WorkingSetSize: 707022848, PeakWorkingSetSize: 1742659584, PageFaultCount: 3912026 *** Dump of thread ID 6932 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF620BF8316 read attempt to address 0xFFFFFFFF - Registers - rax=000000000000003a rbx=000000000eaa4760 rcx=000000000f4a4ac0 rdx=000000000f584bf8 rsi=000000000000000b rdi=000000000f4a4ac0 r8=000000000000003a r9=0000000000000421 r10=00000000242c6e80 r11=0000000020746600 r12=0000000020720000 r13=000000002075f960 r14=0000000020746d40 r15=000000000048b215 rip=0000000020bf8316 rsp=0000000020746680 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 - Callstack - ChildEBP RetAddr Args to Child 207466a0 20bb935d 0eaa4760 20746740 20bab215 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 207466d0 23d27f10 24c10150 2075f960 00000000 00000001 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 20746700 20ba39e8 25cba32c 20720000 207467f0 280a0e7b rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 20746770 28111f6f 00000000 20746cf0 207473b0 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 207467a0 280c1454 00000000 20746cf0 207473b0 00000000 ntdll!__chkstk+0x0 20746eb0 28110a9e 20746fd0 20df2be9 207476f0 207472f0 ntdll!RtlRaiseException+0x0 20747650 20dc2938 00000000 00000001 207476f0 20ba3615 ntdll!KiUserExceptionDispatcher+0x0 20747690 2162c536 00000001 207477a0 2437b498 00000000 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 20747890 2341b619 20747a40 20748e40 2595fad0 20747730 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20747b70 23415c06 20748330 0f375760 0f375760 20747c80 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 20748910 2341d5f3 37c78970 20748a18 00000000 00000000 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 20749080 215fff1c fffffffe 00000000 39fdca78 971b56f6 rosetta_4.20_windows_x86_64!cppdb::mutex::~mutex+0x0 20749150 215fbb43 25dfbc90 69766163 207491b9 00000006 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20749210 215fa41f 0f7acd90 ffffffff 39fdcdd8 ffffffff rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 207494d0 215a2dc0 25b96ec0 0f03af40 25b3e9f0 0f03af40 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20749a70 215a00f8 0f03af40 0f03af40 20749b80 20749b68 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20749c40 2160a4d1 20749cc8 0f03af40 20749cc8 20749e50 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20749c90 215f006d 0eec8600 20749cc8 0eec8600 0eaa5d01 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 20749d20 20ba0dfd 0f4675a0 20749e50 00000000 0eaa5d01 rosetta_4.20_windows_x86_64!xmlCheckHTTPInput+0x0 2075f950 20bab215 00000000 00000000 25acccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 2075f990 27c27034 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 2075f9c0 280c2651 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 2075fa40 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 32762 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 775459008.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Dump of thread ID 30879279 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474963456.000000, Wait Time: 0.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Grant Darwin NT |
Jim Martin Send message Joined: 9 Oct 05 Posts: 23 Credit: 1,443,682 RAC: 585 |
Brian -- The first Rosetta@home work units have arrived, which appear to be working well. They are: TMWFYIU pre-helical-bundles-round1 (I did not use the underscore, because a red line resulted.) miniprotein-relax8 They are in the mid-40% range of completion. Here's hoping. . . jm |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
If it's too big, I run it on a bigger machine. My PCs have 8, 8, 8, 16, 36, 36, and 64GB.I allocate 28Gb from 32Gb totalOnce you've started them, I wonder if Boinc adjusts to what they're actually using, or leaves the requested 6.5GB there just in case? You could see what happens if you try to run only Rosettas. I wasn't clear. My main PC has 28Gb allocated out of 32Gb My recent problems have been on my laptop with 6.7Gb free out of 8Gb As it turned out, the problem task eventually ran ok, I unchecked NNT and downloaded fresh Rosetta tasks and they started up straight away on all cores. Not sure if that confirms the initial RAM req't is only large before the task starts running, then reduces to only what it uses once started. Kind of seems that way though. |
Jim Martin Send message Joined: 9 Oct 05 Posts: 23 Credit: 1,443,682 RAC: 585 |
A reply, here, to Robert, Brian, if I may. Robert -- I, too, wondered if that wasn't an indication of a coding glitch. jm |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I just suspended a task that has been running for 16:19:35 (stuck at 00:10:15 remaining). Any ideas how to get this one to finish and get credit for it?I saw an Android task run for exactly 24 hours (3 times the normal 8). Maybe they're meant to since Androids are slower than PCs? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
It can be. But I just had an LHC Atlas task running for 24 hours instead of about 3, and the CPU usage was zero. Mind you those use virtualbox, which is a pain in the ass.I just suspended a task that has been running for 16:19:35 (stuck at 00:10:15 remaining). Any ideas how to get this one to finish and get credit for it?It sometimes helps to shut down BOINC, then restart BOINC and then the task. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
What an absurd analogy. So you must think everyone who owns a decent car has a small appendage?My PCs have 8, 8, 8, 16, 36, 36, and 64GB. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
No idea what you think I've changed, we're all talking about biologists that don't have to be programmers. Just like hairdressers don't have to be racing drivers.I disagree. I'm sure you can become a brilliant biologist without knowing the first thing about coding.What? It's a valid point. Why would you think a biologist knows about the code? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I understood you perfectly. If your laptop can't handle the big tasks, get Rosetta to only run on the big machine. Or just don't get so upset when something queues up.If it's too big, I run it on a bigger machine. My PCs have 8, 8, 8, 16, 36, 36, and 64GB.I wasn't clear. My main PC has 28Gb allocated out of 32Gb |
DizzyD Send message Joined: 23 Nov 20 Posts: 6 Credit: 1,438,330 RAC: 0 |
I've noticed than some of the latest Tasks aren't checkpointing properly, so if you interrupt them they will revert back to the last successful checkpoint. Grant, thank you for your reply. I don't quite understand your "20 hours" comment. I let the task run for 16 hours. If there is a watchdog timer at 10 hours, what is the different between anything over 10 hours (e.g. 11 hours, 16 hours and 20 hours) not completing? Isn't it just stuck at that point? |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
I've noticed than some of the latest Tasks aren't checkpointing properly, so if you interrupt them they will revert back to the last successful checkpoint. The watchdog isn't at 10 hours. It's 10 hours AFTER whatever the CPU runtime setting is at. So, if you are running with the default setting, which is 8 CPU hours, then the watchdog will only kick in at 18 hours. What Grant meant is that considering the watchdog should kick in at 18 hours, if the task is still running at 20 hours, you might want to abort it. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
No idea what you think I've changed I know. It's that damned Dunning-Kruger thingy. |
jsm Send message Joined: 4 Apr 20 Posts: 3 Credit: 77,825,233 RAC: 32,838 |
Running at 22 hours has substantially reduced the bandwidth hog but detailed checking has turned up a query. All the computers are asking the scheduler every minute or so for new tasks to be told 'no can do you have plenty' (I paraphrase). This is clearly putting an unnecessary load on the scheduler and contributing to my bandwidth loss. Is there a way to instruct the preferences only to seek additional work every so often eg 1 hour? capt |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
Running at 22 hours has substantially reduced the bandwidth hog but detailed checking has turned up a query. All the computers are asking the scheduler every minute or so for new tasks to be told 'no can do you have plenty' (I paraphrase). This is clearly putting an unnecessary load on the scheduler and contributing to my bandwidth loss. Is there a way to instruct the preferences only to seek additional work every so often eg 1 hour?How often it asks for work depends on the number of cores/threads you have, the amount of time the system is actually able to process work, and most importantly- on your cache settings. The fact that many of your Tasks time out before you even return them due to missed deadlines indicates your cache setting is way, way, way, way too large. The estimated completion time for all Tasks, regardless of how long your CPU Target time is set to is 8 hours. So having a multi-day cache, combined with a longer than the default 8 hour Target CPU time is going to result in endless requests for work, and huge numbers of Tasks missing their deadlines. In your computing preferences, Other Store at least 0.01 days of work Store up to an additional 0.01 days of workAnd they will stop trashing Work Units due to missed deadlines, and stop continually asking for more work. If you go back to the default 8 hours in the future, you could then bump up the "Store at least 0.01 days of work" to something like 0.2 to maintain a reasonable buffer, that won't result in missed deadlines when things change. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
No context, no conversation.No idea what you think I've changedI know. It's that damned Dunning-Kruger thingy. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran. [1] Who has 8GB on a machine they actually interact with? You could maybe load Windows 10 and 1 application. But dare to play a game, or use email and a photo editor at once and it'll grind to a halt. Another example of modern shoddy lazy bloated programming. I can boot Linux off a 1GB flash drive. Yet Windows is 20 times bigger. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
No context, no conversation.No idea what you think I've changedI know. It's that damned Dunning-Kruger thingy. Unless you are a relative -- which you are not -- it's not my duty to compensate for your inability to keep up with a conversation due to age-related infirmities. I counsel making use of Google. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org