Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 309 · Next

AuthorMessage
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 102028 - Posted: 8 Jun 2021, 3:08:40 UTC - in response to Message 102026.  

Is this the so called "credit new"?
That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well.


I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.
ID: 102028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102029 - Posted: 8 Jun 2021, 7:13:47 UTC - in response to Message 102028.  

I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.
?
If the Credit system worked as intended, regardless of the Task and regardless of the project, and regardless of how long it takes to process a Task, a given machine would get the same amount of Credit per hour for processing work.
That way the only reason for choosing a particular project over another would be the project itself. A more powerful system will get more Credit because it's doing more work. If a Task of a given type takes 5min or takes 5 weeks, the amount of Credit awarded per hour should be the same.
Of course if there is an application that is more efficient then that will result in more credit due to more work being done within a given time frame- but the amount of work actually being done is the same, so the Credit for that Task should remain unchanged. You just get the benefit of doing more work per hour, so you get a higher RAC thanks to the more optimised application.

Unfortunately by it's very design Credit New varies the amount of Credit a Task will get for all sorts of reasons. And then you get projects such as Collatz that completely ignore the definition of a Cobblestone and award ridiculously, excessively inflated amounts for a Task.



As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.
Grant
Darwin NT
ID: 102029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 102034 - Posted: 8 Jun 2021, 22:38:03 UTC - in response to Message 102029.  

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!
ID: 102034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102040 - Posted: 9 Jun 2021, 8:23:31 UTC - in response to Message 102034.  
Last modified: 9 Jun 2021, 8:27:31 UTC

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!
I spoke too soon.
It's gone back to falling again.
Grant
Darwin NT
ID: 102040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 102047 - Posted: 9 Jun 2021, 23:16:43 UTC - in response to Message 102040.  

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!


I spoke too soon.
It's gone back to falling again.


UGH!!!
ID: 102047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102050 - Posted: 10 Jun 2021, 8:44:17 UTC

And to add to the continually falling RAC, a new batch of work has a roughly 50% failure rate.
11mers_FF2__cyclo_11mer_ are culprits.


Runs for 30sec or less and then dies-
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @11mers_FF2__cyclo_11mer_LVStub2818_000054_extract_B.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2375971
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 06/10/21 12:57:52
Install Directory : C:Program FilesBOINC
Data Directory    : C:ProgramDataBOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots4;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore


ModLoad: 000000006ded0000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded)
    Linked PDB Filename   : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb

ModLoad: 00000000d9bb0000 00000000001f5000 C:WINDOWSSYSTEM32ntdll.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d8010000 00000000000bd000 C:WINDOWSSystem32KERNEL32.DLL (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d76f0000 00000000002c8000 C:WINDOWSSystem32KERNELBASE.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d81c0000 000000000006b000 C:WINDOWSSystem32WS2_32.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : ws2_32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d83e0000 000000000012b000 C:WINDOWSSystem32RPCRT4.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d94b0000 00000000001a0000 C:WINDOWSSystem32USER32.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 10.0.19038.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19038.1

ModLoad: 00000000d7500000 0000000000022000 C:WINDOWSSystem32win32u.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : win32u.pdb
    File Version          : 10.0.19041.906 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.906

ModLoad: 00000000d8640000 000000000002a000 C:WINDOWSSystem32GDI32.dll (6.2.19041.746) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 10.0.19041.746 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.746

ModLoad: 00000000d72f0000 000000000010b000 C:WINDOWSSystem32gdi32full.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32full.pdb
    File Version          : 10.0.19041.928 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.928

ModLoad: 00000000d7b70000 000000000009d000 C:WINDOWSSystem32msvcp_win.dll (6.2.19041.789) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.19041.789 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.789

ModLoad: 00000000d7400000 0000000000100000 C:WINDOWSSystem32ucrtbase.dll (6.2.19041.789) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.19041.789 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.789

ModLoad: 00000000d8bd0000 00000000000ac000 C:WINDOWSSystem32ADVAPI32.dll (6.2.19041.610) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d8c80000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.19041.546

ModLoad: 00000000d8290000 000000000009b000 C:WINDOWSSystem32sechost.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d8670000 0000000000030000 C:WINDOWSSystem32IMM32.DLL (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d5250000 0000000000012000 C:WINDOWSSYSTEM32kernel.appcore.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d6030000 0000000000033000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntmarta.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d1530000 00000000001e4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.19041.867) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.19041.867 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.867

ModLoad: 00000000d2270000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d7590000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.19041.662) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcryptprimitives.pdb
    File Version          : 10.0.19041.662 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.662



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 5698, Write: 646, Other 13797

- I/O Transfers Counters -
Read: 17336636, Write: 14175, Other 6664

- Paged Pool Usage -
QuotaPagedPoolUsage: 317096, QuotaPeakPagedPoolUsage: 317376
QuotaNonPagedPoolUsage: 7200, QuotaPeakNonPagedPoolUsage: 7352

- Virtual Memory Usage -
VirtualSize: 83505152, PeakVirtualSize: 895533056

- Pagefile Usage -
PagefileUsage: 83505152, PeakPagefileUsage: 83505152

- Working Set Size -
WorkingSetSize: 119312384, PeakWorkingSetSize: 119316480, PageFaultCount: 29541

*** Dump of thread ID 7328 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 

- Registers -
rax=000000000000003a rbx=0000000002c36ae0 rcx=0000000003642ac0 rdx=0000000003722bf8 rsi=000000000000000b rdi=0000000003642ac0
r8=000000000000003a r9=0000000000000421 r10=0000000071a76e80 r11=0000000040745240 r12=000000006ded0000 r13=000000004075f960
r14=0000000040745980 r15=000000000048b215 rip=0000000071b2d578 rsp=00000000407452b8 rbp=0000000000000000
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

- Callstack -
ChildEBP RetAddr  Args to Child
407452b0 6e3a831c 00000000 71a76d60 71a76e80 40745298 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 
407452e0 6e36935d 02c36ae0 40745380 6e35b215 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
40745310 714d7f10 723c0150 4075f960 00000000 00000001 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
40745340 6e3539e8 7346a32c 6ded0000 40745430 d9be0e7b rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 
407453b0 d9c5207f 00000000 40745930 40745ff0 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
407453e0 d9c01454 00000000 40745930 40745ff0 00000000 ntdll!__chkstk+0x0 
40745af0 d9c50bae 02c20000 40745bc9 71b4a450 d9bdb3c7 ntdll!RtlRaiseException+0x0 
40746280 6e613e2b fffffffe 06eb4dc8 ffffffff 6e6218c5 ntdll!KiUserExceptionDispatcher+0x0 
407462d0 6e623690 71b4a3a0 06eb4b20 71b4a3a0 407463c9 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40746400 6e739ee8 06923ea8 07290720 06eb4b20 07290720 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40746fb0 6e6d4b6c 07445680 d9bdb3c7 08b80000 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407471b0 6e6d488e 40747298 00000000 40747480 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747310 6e633da1 40747488 00000000 02c34920 40747550 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407476d0 6e639f08 40747a20 40747a20 40747a20 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747d20 6e6384db 03310330 40747d80 031d9b90 031d9b90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747e80 6e5a1fb7 00000000 40747f90 031d9b90 40748190 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747ff0 6e5a57a6 00000005 6e345190 031f9b40 031f9b40 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748060 6e5a56cc 40748368 407481d9 40748368 031d9b90 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748110 6e66b6f5 40748368 40748641 00000000 6e3675e8 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748230 6e66a592 00000005 40748368 40748540 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748300 6e66ad06 00000000 00000000 40748c20 08b80000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407484a0 6eac71a3 40748540 40748c20 ffffff01 6e353e73 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748790 6eac9d09 00000000 00000001 407488a0 40748c20 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748b20 6eac2f8a 40748b60 40748c20 06730c40 0346fb90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748b80 6ecdcc70 40748c20 40749348 031f9b40 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749310 6ecdc6e4 0716e7d0 07289100 73345cc0 6e3475a6 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749370 6ece603e 40749460 0716e500 40749480 40749bd0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749af0 6ece56d4 2305e878 2305eb68 732b7f70 6ed06cb4 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749b80 6ece578e 00000005 4074a128 0346fb90 00000001 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749d20 6e35081d 034ff820 034ff820 0346fb90 02c37e01 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
4075f950 6e35b215 00000000 00000000 7327ccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
4075f990 d8027034 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
4075f9c0 d9c02651 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
4075fa40 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 

*** Dump of thread ID 32761 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 2428590080.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 

*** Dump of thread ID 30891432 (state: Unknown): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474836480.000000, Wait Time: 0.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Grant
Darwin NT
ID: 102050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102075 - Posted: 15 Jun 2021, 6:16:13 UTC

Looks like we've got a server issue.

Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none.
As a result, Work in progress is taking a dive.
Around 21:30 Monday server time was when things fell over.


Grant
Darwin NT
ID: 102075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 102078 - Posted: 15 Jun 2021, 17:11:11 UTC - in response to Message 102075.  

Looks like we've got a server issue.

Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none.
As a result, Work in progress is taking a dive.
Around 21:30 Monday server time was when things fell over.


Correct... :(

The work unit generator was being worked on yesterday to add support for VM applications and it was crashing.

The good news is it was fixed a couple of hours ago - tasks coming down here
ID: 102078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102080 - Posted: 16 Jun 2021, 5:32:26 UTC

It lives!
Grant
Darwin NT
ID: 102080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102102 - Posted: 21 Jun 2021, 5:05:23 UTC

Not sure what's going on, but the amount of work In progress ahs been falling away slowly but surely for 2 days now.


Grant
Darwin NT
ID: 102102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 102104 - Posted: 21 Jun 2021, 9:21:44 UTC - in response to Message 102102.  

Not sure what's going on, but the amount of work In progress has been falling away slowly but surely for 2 days now.

Not sure either. The mix of tasks doesn't imply anything different recently.
I put in that request to reduce the disk-demand for pre_helical_bundles tasks, but it hasn't happened.
That might've improved things, but by not changing it wouldn't result in any change to In Progress tasks you're seeing.
No idea
ID: 102104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102160 - Posted: 3 Jul 2021, 7:54:14 UTC

pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_1mq3ho7q_1391020_4
https://boinc.bakerlab.org/rosetta/result.php?resultid=1400578710

I had to abort this task after running for nearly 2 days elapsed time.
I stalled out around 80%. No movement in the graphics, no increase/decrease in the CPU usage rate, no counting up on completion, nothing.

I shut my system down at night after running a huge electric bill last year running 24/7, but my shut down sequence is to suspend the tasks, shut down the client and then exit.
The leave non GPU tasks in memory while suspended is checked.

I have no idea what could cause it to stall.
I had this problem with another pre helical task in the past.

Any ideas how to prevent this or what is causing it?
How long does this kind of task take to complete?
I run 16-17 hours a day, change tasks is set at 6 hours.
oh...and cpu run time was 7 hours in 1.5 days.
ID: 102160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102161 - Posted: 3 Jul 2021, 9:31:28 UTC - in response to Message 102160.  
Last modified: 3 Jul 2021, 10:00:46 UTC

Any ideas how to prevent this or what is causing it?
I suspect that things aren't actually stalling as such, it's just that your system so ridiculously overcommitted trying to do work that some Tasks take forever to complete so it looks like they have stalled. Even those that do complete take way more time than they should.
eg This Task of yours.
rb_06_04_79633_77510_ab_t000__h002_robetta_IGNORE_THE_REST_04_10_1395231_248_0
Run time 22 hours 21 min 14 sec
CPU time  7 hours 59 min 47 sec

And one of your Moo wrapper CPU Tasks is even worse
dnetc_r72_1624557342_9_9_0

Run time 7 hours 2 min  3 sec
CPU time        23 min 18 sec

22 hours to do 8 hours of work is bad enough, but 7 hours to do 23 minutes of work is beyond ridiculous.

Compared to one of my Rosetta Tasks.
pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_3rl4pd6v_1391037_4_1
Run time 7 hours 59 min 37 sec
CPU time 7 hours 57 min  5 sec

Check Task Manager or use Process Explorer to see what else is running on the system apart from BOINC projects. If there's nothing else and it's just BOINC projects, then you need to reserve a CPU core to support each GPU Task that is running.
Running multiple GPU tasks at the same time as trying to process CPU work on CPU cores that are trying to support the GPU work is the most likely cause of all your issues.

Reserve a CPU core for each GPU Task being run for each project doing GPU work, make sure that there are no other processes sucking up CPU time and your system output for all BOINC projects will improve hugely with all the CPU Tasks that no longer take 7 hours to do 23 minutes of work!


Ah! I though this sounded familiar & now i remember we have had this conversation before, and back then you mentioned you run Folding at home as well.
I told you then what needed to be done, it appears you haven't done it, hence you are still having the same issues.
Grant
Darwin NT
ID: 102161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102163 - Posted: 3 Jul 2021, 12:42:03 UTC - in response to Message 102161.  

ah ha!
FAH is the culprit.
OK....taking that off CPU then.
I have an interest in so many things, I guess I will have to eliminate a few.
ID: 102163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102165 - Posted: 3 Jul 2021, 15:47:34 UTC - in response to Message 102163.  

I don't know how FAH got ahold of my CPU, maybe after an update or something.
Anyway it's eliminated from CPU now.
Thanks for the reminder.
ID: 102165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102166 - Posted: 3 Jul 2021, 21:16:09 UTC - in response to Message 102165.  
Last modified: 3 Jul 2021, 21:19:18 UTC

I don't know how FAH got ahold of my CPU, maybe after an update or something.
Anyway it's eliminated from CPU now.
Thanks for the reminder.
If you are still doing FAH on the GPU, you'll need to check how much CPU time is required to support the GPU Tasks and then reduce the number of CPU cores/threads BOINC can use to stop overcommitting the CPU. Bring up Task Manager/Process Explorer and see how much CPU time is being used by the Folding@home GPU application.

If it needs 1 CPU core/thread per Task, and you're running 4 GPU Tasks then in your Account, Computing preferences, "Use at most xxx % of the CPUs" should be set to 75%. If you're running only 2 GPU Tasks, and they only need 0.5 CPU cores/threads to support them, then set "Use at most xxx % of the CPUs" to 7%. 2 GPU Tasks, 1 CPU Core/Thread to support them then set it to 13%, etc, etc.
Save those changes, then the next time BOINC contacts the Scheduler for the project you made the changes in (or you select it and hit Update in the BOINC Manager) and then the changes will take effect.


While Folding@home doing CPU work is probably having the greatest effect, Reserving a CPU/Core thread for the Moowrapper (and any other GPU projects you do) will also be necessary to stop the CPU from being overcommitted and result in CPU time & Runtimes becoming almost identical.
Grant
Darwin NT
ID: 102166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 102167 - Posted: 3 Jul 2021, 23:15:12 UTC - in response to Message 102165.  

I don't know how FAH got ahold of my CPU, maybe after an update or something.
Anyway it's eliminated from CPU now.

I usually use just the GPU on Folding, and delete the CPU slot. But it takes more than one shot.
After the first reboot, it comes back. If I delete it two or three times, it usually gets the message and stays deleted.

But occasionally it comes back from the dead even after that; maybe due to an update. It does its own thing.
ID: 102167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 102289 - Posted: 28 Jul 2021, 8:51:06 UTC - in response to Message 102167.  
Last modified: 28 Jul 2021, 9:08:55 UTC

I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error.

pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m_1389982_5_1
<core_client_version>7.16.16</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_arm-android-linux-gnu -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3400220
Extracting in project directory: database_357d5d93529_n_methyl.zip
Using database: database_357d5d93529_n_methyl/minirosetta_database

ERROR: [ERROR] Unable to open constraints file: f9b5372889da3e02c19de86d067b31e6_0001.MSAcst
ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 457
 called boinc_finish(0)

</stderr_txt>
]]>


pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f_1389895_5_1
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1087567
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: [ERROR] Unable to open constraints file: f5d52c1749a40719598f1d8b37e13c45_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457
BOINC:: Error reading and gzipping output datafile: default.out
10:42:25 (29468): called boinc_finish(1)

</stderr_txt>
]]>


Funny to note that my Snapdragon 888 is beating my Ryzen 5 3600 and gets much more consistent credits per task. My phone is getting around 394 credits for every 8hr task it gets, My Ryzen 5 3600 gets a measly 346 credits per 8 hour task. Either something is up with the credit calculation, or there is something about these pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks that make them particularly good on ARM. I'm leaning towards the former explanation because: 1) the benchmarks for my phone are 5887.99 million ops/sec floating point speed and 29296.86 million ops/sec integer speed , whilst my 3600 gets 5198.63 million ops/sec floating point speed and 19515.05 million ops/sec integer speed. 2) My 3600 seems to be consistently generating more "decoys" than my phone despite the credit deficit. Probably a BOINC issue that's been beaten to death already.
ID: 102289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 102291 - Posted: 28 Jul 2021, 9:06:25 UTC - in response to Message 102289.  
Last modified: 28 Jul 2021, 9:06:40 UTC

I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error.
Those errors have been occurring with some pre_helical_bundles_ Tasks ever since they were released months ago.
Grant
Darwin NT
ID: 102291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 102292 - Posted: 28 Jul 2021, 9:09:41 UTC - in response to Message 102291.  

Understood, thanks!
ID: 102292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org