Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 276 · Next

AuthorMessage
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 101993 - Posted: 1 Jun 2021, 20:58:02 UTC - in response to Message 101990.  

I'm also having problems with Rosetta.

My computer just marked 45 tasks as Errors While Computing, all within a few seconds of starting.

Here is an example, one of the 45:
1389100052 1241048633 3551508 1 Jun 2021, 16:01:45 UTC 1 Jun 2021, 16:02:57 UTC Error while computing 3.54 0.00 --- Rosetta v4.20
windows_x86_64

Here's the computer configuration:
7.16.11 AuthenticAMD
AMD A6-6400K APU with Radeon(tm) HD Graphics [Family 21 Model 19 Stepping 1]
(2 processors) AMD AMD Radeon HD 7400/7500/8300/8400 series (Scrapper) (768MB) driver: 1.4.1848 OpenCL: 1.2 Microsoft Windows 7
Home Premium x64 Edition, (06.01.7600.00)

Any ideas?

S.Gaber


Yes one--you only have 8gb of memory in that machine and should only try running ONE Rosetta task at a time and NOTHING else at all on the cpu.
ID: 101993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 101996 - Posted: 2 Jun 2021, 6:09:07 UTC - in response to Message 101993.  

I'm also having problems with Rosetta.

My computer just marked 45 tasks as Errors While Computing, all within a few seconds of starting.

Here is an example, one of the 45:
1389100052 1241048633 3551508 1 Jun 2021, 16:01:45 UTC 1 Jun 2021, 16:02:57 UTC Error while computing 3.54 0.00 --- Rosetta v4.20
windows_x86_64

Here's the computer configuration:
7.16.11 AuthenticAMD
AMD A6-6400K APU with Radeon(tm) HD Graphics [Family 21 Model 19 Stepping 1]
(2 processors) AMD AMD Radeon HD 7400/7500/8300/8400 series (Scrapper) (768MB) driver: 1.4.1848 OpenCL: 1.2 Microsoft Windows 7
Home Premium x64 Edition, (06.01.7600.00)

Any ideas?

S.Gaber


Yes one--you only have 8gb of memory in that machine and should only try running ONE Rosetta task at a time and NOTHING else at all on the cpu.
Not true. As long as you have 1.3GB of RAM per core/thread you won't run in to lack of memory problems. 8GB of RAM on a 2 core system (even with onboard graphics) is plenty even with the largest RAM requirement Tasks.

That system is showing signs of a major hardware or driver issue- hardware being most likely (unfortunately the error messages aren't of much help).
I'd check the temperature of the CPU, make sure the power supply rails are all OK, etc. Give Memtest86 a run if the PSU & CPU temperatures are OK.
If they check out OK, it could be the result of corruption of some of the Rosetta files- Resetting the project will dump all existing work, delete all the executable & database files & re-download them from scratch. Even if that does fix it, the question is still "Why did they become corrupt?"
Grant
Darwin NT
ID: 101996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102020 - Posted: 5 Jun 2021, 21:10:03 UTC
Last modified: 5 Jun 2021, 21:16:20 UTC

Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling.
On my systems the amount of Credit per task has gone from around 350 to barely above 300 (there's the odd one giving 400, but more odd ones giving only 170 or so), yet the number of of Invalids and Errors has dropped away considerably.
Grant
Darwin NT
ID: 102020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 102021 - Posted: 6 Jun 2021, 1:16:43 UTC - in response to Message 102020.  

Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling.
On my systems the amount of Credit per task has gone from around 350 to barely above 300 (there's the odd one giving 400, but more odd ones giving only 170 or so), yet the number of of Invalids and Errors has dropped away considerably.


Could be the new tasks are shorter and therefore give fewer credits
ID: 102021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102022 - Posted: 6 Jun 2021, 2:03:07 UTC - in response to Message 102021.  
Last modified: 6 Jun 2021, 2:07:30 UTC

Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling.
On my systems the amount of Credit per task has gone from around 350 to barely above 300 (there's the odd one giving 400, but more odd ones giving only 170 or so), yet the number of of Invalids and Errors has dropped away considerably.


Could be the new tasks are shorter and therefore give fewer credits
Nope.
The Credit is based on the amount of work they do, during the time they run. The work being done & Runtime is unchanged, and many of the Tasks over the last couple of weeks are of types that have been released before, and were paying more Credit.

A good example are the pre_helical_bundles_ we've been getting for months now. They're the ones that were paying around 350, now it's barely 300. The odd one will give 400, but there are many more odd ones paying only 170 or so.
Just another example of the weird Credit mechanism behaviour of BOINC.
Grant
Darwin NT
ID: 102022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 102025 - Posted: 6 Jun 2021, 20:33:27 UTC - in response to Message 102022.  

Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling.
On my systems the amount of Credit per task has gone from around 350 to barely above 300 (there's the odd one giving 400, but more odd ones giving only 170 or so), yet the number of of Invalids and Errors has dropped away considerably.


Could be the new tasks are shorter and therefore give fewer credits
Nope.
The Credit is based on the amount of work they do, during the time they run. The work being done & Runtime is unchanged, and many of the Tasks over the last couple of weeks are of types that have been released before, and were paying more Credit.

A good example are the pre_helical_bundles_ we've been getting for months now. They're the ones that were paying around 350, now it's barely 300. The odd one will give 400, but there are many more odd ones paying only 170 or so.
Just another example of the weird Credit mechanism behaviour of BOINC.


Is this the so called "credit new"?
ID: 102025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102026 - Posted: 7 Jun 2021, 5:46:49 UTC - in response to Message 102025.  

Is this the so called "credit new"?
That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well.
Grant
Darwin NT
ID: 102026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 102028 - Posted: 8 Jun 2021, 3:08:40 UTC - in response to Message 102026.  

Is this the so called "credit new"?
That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well.


I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.
ID: 102028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102029 - Posted: 8 Jun 2021, 7:13:47 UTC - in response to Message 102028.  

I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.
?
If the Credit system worked as intended, regardless of the Task and regardless of the project, and regardless of how long it takes to process a Task, a given machine would get the same amount of Credit per hour for processing work.
That way the only reason for choosing a particular project over another would be the project itself. A more powerful system will get more Credit because it's doing more work. If a Task of a given type takes 5min or takes 5 weeks, the amount of Credit awarded per hour should be the same.
Of course if there is an application that is more efficient then that will result in more credit due to more work being done within a given time frame- but the amount of work actually being done is the same, so the Credit for that Task should remain unchanged. You just get the benefit of doing more work per hour, so you get a higher RAC thanks to the more optimised application.

Unfortunately by it's very design Credit New varies the amount of Credit a Task will get for all sorts of reasons. And then you get projects such as Collatz that completely ignore the definition of a Cobblestone and award ridiculously, excessively inflated amounts for a Task.



As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.
Grant
Darwin NT
ID: 102029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 102034 - Posted: 8 Jun 2021, 22:38:03 UTC - in response to Message 102029.  

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!
ID: 102034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102040 - Posted: 9 Jun 2021, 8:23:31 UTC - in response to Message 102034.  
Last modified: 9 Jun 2021, 8:27:31 UTC

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!
I spoke too soon.
It's gone back to falling again.
Grant
Darwin NT
ID: 102040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 102047 - Posted: 9 Jun 2021, 23:16:43 UTC - in response to Message 102040.  

As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling.


And THAT is a very good thing, next step climbing again!!!


I spoke too soon.
It's gone back to falling again.


UGH!!!
ID: 102047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102050 - Posted: 10 Jun 2021, 8:44:17 UTC

And to add to the continually falling RAC, a new batch of work has a roughly 50% failure rate.
11mers_FF2__cyclo_11mer_ are culprits.


Runs for 30sec or less and then dies-
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @11mers_FF2__cyclo_11mer_LVStub2818_000054_extract_B.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2375971
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 06/10/21 12:57:52
Install Directory : C:Program FilesBOINC
Data Directory    : C:ProgramDataBOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots4;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore


ModLoad: 000000006ded0000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded)
    Linked PDB Filename   : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb

ModLoad: 00000000d9bb0000 00000000001f5000 C:WINDOWSSYSTEM32ntdll.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d8010000 00000000000bd000 C:WINDOWSSystem32KERNEL32.DLL (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d76f0000 00000000002c8000 C:WINDOWSSystem32KERNELBASE.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 10.0.19041.804 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.804

ModLoad: 00000000d81c0000 000000000006b000 C:WINDOWSSystem32WS2_32.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : ws2_32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d83e0000 000000000012b000 C:WINDOWSSystem32RPCRT4.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d94b0000 00000000001a0000 C:WINDOWSSystem32USER32.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 10.0.19038.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19038.1

ModLoad: 00000000d7500000 0000000000022000 C:WINDOWSSystem32win32u.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : win32u.pdb
    File Version          : 10.0.19041.906 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.906

ModLoad: 00000000d8640000 000000000002a000 C:WINDOWSSystem32GDI32.dll (6.2.19041.746) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 10.0.19041.746 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.746

ModLoad: 00000000d72f0000 000000000010b000 C:WINDOWSSystem32gdi32full.dll (6.2.19041.928) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32full.pdb
    File Version          : 10.0.19041.928 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.928

ModLoad: 00000000d7b70000 000000000009d000 C:WINDOWSSystem32msvcp_win.dll (6.2.19041.789) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.19041.789 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.789

ModLoad: 00000000d7400000 0000000000100000 C:WINDOWSSystem32ucrtbase.dll (6.2.19041.789) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.19041.789 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.789

ModLoad: 00000000d8bd0000 00000000000ac000 C:WINDOWSSystem32ADVAPI32.dll (6.2.19041.610) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d8c80000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.19041.546

ModLoad: 00000000d8290000 000000000009b000 C:WINDOWSSystem32sechost.dll (6.2.19041.906) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d8670000 0000000000030000 C:WINDOWSSystem32IMM32.DLL (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d5250000 0000000000012000 C:WINDOWSSYSTEM32kernel.appcore.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d6030000 0000000000033000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntmarta.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d1530000 00000000001e4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.19041.867) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.19041.867 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.867

ModLoad: 00000000d2270000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.19041.546) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.19041.546 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.546

ModLoad: 00000000d7590000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.19041.662) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcryptprimitives.pdb
    File Version          : 10.0.19041.662 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.662



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 5698, Write: 646, Other 13797

- I/O Transfers Counters -
Read: 17336636, Write: 14175, Other 6664

- Paged Pool Usage -
QuotaPagedPoolUsage: 317096, QuotaPeakPagedPoolUsage: 317376
QuotaNonPagedPoolUsage: 7200, QuotaPeakNonPagedPoolUsage: 7352

- Virtual Memory Usage -
VirtualSize: 83505152, PeakVirtualSize: 895533056

- Pagefile Usage -
PagefileUsage: 83505152, PeakPagefileUsage: 83505152

- Working Set Size -
WorkingSetSize: 119312384, PeakWorkingSetSize: 119316480, PageFaultCount: 29541

*** Dump of thread ID 7328 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 

- Registers -
rax=000000000000003a rbx=0000000002c36ae0 rcx=0000000003642ac0 rdx=0000000003722bf8 rsi=000000000000000b rdi=0000000003642ac0
r8=000000000000003a r9=0000000000000421 r10=0000000071a76e80 r11=0000000040745240 r12=000000006ded0000 r13=000000004075f960
r14=0000000040745980 r15=000000000048b215 rip=0000000071b2d578 rsp=00000000407452b8 rbp=0000000000000000
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

- Callstack -
ChildEBP RetAddr  Args to Child
407452b0 6e3a831c 00000000 71a76d60 71a76e80 40745298 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 
407452e0 6e36935d 02c36ae0 40745380 6e35b215 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
40745310 714d7f10 723c0150 4075f960 00000000 00000001 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
40745340 6e3539e8 7346a32c 6ded0000 40745430 d9be0e7b rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 
407453b0 d9c5207f 00000000 40745930 40745ff0 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
407453e0 d9c01454 00000000 40745930 40745ff0 00000000 ntdll!__chkstk+0x0 
40745af0 d9c50bae 02c20000 40745bc9 71b4a450 d9bdb3c7 ntdll!RtlRaiseException+0x0 
40746280 6e613e2b fffffffe 06eb4dc8 ffffffff 6e6218c5 ntdll!KiUserExceptionDispatcher+0x0 
407462d0 6e623690 71b4a3a0 06eb4b20 71b4a3a0 407463c9 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40746400 6e739ee8 06923ea8 07290720 06eb4b20 07290720 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40746fb0 6e6d4b6c 07445680 d9bdb3c7 08b80000 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407471b0 6e6d488e 40747298 00000000 40747480 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747310 6e633da1 40747488 00000000 02c34920 40747550 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407476d0 6e639f08 40747a20 40747a20 40747a20 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747d20 6e6384db 03310330 40747d80 031d9b90 031d9b90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747e80 6e5a1fb7 00000000 40747f90 031d9b90 40748190 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40747ff0 6e5a57a6 00000005 6e345190 031f9b40 031f9b40 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748060 6e5a56cc 40748368 407481d9 40748368 031d9b90 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748110 6e66b6f5 40748368 40748641 00000000 6e3675e8 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 
40748230 6e66a592 00000005 40748368 40748540 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748300 6e66ad06 00000000 00000000 40748c20 08b80000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
407484a0 6eac71a3 40748540 40748c20 ffffff01 6e353e73 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748790 6eac9d09 00000000 00000001 407488a0 40748c20 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748b20 6eac2f8a 40748b60 40748c20 06730c40 0346fb90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40748b80 6ecdcc70 40748c20 40749348 031f9b40 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749310 6ecdc6e4 0716e7d0 07289100 73345cc0 6e3475a6 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749370 6ece603e 40749460 0716e500 40749480 40749bd0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749af0 6ece56d4 2305e878 2305eb68 732b7f70 6ed06cb4 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749b80 6ece578e 00000005 4074a128 0346fb90 00000001 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
40749d20 6e35081d 034ff820 034ff820 0346fb90 02c37e01 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 
4075f950 6e35b215 00000000 00000000 7327ccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
4075f990 d8027034 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 
4075f9c0 d9c02651 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
4075fa40 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 

*** Dump of thread ID 32761 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 2428590080.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 

*** Dump of thread ID 30891432 (state: Unknown): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474836480.000000, Wait Time: 0.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

- Callstack -
ChildEBP RetAddr  Args to Child
(-nosymbols- PC == 0)
00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Grant
Darwin NT
ID: 102050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102075 - Posted: 15 Jun 2021, 6:16:13 UTC

Looks like we've got a server issue.

Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none.
As a result, Work in progress is taking a dive.
Around 21:30 Monday server time was when things fell over.


Grant
Darwin NT
ID: 102075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 102078 - Posted: 15 Jun 2021, 17:11:11 UTC - in response to Message 102075.  

Looks like we've got a server issue.

Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none.
As a result, Work in progress is taking a dive.
Around 21:30 Monday server time was when things fell over.


Correct... :(

The work unit generator was being worked on yesterday to add support for VM applications and it was crashing.

The good news is it was fixed a couple of hours ago - tasks coming down here
ID: 102078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102080 - Posted: 16 Jun 2021, 5:32:26 UTC

It lives!
Grant
Darwin NT
ID: 102080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102102 - Posted: 21 Jun 2021, 5:05:23 UTC

Not sure what's going on, but the amount of work In progress ahs been falling away slowly but surely for 2 days now.


Grant
Darwin NT
ID: 102102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 102104 - Posted: 21 Jun 2021, 9:21:44 UTC - in response to Message 102102.  

Not sure what's going on, but the amount of work In progress has been falling away slowly but surely for 2 days now.

Not sure either. The mix of tasks doesn't imply anything different recently.
I put in that request to reduce the disk-demand for pre_helical_bundles tasks, but it hasn't happened.
That might've improved things, but by not changing it wouldn't result in any change to In Progress tasks you're seeing.
No idea
ID: 102104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 102160 - Posted: 3 Jul 2021, 7:54:14 UTC

pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_1mq3ho7q_1391020_4
https://boinc.bakerlab.org/rosetta/result.php?resultid=1400578710

I had to abort this task after running for nearly 2 days elapsed time.
I stalled out around 80%. No movement in the graphics, no increase/decrease in the CPU usage rate, no counting up on completion, nothing.

I shut my system down at night after running a huge electric bill last year running 24/7, but my shut down sequence is to suspend the tasks, shut down the client and then exit.
The leave non GPU tasks in memory while suspended is checked.

I have no idea what could cause it to stall.
I had this problem with another pre helical task in the past.

Any ideas how to prevent this or what is causing it?
How long does this kind of task take to complete?
I run 16-17 hours a day, change tasks is set at 6 hours.
oh...and cpu run time was 7 hours in 1.5 days.
ID: 102160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,603,537
RAC: 15,265
Message 102161 - Posted: 3 Jul 2021, 9:31:28 UTC - in response to Message 102160.  
Last modified: 3 Jul 2021, 10:00:46 UTC

Any ideas how to prevent this or what is causing it?
I suspect that things aren't actually stalling as such, it's just that your system so ridiculously overcommitted trying to do work that some Tasks take forever to complete so it looks like they have stalled. Even those that do complete take way more time than they should.
eg This Task of yours.
rb_06_04_79633_77510_ab_t000__h002_robetta_IGNORE_THE_REST_04_10_1395231_248_0
Run time 22 hours 21 min 14 sec
CPU time  7 hours 59 min 47 sec

And one of your Moo wrapper CPU Tasks is even worse
dnetc_r72_1624557342_9_9_0

Run time 7 hours 2 min  3 sec
CPU time        23 min 18 sec

22 hours to do 8 hours of work is bad enough, but 7 hours to do 23 minutes of work is beyond ridiculous.

Compared to one of my Rosetta Tasks.
pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_3rl4pd6v_1391037_4_1
Run time 7 hours 59 min 37 sec
CPU time 7 hours 57 min  5 sec

Check Task Manager or use Process Explorer to see what else is running on the system apart from BOINC projects. If there's nothing else and it's just BOINC projects, then you need to reserve a CPU core to support each GPU Task that is running.
Running multiple GPU tasks at the same time as trying to process CPU work on CPU cores that are trying to support the GPU work is the most likely cause of all your issues.

Reserve a CPU core for each GPU Task being run for each project doing GPU work, make sure that there are no other processes sucking up CPU time and your system output for all BOINC projects will improve hugely with all the CPU Tasks that no longer take 7 hours to do 23 minutes of work!


Ah! I though this sounded familiar & now i remember we have had this conversation before, and back then you mentioned you run Folding at home as well.
I told you then what needed to be done, it appears you haven't done it, hence you are still having the same issues.
Grant
Darwin NT
ID: 102161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 276 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org