Computation Errors on 7.14.4

Questions and Answers : Macintosh : Computation Errors on 7.14.4

To post messages, you must log in.

AuthorMessage
John Masone

Send message
Joined: 22 Mar 09
Posts: 1
Credit: 2,368,274
RAC: 4,952
Message 93186 - Posted: 3 Apr 2020, 8:49:01 UTC

I updated my client to 7.14.4 the other day, from 7.14.3. Since doing so, I've been getting tons of computation errors on Rosetta work units. Is this just a coincidence or is there some bug/incompatibility with this version?
ID: 93186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 750
Credit: 5,131,455
RAC: 21,419
Message 93195 - Posted: 3 Apr 2020, 9:27:37 UTC - in response to Message 93186.  

I updated my client to 7.14.4 the other day, from 7.14.3. Since doing so, I've been getting tons of computation errors on Rosetta work units. Is this just a coincidence or is there some bug/incompatibility with this version?

There are new applications out, just released, which don't appear to be working on Macs.
Grant
Darwin NT
ID: 93195 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom Coradeschi

Send message
Joined: 11 Mar 20
Posts: 10
Credit: 105,271
RAC: 0
Message 93213 - Posted: 3 Apr 2020, 11:47:29 UTC - in response to Message 93195.  

I am seeing them on an older iMac, with both 7.14.3 & 7.14.4, seems to be the new Rosetta client which is the problem. It's running fine on a newer machine. Not sure if it's hardware or MacOS variant related.

There's a thread at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=12554&postid=93114 as well.
ID: 93213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Cetrulo

Send message
Joined: 12 Mar 20
Posts: 1
Credit: 206,303
RAC: 0
Message 93227 - Posted: 3 Apr 2020, 14:13:39 UTC

I've been having these errors recently, it started before upgrading I was at v7.14.2 and now I'm at v7.14.4 but it hasn't improved all my tasks report computational error.
ID: 93227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
andrzej

Send message
Joined: 13 Mar 20
Posts: 4
Credit: 21,560
RAC: 0
Message 93247 - Posted: 3 Apr 2020, 16:56:30 UTC
Last modified: 3 Apr 2020, 16:57:19 UTC

same here on older iMac with c2duo cpu
getting computation error on every task since 7.14.3 and on 7.14.4 after update today


Application
Rosetta 4.12 
Name
hugh2020_HHH_rd4_0628_E18W_fragments_abinitio_SAVE_ALL_OUT_905068_626
State
Computation error
Received
Friday, 03 April 2020 at 18:55:40
Report deadline
Monday, 06 April 2020 at 18:55:40
Estimated computation size
80,000 GFLOPs
CPU time
---
Elapsed time
00:00:02
Executable
rosetta_4.12_x86_64-apple-darwin
ID: 93247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4016
Credit: 0
RAC: 0
Message 93517 - Posted: 5 Apr 2020, 17:07:37 UTC

Project Admin requests Mac users attach to Ralph (the beta project for Rosetta@home improvements) to test a program update to address the problems seen in v4.12.
Rosetta Moderator: Mod.Sense
ID: 93517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93572 - Posted: 5 Apr 2020, 23:16:44 UTC

Hi,
Running on MacBook Pro 2012 with Intel Core i5
Gets "Process got error 4" on every Rosetta@home WU but not on any Einstein@home WU.

I have 2 other computers (Linux) running, no problems.

Just tried updating client to 7.16.6, no change whatsoever.
Same errors on 7.14.4

Any suggestions?

(I'll join ralph@home)

Best regards,

Axel
ID: 93572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 750
Credit: 5,131,455
RAC: 21,419
Message 93579 - Posted: 5 Apr 2020, 23:54:22 UTC - in response to Message 93572.  

Just tried updating client to 7.16.6, no change whatsoever.
Same errors on 7.14.4
Those are BOINC Manager versions, not the client/Application.
The Applications download automatically whenever a new one is released.


Any suggestions?

(I'll join ralph@home)
Yep, that would have been the suggestion- help try out the new candidate.
Grant
Darwin NT
ID: 93579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom Coradeschi

Send message
Joined: 11 Mar 20
Posts: 10
Credit: 105,271
RAC: 0
Message 93599 - Posted: 6 Apr 2020, 4:42:15 UTC - in response to Message 93517.  

Done. Waiting for WUs.
ID: 93599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 750
Credit: 5,131,455
RAC: 21,419
Message 93600 - Posted: 6 Apr 2020, 4:49:02 UTC - in response to Message 93599.  

Done. Waiting for WUs.
Maybe too late.
Here at Rosetta the Application version for Macs is now 4.15. Looks like they've released an updated version, and i think in some other thread someone's posted that it's looking good so far.
Grant
Darwin NT
ID: 93600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93605 - Posted: 6 Apr 2020, 6:43:01 UTC - in response to Message 93579.  

Thanks! I thought the client was following the manager in the OS X version as the "Connected to.." changes when I change manager version...

Will see what happens, so far no tasks at ralph@home.

Best regards,

Axel
ID: 93605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93606 - Posted: 6 Apr 2020, 6:52:08 UTC

Well, this is quite mysterious...

I just downgraded Manager to 7.14.2, can now run WU with application v 4.15 and NO problems at all.

It seems as the new Mac manager (7.14.4) and the beta 7.16.6 are not working perfectly with some mac's

Best regards,

Axel
ID: 93606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4016
Credit: 0
RAC: 0
Message 93632 - Posted: 6 Apr 2020, 13:17:07 UTC - in response to Message 93606.  

Actually, doesn't this indicate that Rosetta v4.15 is working much better, and that the BOINC version doesn't seem to matter?
Rosetta Moderator: Mod.Sense
ID: 93632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93672 - Posted: 6 Apr 2020, 19:49:33 UTC - in response to Message 93632.  

Hi,
I could very well be the reason! I've done a bit more analyzing and I've just updated to 7.14.4.
So far no failures.
Last time it failed after less than 1 second of calculations.

Also I had errors at abt. 30% in two of the last runs (with manager 7.14.2), I count them as WU fails...

I guess it's now working as expected, will report if any unexplainable errors :-D

Best regards,

Axel
ID: 93672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 750
Credit: 5,131,455
RAC: 21,419
Message 93695 - Posted: 6 Apr 2020, 23:05:39 UTC - in response to Message 93672.  
Last modified: 6 Apr 2020, 23:12:03 UTC

Hi,
I could very well be the reason! I've done a bit more analyzing and I've just updated to 7.14.4.
So far no failures.
Last time it failed after less than 1 second of calculations.

Also I had errors at abt. 30% in two of the last runs (with manager 7.14.2), I count them as WU fails...
All the manager does it allow work & new applications to download & return results and juggle scheduling between projects.
It doesn't process any work. That's what the Applications do- You need to check what application is being run.
It was the newly released application that was causing problems, and the even newer application that fixed them, not the Manager.


Having said that, the Manager is responsible for any "finish file present too long" errors as it is clobbering the Application before it's finished tyding things up.
And it is sometimes an indication of a system that is struggling, which appears to be the case with yours- a 30min difference in Run time v CPU time is an indication of that- or you have "Use at most xx% of CPU time" set to less than 100%.
Grant
Darwin NT
ID: 93695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93731 - Posted: 7 Apr 2020, 12:41:27 UTC - in response to Message 93695.  

"finish file present too long" errors as it is clobbering the Application before it's finished tyding things up.


It was not finish, it failed at approx. 30% finish, two workunits only and no errors since then. I don't think the system is ressource-starved in any way, 16GB RAM and no specific tasks running 99% of time - It's a laptop parked at the charger - When I use it seriously it's almost always at battery - And then BOINC is disabled...

On the other hand you are right, I have set a max of 90% CPU use, all cores.

Are you saying it's problematic running less than 100% on each core? As far as I see, it uses 100% but in "chunks" relative to %-setting.

I'm doing alike settings at my workstation and server, have had no errors so far (Linux clients).

Best regards,

Axel
ID: 93731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 750
Credit: 5,131,455
RAC: 21,419
Message 93799 - Posted: 8 Apr 2020, 1:09:36 UTC - in response to Message 93731.  
Last modified: 8 Apr 2020, 1:12:50 UTC

"finish file present too long" errors as it is clobbering the Application before it's finished tyding things up.
It was not finish, it failed at approx. 30% finish, two workunits only and no errors since then.
I was quoting the error reported in the result for that particular Task, the rest were due to the application problem.


Are you saying it's problematic running less than 100% on each core? As far as I see, it uses 100% but in "chunks" relative to %-setting.
It;'s not so good for the CPU. Heating, cooling, heating, cooling, heating, cooling etc. Expanding, contracting, expanding, contracting, expanding, contracting etc.
When possible it's best to reduce thermal stress.

Generally to keep heat down if your system's cooling isn't up to it, it's better to use less threads, but then keep the remaining ones operating at 100%
So
Use at most  95% of the CPUs
Use at most 100% of CPU time
would be a better option.
And with multi core/thread CPUs with 4 or more threads you will often produce more work, even though you have lost the output of 1 thread as your Run time & CPU time would be pretty much equal (Run time is the time it takes to complete the Task, CPU time is the time the CPU actually spends processing it. Ideally they will be equal. On a lightly used system there will be a few minute difference. On a heavily used system (or one with a low "Use at most xx% of CPU time" setting ) the difference could be double (or more).
The smaller the difference, the better. The larger the difference, then the greater the improvement in output from limiting thread usage & allowing 100% use of the remaining threads.


I'm doing alike settings at my workstation and server, have had no errors so far (Linux clients).
I'm using all cores and threads 100% of the time on my desktop systems, with no problems since i increased RAM & HDD space available to Rosetta.


You can also make use of the multiple Location settings in your Account.
Such as use School for settings for the Laptop, then you could use Work for settings for desktop systems with better cooling.
Grant
Darwin NT
ID: 93799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Axel Nielsen

Send message
Joined: 19 Mar 20
Posts: 6
Credit: 179,774
RAC: 0
Message 93823 - Posted: 8 Apr 2020, 6:44:32 UTC - in response to Message 93799.  

Hi Grant,

Thank you - I get the point abt. thermal stresses.
I've increased to 100/100% as cooling is not a problem. Seems OK, even at the server, other tasks run as usual without lag, with plenty of RAM available.

I tend to use local configurations as I sometimes manually release ie. 80% of the cores for some dedicated task instead of those tasks just stopping BOINC completely. As it's only for shorter durations I haven't had scheduler problems (yet).

Best regards,

Axel
ID: 93823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Macintosh : Computation Errors on 7.14.4



©2020 University of Washington
https://www.bakerlab.org