Computation errors

Message boards : Number crunching : Computation errors

To post messages, you must log in.

AuthorMessage
Profile David703

Send message
Joined: 17 Jul 17
Posts: 5
Credit: 38,314
RAC: 0
Message 90595 - Posted: 30 Mar 2019, 18:06:19 UTC

Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples:
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314770
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314768
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

How can I keep these errors from happening?
ID: 90595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 250
Credit: 8,037,564
RAC: 6
Message 90599 - Posted: 31 Mar 2019, 16:29:03 UTC - in response to Message 90595.  

Hi, since I've come back to this project I've been seeing some strange errors in some of my WUs, especially in the ones that study big proteins, here are a few examples:
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314770
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065314768
-https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

How can I keep these errors from happening?


Rosetta developers were quite sloppy in their allocation and use of memory.

Task 1065460662 ran out of memory.
https://boinc.bakerlab.org/rosetta/result.php?resultid=1065460662

The other two error out with "Funzione non corretta" or "incorrect function"

When one WU runs out of memory, other WU may get strange error messages from function calls as developers don't always check the return results of all system calls.

The WU you are running are 64-bit and sometimes take large amounts of memory ... frequently over a GB each.

8gb should be enough to run 4 Rosetta 64-bit WU, so I would examine how memory is being used and change the workload.
Buy more memory if practical.
Lower the number of Rosetta WU running simultaneously with app_config.xml or BOINC -> OPTIONS -> COMPUTING PREFERENCES -> USAGE LIMITS
ID: 90599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David703

Send message
Joined: 17 Jul 17
Posts: 5
Credit: 38,314
RAC: 0
Message 90600 - Posted: 31 Mar 2019, 19:25:31 UTC - in response to Message 90599.  

Ok, thank you!
ID: 90600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 187
Credit: 10,662,577
RAC: 7,100
Message 90934 - Posted: 24 Jul 2019, 9:15:55 UTC
Last modified: 24 Jul 2019, 9:16:45 UTC

Seems unlikely they've ever addressed this problem, eh? I see them pretty often. Especially annoying when they have run up 8 hours of effort before crashing, presumably with no points earned. And no, at this point I don't care enough to do the searching to try to figure out if the points were granted. I don't even care enough to read the rest of the thread beyond the Subject: and glancing at a couple of the posts.

Latest example:

Application
Rosetta Mini 3.78
Name
start_close_HHH_rd4_0056.min_rise1.83_whole_pass_aagb.bp_20190406150644_0001_0001_0001_0003_0001_0001_fragments_fold_SAVE_ALL_OUT_833066_1053
State
Computation error
Received
2019年07月22日 08時13分16秒
Report deadline
2019年07月30日 08時13分11秒
Estimated computation size
80,000 GFLOPs
CPU time
07:49:11
Elapsed time
07:59:03
Executable
minirosetta_3.78_x86_64-pc-linux-gnu
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 90934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 883
Credit: 3,249,973
RAC: 3,484
Message 90937 - Posted: 24 Jul 2019, 13:58:09 UTC - in response to Message 90934.  

Latest example:

Application
Rosetta Mini 3.78
Name
start_close_HHH_rd4_0056.min_rise1.83_whole_pass_aagb.bp_20190406150644_0001_0001_0001_0003_0001_0001_fragments_fold_SAVE_ALL_OUT_833066_1053
State
Computation error


Rosetta Mini 3.78 was release in October 2017.
Since then, a lot of errors and problems.
No debug, no new version. Nothing
ID: 90937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 40
Credit: 4,962,036
RAC: 18,038
Message 90943 - Posted: 26 Jul 2019, 0:25:06 UTC

I'd rather have the Rosetta mini tasks vs the Rosetta version that runs for 5h then has an error when the set run time is 1hr.
ID: 90943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Computation errors



©2019 University of Washington
http://www.bakerlab.org