Rosetta 4.0+

Message boards : Number crunching : Rosetta 4.0+

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 19 · Next

AuthorMessage
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 92081 - Posted: 19 Mar 2020, 5:24:12 UTC

Application version: Rosetta v4.07 windows_x86_64
Device: 3710630, Task: 1129189352, and WU: 1017085044.
Name: 6mm7mv4g_3h3_design_COVID-19_SAVE_ALL_OUT_902608_1
Status: Error while computing
Exit status: 1 (0x00000001) Unknown error code
Incorrect function. (0x1) - exit code 1 (0x1)

Could this type of error be caused or contributed to by insufficient host memory (RAM)?
ID: 92081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 92152 - Posted: 23 Mar 2020, 8:52:43 UTC
Last modified: 23 Mar 2020, 8:56:40 UTC

Application version: Rosetta v4.07 windows_intelx86
Device: 1759960, Task: 1130640917, and WU: 1018413849.
Name: 9eq5wp3x_3h3_design3_COVID-19_SAVE_ALL_OUT_902888_1_0
Status: Completed and validated
Exit status: 0 (0x00000000)

Though the task was valid, it did end prematurely because of the following errors:
ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG: The reference pose must be the same size as the working pose
ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323

Good to see got credit for what was done, however. Better than throwing all the crunching out and starting from the beginning.
Maybe this particular task type/code will need to be reviewed if this type of error continues.
ID: 92152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 92252 - Posted: 25 Mar 2020, 5:47:31 UTC - in response to Message 92152.  

Application version: Rosetta v4.07 windows_intelx86
Device: 1759960, Task: 1130640917, and WU: 1018413849.
Name: 9eq5wp3x_3h3_design3_COVID-19_SAVE_ALL_OUT_902888_1_0
Status: Completed and validated
Exit status: 0 (0x00000000)

Though the task was valid, it did end prematurely because of the following errors:
ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG: The reference pose must be the same size as the working pose
ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323

Good to see got credit for what was done, however. Better than throwing all the crunching out and starting from the beginning.
Maybe this particular task type/code will need to be reviewed if this type of error continues.

Name: 9ly9pu7b_3h3_design3_COVID-19_SAVE_ALL_OUT_902893_1_0
Same error as above, except host 3710630.
Task: 1130645238 and WU: 1018417803.
ID: 92252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsNeutrino

Send message
Joined: 22 Mar 20
Posts: 10
Credit: 3,729,289
RAC: 6,293
Message 92254 - Posted: 25 Mar 2020, 6:07:05 UTC - in response to Message 92252.  
Last modified: 25 Mar 2020, 6:34:47 UTC

ID: 92254 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
GLadi

Send message
Joined: 21 Jan 07
Posts: 3
Credit: 303,172
RAC: 0
Message 92286 - Posted: 25 Mar 2020, 17:46:47 UTC

Is there a safe way to pause WUs and resume them later? I'm asking because I'm getting the same error:
ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG:the reference pose must be the same size as the working pose
ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323

This happens after restarting the system. Some WUs end with errors (no points granted), some WUs end at percentage they were before restarting (points partially granted) and some WUs continue to process as it should be.
ID: 92286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
torma99

Send message
Joined: 16 Feb 20
Posts: 14
Credit: 288,937
RAC: 0
Message 92288 - Posted: 25 Mar 2020, 18:16:45 UTC - in response to Message 88533.  

I just found Rosetta 4.07 used 2,111,242,240 bytes (1.97 GIGAbytes) before my system crashed (i7-4770K, 8GB). This seems to be just a bit more than expected, so please take a look and fix the problem.

I run SETI, EINSTEIN, and LHC in addition to Rosetta, so Rosetta can't have the whole machine!


For my rig (16 GB of ram) running on 4 cores. It consumes almost the same. 1,9-2,2 GB and does not causes problems, 8 gig can be somewhat small, if you use your browser with some open tabs next to Rosetta.
ID: 92288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsNeutrino

Send message
Joined: 22 Mar 20
Posts: 10
Credit: 3,729,289
RAC: 6,293
Message 92290 - Posted: 25 Mar 2020, 19:40:18 UTC - in response to Message 92288.  
Last modified: 25 Mar 2020, 19:45:49 UTC

For my rig (16 GB of ram) running on 4 cores. It consumes almost the same. 1,9-2,2 GB and does not causes problems, 8 gig can be somewhat small, if you use your browser with some open tabs next to Rosetta.

In my case BOINC is configured so that it can use 80% of 32GB RAM at all times, running with 14 rosetta threads on a Ryzen 1700 with 8 cores and 16 CPU threads available. 15 GB RAM has been sitting empty when the errors occured. Changed to 8 rosetta threads for now...
ID: 92290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peti

Send message
Joined: 17 Mar 20
Posts: 5
Credit: 142,053
RAC: 0
Message 92308 - Posted: 26 Mar 2020, 2:47:04 UTC
Last modified: 26 Mar 2020, 2:50:21 UTC

Hi everyone,
I am sometimes seeing very similar error messages.

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: src/core/pose/symmetry/util.cc line: 884

and

ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG:the reference pose must be the same size as the working pose
ERROR:: Exit from: src/protocols/protein_interface_design/filters/RmsdFilter.cc line: 323

and https://boinc.bakerlab.org/rosetta/result.php?resultid=1131649536
_64-pc-linux-gnu': free(): invalid pointer: 0x00000000080e29a8 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000080e29a8 ***

and https://boinc.bakerlab.org/rosetta/result.php?resultid=1131646232
_64-pc-linux-gnu': double free or corruption (!prev): 0x00000000060b5a60 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': double free or corruption (!prev): 0x00000000060b5a60 ***

(maybe these last two were due to bad cpu overclock? or maybe not?)

I just found this thread, after I already posted my problems here
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13629&postid=92306#92306

The tasks seems "Completed and validated" on the webpage. Why is that, if there is error?
Whom should I tell that my PC might have made mistakes that are unnoticed?
I don't want to mix bad results into good data...
ID: 92308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92316 - Posted: 26 Mar 2020, 4:20:12 UTC - in response to Message 92286.  

Is there a safe way to pause WUs and resume them later? I'm asking because I'm getting the same error:
ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG:the reference pose must be the same size as the working pose
ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323

This happens after restarting the system. Some WUs end with errors (no points granted), some WUs end at percentage they were before restarting (points partially granted) and some WUs continue to process as it should be.


The BOINC Manager should be able to take care of it.

One approach to preserve everything, is just to sleep the machine. If you were wanting to use the machine, the BOINC Manager does have an option to pause. I'd suggest you pause the R@h project, otherwise it fires up the next work unit in the line when you suspect the one that is running.
Rosetta Moderator: Mod.Sense
ID: 92316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92317 - Posted: 26 Mar 2020, 4:21:55 UTC - in response to Message 92308.  

Whom should I tell that my PC might have made mistakes that are unnoticed?
I don't want to mix bad results into good data...


No need to worry about it. The ProjectTeam can identify any bad results.
Rosetta Moderator: Mod.Sense
ID: 92317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peti

Send message
Joined: 17 Mar 20
Posts: 5
Credit: 142,053
RAC: 0
Message 92319 - Posted: 26 Mar 2020, 7:53:46 UTC - in response to Message 92317.  

No need to worry about it. The ProjectTeam can identify any bad results.

Thank you, then I don't worry.
ID: 92319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kipni

Send message
Joined: 24 Mar 20
Posts: 5
Credit: 323,369
RAC: 0
Message 92320 - Posted: 26 Mar 2020, 8:49:49 UTC
Last modified: 26 Mar 2020, 8:50:50 UTC

Hello,

i've started a few days ago with R@H and i seem to have a lot of computation errors on 2 of my rigs.
is this normal? See screenshot below.
ID: 92320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,631,438
RAC: 1,000
Message 92339 - Posted: 26 Mar 2020, 14:18:34 UTC

I have had 4 wu's fail recently, 3 after about 40 minutes, the other about 3 hours. I have had more trouble with Rosetta this year than since the project started. I don't know if they have rushed new stuff through for the Corona virus, they might have done so.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 92339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,918,841
RAC: 10,651
Message 92341 - Posted: 26 Mar 2020, 14:36:42 UTC - in response to Message 92320.  

Hello,
i've started a few days ago with R@H and i seem to have a lot of computation errors on 2 of my rigs.

Seems memory problems. How much ram do you have?
ID: 92341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kipni

Send message
Joined: 24 Mar 20
Posts: 5
Credit: 323,369
RAC: 0
Message 92345 - Posted: 26 Mar 2020, 15:13:59 UTC - in response to Message 92341.  

On one machine i have 8GB, on the other machine i have 16GB
So for the machine with 8GB you may be right because 90% is used. But on the other machine only 50% memory is used.
ID: 92345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 92347 - Posted: 26 Mar 2020, 15:35:49 UTC - in response to Message 92345.  

How much of this memory have you allowed BOINC to use? That's often more important than the total you have.
ID: 92347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kipni

Send message
Joined: 24 Mar 20
Posts: 5
Credit: 323,369
RAC: 0
Message 92353 - Posted: 26 Mar 2020, 19:07:06 UTC - in response to Message 92347.  

These are my memory settings. Or do u mean something else?
ID: 92353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 92360 - Posted: 26 Mar 2020, 19:41:25 UTC - in response to Message 92353.  

That looks like suitable memory settings. However, calculate how much 80% or 90% of 8 GB is to see if you should expect memory problems on that computer. You may have to reduce the number of cores BOINC can use at once to avoid memory problems.
ID: 92360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1960
Credit: 38,076,311
RAC: 6,958
Message 92361 - Posted: 26 Mar 2020, 19:41:57 UTC - in response to Message 92353.  

These are my memory settings. Or do u mean something else?

You allow more memory than me - that's not the issue.
It was discovered a long time ago that we need to have the "Leave non-GPU tasks in memory while suspended" box ticked, otherwise weird errors crop up.
I'm not sure anyone explained why, but ticking it makes problems go away. I don't know why it isn't the default setting.
Also, no-one ever tells you about it.

Try it and see how it goes for a day.
ID: 92361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 92371 - Posted: 27 Mar 2020, 1:35:46 UTC

I just saw a 24.01 KB zip file being downloaded. 24.0 KB appeared to download at normal speed, then it was several seconds before it downloaded the last 0.01 KB.

In other words, the larger zip files aren't fully exempt from the problem; they just aren't affected severely enough to shut down Rosetta@Home new tasks.
ID: 92371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 19 · Next

Message boards : Number crunching : Rosetta 4.0+



©2024 University of Washington
https://www.bakerlab.org