Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 33 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 95083 - Posted: 21 Apr 2020, 23:28:32 UTC - in response to Message 95072.  
Last modified: 21 Apr 2020, 23:32:20 UTC

Is not the first..
1156578590
ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG:the reference pose must be the same size as the working pose
ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323
BOINC:: Error reading and gzipping output datafile: default.out
22:55:42 (10128): called boinc_finish(1)
Looks like it suffered from the same problem as mine,
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
Looks like they failed after restarting.
A problem with the checkpoint?

From the system that processed the same WU as me,
Starting watchdog...
Watchdog active.

Grant
Darwin NT
ID: 95083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,364,976
RAC: 132
Message 95146 - Posted: 22 Apr 2020, 18:03:18 UTC

I've had a couple of tasks fail on Ubuntu 19.10. They appeared to run for the full 8 hours but the upload failed.

<error_code>-131 (file size too big)</error_code>

https://boinc.bakerlab.org/rosetta/result.php?resultid=1157342448

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_3fg6zc0o_924138_1_0_r441246010_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</

and

https://boinc.bakerlab.org/rosetta/result.php?resultid=1157317275

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_1p9m_2_SAVE_ALL_OUT_IGNORE_THE_REST_2pn8jf2i_924136_1_0_r712855490_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>

Any settings I need to change?
ID: 95146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nick Name

Send message
Joined: 12 Aug 09
Posts: 3
Credit: 2,483,150
RAC: 0
Message 95147 - Posted: 22 Apr 2020, 18:25:00 UTC - in response to Message 95146.  

The size has been set incorrectly. I have some with the same thing on Windows 10 that ended with a computation error.

Rosetta@home 4/22/2020 1:51:16 PM Computation for task Mini_Protein_binds_IL6R_COVID-19_1bqu_1_SAVE_ALL_OUT_IGNORE_THE_REST_5wl5hq6z_924127_1_0 finished
Rosetta@home 4/22/2020 1:51:16 PM Output file Mini_Protein_binds_IL6R_COVID-19_1bqu_1_SAVE_ALL_OUT_IGNORE_THE_REST_5wl5hq6z_924127_1_0_r251659771_0 for task Mini_Protein_binds_IL6R_COVID-19_1bqu_1_SAVE_ALL_OUT_IGNORE_THE_REST_5wl5hq6z_924127_1_0 exceeds size limit.
Rosetta@home 4/22/2020 1:51:16 PM File size: 100258133.000000 bytes. Limit: 100000000.000000 bytes

These should be canceled server-side and reissued when fixed.
Team USA page | Team USA forum
Follow us on Twitter
ID: 95147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
benknobi89

Send message
Joined: 24 Mar 20
Posts: 1
Credit: 752,405
RAC: 42
Message 95154 - Posted: 22 Apr 2020, 21:14:53 UTC - in response to Message 95147.  
Last modified: 22 Apr 2020, 21:18:34 UTC

I'm having the same issue:

Wed Apr 22 12:26:16 2020 | Rosetta@home | Output file Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_3fb2sj8a_924138_1_0_r1088028856_0 for task Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_3fb2sj8a_924138_1_0 exceeds size limit.
Wed Apr 22 12:26:16 2020 | Rosetta@home | File size: 108086503.000000 bytes. Limit: 100000000.000000 byte
Wed Apr 22 12:59:20 2020 | Rosetta@home | Computation for task Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_6uj1ab9u_924138_1_0 finished
Wed Apr 22 12:59:20 2020 | Rosetta@home | Output file Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_6uj1ab9u_924138_1_0_r667199254_0 for task Mini_Protein_binds_IL6R_COVID-19_1p9m_3_SAVE_ALL_OUT_IGNORE_THE_REST_6uj1ab9u_924138_1_0 exceeds size limit.
Wed Apr 22 12:59:20 2020 | Rosetta@home | File size: 118056626.000000 bytes. Limit: 100000000.000000 bytes

Although I would point out that 100 mb is a pretty large file for rosetta so I wonder if its indicative of some other sort of error in the analysis itself.
ID: 95154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 95156 - Posted: 22 Apr 2020, 22:00:16 UTC - in response to Message 95024.  

Name: 04209mer_stapler_stub366_1-6_53_2449252_3mers_0001_0001_SAVE_ALL_OUT_924092_311_0
Application: Rosetta v4.15 windows_x86_64
Device: 3710630
Task: 1157048481. WU: 1040740292
Status: Error while computing.
Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Stderr output:
(unknown error) - exit code -1073741819 (0xc0000005)
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000000014188F556 read attempt to address 0xFFFFFFFF
Engaging BOINC Windows Runtime Debugger...
Same as error quoted in Message ID 95024.
ID: 95156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 95162 - Posted: 22 Apr 2020, 22:50:40 UTC
Last modified: 22 Apr 2020, 22:51:56 UTC

The Project Team is aware of, and working on the file size issues.
Rosetta Moderator: Mod.Sense
ID: 95162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 95169 - Posted: 23 Apr 2020, 2:30:44 UTC

We had to cancel these work units. Unfortunately they slipped past our quality checks. We are working on a fix and will hopefully have these important COVID-19 jobs out asap. Sorry for these recent issues. These are relatively new tasks and protocols so it's a learning process for us also. On a positive note, these large sampling runs are producing great results that otherwise were never practically possible to obtain.
ID: 95169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,763,129
RAC: 927
Message 95185 - Posted: 23 Apr 2020, 7:37:30 UTC
Last modified: 23 Apr 2020, 7:54:40 UTC

Got this error on my Mac, never seen this before. Are these the same as the ones above?
It ran until completion and got reported normally.
https://boinc.bakerlab.org/rosetta/result.php?resultid=1156715070
<core_client_version>7.14.4</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.16_x86_64-apple-darwin -score:weights hugh2020_HHH_rd4_0628_A9V__beta16.nostab-refit.wts -beta_nov16 -corrections:beta_nov16 -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip hugh2020_HHH_rd4_0628_A9V__beta16.nostab-refit_fragments_fold_data.zip -out:file:silent default.out -silent_gz -mute all -in:file:fasta 00001.fasta -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 5000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1818959
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures   129556 cpu seconds
This process generated    962 decoys from     962 attempts
======================================================
BOINC :: WS_max 3.59453e+08

BOINC :: Watchdog shutting down...
22:48:28 (1040): called boinc_finish(0)

</stderr_txt>
<message>
finish file present too long</message>
]]>
ID: 95185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,763,129
RAC: 927
Message 95186 - Posted: 23 Apr 2020, 7:44:02 UTC
Last modified: 23 Apr 2020, 8:04:52 UTC

Also got these four validation errors on my main rig. Is the only issue with them the WU got cancelled (presumably after they were reported)?

https://boinc.bakerlab.org/rosetta/result.php?resultid=1158001484
https://boinc.bakerlab.org/rosetta/result.php?resultid=1157903020
https://boinc.bakerlab.org/rosetta/result.php?resultid=1157900936
https://boinc.bakerlab.org/rosetta/result.php?resultid=1156662934

Also, why is there such a comically large number of "decoys" generated?
ID: 95186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 95190 - Posted: 23 Apr 2020, 8:11:14 UTC - in response to Message 95185.  

Got this error on my Mac, never seen this before. Are these the same as the ones above?
Nope.



</stderr_txt>
<message>
finish file present too long</message>
]]>
It's a problem that has been around for years- a lot of disk I/O and resource contention resulting in the cleanup after the Task was finished not happening fast enough for the Manager's liking, so it gets clobbered. Even if the Task finished without error, it still becomes one.
The latest version of BOINC (7.16.5) is meant to have the fix for it.
Grant
Darwin NT
ID: 95190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erich56

Send message
Joined: 11 Jan 16
Posts: 35
Credit: 1,437,503
RAC: 681
Message 95192 - Posted: 23 Apr 2020, 8:18:33 UTC

last night, I had several tasks which after several hours failed with

202 (0x000000CA) EXIT_ABORTED_BY_PROJECT

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007FF93AADC302

for more Details see: https://boinc.bakerlab.org/rosetta/result.php?resultid=1157856714

what's the problem behind it?
ID: 95192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 95193 - Posted: 23 Apr 2020, 8:21:14 UTC - in response to Message 95192.  

what's the problem behind it?
A batch of dodgy Work Units.
Grant
Darwin NT
ID: 95193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,763,129
RAC: 927
Message 95200 - Posted: 23 Apr 2020, 10:33:36 UTC - in response to Message 95190.  

Got this error on my Mac, never seen this before. Are these the same as the ones above?
Nope.



</stderr_txt>
<message>
finish file present too long</message>
]]>
It's a problem that has been around for years- a lot of disk I/O and resource contention resulting in the cleanup after the Task was finished not happening fast enough for the Manager's liking, so it gets clobbered. Even if the Task finished without error, it still becomes one.
The latest version of BOINC (7.16.5) is meant to have the fix for it.


Welp, I AM using the lastest BOINC version for MAC. Good to know it hasn't been fixed for MacOS. That's odd, my Mac is not new, but its SSD still offers top-notch performance.
ID: 95200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 95201 - Posted: 23 Apr 2020, 10:39:08 UTC - in response to Message 95200.  
Last modified: 23 Apr 2020, 10:40:25 UTC

Welp, I AM using the lastest BOINC version for MAC. Good to know it hasn't been fixed for MacOS. That's odd, my Mac is not new, but its SSD still offers top-notch performance.
?
From the BOINC download page.
Mac OS X (64-bit Intel) Version 10.6.0+
7.16.6 Recommended version
You've got v7.14.4 according to your system info here.
Grant
Darwin NT
ID: 95201 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 95214 - Posted: 23 Apr 2020, 16:50:32 UTC - in response to Message 95186.  

There are a lot of models because of the filtering criteria being used. We are trying to find needles in a haystack and each search step can be short or long depending on the modeling problem at hand. We are looking into ways to decrease the size of these filtered models to take care of this issue but the underlying goal of sampling a very large space is still necessary.
ID: 95214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Beyond
Avatar

Send message
Joined: 30 May 06
Posts: 2
Credit: 8,465,106
RAC: 0
Message 95215 - Posted: 23 Apr 2020, 17:00:35 UTC - in response to Message 95169.  

We had to cancel these work units. Unfortunately they slipped past our quality checks. We are working on a fix and will hopefully have these important COVID-19 jobs out asap. Sorry for these recent issues. These are relatively new tasks and protocols so it's a learning process for us also. On a positive note, these large sampling runs are producing great results that otherwise were never practically possible to obtain.

Good to hear about the great results. Thanks for all the hard work. Everyone's peddling about as fast as they can right now. :-)
ID: 95215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 60
Credit: 222,375
RAC: 12
Message 95216 - Posted: 23 Apr 2020, 17:16:29 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=1157960758

Name 0yx8rt2q_24HGBAB0HM2HGABB24H_build_COVID-19_binder_build1_SAVE_ALL_OUT_924149_47_0
Application version Rosetta v4.15 arm-android-linux-gnu
Peak working set size 688.89 MB
Peak swap size 972.25 MB
Peak disk usage 1,098.00 MB

Stderr output

<core_client_version>7.16.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.15_arm-android-linux-gnu -run:protocol jd2_scripting -parser:protocol design_2019CoV_boinc.xml @bp_cov_flags_24HGBAB0HM2HGABB24H -in:file:silent 0yx8rt2q_24HGBAB0HM2HGABB24H_build_COVID-19_binder_test1.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 0yx8rt2q_24HGBAB0HM2HGABB24H_build_COVID-19_binder_test1.zip @0yx8rt2q_24HGBAB0HM2HGABB24H_build_COVID-19_binder_test1.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 5000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3029927
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 462 starting structures 13803.8 cpu seconds
This process generated 462 decoys from 462 attempts
======================================================
BOINC :: WS_max 0
called boinc_finish(0)

</stderr_txt>
]]>


ID: 95216 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 95220 - Posted: 23 Apr 2020, 17:59:09 UTC - in response to Message 95200.  
Last modified: 23 Apr 2020, 18:02:40 UTC


Welp, I AM using the lastest BOINC version for MAC. Good to know it hasn't been fixed for MacOS. That's odd, my Mac is not new, but its SSD still offers top-notch performance.


According to your user page you are still on BONIC 17.14.4 for your one Mac computer. The latest OS X version of BONIC is 7.16.16. Head over the the main BONIC page and download/install the latest version.

/edit Really everyone should take a few moments regardless of OS and check that you are running the most recent version for your rigs.
ID: 95220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,763,129
RAC: 927
Message 95232 - Posted: 23 Apr 2020, 20:15:50 UTC - in response to Message 95201.  

Welp, I AM using the lastest BOINC version for MAC. Good to know it hasn't been fixed for MacOS. That's odd, my Mac is not new, but its SSD still offers top-notch performance.
?
From the BOINC download page.
Mac OS X (64-bit Intel) Version 10.6.0+
7.16.6 Recommended version
You've got v7.14.4 according to your system info here.


Oops...
I remember just having just updated to 7.16.x a few days prior due to "suspend when CPU is above XX" not working properly on a previous version.
I guess i forgot to actually install the thing.
ID: 95232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Plomos

Send message
Joined: 4 Mar 11
Posts: 11
Credit: 423,212
RAC: 8
Message 95356 - Posted: 25 Apr 2020, 16:43:39 UTC
Last modified: 25 Apr 2020, 16:45:28 UTC

Am noticing with tasks marked as Mini_Protein_binds_IL6R_COVID-19_1p9m_v2_3_SAVE_ALL_OUT_IGNORE_THE_REST_6vq3dw3t_924614_2 do end up creating decoys, but according to the windows graphics output they spend a lot of time with Stage: unknown and then eventually run 2 or 3 steps before going to the next model. Is this normal behavior? Have seen this with quite a few of the Mini_Protein_binds units. Example https://boinc.bakerlab.org/rosetta/result.php?resultid=1159847678
ID: 95356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 33 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2022 University of Washington
https://www.bakerlab.org