Posts by biodoc

1) Message boards : Number crunching : New kind of app on Ralph (Message 102011)
Posted 3 Jun 2021 by biodoc
Post:
I wonder if the the "rosetta python projects" app is an implementation of PyRosetta for boinc.
2) Message boards : Number crunching : New kind of app on Ralph (Message 101977)
Posted 31 May 2021 by biodoc
Post:
Apparently, when i got one and another one is in waiting the CPU crunches one of the pythons ones and nothing else.

Same here. I have a 12 cores AMD Ryzen and even if i stop all other wus, the system crunchs only 1 python.


When i have one of these python jobs i can force rosetta to crunch classical WU's by cancelling all the pythons in the queue. Anyway it doesn't finish them all. I get a unmanageable VM message after several hours of my electricity wasted so i killed the tasks.

Edit: the funny thing is i don't get -of course- those python VM jobs on another machine which has no virtualbox installed. So a radical solution to this problem could be to uninstall virtualbox. I won't.I use it.


An alternative to uninstalling VBOX is to add this line to the "options" section of your cc_config.xml file and then restart the boinc client.

<dont_use_vbox>1</dont_use_vbox>
3) Message boards : Number crunching : "Rosetta v4.12 i686-pc-linux-gnu" : fixed 20 h CPU time, fixed 20 credits (Message 93890)
Posted 8 Apr 2020 by biodoc
Post:
@xii5ku, thank you for tracking this problem down. My caches are full of the linux i686 tasks. I would think it would be a good idea to stop the server from sending this work until the bug in the app is fixed.
4) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91793)
Posted 27 Feb 2020 by biodoc
Post:
I've had ~20 of these tasks fail after 8 hours of computation time: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_04_900260_6_0.

Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1124284684

I've aborted the others in my que.

linux
3900x processor
64 GB RAM
5) Message boards : Number crunching : Validators not running (Message 90700)
Posted 21 Apr 2019 by biodoc
Post:
Both mini-R and R validators need a kick start. Also the mini-R assimilator is not running.
6) Message boards : News : Another publication in Nature describing the first de novo designed proteins with anti-cancer activity (Message 90406)
Posted 22 Feb 2019 by biodoc
Post:
Congratulations! Unfortunately I'd have to pay for the article to read it but based on the abstract, it sounds very cool.
7) Message boards : Number crunching : Rosetta 4.0+ (Message 89188)
Posted 29 Jun 2018 by biodoc
Post:
I'm seeing quite a number of identical computation errors on 2 of my machines. They seem to occur in "bunches". For example, the ryzen had no errors for a couple of days and then I got about 180 of them within a few minutes. My other 2 machines are error free but they are running older kernels (mint 7.3 and 18.3)

computer 1

https://boinc.bakerlab.org/rosetta/results.php?hostid=3420461&offset=0&show_names=0&state=6&appid=

CPU type: 2 x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz [Family 6 Model 62 Stepping 4]
Number of processors: 40
Coprocessors: NVIDIA Quadro FX 4800 (1535MB) driver: 340.10 OpenCL: 1.0
Operating System: Linux Mint 19 Tara [4.15.0-23-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]
BOINC version: 7.9.3
Memory: 96602.15 MB

Error:

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_06_28_403_572__t000__0_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_06_28_403_572__t000__0_C1_robetta.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3482852
rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
SIGABRT: abort called
Stack trace (17 frames):
[0x5efead0]
[0x5ffe380]
[0x607e517]
[0x60083a8]
[0x6002794]
[0x60027ee]
[0x6000f73]
[0x6001996]
[0x60007df]
[0x600020e]
[0x5f1d10e]
[0x5f1d73e]
[0x5f1707a]
[0x5f17202]
[0x412631]
[0x5fff8cc]
[0x610b97]

Exiting...

</stderr_txt>
]]>


Computer 2

CPU type: AMD Ryzen 7 2700X Eight-Core Processor [Family 23 Model 8 Stepping 2]
Number of processors: 16
Coprocessors: NVIDIA GeForce GTX 780 Ti (3015MB) driver: 390.67 OpenCL: 1.2
Operating System: Antergos Linux [4.17.2-1-ARCH]
BOINC version: 7.8.6
Memory: 16037.01 MB

<core_client_version>7.8.6</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -relax::minimize_bond_lengths 1 -out:file:silent_struct_type binary -ignore_unrecognized_res 1 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -abinitio::rg_reweight 0.5 -relax::default_repeats 2 -beta 1 -abinitio::increase_cycles 10 -ex2aro 1 -frag9 00001.200.9mers -abinitio::fastrelax 1 -abinitio::detect_disulfide_before_relax 1 -ex1 1 -relax::minimize_bond_angles 1 -beta_cart 1 -in:file:native 00001.pdb -relax::dualspace 1 -optimization::default_max_cycles 200 -abinitio::rsd_wt_helix 0.5 -frag3 00001.200.3mers -in:file:boinc_wu_zip DRH_curve_X_h21_l4_h29_l2_09658_4_loop_11_0001_one_capped_0001_fragments_data.zip -out:file:silent default.out -silent_gz 1 -mute all -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1195778
rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
SIGABRT: abort called
Stack trace (17 frames):
[0x5efead0]
[0x5ffe380]
[0x607e517]
[0x60083a8]
[0x6002794]
[0x60027ee]
[0x6000f73]
[0x6001996]
[0x60007df]
[0x600020e]
[0x5f1d10e]
[0x5f1d73e]
[0x5f1707a]
[0x5f17202]
[0x412631]
[0x5fff8cc]
[0x610b97]

Exiting...

</stderr_txt>
]]>
8) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 77196)
Posted 1 Aug 2014 by biodoc
Post:
Perhaps the Baker lab use of University Network bandwith is affecting student access to facebook, youtube and Netflix?

Sad, science no longer rules these days.
9) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 77103)
Posted 29 Jul 2014 by biodoc
Post:
Similar problems with my 4 computers:

7/29/2014 4:56:43 PM | rosetta@home | Started upload of tube9_26_A_tube9_26_B_patchdock_split_03_140727_SAVE_ALL_OUT__179886_89_0_0
7/29/2014 4:57:01 PM | rosetta@home | Temporarily failed upload of hc_centroids_1tsf_234_0.5_06-01-14_SAVE_ALL_OUT_168123_4458_0_0: transient HTTP error
7/29/2014 4:57:01 PM | rosetta@home | Backing off 00:06:23 on upload of hc_centroids_1tsf_234_0.5_06-01-14_SAVE_ALL_OUT_168123_4458_0_0
7/29/2014 4:57:04 PM | rosetta@home | Temporarily failed upload of tube9_26_A_tube9_26_B_patchdock_split_03_140727_SAVE_ALL_OUT__179886_89_0_0: connect() failed
7/29/2014 4:57:04 PM | rosetta@home | Backing off 00:05:20 on upload of tube9_26_A_tube9_26_B_patchdock_split_03_140727_SAVE_ALL_OUT__179886_89_0_0
10) Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III (Message 14528)
Posted 24 Apr 2006 by biodoc
Post:
I just aborted a WU (details below). It was running 8+ hours at 2% complete and I have a 2 hr runtime pref. set. The behavior of this WU was similar to the FACONTACT & HBLR_1.0 WUs that I've seen posted here & experienced myself as "long runners". They seem to run normally through the "model 1" process (# of steps in the 6 figure range) & instead of moving on to "model 2, step 1", they start over as model 1, step1. Perhaps it could be described as a "model 1 loop" bug? Anyone else seen this?

I'm only running Rosetta & I have "leave in memory" checked as a pref.

Could this be Accepted RMSD & Accepted energy parameters or "goals" are not met during the Model 1 calculation & thus does not move on to a model 2 calculation & just starts the model 1 calculation over again?


Result ID 18037907
Name FARELAX_NOFILTERS_1scjB_417_302_3
Workunit 13208951
11) Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III (Message 14310)
Posted 21 Apr 2006 by biodoc
Post:
I just abort 2 "long runners" (17+ hours; about 2% complete)

Result ID 17758781
Name FACONTACTS_RECENTER_NOFILTERS_1scjB_448_993_1

Result ID 17757908
Name FACONTACTS_RECENTER_NOFILTERS_1tit__448_560_1

12) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13485)
Posted 11 Apr 2006 by biodoc
Post:
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode.
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=13936782

Please advise; should I terminate?


I've aborted this work unit after six hours.
http://boinc.bakerlab.org/rosetta/result.php?resultid=17002591
13) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13478)
Posted 11 Apr 2006 by biodoc
Post:
Iv'e got a stuck work unit at 1.042% complete (4h 50min) w/ 2 hr runtime preference. No activity in graphics mode.
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=13936782

Please advise; should I terminate?
14) Message boards : Number crunching : Miscellaneous Work Unit Errors (Message 13231)
Posted 8 Apr 2006 by biodoc
Post:
Since the upgrade to version 4.97, I've had most of the WU's failing with client errors on my 4 windoze boxes (42 failures & counting!). My linux box is OK. FYI, I had absolutely no errors with 4.83 since it was released. These are the errors I'm seeing:

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x00599FF4 read attempt to address 0x06C2FFF0
***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x00599FF4 read attempt to address 0x06C2FC8C
***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x007022EA read attempt to address 0x06AAFF34
***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x00599FF4 read attempt to address 0x06CBFA98
***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x007022EA read attempt to address 0x0EB4FF90







©2024 University of Washington
https://www.bakerlab.org