Message boards : Number crunching : Rosetta 4.0+
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 19 · Next
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
|
matsu_pl Send message Joined: 28 Nov 17 Posts: 1 Credit: 1,543,544 RAC: 0 |
Arch Linux on i7-4770, 16 GB RAM. Yesterday I updated Arch. The update included (among others): [2018-04-28 17:33] [ALPM] upgraded glibc (2.26-11 -> 2.27-2) [2018-04-28 17:33] [ALPM] upgraded boinc (7.8.4-1 -> 7.8.4-3) [2018-04-28 17:34] [ALPM] upgraded linux (4.15.15-1 -> 4.16.4-1) After the update, all Rosetta 4.07 x86_64 tasks are failing. For example: https://boinc.bakerlab.org/result.php?resultid=993256201 rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed. SIGABRT: abort called My /etc/locale.conf contains only: LANG=en_US.UTF-8 LC_TIME is not set, and never was. The systemd journal shows no errors, no warnings related to Boinc or rosetta. Can the statically linked Rosetta executable binaries work with the new glibc 2.27? The glibc 2.27 release notes: https://sourceware.org/glibc/wiki/Release/2.27 The following paragraph is discussed at the very end: "Statically compiled applications attempting to load locales compiled for the GNU C Library version 2.27 will fail and fall back to the builtin C/POSIX locale. The reason for this is that the addition of the new "%OB" and "%Ob", support for two grammatical forms of the month names, also extends the locale data binary format. Static applications needing locale support must be recompiled to match the runtime and data they are deployed with. In some distributions there is an upgrade window where dynamically linked applications may use a new library but the old locale data and also fall back to the builtin C/POSIX locales; restarting the application process is sufficient to fix this. " |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta v4.07 windows_intelx86 Device 1759960, Task 993666529, and WU 895307298. Status: Error while computing. Outcome: Computation error Unhandled Exception Detected... OS is Windows 7 Pro, 64. App for_86. Problems noted with this combination by other posters. |
[AF>Libristes]Maeda Send message Joined: 18 Aug 11 Posts: 4 Credit: 4,526,367 RAC: 0 |
Arch Linux on i7-4770, 16 GB RAM. Hello, Same with my computer, 4.16.4-1-ARCH, updated on 4/27/2018. Rosetta Mini v3.78 seems to work, but I can't select only this app to avoid errors, please recompile it with last glibc version. command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -ignore_unrecognized_res 1 -abinitio::fastrelax 1 -ex2aro 1 -abinitio::use_filters false -out:file:silent default.out -abinitio::rsd_wt_loop 0.5 -beta 1 -abinitio::detect_disulfide_before_relax 1 -relax::minimize_bond_angles 1 -in:file:native 00001.pdb -silent_gz 1 -abinitio::rsd_wt_helix 0.5 -relax::default_repeats 15 -beta_cart 1 -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::rg_reweight 0.5 -abinitio::increase_cycles 10 -out:file:silent_struct_type binary -ex1 1 -optimization::default_max_cycles 200 -relax::dualspace 1 -in:file:boinc_wu_zip NTF2chip_4741_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2874706 rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed. SIGABRT: abort called |
[AF>Libristes]Maeda Send message Joined: 18 Aug 11 Posts: 4 Credit: 4,526,367 RAC: 0 |
In fact, only the Rosetta 4.07 x86_64 are crashing. |
mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,390,629 RAC: 15,884 |
Just had over 200 of these fail. So many in such quick succession that my BoincTasks client lost connection while they were all have computation errors. It finally came back with 4 running. Rosetta only, mini seem ok. 1950x, 32gb RAM, 500gb M.2 running 18.04. A 2700x also on 18.04 is running both apps just fine. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 421 |
Could be this is your problem (and mine...) but don't hold your breath for a fix. Any progress on the 32 bit Rosetta 4.07 errors? Rosetta Mini runs fine, so possibly a compiling issue or made for 64 bit instead of 32 bit? All Rosetta 4.07 work units fail on my 32 bit XP Windows machine, this is on both Ralph and Rosetta. Linux runs fine as well. Conan |
gemini8 Send message Joined: 25 Feb 12 Posts: 5 Credit: 3,097,384 RAC: 517 |
I know it's not an officially supported OS anymore, being older than macOS 10.11 'El Capitan', but while RosettaMini is working fine on Mac OS 10.6.8 'Snow Leopard', Rosetta crashes on it. This might be related to Rosetta being compiled differently from RosettaMini. It would be nice if I could switch Rosetta off on the server side, but after reading here about these many problems which aren't solved after all that time I think it's easier to fiddle with an app_config.xml . Thanks. - - - - - - - - - - - - - - - Greetings, Jens |
akkin4 Send message Joined: 17 Apr 18 Posts: 4 Credit: 2,664,458 RAC: 0 |
Is there anything i can do to this Message "if this happend repeadely you may have to restart the prodject. Would avoid to start from beginning when the PC have worked mant houres for those task,s:( Mvh akkin4. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I picked up a few Rosetta 4.07, and they are now running fine after 2 to 3 hours on my Ubuntu 18.04 machine (i7-8700). They are apparently not the x64, but it seems that corrective action has been taken. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
From my own experience, as well as from advise from those more experienced than I, resetting a project --Rosetta-- doesn't fix this issue. As an example, here's what I just found in my Event Log:
Certain WUs just seem to get this "no 'finished' file" error message often. As you say, resetting the project will delete the entire cache for the project and you'll get new WUs, losing credit for all work done on jobs in progress. This also will delay processing of all other WUs which were in your cache. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 3,372 |
From my own experience, as well as from advise from those more experienced than I, resetting a project --Rosetta-- doesn't fix this issue. As an example, here's what I just found in my Event Log: I have seen the "no finished file" message before on several projects. I think it is BOINC Windows and SHARE related. I think RESET does not clear it up. It is only a problem if it happens on the last time the finished file is written. https://boinc.berkeley.edu/dev/forum_thread.php?id=9870 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
998588769 998588769
|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
They are still sending out the 4.07 x64 versions (as of this morning), and they are still failing (Ubuntu 18.04, i7-8700). <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 09:23:38 (8952): wrapper (7.7.26016): starting 09:23:38 (8952): wrapper (7.7.26016): starting 09:23:38 (8952): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda) Python 3.6.3 :: Anaconda, Inc. 09:23:47 (8952): miniconda-installer exited; CPU time 7.696161 09:23:47 (8952): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py) Traceback (most recent call last): File "/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml/lib/python3.6/site-packages/psi4/__init__.py", line 55, in <module> from . import core ImportError: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml/lib/python3.6/site-packages/psi4/core.so: undefined symbol: __svml_sin4 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run-qmml.py", line 12, in <module> import psi4 File "/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml/lib/python3.6/site-packages/psi4/__init__.py", line 60, in <module> raise ImportError("{0}".format(err)) ImportError: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml/lib/python3.6/site-packages/psi4/core.so: undefined symbol: __svml_sin4 Traceback (most recent call last): File "pre_script.py", line 31, in <module> raise Exception("Error running qmml") Exception: Error running qmml 09:26:42 (8952): $PROJECT_DIR/miniconda/bin/python exited; CPU time 101.619166 09:26:42 (8952): app exit status: 0x1 09:26:42 (8952): called boinc_finish(195) https://boinc.bakerlab.org/workunit.php?wuid=899858496 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
They are still sending out the 4.07 x64 versions (as of this morning), and they are still failing (Ubuntu 18.04, i7-8700). Gpugrid.net?? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 3,372 |
They are still sending out the 4.07 x64 versions (as of this morning), and they are still failing (Ubuntu 18.04, i7-8700). If you look at his COMPUTERS, Linux is now including the GLIBC version in the Linux name and Ubuntu 18.04 comes with GLIBC 2.27. Rosetta is statically linked and there is an incompatibility with the LOCALE between the the GLIBC they used and GLIBC2.27. Linux Ubuntu Ubuntu 18.04 LTS [4.15.0-20-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] Any Linux computer with GLIBC 2.27 will have problems running STATICALLY linked Rosetta 4.07. Fedora 28 will also come with glibc 2.27 and will fail with Rosetta 4.07 (I guess I will hold off upgradng). ANY next version of ANY Linux distribution will be using glic 2.27 and have problems with Rosetta 4.07. I saw earlier, that someone had made a request for Rosetta to make the object files available so they could RELINK against GLIBC 2.27. The Rosetta project is REQUIRED by the GNU Free Software license they are using to provide the linkable object files to anyone who has a problem due to linking. Rosetta is probably in "technical violation" of the GNU license until they fix 4.07 or make the object files available. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Gpugrid.net?? Did I copy the wrong one? This is the one associated with that link for Workunit 899858496. <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_05_15_193_289__t000__0_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_05_15_193_289__t000__0_C1_robetta.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3704428 rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed. SIGABRT: abort called Stack trace (17 frames): [0x5efead0] [0x5ffe380] [0x607e517] [0x60083a8] [0x6002794] [0x60027ee] [0x6000f73] [0x6001996] [0x60007df] [0x600020e] [0x5f1d10e] [0x5f1d73e] [0x5f1707a] [0x5f17202] [0x412631] [0x5fff8cc] [0x610b97] Exiting... </stderr_txt> Thanks for catching it, but until they figure it out, it may not matter much. We can error on the errors. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 3,372 |
Anyone experiencing the locale error on a Linux machine with glibc 2.27 will probably see the those errors all 4.07 x86_64 WU sent to them. It appears 3.78 and 4.07 i686 are OK. Nothing anyone can do other than Rosetta developer OR the person managing the job creation. Until they act, by fixing the locale problem OR discontinue sending 64 bit Linux jobs, crunchers will be burdened with the extra system overhead. It is also costing Rosetta by sending the bogus WU and then tracking the failures. Gpugrid.net?? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
It is also costing Rosetta by sending the bogus WU and then tracking the failures. Another annoyance is that because the faulty work units run for only 2 seconds, they skew the BOINC time estimates. My Rosettas (24-hour version) are now showing only 2 hours. So that means BOINC downloads too many. I have to adjust by shortening the work buffer. All Rosetta developers should be required to spend six months actually running their work on BOINC before they are let loose in the wild. |
[DPC]Remmus Send message Joined: 9 Nov 05 Posts: 6 Credit: 2,466,696 RAC: 0 |
Is there an issue system to report this and see any progress. Rosetta is now only sending 4.07 packages so my system is only cranking 'Error while computing' ....... I am running on a Fedora 28 x64 system. |
Message boards :
Number crunching :
Rosetta 4.0+
©2024 University of Washington
https://www.bakerlab.org