1)
Message boards :
Number crunching :
Accout lost??
(Message 110289)
Posted 17 Dec 2024 by rsNeutrino Post: I can confirm a massive reappearance of profiles in the per country overview: https://boinc.bakerlab.org/rosetta/user_profile/profile_country.html Snapshot February 2024, before the purge: http://web.archive.org/web/20240227133833/https://boinc.bakerlab.org/rosetta/user_profile/profile_country.html ∑ 10,317 2024-12-14, after the purge: ∑ 1,363 | ∆ -8,954 2024-12-17, today, after restore: ∑ 9,189 | ∆ +7,826 I have a copy of the project user XML from today and from the 14th, so it would be possible to do a delta, if necessary. https://boinc.bakerlab.org/rosetta/stats/user.gz 2024-12-14, after the purge: 1,380,963 users 2024-12-17, today, after restore: 1,389,007 users | ∆ +8044 Back as well: Peak7100 https://boinc.bakerlab.org/rosetta/show_user.php?userid=151100 WalkerSister https://boinc.bakerlab.org/rosetta/show_user.php?userid=163718 |
2)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 110232)
Posted 14 Dec 2024 by rsNeutrino Post: This list sorted by rising credits per day, starting in the deep negative, should contain at least a subset of the missing accounts (~5050 at this time): https://www.boincstats.com/stats/-1/user/list/13/0/0 I suspect there was a massive purge, intentional or unintentional, and it's either not fully synchronized with externals sites, or masked by other projects adding more credits daily than have been lost in total by rosetta. With old google results I found something else interesting: The alphabetical user list, containing old pages. Example: - R - User Profiles - Names beginning with R: Page 1 of 8 Last updated 14 Dec 2024, 12:50:08 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_R_1.html User Profiles - Names beginning with R: Page 8 of 8 Last updated 14 Dec 2024, 12:50:08 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_R_8.html User Profiles - Names beginning with R: Page 9 of 56 Last updated 13 Dec 2024, 12:50:39 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_R_9.html User Profiles - Names beginning with R: Page 56 of 56 Last updated 13 Dec 2024, 12:50:39 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_R_56.html -> ~ 86% of all users starting with "R" missing And you can do that for every character: - A - User Profiles - Names beginning with A: Page 9 of 9 Last updated 14 Dec 2024, 12:50:08 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_A_9.html User Profiles - Names beginning with A: Page 10 of 68 Last updated 13 Dec 2024, 12:50:40 UTC https://boinc.bakerlab.org/rosetta/user_profile/profile_A_10.html -> ~ 87% of all users starting with "A" missing Edit: A more complete comparison of the magnitude of this purge - Profiles by country: Snapshot February 2024: http://web.archive.org/web/20240227133833/https://boinc.bakerlab.org/rosetta/user_profile/profile_country.html ∑ 10,317 Today: https://boinc.bakerlab.org/rosetta/user_profile/profile_country.html ∑ 1,363 Edit 2: Deltas Profiles per Country (sum) 2024-02-26 10317 2024-12-14 1363 ∆: -8954 Users with credit 2024-12-13 1381403 2024-12-14 1373594 ∆: -7809 Users with recent credit 2024-12-13 13749 2024-12-14 13628 ∆: -121 |
3)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 110230)
Posted 14 Dec 2024 by rsNeutrino Post: Concerning the missing accounts, boincstats shows corresponding data: https://www.boincstats.com/stats/14/user/list/0/1000/0 The "last day" rank up numbers show all users at the 100th place rising 5 positions, corresponding to 5 users missing in the top 100, so 5% of user accounts went missing there. Similar scaling at the 1000th place, rising 65 positions corresponding to 65 users missing within the top 1000 = 6.5%. At the same time the credits of those accounts vanished from the total as well: https://www.boincstats.com/stats/14/project/detail/credit https://www.boincstats.com/stats/14/project/detail/lastDays Yesterday, the total credit balance of the project was 157,333,935,910. Today it's 151,216,469,526. So 6,117,466,384 credits (6.1 BILLION credits, 3.9% of total) vanished, as shown by the massive column going negative in the credits per day, today. |
4)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 107906)
Posted 29 Dec 2022 by rsNeutrino Post: Screenshots for comparison of the chains, no differences between platforms: URL: boinc.bakerlab.org ![]() ![]() URL: boinc-files.bakerlab.org ![]() |
5)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 107905)
Posted 29 Dec 2022 by rsNeutrino Post: I, too, was not able to download any WU files during the last batch around 15.-21.12.2022 BOINC version 7.20.2 on Ubuntu 22.04.1, fully updated. update-ca-certificates did not help. I did some analysis: I noticed two different URLs used by Rosetta: Root URL for general communication: https://boinc.bakerlab.org/ Old download URL, index with folder and file structure visible: https://boinc.bakerlab.org/rosetta/download/ It seems some time ago Rosetta switched to a new URL for downloads. New download URL, index hidden, shown as offline on the status page: https://boinc-files.bakerlab.org/ Both URLs seem to target the same underlying file system. As an examlple, the following two URLs lead to the same file: https://boinc.bakerlab.org/rosetta/download/0/3stub_cyc_target_1cwa_00081_2_extract_B.zip https://boinc-files.bakerlab.org/rosetta/download/0/3stub_cyc_target_1cwa_00081_2_extract_B.zip Tests on Debian with curl: old URL: curl https://boinc.bakerlab.org/rosetta/download/0/3stub_cyc_target_1cwa_00081_2_extract_B.zip -vI * Trying 128.95.160.156:443... * Connected to boinc.bakerlab.org (128.95.160.156) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS header, Finished (20): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS header, Finished (20): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: C=US; ST=Washington; O=University of Washington; CN=*.bakerlab.org * start date: Dec 14 00:00:00 2022 GMT * expire date: Dec 14 23:59:59 2023 GMT * subjectAltName: host "boinc.bakerlab.org" matched cert's "*.bakerlab.org" * issuer: C=US; ST=MI; L=Ann Arbor; O=Internet2; OU=InCommon; CN=InCommon RSA Server CA * SSL certificate verify ok. new URL: curl https://boinc-files.bakerlab.org/rosetta/download/0/3stub_cyc_target_1cwa_00081_2_extract_B.zip -vI * Trying 128.95.160.134:443... * Connected to boinc-files.bakerlab.org (128.95.160.134) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, unknown CA (560): * SSL certificate problem: unable to get local issuer certificate * Closing connection 0 curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above. I could recreate the same error on multiple independent Rasbpian/Debian installations with the same result. Firefox on Windows and Ubuntu did not complain, shows verified by Internet2. On Windows, Openssl fails on both the new and old URL, probably because it's out of date (2021). wget and curl in powershell did not complain. Edge on Windows downloads the zip without warning, but warns about a missing certificate when opening https://boinc-files.bakerlab.org and clicking on the lock symbol. Openssl tests on Debian, each with the same results: Raspbian (buster): OpenSSL 1.1.1n 15 Mar 2022 Ubuntu (22.04.1 LTS): OpenSSL 3.0.2 15 Mar 2022 new URL: openssl s_client -connect boinc-files.bakerlab.org:443 CONNECTED(00000003) depth=0 C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org verify error:num=20:unable to get local issuer certificate verify return:1 depth=0 C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org verify error:num=21:unable to verify the first certificate verify return:1 depth=0 C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org verify return:1 --- Certificate chain 0 s:C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org i:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256 v:NotBefore: Dec 14 00:00:00 2022 GMT; NotAfter: Dec 14 23:59:59 2023 GMT --- Server certificate -----BEGIN CERTIFICATE----- MIIGpDCCBYygAwIBAgIQLX34hscHGmFRGzZSrcfHojANBgkqhkiG9w0BAQsFADB2 ---snip--- Start Time: 1672334317 Timeout : 7200 (sec) Verify return code: 21 (unable to verify the first certificate) Extended master secret: yes old URL: openssl s_client -connect boinc.bakerlab.org:443 CONNECTED(00000003) depth=2 C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority verify return:1 depth=1 C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA verify return:1 depth=0 C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org verify return:1 --- Certificate chain 0 s:C = US, ST = Washington, O = University of Washington, CN = *.bakerlab.org i:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256 v:NotBefore: Dec 14 00:00:00 2022 GMT; NotAfter: Dec 14 23:59:59 2023 GMT 1 s:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA i:C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA512 v:NotBefore: Sep 19 00:00:00 2014 GMT; NotAfter: Sep 18 23:59:59 2024 GMT 2 s:C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority i:C = GB, ST = Greater Manchester, L = Salford, O = Comodo CA Limited, CN = AAA Certificate Services a:PKEY: rsaEncryption, 4096 (bit); sigalg: RSA-SHA384 v:NotBefore: Mar 12 00:00:00 2019 GMT; NotAfter: Dec 31 23:59:59 2028 GMT --- Server certificate -----BEGIN CERTIFICATE----- MIIGpDCCBYygAwIBAgIQLX34hscHGmFRGzZSrcfHojANBgkqhkiG9w0BAQsFADB2 --snip-- Start Time: 1672334384 Timeout : 7200 (sec) Verify return code: 0 (ok) Extended master secret: no Here a comparison with an external 3rd party online analysis from SSLLABS: https://www.ssllabs.com/ssltest/analyze.html?d=boinc-files.bakerlab.org ("This server's certificate chain is incomplete. Grade capped to B.") https://www.ssllabs.com/ssltest/analyze.html?d=boinc.bakerlab.org (Rating: A) (Additional test: https://www.digicert.com/help/) As you can see, even they show issues with the certificate chain for the new URL. If I understand the results correctly, the old URL sends both the server certificate (1) and the intermediate certificate (2) (= issuer certificate) to the client (SSLLABS: Certification Paths: "1 - Sent by server, 2 - Sent by server"), completing the chain of trust with the root certificate (3) on the client. The new URL only sends the server certificate, but not the intermediate certificate (SSLLABS: "2 - Extra download", from InCommon RSA Server CA / Internet2). If the client does not already have this intermediate certificate in its trust store, which seems often but not always the case (comparing Firefox vs windows vs Debian), the chain is broken and any connection to boinc-files.bakerlab.org fails. Maybe there are also some automated tricks and workarounds going on, like caching the intermediate after once connecting to boinc.bakerlab.org, so that the client can puzzle the chain together anyway. As others already wrote, it is visible in Rosetta's statistics that something widespread isn't working. Comparing earlier batches with the last batch is noticeble slower in crunshing and returning WUs: https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_rosetta.html ![]() https://www.boincstats.com/stats/14/project/detail/credit ![]() So, I think what needs to be done is to recreate the configuration for / copy the intermediate certificate from boinc.bakerlab.org to boinc-files.bakerlab.org, so that it gets sent to clients as well. |
6)
Message boards :
Number crunching :
Large proteins
(Message 95071)
Posted 21 Apr 2020 by rsNeutrino Post: Here I have two tasks that run just shy of the watchdog timeout of 18 hours, without the BOINC:: CPU time message: http://boinc.bakerlab.org/rosetta/result.php?resultid=1156071420 http://boinc.bakerlab.org/rosetta/result.php?resultid=1156071425 I think that is too close for comfort. When systems are a little bit slower, all that work might still be wasted because of a too short timeout if it fails to generate some useful output... |
7)
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
(Message 94798)
Posted 18 Apr 2020 by rsNeutrino Post: There very well could be jobs that have long run times per model. I increased the watchdog to 10 hours. This will only take affect on new jobs. After 12h 8min CPU time this one finished successfully with 1 decoy: 1153161145 Did the watchdog end it? BOINC:: CPU time: 43719.1s, 14400s + 28800s[2020- 4-18 16: 7:48:] :: BOINC |
8)
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
(Message 94741)
Posted 18 Apr 2020 by rsNeutrino Post: From the couple i've had, it looks like the Watchdog will let it run for up to an extra 4 hours, and if it doesn't finish in that time then the next time it goes to Checkpoint, the Watchdog then ends it and it is considered finished My target time is 8 hours. The task reached 12 hours, so it did run for 4 extra hours. I had my eye on that task before it ended, and BOINC told me in the task properties that "CPU time since checkpoint" was equal to the "CPU time" of that task. Which means there wasn't even one checkpoint saved in the 12h since the start of that task. The second task shows the same symptoms at the moment, CPU time 04:40:xx, CPU time since checkpoint 04:40:xx, Elapsed time 04:45:xx. My understanding is that the watchdog is there to kill the task at target time + 4h, regardless of wether there are any results: 18.04.2020 06:26:41 | Rosetta@home | Output file rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_04_918009_202_0_r1479539452_0 for task rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_04_918009_202_0 absent Also the watchdog seems to look for "cpu seconds" alias "CPU time", not the bit longer Elapsed time. The point is, it seems to me that there are some models that are either buggy or need much more time to produce even a single result, and the watchdog doesn't like it. In the case that the model can't be changed to fit in an 8h timeslot, to raise the watchdog timeout could be a necessary option, which MAY has already happened in your and James' cases, but not in mine. |
9)
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
(Message 94735)
Posted 18 Apr 2020 by rsNeutrino Post: Looks like there was a file transfer problem there as well. Maybe because it didn't have any checkpoint or result file to upload since it wasn't finished with its first decoy. Thats probably also the reason for the long runtime, it HAS to finish one before shutdown else it keeps going until the watchdog kills it. |
10)
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
(Message 94733)
Posted 18 Apr 2020 by rsNeutrino Post: Task 1152764941 also drove into the 12 hour timeout, reaching 98% around 10 minutes before that. <core_client_version>7.16.5</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.15_windows_x86_64.exe @rb_04_16_21806_21365_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 2 2 2 1 1 1 1 2 1 1 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_04_16_21806_21365_ab_t000__robetta.zip -frag3 rb_04_16_21806_21365_ab_t000__robetta.200.3mers.index.gz -fragA rb_04_16_21806_21365_ab_t000__robetta.200.4mers.index.gz -fragB rb_04_16_21806_21365_ab_t000__robetta.200.7mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 5000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1285868 Starting watchdog... Watchdog active. BOINC:: CPU time: 43301.9s, 14400s + 28800s[2020- 4-18 6:26:34:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 43301.9 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 06:26:34 (10032): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_04_918009_202_0_r1479539452_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> Task 1153161145 is probably going to end up the same, 34% at 4h 10min. |
11)
Message boards :
Number crunching :
Rosetta 4.0+
(Message 92290)
Posted 25 Mar 2020 by rsNeutrino Post: For my rig (16 GB of ram) running on 4 cores. It consumes almost the same. 1,9-2,2 GB and does not causes problems, 8 gig can be somewhat small, if you use your browser with some open tabs next to Rosetta. In my case BOINC is configured so that it can use 80% of 32GB RAM at all times, running with 14 rosetta threads on a Ryzen 1700 with 8 cores and 16 CPU threads available. 15 GB RAM has been sitting empty when the errors occured. Changed to 8 rosetta threads for now... |
12)
Message boards :
Number crunching :
Rosetta 4.0+
(Message 92254)
Posted 25 Mar 2020 by rsNeutrino Post: I seem to get these errors en masse, particularly shortly after resuming work after a restart of my machine / BOINC, with and without having hit "pause for 1h" before shutdown. ERROR: Assertion `copy_pose.size() == native.size()` failed. MSG:the reference pose must be the same size as the working pose ERROR:: Exit from: ......srcprotocolsprotein_interface_designfiltersRmsdFilter.cc line: 323 06:28:45 (14172): called boinc_finish(0) Tasks: http://boinc.bakerlab.org/rosetta/result.php?resultid=1131628501 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131588448 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131588449 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131317068 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131002632 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131002634 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131002636 http://boinc.bakerlab.org/rosetta/result.php?resultid=1131002638 Edit: 3 more: http://boinc.bakerlab.org/rosetta/result.php?resultid=1132223866 http://boinc.bakerlab.org/rosetta/result.php?resultid=1132223881 http://boinc.bakerlab.org/rosetta/result.php?resultid=1132223998 |
13)
Message boards :
Number crunching :
Stalled downloads
(Message 92253)
Posted 25 Mar 2020 by rsNeutrino Post: Same problem with the following file, stuck at 2.61/3.00 KB. 25.03.2020 06:24:39 | Rosetta@home | Temporarily failed download of 9v1nm_gb_c1007_9mer_gb_000847.zip: transient HTTP error |
©2025 University of Washington
https://www.bakerlab.org