1)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 110042)
Posted 11 days ago by Jean-David Beyer Post: ... they're sitting there (headless for the most part) doing nothing but running boinc. My Linux machine runs lots of processes. It has 16 cores and 128 GBytes of RAM. As fare as Boinc is concerned, the main process is the Boinc Client. It uses very little RAM and very little CPU time. From time-to-time, the boinc client sends a message a Boinc server and asks for work. The server send a reply complaining it cannot find any work, or a bunch of messages describinb the files the client hould download. In the latter case, the client downloads the files in the proper places. Then if the client has spare cores, it selects one and forks off a process to run it. So let us say there are no Boinc tasks running, the client has just received a task from the Rosetta server. The client then fork off the Rosetta task. top - 19:12:56 up 16 days, 8:42, 2 users, load average: 13.38, 13.32, 13.29 Tasks: 483 total, 14 running, 469 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.3 sy, 80.6 ni, 18.0 id, 0.0 wa, 0.2 hi, 0.1 si, 0.0 st MiB Mem : 128086.0 total, 5047.0 free, 7395.4 used, 115643.6 buff/cache MiB Swap: 15992.0 total, 15687.0 free, 305.0 used. 116733.0 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 3176351 2043 boinc 39 19 R 596760 0.5 99.0 13 10:12.79 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 3161135 2043 boinc 39 19 R 581420 0.4 99.3 2 121:33.16 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 3111703 2043 boinc 39 19 R 541240 0.4 99.1 9 455:40.07 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 3163687 2043 boinc 39 19 R 481148 0.4 99.2 10 103:13.41 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 3144411 2043 boinc 39 19 R 443480 0.3 99.1 6 233:56.51 ../../projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_x86_64-pc-linux-+ 2043 1 boinc 30 10 S 54708 0.0 0.1 8 300278:26 /usr/bin/boinc 3171024 2043 boinc 39 19 R 39676 0.0 99.3 4 48:38.05 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3166711 2043 boinc 39 19 R 39668 0.0 99.3 11 80:07.82 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3171561 2043 boinc 39 19 R 39584 0.0 99.2 0 44:34.46 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3167425 2043 boinc 39 19 R 39520 0.0 99.3 7 75:58.11 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3176944 2043 boinc 39 19 R 39172 0.0 99.4 15 5:33.72 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3172039 2043 boinc 39 19 R 39116 0.0 99.3 3 41:39.57 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc+ 3176627 2043 boinc 39 19 R 36824 0.0 99.4 1 8:20.14 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_i686-pc-l+ 3141011 2043 boinc 39 19 R 29944 0.0 99.3 5 258:04.99 ../../projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_x86_64-pc-linux-+ Pid is the process Id, PPID is the PID of the process's parent. Pid 1 is the process that starts the parent of all other processes. One of the processes it starts is Pid 2043 that is my Boinc Client, /usr/bin/boinc This client starts all the others. |
2)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109853)
Posted 13 Oct 2024 by Jean-David Beyer Post: There is a new batch of Beta work out.. My latest of these are taking 2.3G to 2.5G each on my Linux machine. I allow 4 Rosetta tasks to run at a time. IIRC, they take 8 to 9 hours of wall clock time to run. Computer 5910575 Computer information CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.22.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 Memory 128085.97 MB Cache 16896 KB Swap space 15992 MB Total disk space 488.04 GB Free Disk Space 479.37 GB |
3)
Message boards :
Number crunching :
Shorter WU deadlines
(Message 109817)
Posted 7 Oct 2024 by Jean-David Beyer Post: your deadlines are so short that my computer often gets about 90% of the work done then times out Not my experience. Mine usually are set to take 8 hours, and they complete on time My main machine: CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.22.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 Memory 128085.97 MB Cache 16896 KB Swap space 15992 MB Total disk space 488.04 GB Free Disk Space 479.37 GB Here are two recent ones: 1584293409 1409564820 5 Oct 2024, 20:24:18 UTC 7 Oct 2024, 9:35:43 UTC Completed and validated 28,281.62 27,959.69 546.71 Rosetta v4.20 x86_64-pc-linux-gnu 1584293410 1409564822 5 Oct 2024, 20:24:18 UTC 7 Oct 2024, 6:04:54 UTC Completed and validated 28,544.62 28,078.41 428.56 Rosetta v4.20 x86_64-pc-linux-gnu |
4)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109797)
Posted 1 Oct 2024 by Jean-David Beyer Post: Ouch! At first glance this beta does not seem to play well with my processors. The first three I got ran just fine. My machine is running Red Hat Enterprise Linux release 8.10 (Ootpa) us ing kernel 4.18.0-553.22.1.el8_10.x86_64 1583987342 1409332410 1 Oct 2024, 10:50:55 UTC 1 Oct 2024, 18:23:18 UTC Completed and validated 26,366.61 25,872.01 369.98 Rosetta Beta v6.06 x86_64-pc-linux-gnu 1583987355 1409332393 1 Oct 2024, 10:50:55 UTC 1 Oct 2024, 18:26:58 UTC Completed and validated 27,345.06 26,815.76 383.71 Rosetta Beta v6.06 x86_64-pc-linux-gnu 1583987363 1409332409 1 Oct 2024, 10:50:55 UTC 1 Oct 2024, 18:23:18 UTC Completed and validated 26,759.98 26,251.39 375.50 Rosetta Beta v6.06 x86_64-pc-linux-gnu |
5)
Message boards :
Number crunching :
Rosetta Beta 6.00
(Message 109586)
Posted 17 Aug 2024 by Jean-David Beyer Post: Same problem on my Windows 11 machine. |
6)
Message boards :
Number crunching :
Rosetta Beta 6.00
(Message 109584)
Posted 16 Aug 2024 by Jean-David Beyer Post: Me too. It downloads me 17 tasks at a time, and they all error out almost immediately. This on my Linux machine. I just set to get no more tasks. While not identical, they all fail in a similar way. Here is one of them: Task 1581485824 Name hal_8a_q_hal_8aa_3jp3179_d157_2_0001_1_SAVE_ALL_OUT_2979144_1_1 Workunit 1407312834 Created 16 Aug 2024, 15:02:43 UTC Sent 16 Aug 2024, 15:08:33 UTC Report deadline 19 Aug 2024, 15:08:33 UTC Received 16 Aug 2024, 21:53:02 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x00000001) Unknown error code Computer ID 5910575 Run time 19 sec CPU time 1 sec Validate state Invalid Credit 0.00 Device peak FLOPS 6.06 GFLOPS Application version Rosetta Beta v6.06 x86_64-pc-linux-gnu Peak working set size 120.74 MB Peak swap size 237.00 MB Peak disk usage 23.86 MB Stderr output <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.06_x86_64-pc-linux-gnu @hal_8a_q_hal_8aa_3jp3179_d157_2_0001_1.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1949704 Using database: database_f5ae1de8e1/database ERROR: Unable to find desired residue 'LEU' with variant 'SIDECHAIN_CONJUGATION'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'LEU' and base ResidueType. Was attempting to add new variant type 'SIDECHAIN_CONJUGATION' ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980 BOINC:: Error reading and gzipping output datafile: default.out 11:11:55 (1424400): called boinc_finish(1) </stderr_txt> ]]> |
7)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109521)
Posted 3 Aug 2024 by Jean-David Beyer Post: And I see one of our favourites, Universe@Home has also been down for a long time. I dropped MilkyWay and Universe. I forget which was which. Both of them awarded way too much credit for the small amount of work done. And that was embarrassing. But that is not why I dropped them. One of them does only GPU work now, and I refuse to run Boinc on my GPU. I think the sponsor of the other died, or something like that, and they no longer send out work. |
8)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109499)
Posted 30 Jul 2024 by Jean-David Beyer Post: Just switch to WCG while you wait. (or any other BOINC project) Denis has no work. CPDN has no work for Linux machines. WCG has work from time to time only for MCM1, but not the other four projects they pretend to support. |
9)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109301)
Posted 27 May 2024 by Jean-David Beyer Post: Jean-David Beyer wrote: I could do it a little more effidiently than that. It ran with 4 modules for many months. The problem occurred when I added 4 new modules. So I took out all 4 new modules and the problem went away. I put in two of the new modules and still no problems. I moved those two new modules to the other two memory slots (was it a slot problem or a module problem?) and it still worked, So I put another new module in and it still worked, so it was probably the last new module. And so it proved. |
10)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109293)
Posted 26 May 2024 by Jean-David Beyer Post: Run Memtest on the system to see if there is an issue with the memory, most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse. I have not run memtest in years. Back when I had 8 GBytes of RAM and dual Intel Xeon processors, it took almost a day to run memtest. Now that this machine has 128 GBytes of RAM, it would probably take over a week to run it. This machine has 8 memory modules, and when I raised it from 64 GBytes to 128 GByte it was a little flakey, but it was pretty easy to find which module it was and the RAM supplier replaced it free of charge. As far as RosettaVS tasks are concerned, I have only two of them waiting to start out of 22 tasks on the machine. At times, half of the tasks on my machine have been RosettaVS, and sometimes two of them have run at the same time. Right now, I have one Rosetta 4.20 Task waiting to run. The biggest tasks I have run have been CPDN like this one: Task 22317868 Name oifs_43r3_bl_a4ck_2016092300_15_991_12212423_2 Workunit 12212423 Created 15 Apr 2023, 5:23:15 UTC Sent 15 Apr 2023, 5:24:02 UTC Report deadline 14 Jun 2023, 5:24:02 UTC Received 15 Apr 2023, 12:23:18 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 6 hours 18 min 49 sec CPU time 6 hours 13 min 2 sec Validate state Valid Credit 1,813.14 Device peak FLOPS 6.06 GFLOPS Application version OpenIFS 43r3 Baroclinic Lifecycle v1.11 x86_64-pc-linux-gnu Peak working set size 5,592.19 MB Peak swap size 5,930.79 MB Peak disk usage 1,277.90 MB |
11)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109224)
Posted 3 May 2024 by Jean-David Beyer Post: Server is still dead. It seem mostly up for me. top - 20:51:09 up 2 days, 12:17, 2 users, load average: 13.33, 13.65, 13.72 Tasks: 474 total, 14 running, 460 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.2 sy, 80.3 ni, 18.4 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 128074.1 total, 33544.1 free, 6219.7 used, 88310.2 buff/cache MiB Swap: 15992.0 total, 15992.0 free, 0.0 used. 120200.2 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 469545 2039 boinc 39 19 R 1.4g 1.2 98.8 15 287:51.62 ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-li+ 504299 2039 boinc 39 19 R 444456 0.3 98.8 5 26:25.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 482867 2039 boinc 39 19 R 213072 0.2 98.6 13 208:50.81 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 504592 2039 boinc 39 19 R 212384 0.2 99.1 6 24:10.34 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 2039 1 boinc 30 10 S 73336 0.1 0.1 6 44900:08 /usr/bin/boinc |
12)
Message boards :
Number crunching :
Whis is this new Rosetta Beta work?
(Message 109210)
Posted 29 Apr 2024 by Jean-David Beyer Post: Is this a new form of the Rosetta program or just a new structure or new way of working on a protein? This is very confusing. My db looks like this, which is way older than anything you discuss. But Rosetta Beta 6.05 tasks work just fine, both the 7a_hal_... ones and the newer ones. And nowhere on my machine do numbers like 2022.33 or 2024.16 appear.: /var/lib/boinc/projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database]$ ls -l total 16 drwxr-xr-x. 2 boinc boinc 6 Mar 20 2020 additional_protocol_data drwxr-xr-x. 13 boinc boinc 4096 Mar 20 2020 chemical drwxr-xr-x. 2 boinc boinc 35 Mar 20 2020 citations drwxr-xr-x. 3 boinc boinc 21 Mar 20 2020 external drwxr-xr-x. 2 boinc boinc 103 Mar 20 2020 gpu drwxr-xr-x. 3 boinc boinc 106 Mar 20 2020 input_output drwxr-xr-x. 2 boinc boinc 74 Mar 20 2020 membrane drwxr-xr-x. 6 boinc boinc 84 Mar 20 2020 protocol_data drwxr-xr-x. 9 boinc boinc 4096 Mar 20 2020 rotamer drwxr-xr-x. 10 boinc boinc 4096 Apr 5 2020 sampling drwxr-xr-x. 15 boinc boinc 4096 Apr 5 2020 scoring drwxr-xr-x. 8 boinc boinc 172 Mar 20 2020 sequence drwxr-xr-x. 3 boinc boinc 20 Mar 20 2020 symmetry drwxr-xr-x. 2 boinc boinc 50 Mar 20 2020 utilities drwxr-xr-x. 6 boinc boinc 130 Mar 20 2020 virtual_enzymes |
13)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109207)
Posted 28 Apr 2024 by Jean-David Beyer Post: OK. I now have three of the RosettaVS_ Tasks and they are as you say. Since I have 128 GBytes of RAM, I do not expect problems. Application Rosetta Beta 6.05 Name RosettaVS_SAVE_ALL_OUT_NOJRAN_KCa2_homology_fulldb_IGNORE_THE_REST_vF8nFW_8_1999_2977959_2 Estimated computation size 80,000 GFLOPs Virtual memory size 1.19 GB Working set size 1.03 GB Progress rate 10.440% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? |
14)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109203)
Posted 28 Apr 2024 by Jean-David Beyer Post: in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks. Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? Application Rosetta Beta 6.05 Name 7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67 State Running Received Fri 26 Apr 2024 02:37:53 AM EDT Report deadline Mon 29 Apr 2024 02:37:53 AM EDT Estimated computation size 80,000 GFLOPs CPU time 05:15:37 CPU time since checkpoint 00:17:21 Elapsed time 05:19:11 Estimated time remaining 02:44:47 Fraction done 65.667% Virtual memory size 468.18 MB Working set size 364.18 MB Directory slots/11 Process ID 2777585 Progress rate 12.240% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu |
15)
Message boards :
Cafe Rosetta :
No Work Available
(Message 109181)
Posted 25 Apr 2024 by Jean-David Beyer Post: That said, it seems quite hard to believe no-one there had already noticed - especially when the whole site goes down <sigh> It is now 19:24 EDST and servers all seem to be up. |
16)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109157)
Posted 23 Apr 2024 by Jean-David Beyer Post: I've got 15 tasks returned after deadline and they've all validated and credited. So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy. |
17)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 109106)
Posted 12 Apr 2024 by Jean-David Beyer Post: Anyone else notice the entire website went down again today? Yes, but it is up right now and I just got a bunch of tasks -- Rosetta Beta 6.05. |
18)
Message boards :
Number crunching :
Vote for BOINC!
(Message 109040)
Posted 26 Mar 2024 by Jean-David Beyer Post: BOINC is a finalist for an notable award, and needs votes (by Sunday): Done. |
19)
Message boards :
Number crunching :
Why no work on my computer since Feb 17, 2024
(Message 109037)
Posted 25 Mar 2024 by Jean-David Beyer Post: There are however far more projects than those four you mentioned doing a real science, so perhaps you should try to find one or two more as basically you have only one reliable project in your list (Einstein). Milkyway, which you have crunched for in the past, is doing pretty well recently if you don't mind multithreaded WUs. I ran some project that did multithreaded work. IIRC it ran four threads at a time. But it confused my Boinc client and OS ( forget the details) . So I stopped taking those. And some project does multithreaded tasks but I have to set up a special environment for it and I am not willing to do that. Ages ago, I was running Seti@home, ClimatePrediction, WCG, and Malaria. I know Seti@home quit, followed by Malaria. But at the time they provided plenty of work for the computers I was running. One had two hyperthreaded Xeon processors. I added Rosetta at some point to keep the work up. Most recently I added Universe and MilkyWay. These did not interest me all that much. One of them now sends out only GPU work; the other's manager died and is sending out no more work. If CPDN ever starts sending out Linux work (tasks usually take a week or so to run), and if WCG ever get their act together, I should be fine. |
20)
Message boards :
Number crunching :
Why no work on my computer since Feb 17, 2024
(Message 109030)
Posted 25 Mar 2024 by Jean-David Beyer Post: So if he wants to run 4 or more projects simultaneously, which is in general nothing wrong, I think the best results he will get with 0 additional days (once everything settles down), as that eliminates any piggybacking of requests for new work on scheduler requests to report completed tasks, so BOINC will always make separate request for work and when doing that, choose exactly the project that should get more work to respect the ressource setttings. My machine has 16 cores, but I currently allow only 13 for Boinc work. I have 128 GBytes of RAM, and 450 GBytes of disk allocated to Boinc client. I am running ClimatePrediction, WCG (all 5 tasks), Denis, Rosetta, and Einstein. Settings are for 1 days work and 1 day additional. Trouble is that none of them have any work to supply and my machine is running only 4 tasks. 2 Einstein and 2 Rosetta. I am beginning to think that WCG is completely mismanaged and has been for two years or so. Of their 5 "active" projects. only MCM1 ever delivers work, and I have none. CPDN has supplied me no work since last June. Denis supplies work only about once a week, and I cannot allow a week's worth of work from them because their deadline is only about 3 days. My guess is that other than server mismanagement, researchers are just using less and less distributed computing, even though free, and running on large local computing systems. And as these large local systems get larger, faster, and cheaper, the whole Boinc system will just wither away. |
©2024 University of Washington
https://www.bakerlab.org