Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 295 · Next
Author | Message |
---|---|
Falconet Send message Joined: 9 Mar 09 Posts: 350 Credit: 1,105,396 RAC: 0 |
Now my pcs have "got 0 new tasks" of python wus, but in the queue there are over 5000 wus... I received 5 tasks just now. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
They have a mix of the old (8 GB) and new (3 GB) pythons. The old ones are "boinc_cages-Il", and the new ones are "aaxx-xxx". So maybe we will finish off the big ones at some point. |
trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0 |
I've been trying to run these Pyrhons starting aa.. and made some progress, after finding windows update had pre-stalled my VM 'manual start up' setting in services. No warning or anything useful. I had previously struggled with LHC due to this. Checking the operation out showed that BoincMgr was freezing,same with EFMER version, which I prefer [for visibilty and function] - but less good at access to Boinc lately, under password issues. I looked into the VM box setting and saw that the ram allocation was waning that max sertting would cause system lag- Problem, I couldn't mod the setting for a while till it suddenly went live. It seemed to reset back easily, though. In trying to sort out this, I lost half a dozen WU's after completing two ok but on restarting I no longer get units after inceasing my disk allocation a lot -as pointed out by others prioe to successful two runs.. I waited 24 hours after failures to see if finihing other work affected getting download better- but no change. I wonder if aborted work has led to blacklisting?? Annoying -as I lost 4 days of GPUGrid work in the process and spent hours sorting out the VM- which is pretty tricky to use. Any thoughts, Maestros? I never had issues over years with old RAH clients.. World Community Grid 03-12-2021 16:03 02:20:28 (01:44:22) 03-12-2021 16:04 MCM1_0185708_4424_2 74.30 Reported: OK + 7.61 Mapping Cancer Markers DESKTOP- Rosetta@home 03-12-2021 14:50 00:00:00 (00:00:00) 03-12-2021 14:51 aaam-SAR_pp-mPRO_pp-PIP-AMACBEN2_pp_12_2570219_1_0 0.00 Aborted (203) 1.03 rosetta python projects (vbox64) DESKTOP *** GPUGRID 03-12-2021 14:39 *RUN TIME 02d,01:17:02 (02d,01:55:03) 03-12-2021 14:41 e7s224_e1s376p0f362-ADRIA_BanditGPCR_APJ_b0-0-1-RND6911_1 0.956C + 1NV 100.00 Aborted (203) 2.19 New version of ACEMD (cuda1121) DESKTOP- Rosetta@home 03-12-2021 14:30 00:00:27 (00:00:00) 03-12-2021 14:31 aagb-PRO_pp-SAR-ACPenC13T-mB3PHG_pp_11_2697622_1_0 0.00 **Reported: Computation error (1,) 1.03 rosetta python projects (vbox64) DESKTOP- World Community Grid 03-12-2021 12:59 02:37:38 (02:27:14) 03-12-2021 13:01 World Community Grid 03-12-2021 11:07 03:03:09 (02:36:18) 03-12-2021 11:09 OPN1_0095285_00301_0 85.34 Reported: OK + 7.21 OpenPandemics - COVID 19 DESKTOP- Rosetta@home 03-12-2021 10:54 00:59:04 (00:54:55) 03-12-2021 10:56 aaam-mNMVAL_pp-FPR-mPHE- AMACBEN2_pp_4_2496370_1_0 92.97 ** Reported: Computation error (0,) 1.03 rosetta python projects (vbox64) DESKTOP- Rosetta@home 03-12-2021 09:52 03:53:41 (03:47:29) 03-12-2021 09:56 aaas-SAR-VAL_pp-NMVAL-SUGA_pp_12_2559723_1_0 97.35 Reported: OK * 1.03 rosetta python projects (vbox64) DESKTOP- Rosetta@home 03-12-2021 05:57 07:03:18 (06:52:27) 03-12-2021 05:57 aaap-PIP_pp-mNMPHE_pp-TIC-AMACBEN3_pp_0_2502770_1_0 97.44 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP- Rosetta@home 02-12-2021 22:06 00:00:00 (00:00:00) 02-12-2021 22:08 aagb-mNMVAL-mPHE-GPN-B3PHG_pp_12_2632874_1_0 0.00 **Aborted (203) 1.03 rosetta python projects (vbox64) DESKTOP-I Rosetta@home 02-12-2021 21:15 06:44:38 (06:22:05) 02-12-2021 21:17 aaas-PHE_pp-mTIC_pp-NMVAL-mSUGA_1_2517870_1_0 94.43 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 14 |
Well, I use only the <project_max_concurrent> not <max_concurrent>, Jean - I thought you might be on to something. But it was a fluke. I put <name> in app_config and I set the project_concurrent to 2 and then to 1, but that is being ignored. Still running 3. I guess RAH will do what it wants to do no matter what commands you give it, short of cutting resource share which looks like the only way to get it to 2 tasks and maybe at 25% to get it to 1. Because they still want to use/reserve 7629 MB per task which times 3 is 22,887 MB which is 90% of my memory. That is with the boinc_cages_IL tasks. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1978 Credit: 9,194,012 RAC: 3,787 |
Now my pcs have "got 0 new tasks" of python wus, but in the queue there are over 5000 wus... Uh, i cannot understand. In the pc profiles the phyton wus are "disable" (skip) but i don't change this option. |
Jonathan Send message Joined: 4 Oct 17 Posts: 43 Credit: 1,337,472 RAC: 0 |
You got Blacklisted |
Jonathan Send message Joined: 4 Oct 17 Posts: 43 Credit: 1,337,472 RAC: 0 |
trevG, if your computer has only 4Gb of RAM, you don't have enough to run the VM tasks. |
trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0 |
No wonder I struggled but it did 2 WU,s. But for my fault finding glitches it could have completed them all.. The VM box controls were far from intuitive. A warning would have helped, anyway.. Maybe I should stop all RAH work now as this seems the default..? As a by the by- what is the minimum RAM needed for this work? I probably won't upgrade -as other work seems ok, apart from LHC. 17+ hrs of completed work that has been missed from validation, 3 completed units: Rosetta@home 03-12-2021 09:52 03:53:41 (03:47:29) 03-12-2021 09:56 aaas-SAR-VAL_pp-NMVAL-SUGA_pp_12_2559723_1_0 97.35 Reported: OK * 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB Rosetta@home 03-12-2021 05:57 07:03:18 (06:52:27) 03-12-2021 05:57 aaap-PIP_pp-mNMPHE_pp-TIC-AMACBEN3_pp_0_2502770_1_0 97.44 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB (203) 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB Rosetta@home 02-12-2021 21:15 06:44:38 (06:22:05) 02-12-2021 21:17 aaas-PHE_pp-mTIC_pp-NMVAL-mSUGA_1_2517870_1_0 94.43 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB |
Jonathan Send message Joined: 4 Oct 17 Posts: 43 Credit: 1,337,472 RAC: 0 |
The current VM work is using about 3Gb for the work units starting with 'aa'. The 'boinc_cages_IL' is running about 6Gb. You should be able to run camb_boinc2docker VMs from Cosmology at home as those only use 2Gb. You just need to set your preferences to 1 or 2 for Max # CPUs so it only assigns one or two cores to each work unit and VM. It's probably best if you just stick to conventional Boinc work units as they behave better and share resources as they run at a lower priority. Virtual Box task run at a normal priority. |
trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0 |
Yes- I did some CAH units ok recently, first time using VM. I am annoyed about what happened on Rosetta though- and I can't be the first. Also, a message to stop trying would be a good idea - whilst wasting effort as even the apparent good units were ignored in the end! |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 185 Credit: 6,137,747 RAC: 2,401 |
Jean - I thought you might be on to something. But it was a fluke. Why does it work for me and not for you? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1631 Credit: 16,688,632 RAC: 8,774 |
Its windblows 7, opteron16 that has gone funkyI still haven't seen any mention of what your BOINC disk settings actually are. Use no more than ? GB Leave at least ? GB free Use no more than ? % of total 11 at once and it is getting the disk space moan again, except even after all that clear out, its got worse !!!??11 Python Tasks will require roughly 88GB of disk space. Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 14 |
Jean - I thought you might be on to something. But it was a fluke. That is the ultimate question. Talk me through it again...you had all that directory stuff in the text, but I don't have that. Whats the plain text version of all that? I have project name (boinc agrees), project_max_concurrent (no disagreement there), but it ignores those. I just aborted a stuck task and BOINC took 2 pythons to start. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 14 |
The current VM work is using about 3Gb for the work units starting with 'aa'. The 'boinc_cages_IL' is running about 6Gb. You should be able to run camb_boinc2docker VMs from Cosmology at home as those only use 2Gb. You just need to set your preferences to 1 or 2 for Max # CPUs so it only assigns one or two cores to each work unit and VM. Correction: Cages is 7,629.39 MB (i've been getting those quite a bit lately) |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Its windblows 7, opteron16 that has gone funkyI still haven't seen any mention of what your BOINC disk settings actually are. It was in Message 103670, Posted: 2 Dec 2021, 21:51:18 UTC . I had gone as far to untick all the disk space boxes to give it unlimited use of the disk at the moment I have 89GB free on drive C, and rosetta is using 98GB {greedy hog} And this afternoon I reduced my workunit cash down to 0.1 + 0.1 just to see what happens [it was 1 + 0.5] In the long run , it works , so whatever it is moaning about I can live with it |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 185 Credit: 6,137,747 RAC: 2,401 |
Are you putting them in the right app_config.xml file? I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory? /var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml <app_config> <project_max_concurrent>3</project_max_concurrent> </app_config>
I hope this is what you want. cat /etc/redhat-release: Red Hat Enterprise Linux release 8.5 (Ootpa) uname -r: 4.18.0-348.2.1.el8_5.x86_64 rpm -q boinc-client: boinc-client-7.16.11-3.el8.x86_64 # Two terabyte hard drive. [part of /etc/fstab] UUID=90309ec8-b1d3-4438-b983-f7ab121421a8 /D3P1 ext4 defaults 1 2 UUID=9bea9d6e-2f0d-4636-ac83-7fb9c0b2e108 /D3P2 ext4 defaults 1 2 UUID=8d57a006-8363-4dd0-abe4-d8f77fc15182 /var/lib/boinc ext4 defaults 0 0 <---<<< UUID=840e6522-89ff-4b81-9efa-33d97df3fb1e /home/guest xfs defaults 0 0 UUID=04a403c2-6199-4936-9ab7-1fe2ed25377e /D3P6 xfs defaults 0 0 UUID=c667796a-a283-4db1-9bb9-7f6ef0de9982 /D3P7 xfs defaults 0 0 Disk: Boinc will use the most restrictive of these settings; Use no more than 110 GBytes Leave at least 0.5 GBytes free Use no more than 85% of total. Memory [I have about 64 GBytes RAM] When computer is in use, use at most 80% When computer is not in use, use at most 90% Leave non GPU tasks in memory when tasks are suspended Page swap file: use at most 50% [100% = 16 Gigabytes] [1.5 Megabytes used] [/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# ls -l total 739232 82 Jul 1 15:29 app_config.xml <---<<< 4096 May 11 2020 database_357d5d93529_n_methyl 507570722 Nov 14 2020 database_357d5d93529_n_methyl.zip 0 Nov 14 2020 database_357d5d93529_n_methyl.zip.is_bad 352308 Nov 14 2020 LiberationSans-Regular.ttf 125232600 Nov 14 2020 rosetta_4.20_x86_64-pc-linux-gnu 123794008 Nov 14 2020 rosetta_graphics_4.20_x86_64-pc-linux-gnu [/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml <app_config> <project_max_concurrent>3</project_max_concurrent> </app_config> top - 23:33:34 up 23:55, 1 user, load average: 8.53, 8.55, 8.62 Tasks: 453 total, 10 running, 442 sleeping, 1 stopped, 0 zombie %Cpu(s): 0.4 us, 0.3 sy, 49.5 ni, 49.7 id, 0.0 wa, 0.1 hi, 0.1 si, 0.0 st MiB Mem : 63902.2 total, 1888.6 free, 9855.8 used, 52157.7 buff/cache MiB Swap: 15992.0 total, 15990.5 free, 1.5 used. 53269.4 avail Mem Boinc processes running; n.b.: I have no Rosetta tasks at the moment PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 11368 11311 boinc 39 19 T 1.4g 2.2 0.0 2 1260:36 /var/lib/boinc/projects/climateprediction.net/hadam+ 11370 11310 boinc 39 19 R 1.3g 2.1 99.3 6 1334:37 /var/lib/boinc/projects/climateprediction.net/hadam+ 11378 11376 boinc 39 19 R 1.3g 2.1 99.2 7 1343:57 /var/lib/boinc/projects/climateprediction.net/hadam+ 11374 11309 boinc 39 19 R 1.3g 2.1 99.3 2 1268:12 /var/lib/boinc/projects/climateprediction.net/hadam+ 89853 2604 boinc 39 19 R 759984 1.2 99.3 4 68:29.20 ../../projects/www.worldcommunitygrid.org/wcgrid_ar+ 72940 2604 boinc 39 19 R 758500 1.2 99.0 1 350:46.26 ../../projects/www.worldcommunitygrid.org/wcgrid_ar+ 90073 2604 boinc 39 19 R 153240 0.2 99.3 5 64:58.59 ../../projects/www.worldcommunitygrid.org/wcgrid_op+ 93694 2604 boinc 39 19 R 113052 0.2 98.9 0 11:48.64 ../../projects/www.worldcommunitygrid.org/wcgrid_op+ 91188 2604 boinc 39 19 R 73000 0.1 99.2 11 50:54.74 ../../projects/www.worldcommunitygrid.org/wcgrid_mc+ 2604 1 boinc 30 10 S 37876 0.1 0.7 10 5521:29 /usr/bin/boinc 11309 2604 boinc 39 19 S 18564 0.0 0.0 10 1:17.60 ../../projects/climateprediction.net/hadam4_8.52_i6+ 11310 2604 boinc 39 19 S 18468 0.0 0.0 10 1:19.52 ../../projects/climateprediction.net/hadam4_8.52_i6+ 11376 2604 boinc 39 19 S 17536 0.0 0.1 14 0:43.15 ../../projects/climateprediction.net/hadam4_8.52_i6+ 11311 2604 boinc 39 19 S 17084 0.0 0.0 10 0:27.48 ../../projects/climateprediction.net/hadam4_8.52_i6+ |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1631 Credit: 16,688,632 RAC: 8,774 |
I had gone as far to untick all the disk space boxes to give it unlimited use of the diskThe boxes aren't tickable, they require values. And one value in any one of the options overrides the values in any of the other two when it comes to what disk space is actually available. While people are now able to get & do more Python Tasks, i can see many more people leaving the project anyway. Rosetta 4.20 Tasks weren't exactly high payers with roughly 340 Credits for 8 hours of work (depending on the system). For the same time frame, Python Tasks only pay out around 130, roughly 2.5 times less. Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 14 |
Are you putting them in the right app_config.xml file? I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory? /var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml <app_config> <project_max_concurrent>3</project_max_concurrent> </app_config> ------------------------ Yep..for me its in boinc data/projects/boinc.bakerlab.org_rosetta. <name>rosetta python projects</name> <app_config> <project_max_concurrent>1</project_max_concurrent> (it could be 2 but not 3) </app_config> And its back to 3 at a time again. Resource share is at 100%. I'm thinking I will have to bring it back to 50 again. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 14 |
Oh man....how dumb can I be. I forgot to change the type from text to xml!!!! SMH Now that it is a xml file it works. Gees That's what happens when your in a rush in the middle of the night. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 185 Credit: 6,137,747 RAC: 2,401 |
Oh man....how dumb can I be. I forgot to change the type from text to xml!!!! That is not dumb. That is just part of being human, and you do not even need to be forgiven for that. There are too many people who seem to have lost their humanity. They seem to me that they end up in government and upper management of large corporations. I do not know your age,. but I am more than three score and ten and I can tell you it gets worse with age. So do not insult yourself. Forgive yourself if you must. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org