Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 306 · Next
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Instead of being an idiot, be informative and post a link or a quote. I gave you the answer, straight into your hand. And you still weren't able to figure it out. (I agree that information is hard to find sometimes - that is why you should look at the topics first. If you don't have time, I don't think you can expect someone else to do it for you.) |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
What you said was it has "something" to do with "decemeber 25" Well what exactly is something? and what does decemeber 25 have to with anything when this problem goes back to the 18th or even earlier? If you have time..go find a specific post and put the link here. I do that for people when I have time and know the specific answer from another post. Generalization does nothing for me. That's what you offered. "Something" Here is specific detailed info that I see on my single project graph. All other projects are 100% equal share with Rosetta. (default setting) Rosetta is losing credits and should be trying to make them up but does not. Credit stays normal until on the 17th and dips for a day and the corrects itself until the 21st and after that all downhill. So how does this go with your "something on the 25th"? 17 and 21 predate the 25th. [/img] |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Amazing!! 1/15/2019 7:24:26 PM | Rosetta@home | Sending scheduler request: To fetch work. 1/15/2019 7:24:26 PM | Rosetta@home | Requesting new tasks for CPU 1/15/2019 7:24:29 PM | Rosetta@home | Scheduler request completed: got 29 new tasks |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them. Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1. Any suggestions would be gratefully received. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each. Ouch, I know my free memory sometimes goes down to 2 or 3% but I hadn’t thought of it going negative. Thanks for the suggestion, I’ll look at getting another 8gb and maybe some more for the FX rig as well, that only has 4gb for the 4 cores. Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement? Yes, all the WCG ones that I know of have a pretty small memory requirement. The biggest is MIP, which is around 300 MB. But you probably aren't always running an equal proportion of Rosetta and WCG. The BOINC scheduler does strange things, and may give you all Rosetta once in a while. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,334,484 RAC: 3,524 |
I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them. You might check if decreasing the time that workunits can run on your computers to ten hours has any effect on this. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them. The time was set to the default of 8 hours so I don’t know why these were taking 12 hours anyway but I reset it to 6 hours yesterday to try to reduce the loss when a wu errored out in this way. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement? I monitor them fairly closely and I’m fairly sure the the ryzen was running 6 and 6 at the time. The FX had just come out of a period where it was running all Rosetta for a few days to catch up after running all WCG for a while but the ryzen completed 46 WCG WUs that day which is about normal and was equal every time I looked. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
OK, extra memory ordered for both machines so we’ll see if that sorts it. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
Whilst I’m here, a silly question if I may. Is there any way of changing the View Tasks page from sorting by date sent to sorting by date returned? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Whilst I’m here, a silly question if I may. I take it you are asking about the website? ...and not the BOINC Manager tasks page. I am not aware of a way to define how to present the web page. Rosetta Moderator: Mod.Sense |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
Ok, thanks. It was worth asking :-) |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,221,036 RAC: 12,268 |
OK, extra memory ordered for both machines so we’ll see if that sorts it. It seems like Rosetta gets into a state where it consumes 1gb+ per WU. I am running 35 WU and there is always a couple taking over a gb. I watch the difference between CPU and RUN times and swap used. As long as the swap used is very low, you are probably not running into memory problems. I tend to buy more GB of memory than threads. I originally got my 36 thread machine with 32GB and that was not enough. You can see that 19gb of my swap space has been used even though the machine has 64gb installed for the 36 threads. 19gb swap space used is concerning. Based on over a thousand jobs each, the credit difference between the 64-bit Rosetta WU and Minirosetta 32-bit WU is negligible. 44.0 credits/CPU hr for Rosetta 4.08 and 45.7 credits/CPU hr. top ic .... sorted by memory use. top - 10:55:55 up 1 day, 18:24, 0 users, load average: 40.22, 36.72, 36.27 Tasks: 524 total, 37 running, 487 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 1.4 sy, 96.6 ni, 1.1 id, 0.0 wa, 0.4 hi, 0.1 si, 0.0 st MiB Mem : 64090.7 total, 1051.2 free, 16283.5 used, 46756.0 buff/cache MiB Swap: 32112.0 total, 32093.0 free, 19.0 used. 45874.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24590 boinc 39 19 1722808 1.5g 75400 R 98.3 2.5 219:01.25 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+ 25349 boinc 39 19 1384300 1.2g 75400 R 99.3 1.9 198:57.60 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+ 22988 boinc 39 19 838204 723668 75400 R 97.7 1.1 259:32.35 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_14_1536_1929__t000__0_C1+ 24878 boinc 39 19 706928 592640 75784 R 99.3 0.9 211:30.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1805_1950__t000__0_C1+ 15222 boinc 39 19 605140 491200 76104 R 99.0 0.7 459:54.12 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+ 20625 boinc 39 19 605492 491108 75400 R 99.3 0.7 319:46.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1808_1947__t000__0_C1+ 16439 boinc 39 19 583112 468876 75784 R 97.4 0.7 428:23.20 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+ 24082 boinc 39 19 583664 465920 68044 R 99.3 0.7 231:28.63 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1805_1950__t000__ab_robetta -in:file:boinc_wu_zip+ 17334 boinc 39 19 575680 457680 68620 R 99.3 0.7 404:59.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 22280 boinc 39 19 543464 425512 68556 R 99.7 0.6 276:44.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 19901 boinc 39 19 533536 415428 68556 R 99.7 0.6 338:55.09 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 22209 boinc 39 19 530860 413260 68236 R 99.3 0.6 278:15.90 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1808_1947__t000__ab_robetta -in:file:boinc_wu_zip+ 25711 boinc 39 19 523612 408668 70668 R 99.3 0.6 190:02.19 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0001_fold_and_dock_flags -silent_gz -mute all -ou+ 21481 boinc 39 19 521072 406132 70604 R 99.3 0.6 297:12.91 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0005_fold_and_dock_flags -silent_gz -mute all -ou+ 17873 boinc 39 19 516024 398184 68620 R 99.3 0.6 391:55.17 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 15374 boinc 39 19 511956 394116 68556 R 99.3 0.6 455:50.04 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 30825 boinc 39 19 509260 391232 68620 R 99.3 0.6 78:42.82 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 14998 boinc 39 19 508228 390160 68620 R 98.0 0.6 465:27.01 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 18209 boinc 39 19 503324 385500 68620 R 99.0 0.6 383:28.39 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 31538 boinc 39 19 500516 382744 68620 R 99.3 0.6 60:22.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ |
Juha Send message Joined: 28 Mar 16 Posts: 13 Credit: 705,034 RAC: 0 |
19gb swap space used is concerning. 19 GB would indeed be a lot of swap in use but haven't you got the unit wrong? It looks like 19 MB to me. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2132 Credit: 41,490,422 RAC: 17,755 |
You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each. Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks. I can't recall the tasks involved. Right now I'm back to my more usual level of 7.74Gb in use |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,221,036 RAC: 12,268 |
19gb swap space used is concerning. DOH! You are obviously correct. I got units of GB dancing in my head. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 397 Credit: 12,285,463 RAC: 11,195 |
As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday. I'll monitor going forward and report back. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday. Yes, we are back to 8086 tasks ready to send according to the server status page which actually means 0 tasks ready to send. Maybe the admins should investigate, what those 8086 tasks are and if they eventually cause the issues. . |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org