Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 293 · 294 · 295 · 296 · 297 · 298 · 299 . . . 302 · Next
Author | Message |
---|---|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,431,332 RAC: 5,665 |
There is a new batch of Beta work out.. My latest of these are taking 2.3G to 2.5G each on my Linux machine. I allow 4 Rosetta tasks to run at a time. IIRC, they take 8 to 9 hours of wall clock time to run. Computer 5910575 Computer information CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.22.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 Memory 128085.97 MB Cache 16896 KB Swap space 15992 MB Total disk space 488.04 GB Free Disk Space 479.37 GB |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
You made me look.After about half an hour mine end up down around 700-800MB. But when they first start after 5min or so they're still up around 2GB+ before dropping down again. When I wrote that I was on my 6-core machine. More odd is now I've returned to my 8C/16T machine I found it was running only 8 tasks plus 2 waiting for memory and only 1 WCG running - 7 cores idle, Checking task manager all Rosetta tasks were again only using 3-400Mb each and I had a lot of RAM spare too - what happened to the other cores, I don't know. I suspended all Rosetta tasks, letting SiDock get its turn - 16 tasks very low RAM - then when they finished, priority went back to Rosetta and all 16 threads started running tasks again. Some funny business going on somewhere... At least I created a little space to download more of the remaining few Rosetta tasks available. Not many left now. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
You made me look.After about half an hour mine end up down around 700-800MB. But when they first start after 5min or so they're still up around 2GB+ before dropping down again. Got up to find only 6 Rosetta tasks running, plus 4 waiting for memory and 6 threads idle, while RAM is at 65% used and 5.5Gb free 5 of the tasks are using 310-440Mb, only one using 2.122Gb This is very odd Edit: This is getting even weirder. Having set NNT for all projects, I suspended all the <non-running> Rosetta tasks first and immediately the 6 idle threads started running 6 SiDock tasks. Those Rosetta tasks waiting for memory were still waiting for memory. Then I suspended all the running & waiting for memory Rosetta tasks and all 16 threads are now running SiDock tasks. My intention, as before, was to free up my non-Rosetta offline cache to allow more Rosetta into the cache, then return to Rosetta tasks, restarting them one at a time which seemed to allow all 16 threads to run Rosetta the last time. Why suspending unstarted tasks allowed the idle threads to be utilised, I have no idea. Never seen that before, |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
Got up to find only 6 Rosetta tasks running, plus 4 waiting for memory and 6 cores idle, while RAM is at 65% used and 5.5Gb freeVery, very odd. Most of my Tasks are now using around 2GB of RAM, even after running for a few hours. I'd suggest checking your "When and how BOINC uses your computer" preferences. These are mine- the most likely to be causing issues- the Memory preferences. Is "Leave non-GPU tasks in memory while suspended" selected? And low "Use at most preferences" would also cause issues. Computing Usage limits Use at most 100 % of the CPUs Use at most 100 % of CPU time When to suspend Suspend when computer is on battery No Suspend when computer is in use No Suspend GPU computing when computer is in use No 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 0.35 days of work Store up to an additional 0.01 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than 30 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 98 % Leave non-GPU tasks in memory while suspended No Page/swap file: use at most 75 % Grant Darwin NT |
Klimax Send message Joined: 27 Apr 07 Posts: 44 Credit: 2,800,788 RAC: 736 |
Also only three days of deadline?Which has been the case for years now, and which is why you don't return a large amount of the work you download- you miss the deadline almost 50% of the time. While on surface or first pass you'd be correct, things are bit more complex (Side note: My question about deadlines shows how long ago I fully paid attention to project). First, variable memory consumption. Often several tasks are waiting for others to finish or to drop currently allocated memory (Seems that nlike others, I got tasks that keep all 2GB allocated even way later). Second, BOINC or Rosetta has very weird accounting of remaining time to completion. (Lots of task are now around 50% mark yet estimated time to completion is still 12 hours, while waiting tasks have 8 hours. (And yes, my target runtime is 12 hours) All those canceled were in any case from beginning of month were computer was configured for all 20 cores to be used and ran out of memory very fast making lost of tasks waiting for memory and thus lots of cancellations. I have since then changed configuration to use only 10 cores, so same situation shouldn't occur. BTW: My cache is configured for 1+1. As long as computer is running 24h there soul be a day of reserve. ETA: Looks like some tasks finally dropped in memory usage to 1GB and one to 700MB. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
BTW: My cache is configured for 1+1. As long as computer is running 24h there soul be a day of reserve.Setting it that way may not give you what you might expect it to. If you want 2 days worth, then set it to 2+ 0.01. Those additional days are just that- additional days. They will only be added on when the cache gets low enough to reach the "Store at least value" and it needs to be topped up. Then it will also top up the additional day, which will then run down again until the "Store at least value" is reached again. With it set to 1+1 you will get one day's worth, plus another day's worth, but then the cache will run down to just under 1 day's worth, then it will refill the 1 day & then re-fill the second additional day. With it set to 2+ 0.01 as it returns a Task, it will download another to keep the cache at the 2 days level. Grant Darwin NT |
tgbauer Send message Joined: 5 Jan 06 Posts: 10 Credit: 101,605,136 RAC: 74,291 |
Looks like Application "Rosetta Beta 6.06" tasks are using 2.5GB of RAM each! That becomes a bit inefficient when have 128 cores in a computer and 128GB RAM (only 46/128 cores used). Ones before that and "Rosetta 4.20" are consuming less than 0.5GB (and all 128 cores used). Is it possible to limit the RAM usage per task, so can consume all cores again? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
Is it possible to limit the RAM usage per task, so can consume all cores again?No. As mentioned in the previous posts, the high RAM usage is generally only for the first 30min or so. After that, it drops down to 1GB or less (although there can be some Tasks where it goes up to 2GB per Task for a while later on, before dropping down again- my current Tasks after 4 hours are using around 800MB each). Grant Darwin NT |
Bill Swisher Send message Joined: 10 Jun 13 Posts: 36 Credit: 33,183,499 RAC: 43,338 |
[No. As mentioned in the previous posts, the high RAM usage is generally only for the first 30min or so. After that, it drops down to 1GB or less (although there can be some Tasks where it goes up to 2GB per Task for a while later on, before dropping down again- my current Tasks after 4 hours are using around 800MB each). Then I have no choice but to NOT run Rosetta beta and I don't see an option to turn the beta work off. If I could limit the number (say 4) of them running it would be possible. So I guess I'll leave Rosetta turned off. As I said, 16c/32t and 32GB of memory (I pretty much build all the computers now days with 32GB of memory) and I'm physically away from the computers for 5 months out of the year. Well it was fun while it lasted. |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
You can limit the number running using app_config.xml file in Rosetta's project directory. Create/amend then from menu select Read config files. Paul. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
Well, at least it's been a while since the last time. boinc-process host is down again, so no Validation until it lives again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
Well, at least it's been a while since the last time.And now the download server has died as well. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
Got up to find only 6 Rosetta tasks running, plus 4 waiting for memory and 6 cores idle, while RAM is at 65% used and 5.5Gb freeVery, very odd. My only settings that are more restrictive are Disk 50% Memory in use 85% Memory not in use 95% More likely it's that I had a faulty RAM stick the other month so I'm only running with 16Gb RAM rather than 32GB |
Klimax Send message Joined: 27 Apr 07 Posts: 44 Credit: 2,800,788 RAC: 736 |
BTW: My cache is configured for 1+1. As long as computer is running 24h there soul be a day of reserve.Setting it that way may not give you what you might expect it to. After quick verification on another project... damn you are correct. I was misunderstanding that option for past 17 years or so. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
BTW: My cache is configured for 1+1. As long as computer is running 24h there soul be a day of reserve.With it set to 1+1 you will get one day's worth, plus another day's worth, but then the cache will run down to just under 1 day's worth, then it will refill the 1 day & then re-fill the second additional day. I also misunderstood it for a decade or more, but in the end I decided I <did> actually want somewhere between the minimum and maximum amount of days and didn't really care where I was as long as I was in that general area. My target actually hovered between 1 and 1.5 days total back then, but more recently I've found it more appropriate for me to halve that so no one project runs away with itself too far when Rosetta tasks sometimes become available. As they have in the last hour or so. The trouble is, as well as the boinc-process server being down, so is the download server, boinc-files.bakerlab.org so the necessary files are failing atm. Fingers crossed someone notices |
tgbauer Send message Joined: 5 Jan 06 Posts: 10 Credit: 101,605,136 RAC: 74,291 |
high RAM usage is generally only for the first 30min or so. After that, it drops down to 1GB or less This is not my experience. Have beta 6.06 tasks that are currently near 50% complete and RAM usage is between 2.26GB and 2.50GB each (1.7GB to 2.2GB compressed). Sounds like limiting the Rosetta count is only recourse because RAM to CPU ratio is so far off, can't prioritize the more RAM efficient tasks, and swapping causes tasks to take 10x longer. |
mmonnin Send message Joined: 2 Jun 16 Posts: 59 Credit: 24,222,307 RAC: 83,030 |
high RAM usage is generally only for the first 30min or so. After that, it drops down to 1GB or less I agree, I have high RAM usage the entire time in Linux. A Win10 system had lower RAM usage then Linux and I could run 100% R@H with 2GB ram per thread and be my primary desktop. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
The trouble is, as well as the boinc-process server being down, so is the download server, boinc-files.bakerlab.org so the necessary files are failing atm. Looks like it was fixed 3 or 4 hours later. By the time I got back from work to do something about it all the few remaining tasks had been snapped up again <sigh> |
Jonathan Send message Joined: 31 Jul 24 Posts: 2 Credit: 109,443 RAC: 1,603 |
Hi, I had to abort a couple of Rosette beta workunits from my arm64 linux (RPi5) machine as they made the machines unresponsive. Possibly they were memory constrained with 4 cores and 4Gb of memory, but whatever the reason the machine became unresponsive to ssh 1584608775 1409829153 6297726 13 Oct 2024, 9:04:25 UTC 17 Oct 2024, 9:03:56 UTC Aborted 44.46 36.32 --- Rosetta Beta v6.06 aarch64-unknown-linux-gnu 1584607687 1409833590 6297726 13 Oct 2024, 9:04:25 UTC 17 Oct 2024, 9:03:56 UTC Aborted 4.78 0.00 --- Rosetta Beta v6.06 aarch64-unknown-linux-gnu Stderr output <core_client_version>7.20.5</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.06_aarch64-unknown-linux-gnu @SETDB1_8UWP_boinc_fulldb_6hkEP2_0_3936.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 Using database: database_f5ae1de8e1/database Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. </stderr_txt> ]]>[/code] |
Jonathan Send message Joined: 31 Jul 24 Posts: 2 Credit: 109,443 RAC: 1,603 |
apologies duplicate |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org