Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 311 · Next
Author | Message |
---|---|
Kaddaman Send message Joined: 22 Mar 20 Posts: 4 Credit: 1,087,271 RAC: 0 |
[snip] Yes, charger is connected and screen is off (although there are options to change that in the settings). I now limited the CPU usage to only 4 of the 8 cores, this removed the error message and BOINC keeps working more consistently, although not perfectly. WUs still get stopped in the middle. This is especially bad if the WUs are maybe at 50% after a few hours and I am starting to use my phone, those hours of work get suspended. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
[snip]
BOINC also has settings for Suspend when computer is on battery, Suspend when computer is in use, and Suspend when non-BOINC CPU usage is above. Have you turned all of these off? Do you disconnect the charger when you are using the phone? BOINC is usually able to restart properly after it is suspended. Does it shut down BOINC completely, instead of just suspending it? You might try limiting CPU use by BOINC to 3 cores, in order to leave more memory for phone use. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Having other applications get sluggish is part of why the default BOINC settings are to not run at maximums. Memory contention and swapfile still get you, even given that BOINC tasks are low priority. By reducing your resources for BOINC, you are leaving some for the active user. Rosetta Moderator: Mod.Sense |
spRocket Send message Joined: 23 Mar 20 Posts: 22 Credit: 3,008,018 RAC: 0 |
Seems to me, at least when you're dealing with 8+ cores and SMT, that leaving at least one thread open for overhead is a good idea, particularly if you're planning to use the machine as a normal desktop. I have a Ryzen 7 1700 cranking away (8 cores/16 threads) running Linux, and I find that it's best to use 14 cores for Rosetta, one core and the GPU for GPUGRID, and one core for overhead. Otherwise, the system bogs down noticeably. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Seems to me, at least when you're dealing with 8+ cores and SMT, that leaving at least one thread open for overhead is a good idea, particularly if you're planning to use the machine as a normal desktop. I have a Ryzen 7 1700 cranking away (8 cores/16 threads) running Linux, and I find that it's best to use 14 cores for Rosetta, one core and the GPU for GPUGRID, and one core for overhead. Otherwise, the system bogs down noticeably. You only mentioned one of the possible narrow places limiting how many cores you can use; otherwise, correct. You may need to watch the total amount of memory, and the shared path from the CPU to main memory as well, and possibly others. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1736 Credit: 18,532,940 RAC: 14,716 |
Thanks - have tried both Memtest86 and Windows built in memtest on that host, both came back clean after several passes.Your Ryzen 9 and i7-3630QM don't have enough RAM to run most of your cores & threads. You need 1.3GB RAM per Task running (present Tasks are using much less, but others do use that much). Even if you have that much RAM, you then need to make it available for BOINC to use. Your accout, Computing preferences, Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended (not selected) I have no issues running all cores & threads on my systems. The Rosetta applications run at idle priority, and they don't work the CPU nearly as hard as other applications do when they are running. Grant Darwin NT |
Tom M Send message Joined: 20 Jun 17 Posts: 97 Credit: 16,726,096 RAC: 36,642 |
Where do we post initialization and login questions, please? https://boinc.bakerlab.org/rosetta/forum_help_desk.php Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Tom M Send message Joined: 20 Jun 17 Posts: 97 Credit: 16,726,096 RAC: 36,642 |
slightly OT: Windows10 seems to LAG when BOINC runs on ALL cores. why? Sure. You need to allow at least one free core/thread on most operating systems or the system will slow down its processing of all tasks. Often it is useful to have 2 idle threads. On a high core/thread cpu you want to have 4 free. One rule of thumb is set the percentage of cpu cores/threads that your projects use to 90%. On smaller core/thread count systems I sometimes have to set it as low as 75% to get a thread free. Tom M Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1736 Credit: 18,532,940 RAC: 14,716 |
Sure. You need to allow at least one free core/thread on most operating systems or the system will slow down its processing of all tasks.Where as i'm running using all 6 cores & 12 threads with no impact on system performance or responsiveness and next to no detriment to processing times; only 2min difference between Run time & CPU time over 8 hours of processing (and using all 6 cores on the system with HyperThreading turned off. No problems). So far so good - as i let my machine run 24/7, i would like to use ALL EIGHT CORES for rosetta.In your account, computing preferences, try Memory When computer is in use, use at most 95% When computer is not in use, use at most 95% Leave non-GPU tasks in memory while suspended (unselected) Page/swap file: use at most 75%You might have enough memory, but if you don't let BOINC use it then it will have problems. Grant Darwin NT |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
slightly OT: Windows10 seems to LAG when BOINC runs on ALL cores. why? Its the HDD. You have tasks writing to disk and you are trying to copy a file at the same time. The read/write heads can't be in two places at once. Its not as noticeable with an SSD but still happens when tasks first start as they unzip data into the slot directory. Another thing you could adjust would be the checkpoint interval to 300 seconds to reduce some of the regular writes to the disk. Given you have 16 threads on a i7 9900k and Rosetta can use up to 1.3GB of memory per task you could also be paging. 16GB isn't enough to keep everything in memory so something has to get paged out to disk. You should be able to see this in Task Manager. That might also explain why when you set CPU usage to 88% in BOINC that it doesn't happen. If it is paging you can use an app_config file to limit how many tasks run at once, leave BOINC set at 88%, or add more memory. BOINC blog |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
[snip] I run BOINC on Android Oreo (HTC 10) and Pie (HTC U12+) just fine. You can try out HTC power to give, it's based on a somewhat older version of BOINC and has had some classic BOINC features removed (it does not cache work, instead, it behaves more like FAH, It does not allow you to crunch on battery power. Oh, and it only allows you to attach project from its limited predefined list, it also does not support "no new tasks" or suspend individual tasks or projects, which baffles me), but it works well with Android (it even has a setting to prevent your battery from overheating, which is vital for longevity) so I live with the cut-down-ess. One issue with HTC Power to give and Android Pie and later is that you still need to open up Power to Give before you turn off the screen, else it won't run. Oh, and you need to enable the persistent notification to prevent it from being killed. |
PorkyPies Send message Joined: 6 Apr 20 Posts: 45 Credit: 1,650,779 RAC: 0 |
Problem: Rosetta for Portable Devices queue has run dry. Question: Are the tasks the same between the 3 queues (Rosetta, Rosetta Mini and Rosetta for Portable Devices) or are they smaller for the Portable Devices? MarksRpiCluster |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Anyone else seeing this behavior? (follow link above to message 92354) After resuming apparently normal function for 8-10 days, the screensaver is displaying this blank template the past day and a half. Eric |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
I can't upload this Task https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1029331860 . It is stuck for two days now. Always uploading to 100% but not disappreaing from the transfer tab. Restarted BOINC manager etc. without success. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
I can't upload this Task https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1029331860 . It is stuck for two days now. Always uploading to 100% but not disappreaing from the transfer tab. Your computer(s) are hidden, so I can't tell if I could help. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1736 Credit: 18,532,940 RAC: 14,716 |
I can't upload this Task https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1029331860 . It is stuck for two days now. Always uploading to 100% but not disappreaing from the transfer tab.While the file is uploading (not waiting, but with the elapsed time counting upward), select Activity, "Suspend network activity." Give it a second or 2, then set it back to "Network activity based on preferences." This can often get a stuck transfer unstuck. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 404 Credit: 12,294,748 RAC: 2,092 |
I can't upload this Task https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1029331860 . It is stuck for two days now. Always uploading to 100% but not disappreaing from the transfer tab.While the file is uploading (not waiting, but with the elapsed time counting upward), select Activity, "Suspend network activity." Give it a second or 2, then set it back to "Network activity based on preferences." This can often get a stuck transfer unstuck. Useful to know, thank you :-) |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
Hi, I'm facing well known problems with crunching Rosetta tasks. My Boinc client is freshly installed on a new computer and the Rosetta project added shortly afterwards. Now I receive again and again error messages that taks are exited with zero status and no finish file, see below. I had the same problem on several other computers in the past. So there seems to be a general problem with Rosetta and not with one certain computer. By the way: Resetting the projects is no help. That way it makes no sense to continue crunching. It would be waste of time and electric power. Thanks for your reply. And it would be great, if the project it self could be repaired that such issues can't happen anymore. Sven **** 10.04.2020 13:23:40 | | cc_config.xml not found - using defaults 10.04.2020 13:23:40 | | Starting BOINC client version 7.16.5 for windows_x86_64 10.04.2020 13:23:40 | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2s zlib/1.2.8 10.04.2020 13:23:40 | | Data directory: C:ProgramDataBOINC 10.04.2020 13:23:40 | | Running under account rothsven 10.04.2020 13:23:42 | | CUDA: NVIDIA GPU 0: Quadro M2200 (driver version 378.98, CUDA version 8.0, compute capability 5.2, 4096MB, 3416MB available, 2122 GFLOPS peak) 10.04.2020 13:23:42 | | OpenCL: NVIDIA GPU 0: Quadro M2200 (driver version 378.98, device version OpenCL 1.2 CUDA, 4096MB, 3416MB available, 2122 GFLOPS peak) 10.04.2020 13:23:42 | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 630 (driver version 21.20.16.4574, device version OpenCL 2.1, 13037MB, 13037MB available, 211 GFLOPS peak) 10.04.2020 13:23:42 | | OpenCL CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 6.8.0.2, device version OpenCL 2.1 (Build 2)) 10.04.2020 13:23:42 | | Windows processor group 0: 8 processors 10.04.2020 13:23:42 | | Host name: EAMODLES4PRLNQ2 10.04.2020 13:23:42 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz [Family 6 Model 158 Stepping 9] 10.04.2020 13:23:42 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2 10.04.2020 13:23:42 | | OS: Microsoft Windows 10: Enterprise x64 Edition, (10.00.16299.00) 10.04.2020 13:23:42 | | Memory: 31.85 GB physical, 36.60 GB virtual 10.04.2020 13:23:42 | | Disk: 476.03 GB total, 356.30 GB free 10.04.2020 13:23:42 | | Local time is UTC +2 hours 10.04.2020 13:23:42 | | No WSL found. 10.04.2020 13:23:42 | | General prefs: from http://lhcathomeclassic.cern.ch/sixtrack/ (last modified 06-Mar-2015 20:46:54) 10.04.2020 13:23:42 | | Host location: none 10.04.2020 13:23:42 | | General prefs: using your defaults 10.04.2020 13:23:42 | | Reading preferences override file 10.04.2020 13:23:42 | | Preferences: 10.04.2020 13:23:42 | | max memory usage when active: 16305.90 MB 10.04.2020 13:23:42 | | max memory usage when idle: 29350.62 MB 10.04.2020 13:23:43 | | max disk usage: 100.00 GB 10.04.2020 13:23:43 | | max CPUs used: 2 10.04.2020 13:23:43 | | don't compute while active 10.04.2020 13:23:43 | | don't use GPU while active 10.04.2020 13:23:43 | | suspend work if non-BOINC CPU load exceeds 15% 10.04.2020 13:23:43 | | (to change preferences, visit a project web site or select Preferences in the Manager) 10.04.2020 13:23:43 | | Setting up project and slot directories 10.04.2020 13:23:43 | | Checking active tasks 10.04.2020 13:23:43 | climateprediction.net | URL https://climateprediction.net/; Computer ID 1502231; resource share 10 10.04.2020 13:23:43 | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 4092002; resource share 20 10.04.2020 13:23:43 | | Setting up GUI RPC socket 10.04.2020 13:23:43 | | Checking presence of 14 project files 10.04.2020 13:23:43 | | Suspending network activity - computer is in use 10.04.2020 13:27:04 | | Resuming network activity 10.04.2020 14:22:05 | Rosetta@home | Task hgfp_dimer_5x_373_fold_SAVE_ALL_OUT_907154_43_0 exited with zero status but no 'finished' file 10.04.2020 14:22:05 | Rosetta@home | If this happens repeatedly you may need to reset the project. 10.04.2020 14:50:20 | Rosetta@home | Task hgfp_dimer_5x_373_fold_SAVE_ALL_OUT_907154_43_0 exited with zero status but no 'finished' file 10.04.2020 14:50:20 | Rosetta@home | If this happens repeatedly you may need to reset the project. 10.04.2020 15:21:01 | Rosetta@home | Task hgfp_dimer_5x_373_fold_SAVE_ALL_OUT_907154_43_0 exited with zero status but no 'finished' file 10.04.2020 15:21:01 | Rosetta@home | If this happens repeatedly you may need to reset the project. 10.04.2020 16:35:03 | Rosetta@home | Task hgfp_dimer_5x_373_fold_SAVE_ALL_OUT_907154_43_0 exited with zero status but no 'finished' file 10.04.2020 16:35:03 | Rosetta@home | If this happens repeatedly you may need to reset the project. 10.04.2020 17:31:41 | Rosetta@home | Project requested delay of 7 seconds 10.04.2020 19:23:06 | | Suspending network activity - computer is in use 10.04.2020 19:26:47 | | Resuming network activity 10.04.2020 23:04:58 | | Suspending network activity - computer is in use 10.04.2020 23:09:18 | | Resuming network activity 11.04.2020 02:25:29 | climateprediction.net | No tasks sent 11.04.2020 02:25:29 | climateprediction.net | Project requested delay of 3636 seconds 11.04.2020 02:25:36 | Rosetta@home | Project requested delay of 7 seconds 11.04.2020 02:31:24 | Rosetta@home | No tasks sent 11.04.2020 02:31:24 | Rosetta@home | Project requested delay of 7 seconds 11.04.2020 02:31:36 | Rosetta@home | Project requested delay of 7 seconds 11.04.2020 02:44:52 | Rosetta@home | Task hgfp_dimer_5x_221_fold_SAVE_ALL_OUT_906873_888_0 exited with zero status but no 'finished' file 11.04.2020 02:44:52 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 03:24:33 | Rosetta@home | Task hgfp_monomer_54_fold_SAVE_ALL_OUT_906079_885_0 exited with zero status but no 'finished' file 11.04.2020 03:24:33 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 04:14:30 | Rosetta@home | Task hgfp_monomer_54_fold_SAVE_ALL_OUT_906079_885_0 exited with zero status but no 'finished' file 11.04.2020 04:14:30 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 06:42:26 | Rosetta@home | Task hgfp_dimer_5x_221_fold_SAVE_ALL_OUT_906873_888_0 exited with zero status but no 'finished' file 11.04.2020 06:42:26 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 06:45:33 | Rosetta@home | Task hgfp_monomer_54_fold_SAVE_ALL_OUT_906079_885_0 exited with zero status but no 'finished' file 11.04.2020 06:45:33 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 08:24:54 | Rosetta@home | Task hgfp_monomer_54_fold_SAVE_ALL_OUT_906079_885_0 exited with zero status but no 'finished' file 11.04.2020 08:24:54 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 09:53:15 | Rosetta@home | Task hgfp_monomer_54_fold_SAVE_ALL_OUT_906079_885_0 exited with zero status but no 'finished' file 11.04.2020 09:53:15 | Rosetta@home | If this happens repeatedly you may need to reset the project. 11.04.2020 11:33:00 | | Suspending network activity - computer is in use ******* |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Now I receive again and again error messages that taks are exited with zero status and no finish file, see below. I don't see it on any of my machines (nine Ubuntu and one Windows 7 64-bit). It could be your anti-virus interfering with creating or accessing the file. |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
I can't upload this Task https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1029331860 . It is stuck for two days now. Always uploading to 100% but not disappreaing from the transfer tab.While the file is uploading (not waiting, but with the elapsed time counting upward), select Activity, "Suspend network activity." Give it a second or 2, then set it back to "Network activity based on preferences." This can often get a stuck transfer unstuck. I tried this however without success. BOINC however gives me an error: 11.04.2020 13:00:13 | Rosetta@home | Started upload of conducting_fiber_fold_21_fold_SAVE_ALL_OUT_905803_166_0_r421462194_0 11.04.2020 13:00:41 | Rosetta@home | Temporarily failed upload of conducting_fiber_fold_21_fold_SAVE_ALL_OUT_905803_166_0_r421462194_0: transient HTTP error 11.04.2020 13:00:41 | Rosetta@home | Backing off 04:04:44 on upload of conducting_fiber_fold_21_fold_SAVE_ALL_OUT_905803_166_0_r421462194_0 11.04.2020 13:00:42 | | Project communication failed: attempting access to reference site 11.04.2020 13:00:43 | | Internet access OK - project servers may be temporarily down. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org