Message boards : Number crunching : Low Credits RAC for 8-Core PC?
Previous · 1 · 2
Author | Message |
---|---|
Corhal Send message Joined: 22 Nov 09 Posts: 8 Credit: 11,272 RAC: 0 |
Thanks transient, that clears it up for me. :) The AMD only has a cache of 512 KB while the Intel has 1024 KB! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So this is a case where the raw computing speed of the CPU is stellar (hence the high credit claim), but the actual results produced when running Rosetta are less then stellar, because (apparently) the contention for L2-cache is preventing full use of the CPUs. The CPUs find themselves waiting for things to arrive in L2 before they can proceed with execution. The BOINC benchmarks (which is what the claim is based on) are not memory intensive at all, and so they do not factor in this vital resource on the benchmarks used. This sort of dichotomy between a measurement and a reality is exactly why so many DIFFERENT benchmarks exist for measuring system capabilities. I also just wanted to point out that it is extremely difficult to draw concrete inference about one machine by comparing to another. This is because there are always so many different types of work in progress on Rosetta@home. And so even if you found WUs from the same protein, doing the same type of search protocol, you will still find variation from one model to the next. Since the tasks are always unique sets of models, the work the two machines were performing is never truly perfectly identical. One model may have 10% (or more, 100% in some isolated cases) variation in runtime to other models of the same task. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
robertmiles, I have no way to address your question about a memory leak that Windows task manager doesn't show. In fact, since no Rosetta task runs for weeks, I don't even follow the logic of how you conclude that Rosetta has the magic power to override the operating system and BOINC core client that it runs within to result in memory consumption beyond the established limits. I mean how do you conclude that one of your OTHER applications on that machine isn't performing the same magic? A memory leak in minirosetta was discussed by others a few weeks ago. When I add up all the memory usage that the Windows Task Manager shows as belonging to specific tasks, I often get significantly less than it shows as the total physical memory in use. I DON'T have any good way to tell exactly what program is responsible, but I often guess it's the one with the already reported memory leak. So, I believe what may have happened is that your machine ran happily with many tasks for many days and then landed a few with high memory requirements and the machine's performance for the rest of your applications began to suffer. This does not mean there is a memory leak, simply that those tasks require more memory then the ones you had run previously. Any particular reason why its performance STAYS poor even after the workunits it's currently running finish, and even after a few reboots? Any particular reason why the low priority tasks under BOINC seem to get higher priority for memory usage than the higher priority tasks the user starts? As for idle time, I let BOINC control most of the CPU usage nearly 24/7, but often want to run Windows Mail or Internet Explorer in addition. Often, BOINC seems to seize and keep enough memory that even the BOINC manager is very slow to respond, even after I set BOINC not to use more than 40% of the total physical memory. Any particular reason why suspended BOINC workunits seem to keep the same control of their memory as workunits still running, even if BOINC is set not to keep all suspended workunits in memory? |
![]() Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 9 |
robertmiles, I have no way to address your question about a memory leak that Windows task manager doesn't show. In fact, since no Rosetta task runs for weeks, I don't even follow the logic of how you conclude that Rosetta has the magic power to override the operating system and BOINC core client that it runs within to result in memory consumption beyond the established limits. I mean how do you conclude that one of your OTHER applications on that machine isn't performing the same magic? I've never seen anything better than Process Explorer. HTH Danny |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
robertmiles, what you are describing is basically the erratic behavior that a memory constrained system is going to exhibit. The user application gets done with that it had to do, nothing of any higher priority needs the CPU, the CPU goes to the low priority BOINC tasks, in order for them to run page faults occur, thus kicking out some of the pages needed by the user application, then later when the user demands response from the application again, some of it's pages are swapped out and must be faulted back in for it to respond to the request. There is no priority with regard to memory. Priority is for CPU time, and the task that currently got the CPU time will generate the page faults necessary in order for it to run. Think of this as the operating system getting what the task needs in order for it to run. If your period of time here without Rosetta running as improved things on your machine, then that would tend to confirm that memory was over committed. And the question then becomes whether Rosetta is using any more memory then it should be. The only mention of a leak that I recall was yours, perhaps you could provide a link and refresh our memories? If there was another, I don't believe it provide enough information to even conclude whether the author understood the difference between a leak, and a task starting out using very little memory, and then watching memory usage increase as the task gets started. Just because a task is using 100MB after 30 seconds and 150MB after 60 seconds and 200MB after 90 seconds is no indication of a leak. It's just an indication of a task that requires 200MB to run, getting started when running under low priority. Rosetta Moderator: Mod.Sense |
Corhal Send message Joined: 22 Nov 09 Posts: 8 Credit: 11,272 RAC: 0 |
So this is a case where the raw computing speed of the CPU is stellar (hence the high credit claim), but the actual results produced when running Rosetta are less then stellar, because (apparently) the contention for L2-cache is preventing full use of the CPUs. The CPUs find themselves waiting for things to arrive in L2 before they can proceed with execution. Hmm, that's rather interesting. It's a shame the CPU usage is 100% and yet they can't crunch to their full ability. Makes me wish I had bought Intel 2xQuad Cores with the 1024KB cache! Never thought I'd need it that much. The Work Units (per processor) take about the same time as on an my 1024KB Intel Dual Core and yet only yield 5-15C (per WU) compared to the 35-50C the Intel gets. I always assumed a completed Work Unit is the same no matter what PC its done on. This implies then that a WU completed on the machine with the bigger cache gets the Work Unit better done / more / better results or something like that? (Hence the more credits per WU) And I realize that it's difficult to compare two PCs when crunching due to the many, many factors playing a role. But there's still something you can tell from the comparison if there's a range of 5-15C and a range of 35-50C per Work Unit over a longer period of time :) Is there any chance that Rosetta might work better on machines with a smaller cache size some time in the future? Or is that simply not possible with the kind of work Rosetta is doing? |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
As for idle time, I let BOINC control most of the CPU usage nearly 24/7, but often want to run Windows Mail or Internet Explorer in addition. Often, BOINC seems to seize and keep enough memory that even the BOINC manager is very slow to respond, even after I set BOINC not to use more than 40% of the total physical memory. I see the same kind of things on machines I am using my gpu on. When I set Boinc to snooze it responds like it does when the gpu is not being used. I have no less than 2 gig of ram in all my machines so it isn't the ram it is my gpu, in my case. And yes I have Boinc set to crunch using the gpu even when I am using the machine. I tend to do alot on one of my machines and flipping the setting would mean it wouldn't finish a unit in a reasonable amount of time. So I choose to snooze Boinc when I need to do something I think is more important and then if I forget to un-snooze it it comes back on after while by itself. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Corhal, that is where your comments about the credit system have been a bit off the mark. With other projects such as SETI, a work unit represents a pre-defined portion of the search space. You complete the search of that space and you've done one unit of work, and regardless of how fast your machine is, or how long it took you (assuming all complete prior to deadlines) it should be worth a given amount of credit. On Rosetta@home, the search space for each protein is... well, closer to infinity then zero :) ...take 3 to the 100th power some time to get an idea (more for larger proteins). Anyway, the workunits are tasked with searching from a given starting point. The unit of work from that start is a model. When a model (or "decoy") is completed then a tiny portion of that vast search space has been completed. But the task itself can run up to 99 models on the same task. Each is a unique starting point. This is how the project supports the user defined runtime preference (see the website area for participants, and then click the Rosetta@home preferences). If your preferred runtime has not yet been reached, and the average time per model so far on this task would predict the next model will complete within your preference, then another model is begun. If the prediction is that it would run past your runtime preference, then the task is completed and it's results sent back. So the project servers tabulate the credit claims of all of the machines working on the "same" task (see my prior comments about how each protein and each type of search is unique) and a rolling average of credit claimed per MODEL (not task) is maintained. As work is reported back, the number of models found in the result (click on your task details sometime) is used to determine the GRANTED CREDIT for the result. And so R@h has no preconceived notion about how fast a machine is, or how long it should take to run 10 models... it is very adaptable and flexible. This is why we say the system grants credit for the amount of work actually completed. A task that reports back 20 models has twice as much valuable science data as another (of same protein and search type) that only has 10 models completed, and it is granted twice as much credit. No given user (nor team) can manipulate the system by falsifying their benchmarks, because their own credit is based on the average prior to their reporting (unless they are the very first one to report, but they have no control of that either), and any inflated benchmark actually results in the people reporting in AFTER this claim getting slightly more credit per model. DK has posted graphs before showing the credit per model rolling average becomes very well established very quickly and does not vary much after that. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
"rolling average" was not the proper term, because nothing is rolling out of the average. "Weighted average", weighted on number of models would be the correct term. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
robertmiles, what you are describing is basically the erratic behavior that a memory constrained system is going to exhibit. The user application gets done with that it had to do, nothing of any higher priority needs the CPU, the CPU goes to the low priority BOINC tasks, in order for them to run page faults occur, thus kicking out some of the pages needed by the user application, then later when the user demands response from the application again, some of it's pages are swapped out and must be faulted back in for it to respond to the request. There is no priority with regard to memory. Priority is for CPU time, and the task that currently got the CPU time will generate the page faults necessary in order for it to run. Think of this as the operating system getting what the task needs in order for it to run. Maybe you didn't look far enough back. See: Number crunching: Problems with web site message 62750 Number crunching: Chaos in Rosetta@Home?? messages 62683 and 62640 both over 120 days ago. I found nothing saying whether the memory leak problem reported then was fixed, and if so, when. Also, I'm seeing slowdowns whenever the total memory usage gets much over 50% and BOINC workunits are the main memory users, so maybe some program is using it in a scattered fashion, which prevents Vista from giving large contiguous blocks to programs that need them. Also, why isn't the memory used by suspended workunits moved to the swapfile faster, when the setting to keep it in memory is NOT used? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...why isn't the memory used by suspended workunits moved to the swapfile faster, when the setting to keep it in memory is NOT used? In short, because it could be very inefficient to do so. The Operating System most likely uses a strategy called "least recently used (LRU)" to determine what to keep in RAM and then the least recently used pages of memory are the ones that go to the swap space on the hard drive. One example, machine is idle, BOINC is running, using just about all of the memory you are allowing it to, and then you sit down and fire up a browser. The period of time from you clicking the icon until you have your homepage up requires a lot of disk activity. If Windows tried to push the BOINC pages out of memory to the disk at this time, it would only be directly competing with the disk IO required to fire up your browser. So instead, it waits to see how much memory your browser is going to really need, and only pushes out the very oldest pages of memory. When BOINC gets CPU time again (which is actually probably occurring WHILE your browser is waiting for disk IO) it may still be able to run for a considerable time with the pages that remain. You mentioned scattered (i.e. fragmented) use and contiguous space. All memory is divided in to pages. And all use is "scattered". I might be running BOINC with 100 pages of memory one minute, and then fire up that browser and have some of my pages swapped out. But I will still be able to run with the 60 pages that are left, it just increases the odds that I have to go to disk to bring back a few of the 40 that were swapped out. And this is what makes things sluggish and response times inconsistent. It just depends which memory pages the application requires next and whether they are still in memory or not. This is part of why BOINC has gone to great lengths to try and provide controls to you on how your machine is used. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
Is there any particular reason why, in such a case, the browser doesn't just run in the 60% of the physical memory I have told BOINC not to use, and therefore start up faster? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
"the browser"?? You mean the BOINC Manager? I am not certain how it enforces it's configured memory preferences. But the idea is simply that if the total amount of memory a task is allowed to allocate is capped, then that would also cap the amount of actual RAM the process could ever use. It's actual RAM use could be smaller, with the rest swapped out as I described earlier, but the task doesn't know what's swapped and what is not, only the total it is using. ...perhaps in your case, the limit you set was still actually too high for your conditions, and thus the tasks ended up doing significant swapping as they ran, causing performance degradation. What I'm attempting to say is that memory is rather difficult to observe and control. But it is possible BOINC was living within the limits you imposed and the other requirements of the operating system and the applications you were running still resulted in swapping. Another possibility is that BOINC is not properly enforcing the limit. For example it would be possible they measured all dynamic memory allocations, and neglected to account for the actual application size, which would be significant in the case of R@h. More significant then other projects with smaller application size. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 9 |
"the browser"?? Internet browser! robert - I believe the 40% of memory will remain free from BOINC, but the OS and other running programs have to sit in that memory, and if you've not been using your browser then some part of the OS will have to be swapped out - it could be anything from the print spooler to the anti-virus... My understanding is that an OS will use as much RAM as is available because otherwise it would have to page out an unknown amount of stuff in memory in order to be ready for an unknown program's memory requirements. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Oh, yah, sorry, I misplaced a word or two in robert's reply. My example was a browser and he was talking about the memory that should NOT be used by BOINC. Yes, since BOINC should only be using 40%, and would presumably be the most recently used areas of memory (assuming BOINC was just running) then by default, the new application (such as the browser) would end up in the remaining 60%... now, is that 60% sitting empty or is there stuff in there that must be first written to disk? It just depends. You had mentioned adding up the memory of all tasks and that the total always comes shy of the totals shown by Windows. This can be due to areas of the operating system that utilize memory, but are not shown as tasks. Or memory used by applications that just pop in for such a brief time that they do not appear in the task list. You might do the same comparison of total and detail at a time when BOINC has not run since a reboot, just to lend confidence to the thought that this variation exists with or without BOINC. ...although it would seem less likely to appear, or to be a significant variation just after a reboot. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
"the browser"?? You mean the BOINC Manager? I am not certain how it enforces it's configured memory preferences. But the idea is simply that if the total amount of memory a task is allowed to allocate is capped, then that would also cap the amount of actual RAM the process could ever use. It's actual RAM use could be smaller, with the rest swapped out as I described earlier, but the task doesn't know what's swapped and what is not, only the total it is using. ...perhaps in your case, the limit you set was still actually too high for your conditions, and thus the tasks ended up doing significant swapping as they ran, causing performance degradation. By the browser, I mean Internet Explorer 8. I've seen little sign that the BOINC manager program uses very much memory; it is just the BOINC user interface. boinc.exe, on the other hand, appears to dole out memory to the workunit programs. What I'm attempting to say is that memory is rather difficult to observe and control. But it is possible BOINC was living within the limits you imposed and the other requirements of the operating system and the applications you were running still resulted in swapping. When the slowdown occurs, going to Windows Task Manager -> Performance -> Physical Memory often shows just 1 MB of free memory, so SOMETHING is using more memory than expected. However, in such a case, the graph at Windows Task Manager -> Performance usually shows not much more than 50% of the memory in use. Another possibility is that BOINC is not properly enforcing the limit. For example it would be possible they measured all dynamic memory allocations, and neglected to account for the actual application size, which would be significant in the case of R@h. More significant then other projects with smaller application size. In the past, I have seen at least one BOINC version that didn't enforce the memory limits on high priority workunits when two of them were running at once. I don't remember any high priority tasks needing enough memory that I could check whether this also applied when just one was running. I haven't seen any recent cases of two high priority workunits running at once with memory requirements high enough that I can check if the problem applies to the 6.10.18 BOINC version I'm using now. I've seen signs that Windows Task Manager just tries to report the portion of the task that's currently in memory, not the portion in the swap file. |
David Ball Send message Joined: 25 Nov 05 Posts: 25 Credit: 1,439,333 RAC: 0 |
Instead of just letting unused physical memory sit idle, the Operating System dynamically uses extra physical memory for disk cache. The free memory number that you're looking at represents what is left over. It's normal for this number to be rather small, often in the lower double digits. When the OS needs more memory, it just discards some of the disk cache and uses that memory. The performance graph only indicates how much memory is in use by programs. It excludes the use of idle memory as disk cache. That's why it's showing a lower number. What you're seeing sounds like memory thrashing. Something suddenly wants a lot of memory and the OS is discarding disk cache to give it to programs. Memory allocation is something of an art and can represent a choke point in the system. Not only is the OS having to track all of that memory but it has to coordinate the operation between CPUs/cores and virtual memory tables are being updated which means some cpu caches are being flushed as well. Also, program startup is when some anti-virus programs scan the program being loaded and scan the files being opened. This can significantly slow program startup. Depending on what version of Windows you're using, there are some tunable parameters. On Vista SP2, you might want to go into control panel and check the following: Go to Control Panel -> System and select the item on the left titled "Advanced System Settings". You'll get a popup window titled "System Properties". Select the "Advanced" tab. In the section on performance, press the "settings" button. This will give you a popup window title "performance options". Select the "Advanced" tab on this window. At the top is a section titled "processor scheduling". In this section, you have a choice of 2 options, "Programs" or "Background Services". You want to select "Programs". If you're on an earlier version of windows, beneath the "processor scheduling" section, there's another section titled "Memory Usage" which lets you adjust for the best performance of "Programs" or "System Cache". You'd want to select "programs" in this section if it's present.When you're done with your selection(s), click the apply button at the bottom. The hit the various OK buttons to get back out of the nested windows. A reboot may be required. Use at your own risk! I don't know what options are present in Win7. Microsoft seems to be making this less and less tunable from the user perspective. If you're a programmer and very daring, there are some additional ways to tune this. See: Blog: Too Much Cache? : http://blogs.msdn.com/ntdebugging/archive/2007/11/27/too-much-cache.aspx GetSystemFileCacheSize Function: http://msdn.microsoft.com/en-us/library/aa965224%28VS.85%29.aspx SetSystemFileCacheSize Function: http://msdn.microsoft.com/en-us/library/aa965240%28VS.85%29.aspx That's probably more than you ever wanted to know about the system cache and I've barely scratched the surface. Have you read a good Science Fiction book lately? |
Message boards :
Number crunching :
Low Credits RAC for 8-Core PC?
©2025 University of Washington
https://www.bakerlab.org