Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 310 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
[snip] There must be some kinda CRC check in the programming. Astrophysics projects use at least two machines, as the answer can be incorrect. It depends. If the project is searching a very large set of starting points that should all give answers converging to the best possible answer, and the server can quickly evaluate the quality of what was returned, that a few wrong answers aren't important enough to reduce the number of starting points that are evaluated. On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Where did you get the data from?I was just looking at the Measured floating point speed and Measured integer speed values on each Computer Details page, which come from the Whetstone and Dhrystone benchmarks that BOINC runs. Those numbers are anything but accurate. My hilariously thermally constrained Macbook from 2015 has a measured floating point speed of 5.65GFLOPs (it can go above 6.10GFLOPs sometimes, which is way higher than a well-cooled i9-9900K). That is faster than my Ryzen 3600 and many current gen high-end desktop-grade CPUs from Intel. There is no way that can be true, integer performance seems to match up to expectations, though. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1729 Credit: 18,491,225 RAC: 20,847 |
Ok that answers one of my two questions, but.... how did he finish it so quickly?My understanding is that for a given Work Unit, each Task actually starts with a different random seed. So while the data for 2 (or more) Tasks from a given Work Unit is the same, the starting seed value(s) are different, and so the entire calculation work done can be significantly different- even though the data being processed is the same. Hence why there is no comparison of results involved in Validation of work done. I could be wrong of course. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1729 Credit: 18,491,225 RAC: 20,847 |
Cheating by some people in the original Seti project was the reason BOINC was developed- Credits instead of just counting the number of Work Units processed, and a method for comparing results to see if a returned result is actually Valid or not.On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 403 Credit: 12,294,748 RAC: 3,791 |
Where did you get the data from? I usually compare using http://cpuboss.com/compare-cpus but that has not heard of his CPU. I tried searching for a few more comparison sites, but the ones that list his don't have benchmarks, they just list all the specs side by side. Try :- https://www.cpubenchmark.net/cpu_list.php |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So far as validating results, lost results etc. Each protein study fires off thousands of tasks. Some 5% or less of those results will look to be the best. If a task ran astray out in the wild, and mistakenly reports a terrible result, that's not ideal, but there should still be a similar result in those top 5%. If the task ran astray and mistakenly reports a fantastic result, that single model is rerun in the lab and confirmed. If the lab system has the same flaw, it should get the same fantastic result. But there is also human review of the results. Sometimes you can tell, just by the shape of the result, that it doesn't look like a protein found in nature. If a protein-protein interaction were being studied, it might be more difficult to tell that something is off just by the shape. Eventually results may by sent to the "wet lab" where they produce the two proteins and see if they actually interact as predicted by the model. If the protein structure has already been determined, the models are compared to the known structure and the degree of their similarity is measured in RSMD. Sometimes the human review of the top 5% of the results concludes that we still have not found the best model. Perhaps there is a high variability in appearance across the top scoring models. In such cases, variations of those top 5% of the results are sent out as a new round of work. It is for the same protein, and again will do thousands of models, but these will start with some assumptions or rules that cause you to begin with something much closer to one of those previous best results, and search around that same area for a better (lower energy) result. I made up the 5% number. 1% or less is probably more realistic. Maybe I should have said something like "...the top 10 or 20 models". Anyway, I hope that makes it more clear why R@h does not require a wingman to rerun the same models to confirm results. When you get down to those top 10 results, they should all look pretty similar. Each arrived at that model from a different start, but, in the end, the top results should all be similar to the actual protein's structure in nature. So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb. Rosetta Moderator: Mod.Sense |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
So far as validating results, lost results etc. Each protein study fires off thousands of tasksExcellent description, thanks, it's nice to know how the system operates that we're running. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,555,266 RAC: 8,961 |
if it says "36000s + 14400s" that indicates the watchdog has now been set back to 4hrs rather than 10hrsThe 4 hours I took to be the run time preference (per this post); the 10 hours the watchdog (per this post). Got it. It's been so long since I needed to look at task overruns I must've completely forgotten the syntax. Made worse by the task runtime being 4+10 rather than 10+4. If it was 8+watchdog I wouldn't have confused myself so easily (I hope) |
Jord Send message Joined: 16 Sep 05 Posts: 41 Credit: 204,120 RAC: 0 |
When you made the 4.20 app for Windows, did you add the code (via the BOINC API) that checks every 10 seconds if the client has died and will then auto-exit the app? During testing something with BOINC/BOINC manager I find that when I kill BOINC Manager about 15 seconds after it starting up, while Rosetta tasks are still loading into memory, that both BOINC and BOINC Manager exit normally but the Rosetta tasks that started stay in memory. Even after a handful of minutes these apps still run. I have to manually kill them. Restarting BOINC Manager will only cause the tasks that started already to stay in memory and in BOINC Manager these show as "waiting to acquire slot directory lock. Another instance may be running." |
Corgi Send message Joined: 19 Jun 19 Posts: 5 Credit: 2,453,241 RAC: 2,154 |
Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running. I hate seeing these tasks I can't complete! Suggestions, please? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running. How many cores do you have? Can you run your intensive tasks and a smaller number of Rosettas at once, by limiting Boinc to use less cores? Or leave the computer on more when you're not using it? |
Ray Murray Send message Joined: 22 Apr 20 Posts: 17 Credit: 270,864 RAC: 0 |
Hi Corgi, Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free. I've noticed with Folding, if set to medium, 3 cores, before a task starts, you can turn it down to light, 1 core, and back up to medium later, but if a tasks starts in light, 1 core, turning it up to medium has no effect and it will run as 1 core to the end of that task. Hope that helps. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
The way I got BOINC and Folding to run fairly well together involved several steps: 1. While BOINC is shut down, tell Folding to create a GPU slot. Note that you MUST find and click inside the rather faint circle to the left of the GPU information for this to work, and must have a suitable GPU. 2. Run Folding long enough for any current CPU task to finish. Now before the next CPU task starts, delete Folding's CPU slot. 3. Restart BOINC, set it to not use the GPU, and not to try to get tasks from any GPU-only BOINC project. 4. Decrease the number of CPUs BOINC is allowed to use until BOINC and Folding do not interfere with each other or with programs you run from the console. This lets Folding use your GPU for COVID-19 work, while having BOINC projects using the CPU and also doing COVID-19 work.. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free. You don't try to run Folding CPU work at the same time as BOINC CPU work. But you can easily run Folding GPU work while running BOINC CPU work. You just reserve a core in BOINC for use by the GPU, typically by setting the "Use at most xxx% of the processors" to the appropriate value. For an 8 core machine, setting it to 90% leaves one core free for use by Folding. For a 16 core machine, 95% works. And for a 32 core machine, 97% will do. You can of course reserve more than one core if you want. That is the same thing you would ordinarily do (or at least I would do) for reserving a core if running a BOINC GPU project, so it is really no different for a Folding GPU project. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free. Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has. That reminded me to check back with Folding. They have had such a heavy server load recently (for the CPU work) that they haven't been able to handle it. It looks like they can now. I can do an all-Folding machine again. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has. Heavy load for CPU? Surely GPU does a lot more? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Heavy load for CPU? Surely GPU does a lot more? It is the server load I am talking about. They have different ones for different purposes. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has. Folding is planning to offer a BOINC version. Probably not until after they finish their COVID-19 work, though. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org