Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 310 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98236 - Posted: 19 Jul 2020, 22:43:36 UTC - in response to Message 98233.  
Last modified: 19 Jul 2020, 22:44:35 UTC

[snip]

There must be some kinda CRC check in the programming. Astrophysics projects use at least two machines, as the answer can be incorrect.

It depends. If the project is searching a very large set of starting points that should all give answers converging to the best possible answer, and the server can quickly evaluate the quality of what was returned, that a few wrong answers aren't important enough to reduce the number of starting points that are evaluated.

On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.
ID: 98236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98237 - Posted: 19 Jul 2020, 22:58:32 UTC - in response to Message 98236.  

On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.
I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits.
ID: 98237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 98238 - Posted: 20 Jul 2020, 4:19:40 UTC - in response to Message 98234.  
Last modified: 20 Jul 2020, 4:21:04 UTC

Where did you get the data from?
I was just looking at the Measured floating point speed and Measured integer speed values on each Computer Details page, which come from the Whetstone and Dhrystone benchmarks that BOINC runs.


Those numbers are anything but accurate.
My hilariously thermally constrained Macbook from 2015 has a measured floating point speed of 5.65GFLOPs (it can go above 6.10GFLOPs sometimes, which is way higher than a well-cooled i9-9900K). That is faster than my Ryzen 3600 and many current gen high-end desktop-grade CPUs from Intel.
There is no way that can be true, integer performance seems to match up to expectations, though.
ID: 98238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,491,225
RAC: 20,847
Message 98240 - Posted: 20 Jul 2020, 6:04:19 UTC - in response to Message 98231.  

Ok that answers one of my two questions, but.... how did he finish it so quickly?
My understanding is that for a given Work Unit, each Task actually starts with a different random seed. So while the data for 2 (or more) Tasks from a given Work Unit is the same, the starting seed value(s) are different, and so the entire calculation work done can be significantly different- even though the data being processed is the same.
Hence why there is no comparison of results involved in Validation of work done.

I could be wrong of course.
Grant
Darwin NT
ID: 98240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,491,225
RAC: 20,847
Message 98241 - Posted: 20 Jul 2020, 6:07:59 UTC - in response to Message 98237.  

On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.
I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits.
Cheating by some people in the original Seti project was the reason BOINC was developed- Credits instead of just counting the number of Work Units processed, and a method for comparing results to see if a returned result is actually Valid or not.
Grant
Darwin NT
ID: 98241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 403
Credit: 12,294,748
RAC: 3,791
Message 98248 - Posted: 20 Jul 2020, 17:14:47 UTC - in response to Message 98233.  

Where did you get the data from? I usually compare using http://cpuboss.com/compare-cpus but that has not heard of his CPU. I tried searching for a few more comparison sites, but the ones that list his don't have benchmarks, they just list all the specs side by side.

Try :-

https://www.cpubenchmark.net/cpu_list.php
ID: 98248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 98249 - Posted: 20 Jul 2020, 19:22:23 UTC

So far as validating results, lost results etc. Each protein study fires off thousands of tasks. Some 5% or less of those results will look to be the best. If a task ran astray out in the wild, and mistakenly reports a terrible result, that's not ideal, but there should still be a similar result in those top 5%. If the task ran astray and mistakenly reports a fantastic result, that single model is rerun in the lab and confirmed. If the lab system has the same flaw, it should get the same fantastic result. But there is also human review of the results. Sometimes you can tell, just by the shape of the result, that it doesn't look like a protein found in nature.

If a protein-protein interaction were being studied, it might be more difficult to tell that something is off just by the shape. Eventually results may by sent to the "wet lab" where they produce the two proteins and see if they actually interact as predicted by the model.

If the protein structure has already been determined, the models are compared to the known structure and the degree of their similarity is measured in RSMD.

Sometimes the human review of the top 5% of the results concludes that we still have not found the best model. Perhaps there is a high variability in appearance across the top scoring models. In such cases, variations of those top 5% of the results are sent out as a new round of work. It is for the same protein, and again will do thousands of models, but these will start with some assumptions or rules that cause you to begin with something much closer to one of those previous best results, and search around that same area for a better (lower energy) result.

I made up the 5% number. 1% or less is probably more realistic. Maybe I should have said something like "...the top 10 or 20 models".

Anyway, I hope that makes it more clear why R@h does not require a wingman to rerun the same models to confirm results. When you get down to those top 10 results, they should all look pretty similar. Each arrived at that model from a different start, but, in the end, the top results should all be similar to the actual protein's structure in nature. So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb.
Rosetta Moderator: Mod.Sense
ID: 98249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98250 - Posted: 20 Jul 2020, 19:41:52 UTC - in response to Message 98249.  

So far as validating results, lost results etc. Each protein study fires off thousands of tasks
[snip]
So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb.
Excellent description, thanks, it's nice to know how the system operates that we're running.
ID: 98250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,555,266
RAC: 8,961
Message 98252 - Posted: 21 Jul 2020, 0:12:27 UTC - in response to Message 98225.  

if it says "36000s + 14400s" that indicates the watchdog has now been set back to 4hrs rather than 10hrs
The 4 hours I took to be the run time preference (per this post); the 10 hours the watchdog (per this post).

Got it.
It's been so long since I needed to look at task overruns I must've completely forgotten the syntax.
Made worse by the task runtime being 4+10 rather than 10+4. If it was 8+watchdog I wouldn't have confused myself so easily (I hope)
ID: 98252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jord
Avatar

Send message
Joined: 16 Sep 05
Posts: 41
Credit: 204,120
RAC: 0
Message 98265 - Posted: 22 Jul 2020, 10:02:23 UTC

When you made the 4.20 app for Windows, did you add the code (via the BOINC API) that checks every 10 seconds if the client has died and will then auto-exit the app?
During testing something with BOINC/BOINC manager I find that when I kill BOINC Manager about 15 seconds after it starting up, while Rosetta tasks are still loading into memory, that both BOINC and BOINC Manager exit normally but the Rosetta tasks that started stay in memory. Even after a handful of minutes these apps still run. I have to manually kill them.
Restarting BOINC Manager will only cause the tasks that started already to stay in memory and in BOINC Manager these show as "waiting to acquire slot directory lock. Another instance may be running."
ID: 98265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Corgi

Send message
Joined: 19 Jun 19
Posts: 5
Credit: 2,453,241
RAC: 2,154
Message 98466 - Posted: 10 Aug 2020, 19:07:03 UTC

Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running.

I hate seeing these tasks I can't complete! Suggestions, please?
ID: 98466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98467 - Posted: 10 Aug 2020, 19:27:52 UTC - in response to Message 98466.  

Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running.

I hate seeing these tasks I can't complete! Suggestions, please?


How many cores do you have? Can you run your intensive tasks and a smaller number of Rosettas at once, by limiting Boinc to use less cores? Or leave the computer on more when you're not using it?
ID: 98467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 22 Apr 20
Posts: 17
Credit: 270,864
RAC: 0
Message 98468 - Posted: 10 Aug 2020, 19:57:35 UTC - in response to Message 98466.  

Hi Corgi,
Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free.

I've noticed with Folding, if set to medium, 3 cores, before a task starts, you can turn it down to light, 1 core, and back up to medium later, but if a tasks starts in light, 1 core, turning it up to medium has no effect and it will run as 1 core to the end of that task.

Hope that helps.
ID: 98468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98469 - Posted: 10 Aug 2020, 21:38:19 UTC
Last modified: 10 Aug 2020, 21:42:40 UTC

The way I got BOINC and Folding to run fairly well together involved several steps:

1. While BOINC is shut down, tell Folding to create a GPU slot. Note that you MUST find and click inside the rather faint circle to the left of the GPU information for this to work, and must have a suitable GPU.

2. Run Folding long enough for any current CPU task to finish. Now before the next CPU task starts, delete Folding's CPU slot.

3. Restart BOINC, set it to not use the GPU, and not to try to get tasks from any GPU-only BOINC project.

4. Decrease the number of CPUs BOINC is allowed to use until BOINC and Folding do not interfere with each other or with programs you run from the console.

This lets Folding use your GPU for COVID-19 work, while having BOINC projects using the CPU and also doing COVID-19 work..
ID: 98469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98470 - Posted: 10 Aug 2020, 21:48:13 UTC - in response to Message 98468.  

Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free.

You don't try to run Folding CPU work at the same time as BOINC CPU work.

But you can easily run Folding GPU work while running BOINC CPU work.
You just reserve a core in BOINC for use by the GPU, typically by setting the "Use at most xxx% of the processors" to the appropriate value.
For an 8 core machine, setting it to 90% leaves one core free for use by Folding.
For a 16 core machine, 95% works.
And for a 32 core machine, 97% will do.
You can of course reserve more than one core if you want.

That is the same thing you would ordinarily do (or at least I would do) for reserving a core if running a BOINC GPU project, so it is really no different for a Folding GPU project.
ID: 98470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98473 - Posted: 10 Aug 2020, 22:15:04 UTC - in response to Message 98470.  

Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free.

You don't try to run Folding CPU work at the same time as BOINC CPU work.

But you can easily run Folding GPU work while running BOINC CPU work.
You just reserve a core in BOINC for use by the GPU, typically by setting the "Use at most xxx% of the processors" to the appropriate value.
For an 8 core machine, setting it to 90% leaves one core free for use by Folding.
For a 16 core machine, 95% works.
And for a 32 core machine, 97% will do.
You can of course reserve more than one core if you want.

That is the same thing you would ordinarily do (or at least I would do) for reserving a core if running a BOINC GPU project, so it is really no different for a Folding GPU project.


Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has.
ID: 98473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98474 - Posted: 10 Aug 2020, 22:40:54 UTC - in response to Message 98473.  

Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has.

That reminded me to check back with Folding. They have had such a heavy server load recently (for the CPU work) that they haven't been able to handle it.
It looks like they can now. I can do an all-Folding machine again.
ID: 98474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98475 - Posted: 10 Aug 2020, 23:10:16 UTC - in response to Message 98474.  

Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has.

That reminded me to check back with Folding. They have had such a heavy server load recently (for the CPU work) that they haven't been able to handle it.
It looks like they can now. I can do an all-Folding machine again.


Heavy load for CPU? Surely GPU does a lot more?
ID: 98475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98476 - Posted: 11 Aug 2020, 0:25:44 UTC - in response to Message 98475.  

Heavy load for CPU? Surely GPU does a lot more?

It is the server load I am talking about. They have different ones for different purposes.
ID: 98476 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98477 - Posted: 11 Aug 2020, 0:27:11 UTC - in response to Message 98474.  

Simpler way, don't use folding. If they want me to use my computers on their project, they need to make it work in Boinc like everyone else has.

That reminded me to check back with Folding. They have had such a heavy server load recently (for the CPU work) that they haven't been able to handle it.
It looks like they can now. I can do an all-Folding machine again.

Folding is planning to offer a BOINC version. Probably not until after they finish their COVID-19 work, though.
ID: 98477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 310 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org