Posts by Greg_BE

41) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107516)
Posted 20 Oct 2022 by Profile Greg_BE
Post:
It's ok now since it started a new task. So now it 2+2 on MOO, 4 LHC (like normal) and 9 single tasks (15) and then 16 is for windows and other stuff.
I've never seen a task (including Moo) use two GPUs. I don't have any machines with two identical cards to test it right now, but I'm sure I have in the past and I just got 2 Moo tasks, one on each card. Maybe only Nvidias do it? (I only have AMDs) Are your GPUs identical?





1080 and a 1060TI
42) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107511)
Posted 20 Oct 2022 by Profile Greg_BE
Post:
I use that as well...more for boosting the GPU temp range to run it a bit harder.
I find the manufacturer's range lets it go up to 90C and crash. I reduce it to 50C no fan, 70C half fan, 80C full fan. I also reduce clocks on old cards that are tired and crash a lot, but I never increase clocks as that just causes unreliability and busted cards. I also boosted the max power consumption on my Fury from normal to +50% (!) to stop it reducing the clock when thinking hard and it seems happy with it, apart from I melted the power connector, so I soldered the wires directly on, it's working now. I was going to boost it even higher, but if I select the option to exceed manufacturers overclocking specs, it crashes the other card in that machine, even though I'm not overpowering that one.

But something isn't adding up: 10/20/2022 2:32:52 PM | Moo! Wrapper | Found app_config.xml
10/20/2022 2:32:52 PM | | Config: use all coprocessors
I see nothing wrong there.

The percentages look the same and the core usage configuration looks the same still.
It should have noticed your Moo task needs 2 CPU cores, and should have reduced the number of CPU tasks running. The CPU will presumably still be maxed out, but doing the correct number of things, so the CPU tasks should drift up to nearly 100% each.

Note: it will still display the same 0.2C+2NV until you download new Moo tasks, then it will show 2C+2NV. It's doing what you asked, just not updating the display on the tasks you already have. Yet another Boinc bug.


It's ok now since it started a new task. So now it 2+2 on MOO, 4 LHC (like normal) and 9 single tasks (15) and then 16 is for windows and other stuff.
43) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107509)
Posted 20 Oct 2022 by Profile Greg_BE
Post:
Then all is well. I use MSI afterburner which gives lovely graphs. I can see temperature and usage of CPU and GPU all at once and adjust Boinc until everything is running most efficiently.


I use that as well...more for boosting the GPU temp range to run it a bit harder.
But I see the usage graph now as well. I'm pretty much maxed out. 98% on both cards.

I forgot it looks at CPU as well.
Based on that info, the system is working just under maximum. 98% seems to be a common number.
CPU temp with a radiator is also just under max.

That's what I built this system to do.
Gaming board, hardcore but not pricey CPU.
GPU's are second hand. I got a good deal on the 1080 a long time ago, that group sold off all their cards really fast. I'm ok with a 1060 and a 1080.
I've spent all the money I am going to on this system and I will just keep it this way until it dies.

App_config I have used in the past to limit LHC ATLAS. But then they changed their coding and the app_config was no longer needed.

But something isn't adding up: 10/20/2022 2:32:52 PM | Moo! Wrapper | Found app_config.xml
10/20/2022 2:32:52 PM | | Config: use all coprocessors


The percentages look the same and the core usage configuration looks the same still.


Thats good to see they are looking at that idea for more specific control for idle and non idle systems.
44) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107507)
Posted 20 Oct 2022 by Profile Greg_BE
Post:
Also noticed that BOINC tasks says MOO is using between 180-190% CPU and maybe TN with Rosetta and FAH is chewing up to much capacity?

A couple of TN's are 40 minutes behind in CPU time vs run time.
Makes me wonder if TN needs more power.
Rosetta is fine, running about 15 minutes behind.
Maybe drop TN then. For now, setting it to no new tasks to see what happens after it has completed all the stuff in queue.
Your computer's running fine. Your GPU task is getting the 2 CPU cores it needs. The GPUs are most important since they're much more powerful and should always be kept busy. Your remaining CPU cores are shared between the CPU tasks.

Either leave it as it is, since presumably (you haven't told me) your GPU is flat out and so is your CPU.

Or use app config in Moo to tell it that task requires 2 CPU cores, so Boinc will run less CPU tasks at the same time.

Or decrease the number of CPU cores Boinc is allowed to use in preferences, but this won't be very automatic, since if you change your GPU to do something other than Moo, it won't be set correctly.


HWINFO says GPU's are 98% capacity. 15/16 cores are used in BOINC. 1 core is left free for general non BOINC stuff. Though if there was a way that when I am not on the computer to use all 16 and then release one when the computer is in use, then it would be a more efficient use of my system. So I could configure MOO to do like you said for now. I will have to look up the command to do that.
45) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107504)
Posted 19 Oct 2022 by Profile Greg_BE
Post:
Also noticed that BOINC tasks says MOO is using between 180-190% CPU and maybe TN with Rosetta and FAH is chewing up to much capacity?

A couple of TN's are 40 minutes behind in CPU time vs run time.
Makes me wonder if TN needs more power.
Rosetta is fine, running about 15 minutes behind.
Maybe drop TN then. For now, setting it to no new tasks to see what happens after it has completed all the stuff in queue.
46) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107503)
Posted 19 Oct 2022 by Profile Greg_BE
Post:
FAH both GPU's
BOINC 1 GPU at the moment, 15 cores in total allocated to BOINC
TN got more work, so more cores as its a new project.
MOO is also "new"
Must have been a learning thing. CPU's in the 90s now on TN
Either that or you're running more things than cores. You have 16 cores, but you're using 17 cores (Moo is using two cores, as indicated by task manager saying 12%). The CPU doesn't care, but you should watch your GPU usage, because if the CPU is overloaded, it might not feed the GPU fast enough (depends on the project).



yeah I keep forgetting gpu uses cpu, but then again figured BOINC or the system would take care of managing that.
47) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107500)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
FAH both GPU's
BOINC 1 GPU at the moment, 15 cores in total allocated to BOINC
TN got more work, so more cores as its a new project.
MOO is also "new"



Must have been a learning thing. CPU's in the 90s now on TN
48) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107499)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
FAH both GPU's
BOINC 1 GPU at the moment, 15 cores in total allocated to BOINC
TN got more work, so more cores as its a new project.
MOO is also "new"
49) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107498)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
50) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107497)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
51) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107491)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
[quote]TN is only using 29.51%.
Do you mean a TNGrid task only uses 29.51% of a CPU core? - Correct. now its at 46.82%
52) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107490)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
So WCG is screwed up, Cosmology as well. What's going on?
I went back to MOO. I used to do them long ago, but looks like my account was deleted at some point in time.
53) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107489)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
cd /d e:\Program Files\BOINC

:loop

boinccmd --network_available

TIMEOUT /T 300 /nobreak
goto loop
Which I placed in a batch file and told Windows to run at startup and minimised.



what do you call this file?
54) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107485)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
TN is only using 29.51%. GPU grid is done processing. Now just a lucky WCG and a PrimeGrid are running.
So whats up with that?

Oh and since I was having trouble with WCG GPU, I opened it up to CPU, but still only get GPU if they succeed in downloading.
55) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107484)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
QuChemPedia - interesting chemistry research - Can't run this, does not work on my system. Might try again.
I don't remember having problems, but they're out of work just now so I can't test, and I haven't run it for so long my tasks aren't in Boinctasks history or server history.

TN-Grid - genetics _ interesting, will have a look ***Is this CPU or GPU or both?***
CPU only. - bah..there is enough CPU stuff out there and I have plenty of work for CPU. I'm looking for GPU now. If Denis comes back something has to go.

WCG - loads of work now, and I've never had a single error you speak of

Only getting OPN because I am trying to fill my GPU queue. CPU has enough
10/18/2022 8:50:47 AM | World Community Grid | Temporarily failed download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt: transient HTTP error
10/18/2022 8:50:47 AM | World Community Grid | Backing off 00:10:09 on download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt
(and repeat 15 times) - aborted. They didn't do a blinking thing over night which is normal when they are stalled out with transient. I think I abort more stalled downloads than I do work. Seems to be a GPU task thing. CPU works just fine. Almost thing its time to stuff WCG for a month and come back. Another group that is in over their heads and can't fix stuff.
So your only problem is you have to retry downloads? You need to run "boinccmd --network_available" from a batch file and/or task scheduler. This will retry all downloads. You can set it to run as often as you like.


I tried looking that up..I don't really understand or know what to do to make downloads retry every 5 mins or something
56) Message boards : Cafe Rosetta : Off topic stuff from technical discussion (Message 107480)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
poop? really? Must be a crowd of PC people.
Shit is so common in American that it ridiculous.
Move that shit, see that shit over there, dumb shit,I got to take a shit, etc.
I guess my work in the stage industry and other places I have worked are rough and tough.
I don't know anyone that says poop.

Well bull fighting is animal cruelty in todays society because the bull is injured and is forced to continue until he's dead. Where as with meat, you shoot them in the head and they are gone in a instant. But I didn't think it was bull meat, I thought it was more the cows that aren't producing a calfs anymore.
Then you have to watch out for downer cows that they shove through the system anyway, the cows with brain eating stuff in them, the contaminated foot and mouth cows. Though all this seems to have disappeared.


And yeah you country Brits swear more than a drunk American.
I noticed Ramsy manages to keep it at PG level, but then when you watch the police interception shows, wow! some really hardcore mouths there. An American cop would be putting the cuffs on and stuffing them in the car for that kind behavior.
57) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107479)
Posted 18 Oct 2022 by Profile Greg_BE
Post:
Cosmology

Denis - he's not creating any work right now and has gone silent.
Tasks ready to send 0
Tasks in progress 0

Einstein - new radio waves research by a PhD student - already got that one

LHC - constant supply of single core and multicore VB work. - Running ATLAS

Milkyway - Got thhis

QuChemPedia - interesting chemistry research - Can't run this, does not work on my system. Might try again.

Rosetta - duh

Sidock - got this

TN-Grid - genetics _ interesting, will have a look ***Is this CPU or GPU or both?***

Universe - used to run that.

WCG - loads of work now, and I've never had a single error you speak of

Only getting OPN because I am trying to fill my GPU queue. CPU has enough
10/18/2022 8:50:47 AM | World Community Grid | Temporarily failed download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt: transient HTTP error
10/18/2022 8:50:47 AM | World Community Grid | Backing off 00:10:09 on download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt
(and repeat 15 times) - aborted. They didn't do a blinking thing over night which is normal when they are stalled out with transient. I think I abort more stalled downloads than I do work. Seems to be a GPU task thing. CPU works just fine. Almost thing its time to stuff WCG for a month and come back. Another group that is in over their heads and can't fix stuff.
58) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107472)
Posted 17 Oct 2022 by Profile Greg_BE
Post:
For those of you interested in other BOINC projects:

collatz has shut down, and expects a few months before it is ready to restart.

SETI@home has shut down, and plans to restart only when they get a new source of data.

cosmology@home appears to have a server problem - they've been unreachable for about a day.

RNA World stopped creating workunits over a year ago, but is still trying to get their last 20 or so workunits finished. The remaining workunits are expected to run for months each.



So whats left out there for science? Denis is down with a unexpected model issue or something along those lines. WCG is barely getting by and not all projects are online and seem to kick up transient http errors by the dozens. SiDock is the only really new project kicking butt.
I run a few other projects as well along with GPU Grid.
59) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107471)
Posted 17 Oct 2022 by Profile Greg_BE
Post:
Cosmology, I used their fine print to get to the University's Astronomy Department and emailed them asking if they could find the right person for the BOINC side.
No response so far. Doubt there ever will be.
60) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107470)
Posted 17 Oct 2022 by Profile Greg_BE
Post:
Some BOINC projects are now using a very high initial estimate for the run time, possibly in order to get an especially high priority for their tasks. GPUGRID is one of them. You need to watch how the estimated time to completion drops in order to see if that happened,
If that was true, it would only work once. Boinc then adjusts using the duration correction factor. So, no point in a project doing so.

What I'm seeing with GPUGRID work does not agree with you. The tasks start out with an expected runtime of hundreds of days (well past the deadline) and actually finish in less than two days, over and over.

One possibility I've thought of - the duration correction factor works for CPU tasks, but does not work for GPU tasks. I've only seen this odd behavior on GPU tasks, GPUGRID seldom offers any CPU tasks.



I run GPU grid as well. I think what screws up BOINC time remaining is that although it is a GPU task it is running a ton of CPU processes and this backwards running is what BOINC does not understand.
I am 80.80% done with the current task in 2 days and 8 hours and 42 minutes, remaining time to completion is 11 days 11 hours and 16 minutes. Somehow that does make sense.


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org