Posts by strongboes

1) Message boards : Number crunching : other covid 19 project (Message 97522)
Posted 23 Jun 2020 by strongboes
Post:
I have a pc running on TN Grid, you just need the code on website to set up an account. Really interesting project highly worth a look
2) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 95467)
Posted 28 Apr 2020 by strongboes
Post:
Yes, pretty clearly using those hosts to somehow feed work to other cpus. Very clever but clearly not actually the top cpu.

Does anyone know how he can do that out of interest?
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 95445)
Posted 27 Apr 2020 by strongboes
Post:
Folding@ you can set the number of cores in the cpu slot, change from - 1 to value you want.

I have found there is no optimal for running both simultaneously.
4) Message boards : Number crunching : The most efficient cruncher rig possible (Message 95435)
Posted 27 Apr 2020 by strongboes
Post:
Looking at raspberry Pi 4, 4GB only, I'm seeing the following from my experience.

Credit:
4 threads on a raspberry pi is going to land around 800 - 900 average credit.
a 12 year old Core2Quad generates about 2500 average credit
A Ryzen 3600 generates around 9000 average credit (from what I've seen looking at other accounts)


AMD Ryzen 3990x will do 115k-120k ppd RAC. That's drawing 280 watt total for cpu, although the ppd doesn't reduce much throttling at 200W. So if you could get a pi to run to 900ppd you would need say 133 of them to equal a 3990x.

The pi's would cost by your figures, I'll take the mid price at 88.5*133 =$11770

They would draw 480 watts in total.

This is just for comparison of course, but the 3990x is cheaper, more efficient, plus you have an awesome pc.
5) Message boards : Number crunching : 8 cores but only one Rosetta file? (Message 94991)
Posted 20 Apr 2020 by strongboes
Post:
My system is air cooled only. Three front intake. 1 top 1 back exhaust. and push/pull CPU cooler master fans. I won't use a water cooled system as people told me it is likely to leak. I have pets, I can't have that happen.

GPU has three fans on it. It's the long one.

I have contacted ASUS to tell them we need a firmware update or a new program that allows 100% user fan controller. Not sure how long it will take them to reply or if they will allow it.

I have noticed that Rosetta files run hotter, significantly hotter. I am now running only WCG files at 80-90% capacity and the temps are below 80. They run about 10C cooler. I wonder if Rosetta files are just not optimized correctly for user PCs. I don't know. Might be something to look at.

I still have to leave my AC on, and I don't want to do that, I want, like you said, even in an ambient environment at 35C that my system runs no hotter than 70. Something just isn't right.


I suggest you underclock your cpu a little, it will take a chunk of heat out, you get more of a reduction in power usage than performance. The reason your cpu runs hotter with only 25% load is counter intuitive, but in my experience it's because the cpu will run a higher clock speed, to do this it needs more voltage, and with more voltage you get more heat. So your best solution is to lower your clock speed and or voltage.
6) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 94974)
Posted 20 Apr 2020 by strongboes
Post:
In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home.
* http://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&postid=10377
* http://https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2194&postid=24612#24612

Basically, from what I understand, what really bothers you right now is WU to WU variability. Note that this will all average out on the long term, even after a few days, but definitely after 2 weeks (RAC).

Thus your aggregate credit count and RAC is still as good as anything for the purpose of keeping up the friendly competition with your peers, getting feedback about your contribution, fine tuning the performance of your hardware, checking whether your boxes are producing as expected for the given kind of hardware, etc.

Hence what you are asking (precise WU flops estimates) would require lots of development and maintenance time to be devoted to something that isn't that important at all and would take away time from research.



I'm aware how credits are awarded, there is huge variation.

What I've suggested I doubt is much of a change, instead of a wu being called to finish after a time period it would be a decoy count number which is set by researcher. Before batch is sent out it is run on a known hardware to determine credit per decoy awarded. Its very simple.
7) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 94972)
Posted 20 Apr 2020 by strongboes
Post:
The credit system is broken as far as I'm concerned anyway, 2 virtually identically named units can finish within a minute of each other and have wildly different credits. It makes absolutely no sense at all.

Imo having a time based system is one issue, instead there should be no time, the units should be simply setup so that they complete x amount of decoys, the researchers can set each run of work so that x number of decoys take a desired time

Example, rb1, 1 decoy takes 1 hour approx on reference hardware, credits awarded per decoy 100, unit size 4 decoys. Approx runtime 4 hours. Best hardware takes 3 hours, worst 5 hours, faster hardware clearly gets more reward per hour.

Rb2 1 decoy takes 15 mins approx on reference hardware, credit awarded per decoy 25, unit size 12 decoys. Etc

Obviously size can be set to what researchers want, perhaps instead of time preference for crunchers simply have wu size, small, medium, and large which could be a preference only, but wouldn't preclude you getting wu of any size

Perhaps a further option such as in wcg where you can set a number for these new larger wu's being proposed to run on your machine, default could be set to 1 per 16gb of machine memory to allow for other tasks to run.

I imagine credit compensation should be on the basis that if these larger units require you to suspend a core to allow it to run then double the credit should be awarded.


Enough for this post but is there no way to allow a workunit to use more than 1 core? This would reduce memory issues considerably and allow larger work to be run.
8) Message boards : Number crunching : What to do with tasks that are predicted to run longer than their available deadline? (Message 93641)
Posted 6 Apr 2020 by strongboes
Post:
A better way would be to reduce your runtime on the website to 2 hours. run update. restart boinc and any wu over the time will roll back to last completed decoy/checkpoint, upload and report. No need to abort and not get credit and save someone else re running.
9) Message boards : Number crunching : How might memory effect R@h processing? (Message 93472)
Posted 5 Apr 2020 by strongboes
Post:
I'm making the opinion from a number of factors, but a hard figure is the average processing rate which is found on the computer details/application details page. 4.07 was regularly hitting 30 on the covid design tasks, 4.12 and 11.5 is the limit.

4.07 I was averaging 1 credit per core, per average 11.5 seconds compute time. 4.12 is running at something like 40-50 seconds average for 1 credit, the best I've noted was around 34 secs.

I don't know if it's an accurate number to go on because you can run 100 tasks with the same name, 30 of them will run 10 decoys (for example) finish within 0.5% of time of each other, and the credit awarded can vary by 300 points, some will get 50, others rarely nearly 400.

I'm assuming most people don't pay as much attention to these numbers as I have done and so haven't noticed.
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93468)
Posted 5 Apr 2020 by strongboes
Post:
[snip]

Yes, smt/hyper threading is off. The point is though, on my Intel laptops, the speed has remained the same between 4.07 to 4.12. On my tr it's dropped over 60% regardless of what I try, I can't put it much simpler than that.

Note that all of your Intel laptops have a much lower number of cores, and will therefore have much less of a problem with too many cores trying to share the limited speed path to main memory. I can't put it much simpler than that, either.



4.07 is the reference yes. 4.12 is 60% slower than that, I don't know what is so difficult to understand, it's has nothing to do with memory or anything else. You dont have a 64/128 chip to only run it on 5 cores to keep the same productivity because of a software change. Is there a mod/developer that can possibly comment on this issue?

Resource contention, nothing less, nothing more. Nothing exotic or unknowable. ZEN architecture is strong on memory bandwidth but bad on memory latency. Even if you have relatively larger cache there will be still lots of memory access that has to be fulfilled from main memory. Because of that too many core doing access will overload memory bus and their performance will drop of sharply. (Especially when latency of memory access is already high)

I suggest you experiment with number of concurrent task for RAH to find equilibrium. Alternately, you could try to see if you can get memory to higher frequency without worsening its timing. Also ensure that there is no swapping. (IIRC each 4.12 task needs about 2GB of RAM which amounts to about 128GB of RAM in use beside all other processes already running)


For the last 12 hours it's been using only 28 cores. No difference in performance unfortunately. I'll probably switch to folding at home with it as it costs the same to run regardless of how many cores are running. The folding software scales performance with core count so no issues there.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93381)
Posted 4 Apr 2020 by strongboes
Post:
[snip]

Yes, smt/hyper threading is off. The point is though, on my Intel laptops, the speed has remained the same between 4.07 to 4.12. On my tr it's dropped over 60% regardless of what I try, I can't put it much simpler than that.

Note that all of your Intel laptops have a much lower number of cores, and will therefore have much less of a problem with too many cores trying to share the limited speed path to main memory. I can't put it much simpler than that, either.



4.07 is the reference yes. 4.12 is 60% slower than that, I don't know what is so difficult to understand, it's has nothing to do with memory or anything else. You dont have a 64/128 chip to only run it on 5 cores to keep the same productivity because of a software change. Is there a mod/developer that can possibly comment on this issue?
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93373)
Posted 4 Apr 2020 by strongboes
Post:
4.12 is running very poorly on AMD.
[snip]

On CPUs with very high numbers of cores, the speed of the path from the CPU to the memory is usually inadequate for effective use of all the cores at once, especially for programs with heavy use of the L3 cache in the CPU. The 4.12 program is significantly larger than the 4.07 program, and therefore likely to have heavier L3 use.

You may need to experiment with various numbers of cores in use, and draw a graph of total throughput per day versus number of cores in use, so you can get the maximum total throughput in a day.

Whether the CPU was made by AMD or by Intel is not the major factor in this problem.


Well each core has 4.2mb of L3, more than the Intel chips. I've tried with 8mb of L3 per core, no difference.

The i5 processor has only 3MB in total 1.5 per core. It clearly isn't down to L3 cache levels.

I'm only running it on the 64 cores, so has 2gb per core, although not running the full 64. I've tried on just 32 cores, makes no difference. Like I said, was running at 3 times the speed of the intel laptops which are slower and have considerably less cache on the 4.07 tasks, now regardless it's running marginally slower than them. That is an enormous drop.

ssd drive

There are other posts from people running AMD chips noticing their temps and power usage is reduced, all indicating less work being done, for whatever reason.

My clock speed is also up depending on wu between 10% and 30%, which again suggests that the chip is running faster but doing less per cycle.

I misread your total amount of main memory.

All of the cores must share the same path to main memory. With today's memory speeds, that means that each core in use spends most of its cycles waiting for access to the main memory, rather than doing any useful work, if many cores are in use. This reduces the power used, and therefore the amount of heat generated.

If you are using less than half of the total number of processors on the CPU chip, turning off hyperthreading, or AMD's equivalent, often helps somewhat.

I'm not saying that which company made the CPU has no effect, I just expect it to be much less than many cores using the same path to main memory. Recompiling 4.12 to take advantage of the things the AMD CPU has but Intel CPUs don't would offer some help, but that's not something I can do.



Yes, smt/hyper threading is off. The point is though, on my Intel laptops, the speed has remained the same between 4.07 to 4.12. On my tr it's dropped over 60% regardless of what I try, I can't put it much simpler than that.
13) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93370)
Posted 4 Apr 2020 by strongboes
Post:
4.12 is running very poorly on AMD.

Average processing rate is reasonable measure to go on. 4.07 my 3990x was regularly hitting over 30 GFLOPS on x86_64. If you look now its been left on 19 but that's an anomaly.

4.12 I can't get it over 11.5 no matter how many cores I run. Yet 2 of the 3 old laptops I have running this with vastly slower chips are hitting over 12 running the same work units.

Something is clearly wrong and I did notice immediately but have had no response to other posts.

On CPUs with very high numbers of cores, the speed of the path from the CPU to the memory is usually inadequate for effective use of all the cores at once, especially for programs with heavy use of the L3 cache in the CPU. The 4.12 program is significantly larger than the 4.07 program, and therefore likely to have heavier L3 use.

You may need to experiment with various numbers of cores in use, and draw a graph of total throughput per day versus number of cores in use, so you can get the maximum total throughput in a day.

Whether the CPU was made by AMD or by Intel is not the major factor in this problem.


Well each core has 4.2mb of L3, more than the Intel chips. I've tried with 8mb of L3 per core, no difference.

The i5 processor has only 3MB in total 1.5 per core. It clearly isn't down to L3 cache levels.

I'm only running it on the 64 cores, so has 2gb per core, although not running the full 64. I've tried on just 32 cores, makes no difference. Like I said, was running at 3 times the speed of the intel laptops which are slower and have considerably less cache on the 4.07 tasks, now regardless it's running marginally slower than them. That is an enormous drop.

ssd drive

There are other posts from people running AMD chips noticing their temps and power usage is reduced, all indicating less work being done, for whatever reason.

My clock speed is also up depending on wu between 10% and 30%, which again suggests that the chip is running faster but doing less per cycle.
14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93358)
Posted 4 Apr 2020 by strongboes
Post:
4.12 is running very poorly on AMD.

Average processing rate is reasonable measure to go on. 4.07 my 3990x was regularly hitting over 30 GFLOPS on x86_64. If you look now its been left on 19 but that's an anomaly.

4.12 I can't get it over 11.5 no matter how many cores I run. Yet 2 of the 3 old laptops I have running this with vastly slower chips are hitting over 12 running the same work units.

Something is clearly wrong and I did notice immediately but have had no response to other posts.
15) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93071)
Posted 2 Apr 2020 by strongboes
Post:
One thing to watch for when using CPUs with especially high numbers of cores - the bandwidth from the CPU to the memory may not be adequate to run all of the cores very well. This could leave each core in use waiting for access to memory most of the time,

If so, it can be useful to reduce the number of cores BOINC is allowed to use and see if that speeds up the work enough to more than compensate for fewer cores in use.


If you read previous posts you will see that i'm not hyper threading and have large l3 cache and ram, I tried running just 10 cores also. It isn't that, they run roughly 4 times slower than 4.07 if they start with rb, It will be obvious soon enough.
16) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93054)
Posted 2 Apr 2020 by strongboes
Post:
see below, there are no 4.07 tasks left showing, there was 9000 yesterday only 400 today, the mini was taking around an hour but gives an idea. the 4.07 were averaging a 40 min runtime, with a rate of 1 credit for 11.5 secs of runtime on average. 3600/11.5 = 313

The last 4.12 is running at 1 credit for 59.95 seconds of runtime. 4.7* slower

https://boinc.bakerlab.org/rosetta/results.php?hostid=3800945&offset=340&show_names=0&state=4&appid=
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93048)
Posted 2 Apr 2020 by strongboes
Post:
This is the 3990x so 64 cores, 128 threads, I've turned off smt so only running the 64 so as to give more l3 cache per core which allows the tasks to progress very rapidly. It's a fantastic chip.
18) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93046)
Posted 2 Apr 2020 by strongboes
Post:
the first of the new tasks has just finished, took 4 hours to run the 1 decoy for me, these were definitely running under an hour previously. If you have your runtime to 4 hours you wont really notice the difference in time, but i'm more concerned with the actual work being done by the program. If points are an accurate indication then with 4.07 I was running at an average of 300pts per hour per core, this just finished task has returned 300 points in 4 hours, which ties in with my thinking they are not running efficiently.

Is there a mod reading who can make a comment?

edit, there are 60 of these now finishing so plenty to look athttps://boinc.bakerlab.org/rosetta/result.php?resultid=1138591491
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93041)
Posted 2 Apr 2020 by strongboes
Post:
I can give a little further info also, my cpu is currently 99% utilised. boinc is running 60 cores, 2 are running gpus for folding, 2 spare for overhead. normally when boinc is running with all cores running the clock speed is approx 3.2ghz, and it will pull as many watts as i let it (doubling the power with an overclock only get me to 3.55ghz) , at the moment it's pulling 15% less power, and the clock speed is up at 4.2ghz for all cores. If each core was being run hard it would be impossible for it to run this speed. This is the speed it normally runs with say 3 or 4 cores loaded.

Imo 4.12 is not making use of the cpu properly, it's taking 4-6 times longer to complete a decoy which ties in with the fact my cpu is running a very high clock speed which indicates the cores are doing very little work.
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93039)
Posted 2 Apr 2020 by strongboes
Post:
I've run a few 4.12 tasks overnight (around 80), first impressions are that it runs each decoy approx 4* as long, my preference is set for an hour, and my threadripper is quite quick, but it's taking between 3-4 hours to run one decoy if it starts with rb, under 4.07 I would hit 1 decoy in under an hour 99% of the time. The design task has run the same speed but only given 2 points credit per decoy.

4.12 is not looking productive from my end as an end user. It would take my slower processors in laptops etc nearly 8 hours or more for 1 decoy going by results by far.

I've just been sent a further 60 tasks with rb prefix so ill see how they run. these are a different batch.

You've just run 80 tasks overnight and received 60 more
You have 2500 tasks available to run on your threadripper which has 64-cores and 132Gb RAM running 1hr tasks

What is it about 4.12 that isn't looking productive from your pov?
Because after several days with very few tasks available at all I'd kill for any of the problems you're currently having
Quite astonishing.



The tasks in progress is incorrect, I reset the project twice this week due to multiple downloads failing so they arent really there as discussed in a different thread.

I'm saying it doesn't look productive because the decoys are taking approximately 4 to 6 times longer to process. If you watch the graphics, it gets to a certain number of steps and then almost stops, taking 30-60 minutes for each additional step.

Half last night before I went to bed stopped at step 24600, then took 30 mins to do step 24601 etc.

So that's what I mean, it is taking 4-6 times longer to process the same work, so it appears.

The latest batch which are rb 04 01 20235 19963 ab t000 robetta cstwt... Are currently on 2 hours 49, 56% on first decoy. Looks like 5hrs to run. 4.07 was running very similar tasks under an hour.


Next 20



©2024 University of Washington
https://www.bakerlab.org