Posts by Keith Myers

1) Message boards : News : Outage notice (Message 99695)
Posted 16 Nov 2020 by Keith Myers
Post:
I wondered why my 1GB of memory RPi3B+ was frozen for a day and only showing 3% cpu utilization. Found out it was trying to crunch a horns task. Aborted it just as soon as I looked at the name of the task. Much better now.
2) Message boards : Number crunching : If You Don't Know Where to Put it, Post it here. (Message 99660)
Posted 13 Nov 2020 by Keith Myers
Post:
So you are saying I can upgrade my Threadripper 1920X to a Zen 2 or 3 cpu? I was told I was stuck with what I have, no I'm NOT unhappy with it but a new cpu is FAR cheaper than a whiole new mb etc. No I don't remember the mb model off the top of my head.
Sid covered it- that motherboard (possibly with a BIOS upgrade) will allow you to upgrade to a 2900 series Threadripper (Zen+). Although you would have to check with the Manufacturer as to whether it is cable of supporting the 2970WX & 2990WZX CPUs as they require way more power than the lower end CPUs (250W v 180W).

The 3900 series CPUs (Zen2) required the TRX4 socket, and although nothing has been announced yet, it's highly likely the yet to be released Ryzen 3 series Threadrippers will also be usable in those existing motherboards.
Not as good support as for their main stream CPUs across multiple chipsets, but still way better than Intel to date.


Wait a bit longer for the Zen 3 Threadrippers to arrive. Maybe by the end of the year as that is the same timeframe the Epyc Milan Zen 3 server chips are supposed to arrive.

Will use the same TRX40 socket as the earlier Zen 2 Threadrippers but does require a new BIOS to support the new cpus.
3) Message boards : Number crunching : If You Don't Know Where to Put it, Post it here. (Message 99659)
Posted 13 Nov 2020 by Keith Myers
Post:
Don't be too hopeful of upgrading from one version of Zen to anotehr with the same motherboard, they too often require a new motherboard with each version of new cpu they come out with.


Rubbish.
Intel, yes. But AMD? Not true.


So you are saying I can upgrade my Threadripper 1920X to a Zen 2 or 3 cpu? I was told I was stuck with what I have, no I'm NOT unhappy with it but a new cpu is FAR cheaper than a whiole new mb etc. No I don't remember the mb model off the top of my head.

Upgrade your 1920X to a 2920X. You missed out on the great prices late last year and earlier this year. I bought mine at launch for the MSRP of $650. Unfortunately the current prices have rebounded back to almost MSRP.



Same socket and memory needed. Just remove the 1920X and drop in the 2920X. There ARE some generational improvements, mainly in higher memory clock capabilities from a better IMC.
I run mine all-core at 4150Mhz @1.30V and the 32GB of memory at 3466Mhz CL14.

Still no match for my Ryzen 9 3950X with 32GB CL14 3600Mhz memory running at 4250Mhz all-core at 1.25V. But you would need a new motherboard platform for Ryzen Zen 2 or Zen 3.

There really isn't any current bargains on the 32 thread 2950X either which your motherboard supports. Same power TDP of 180W.

[Edit] Actually, right now Amazon has the 2950X for $590 which is a good deal.

https://www.amazon.com/dp/B07GFN6CVF?tag=pcpapi-20&linkCode=ogi&th=1&psc=1
4) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 99071)
Posted 21 Sep 2020 by Keith Myers
Post:
The problem with BOINC is all the legacy code that is still in the source. Too many patches upon patches. Half the code does nothing now as the hardware and project types have moved on from what BOINC was originally written for. BOINC needs a complete fresh slate rewrite.

And BOINC is still trying to be all things for every platform and every user. Will never be optimized for any platform. Should have separate clients for low power devices like phones, cpu only hosts, gpu only hosts etc. etc.

And if you are a capable developer, you can make changes to the source to improve the client and just compile it for your own specific needs. Just don't bother submitting it to the official BOINC developers because it will never get out of committee and will languish in the proposed tree forever.
5) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98906)
Posted 9 Sep 2020 by Keith Myers
Post:
And I am guilty of not running the latest BIOS. Normally, unless there is some absolutely compelling reason to update to a new BIOS, I believe in "if it ain't broke, leave it alone" From the OCN forums, the latest BIOS only had marginal, if any improvements over the one I am on. And nowhere near consensus on just what the supposed improvements were. I don't push to the extremes that most of the forum members do, so just like to stay in the middle ground.
6) Message boards : Number crunching : Rosetta Just swamp my computer (Message 98901)
Posted 8 Sep 2020 by Keith Myers
Post:
In your BOINC settings check what you have in your memory settings for "in use" and "when not in use". Default values are 60% and 90%. I wouldn't suggest going over 90%.

I have use 94% of the cpus and 90% of the memory when not in use. 85% when in use. 32 cores and 32GB of memory.

Since everyone is getting all these buggy tasks, I simply will not run Rosetta until it all gets sorted out. Then might come back or not. Plenty of work available for my other projects. TN-Grid and GPUGrid are doing Covid-19 or precursor work equivalent to what Rosetta does.
7) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98900)
Posted 8 Sep 2020 by Keith Myers
Post:
I don't know. It could be some corner case happening. I regularly find those when I test things for developers. Einstein has a setting in Project preferences for setting a LIBC215 flag for getting the correct applications.
But probably just some iffy setting in my environment that Rosetta does not like. Or some weakness in the hardware that only Rosetta uncovers.
8) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98869)
Posted 8 Sep 2020 by Keith Myers
Post:
but the same applications on similar CPUs were OK then it would be a problem with that CPU type (needing a micro-code fix, or a specific fix for that CPU in the Application). If the errors were produced by a particular application on a particular CPU, but other applications on that same CPU work OK, then it'd be a problem with the application.

But we still do not know that. As far as I can tell in my research in various threads here and at Seti and Einstein, if an application is written expecting the deprecated VSYSCALL function to be available, the application will segfault. Only applies to Linux systems. Not applicable in Windows.
9) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98868)
Posted 8 Sep 2020 by Keith Myers
Post:
What do you mean a "poor tester for Linux"? It tests the physical RAM, and it doesn't matter what OS you run on the machine afterwards.

It is the opinion of the memory testers at OCN that Memtest is a particularly poor tester. Does not test very thoroughly and also not very consistently. Since those experts have much more experience than I, I trust their opinions.
10) Message boards : Number crunching : Rosetta Just swamp my computer (Message 98852)
Posted 8 Sep 2020 by Keith Myers
Post:
Well I just decided to stop Rosetta. During these wild memory excursions that stole all my memory and froze my system for a minute, I discovered that the GPUGrid tasks that were running at the same time errored out. And the error message matched what I had been seeing on the very rare occasion when a GPUGrid task failed. So cause and effect indicates that the Rosetta tasks running were the culprit. So have set NNT on Rosetta and will run without them for a while and see if I stop getting those random errors.
11) Message boards : Number crunching : Rosetta Just swamp my computer (Message 98849)
Posted 8 Sep 2020 by Keith Myers
Post:
Rosetta just swamped the memory and swap of my computer, thus causing it to freeze up? I'm running a AMD64 Ryzen 5 1800X with 32Gb of ram. Anyone else run into this?


===
hangint3n

Rosetta has dodgy tasks quite often. I just ran into one that grabbed all available memory and swap every ten minutes and thus freezing the computer for a minute until the task released the memory.
Well it was this Rosetta task kp8RjDVk_fold_and_dock_SAVE_ALL_OUT_1009390_201. It is grabbing all the memory and the swap file every five minutes or so.


So if I do nothing it will release the memory? How long does this take? I'm using a Linux OS not Windows.

It released the memory back to my static level after about a minute. Then ten minutes later it would repeat. Someone replied a similar post had seen the same issue. So I decided the task was bad and aborted it.
Using Ubuntu 20.04.1 OS.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=12554&postid=98814
12) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98847)
Posted 8 Sep 2020 by Keith Myers
Post:
I thought the Boinc default was quite small, although since I immediately change it on installation I may have misremembered.

Are you telling me Rosetta actually sends 10 days of work with a 3 day deadline? Boinc can't be that useless.

Ha ha LOL. My very first connection to Rosetta upon joining sent me 246 tasks in a single download after congratulating me for joining. With a 3 day deadline. Had to abort all but ten after setting NNT before the next scheduler connection or it would have kept sending work.
13) Message boards : Number crunching : Rosetta Just swamp my computer (Message 98844)
Posted 8 Sep 2020 by Keith Myers
Post:
Rosetta just swamped the memory and swap of my computer, thus causing it to freeze up? I'm running a AMD64 Ryzen 5 1800X with 32Gb of ram. Anyone else run into this?


===
hangint3n

Rosetta has dodgy tasks quite often. I just ran into one that grabbed all available memory and swap every ten minutes and thus freezing the computer for a minute until the task released the memory.
Well it was this Rosetta task kp8RjDVk_fold_and_dock_SAVE_ALL_OUT_1009390_201. It is grabbing all the memory and the swap file every five minutes or so.
14) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98833)
Posted 7 Sep 2020 by Keith Myers
Post:
If I don't run Rosetta, I don't get any errors on any of my other projects.
The thing is: other people don’t get any errors on the Rosetta tasks that fail on your machine. Rosetta seems to be uncovering a fault that those synthetic stress testers fail to detect.

Well neither Prime95 or y-cruncher are synthetic applications. They are real compute loads like Rosetta. And yet they uncover no memory issues or cause sigsegv errors.
15) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98832)
Posted 7 Sep 2020 by Keith Myers
Post:
If I don't run Rosetta, I don't get any errors on any of my other projects.
Yet you're the only one that is having signal 11 issues with WUs that others can process with no problems at all- even with the same application.

Signal 11 indicates a memory problem. The problem only occurs with Rosetta Tasks- which in general use way more RAM than other projects Tasks. And if you've been getting these errors since before the faulty memory pig WUs came out.
Everything points towards a hardware memory issue- be it too much/too little voltage, to much overclock, or just a dodgy address(es).
*shrug*

Not arguing with you. As I previously stated, I guess Rosetta tasks work the memory harder than any other project. The Einstein GW tasks are supposedly very hard on memory yet I have no issues. The TN-Grid tasks which are also molecular modeling like Rosetta have no issues.
And I never got any response from my question about VSYSCALL=emulate needed or not for Rosetta apps. Maybe that is the problem. I can either continue to run tasks here and have errors or give up completely. No skin off my nose as far as I am concerned. Only using 2 of 30 cores so not losing too much compute time.
16) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98828)
Posted 7 Sep 2020 by Keith Myers
Post:
Well I have no clue on how to troubleshoot the issue. As I stated no issues with any other tasks. I have 32GB of memory and memory usage is only 25% of max. I see less than 1GB or memory usage on the Rosetta tasks.


I swear by "Memtest 86" (or "Memtest 86+"), whichever works on your system - one doesn't work on older machines and one doesn't work on newer ones, I can't remember which. You download it for free, it makes a bootable OS-independant CD, and you run it for about an hour or so until it says "pass complete". Even one single RAM error reported, you need to replace the RAM. You can easily find out which chip is faulty by testing one at a time.

I swear by stressapptest. My systems pass 24 hours of memory testing using all available memory and all available cores with no errors.


I'd never heard of that, I assume it does the same thing as Memtest. Does it run within the OS? If so I'd not trust it, as the OS can't let it test memory in use by the kernel.

Well I first used Memtest from a USB stick. But the memory testers on OCN state that is a very poor tester for Linux. They recommend the Google stressapptest. That is the one Google developed to test their servers that they deploy in their AWS farms before putting them into service. It is a standard application in the repositories.

I then follow up the memory stress testing with several hours of Prime95 and y-cruncher to put the system under actual compute loads to make sure it is stable before starting up BOINC with my actual loads. Closest I can come to actual BOINC loads. But BOINC is the final arbiter of stability. If I don't run Rosetta, I don't get any errors on any of my other projects.
[Edit]
Here are some links about it.
https://www.ghacks.net/2009/10/19/google-stress-app-test/
https://rog.asus.com/forum/showthread.php?73665-Our-preferred-memory-stress-test
17) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98823)
Posted 7 Sep 2020 by Keith Myers
Post:
[Edit 2] Well it was this Rosetta task kp8RjDVk_fold_and_dock_SAVE_ALL_OUT_1009390_201. It is grabbing all the memory and the swap file every five minutes or so.
It's a resend, this is what the first system got with it.

            Outcome Computation error
       Client state Compute error
        Exit status 1 (0x00000001) Unknown error code
        Computer ID 5159178
           Run time 19 min 44 sec
           CPU time 18 min 38 sec
     Validate state Invalid
             Credit 0.00
  Device peak FLOPS 3.28 GFLOPS
Application version Rosetta v4.20 windows_x86_64



Stderr output
<core_client_version>7.0.80</core_client_version>
<![CDATA[
<message>
Función incorrecta.
 (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @kp8RjDVk_fold_and_dock_flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip fold_and_dock_kp8RjDVk_data.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3873245
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: Error in core::kinematics::FoldTree::get_jump_that_builds_residue(): This residue is not the child of (built by) a jump!
ERROR:: Exit from: ......srccorekinematicsFoldTree.cc line: 436
BOINC:: Error reading and gzipping output datafile: default.out
16:00:04 (3796): called boinc_finish(1)

</stderr_txt>
]]>

Thanks for the reply. That report is exactly what I am seeing on this task. My memory usage for the task climbs from 1GB all the way to all memory and swap in use for the task every ten minutes or so and then falls back to normal. Looking at it in htop was what allowed me to figure out the culprit.
So I assume a faulty work unit and I will just abort it now.
18) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98821)
Posted 7 Sep 2020 by Keith Myers
Post:
Well I have no clue on how to troubleshoot the issue. As I stated no issues with any other tasks. I have 32GB of memory and memory usage is only 25% of max. I see less than 1GB or memory usage on the Rosetta tasks.


I swear by "Memtest 86" (or "Memtest 86+"), whichever works on your system - one doesn't work on older machines and one doesn't work on newer ones, I can't remember which. You download it for free, it makes a bootable OS-independant CD, and you run it for about an hour or so until it says "pass complete". Even one single RAM error reported, you need to replace the RAM. You can easily find out which chip is faulty by testing one at a time.

I swear by stressapptest. My systems pass 24 hours of memory testing using all available memory and all available cores with no errors.
19) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98819)
Posted 7 Sep 2020 by Keith Myers
Post:
I only run two Rosetta tasks at a time at most. The one task that I mentioned uses all available memory (32GB) plus all of the 6GB swap file every ten minutes or so. Must be writing out to a scratch file or something. Most single task memory usage I ever saw before on any Rosetta task was around 4GB. What prompted me to bump to 32GB in the first place.
So this task species is most definitely an extreme outlier.

As far as changing settings, since no other tasks from no other projects have any issues, the solution is just to quit crunching Rosetta.
20) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 98810)
Posted 7 Sep 2020 by Keith Myers
Post:
And I do successfully crunch half of the Rosetta tasks that get sent to me.
Which shows it is a system issue.
Other systems running the same Linux application are processing WUs that fail on yours, and theirs run to their Target CPU time & Validate.
And they don't have computation errors in their Tasks list. Maybe late to Validate, or Cancelled by server- but no computation errors.

Well I have no clue on how to troubleshoot the issue. As I stated no issues with any other tasks. I have 32GB of memory and memory usage is only 25% of max. I see less than 1GB or memory usage on the Rosetta tasks.
[Edit] While typing this response something just grabbed all my memory and used the 6GB swap file for some reason for the first time. Couldn't catch which application was the memory hog. It was only there for about 5 seconds.
[Edit 2] Well it was this Rosetta task kp8RjDVk_fold_and_dock_SAVE_ALL_OUT_1009390_201. It is grabbing all the memory and the swap file every five minutes or so.


Next 20



©2024 University of Washington
https://www.bakerlab.org