Posts by QuantumEthos

1) Message boards : Number crunching : Hack4Change - robo wars - using robotics to enhance the web & cyber security - & do boinc HPC projects! (Message 89056)
Posted 4 Jun 2018 by QuantumEthos
Post:
Hack4Change - robo wars - using robotics to enhance the web & cyber security - & do boinc HPC projects!

https://t.co/MEmrbLrCnj

Network,HPC - data science - configuration and recommendations (c)RS - for NGO - HPC - Universities & Schools.

http://bit.ly/CF-Priorities-HPC-RS
http://bit.ly/SiteFetchOptimaHPC-RS
http://bit.ly/HPCCloudOperationsPhotoset - es.int

https://science.n-helix.com
https://gpugrid.n-helix.com
https://rosetta.n-helix.com/
https://boinc.n-helix.com/

https://trackjs.com/how/

https://www.isc.org/downloads/bind/

http://science.n-helix.com/2017/04/boinc.html
2) Message boards : Cafe Rosetta : The Scientist - a bit of fun and inspiration for us all (Message 87690)
Posted 14 Nov 2017 by QuantumEthos
Post:
https://www.youtube.com/watch?v=QRiA39VrUoE

Slim shady - "the superman"

the slim shady has got to be one figure of a supremacy...
the beauty and his holy majesty,

Brother have you seen like me; Been on MTV just like him!
gods got one in the band... got the other in remand!
detention is extacy ... the moment you live in memory!

So don't let that moment go to waste,
Heres the hero who will make you paste...

no don't let that money waste..
you are far to good to be gone in haste!

big ups Eminem and the D12 crew...
and the science you knew!

into the memory that flew...
oh heavens watch what you chew!
spit bar and knew.

RS
3) Message boards : Number crunching : working for humanity - Computer Optimization - CPU , GPU & RAM - PC, Mac & ARM development and programming guide with SDK's (Message 87668)
Posted 11 Nov 2017 by QuantumEthos
Post:
https://github.com/ctuning/ck - data & program - testing and tuning
4) Message boards : Number crunching : Readable Ideas that may change the speed of boinc and Research - add yours - firstly the Berkeley labs 100 award for science list (Message 87660)
Posted 10 Nov 2017 by QuantumEthos
Post:
TCell function controller's explored

https://www.scientificamerican.com/custom-media/exploring-the-tumor-microenvironment/

have you explored the MS link to the immune-system inhibitor ?

(c)RS
5) Message boards : Number crunching : Readable Ideas that may change the speed of boinc and Research - add yours - firstly the Berkeley labs 100 award for science list (Message 87659)
Posted 10 Nov 2017 by QuantumEthos
Post:
of interest = genetic & cancer

topic = work

for = developers + propagation

the Can-DLE project is deeply meaningful to cancer research .... and genetics : please learn more about their theorems and programming development ...

https://cbiit.cancer.gov/ncip/hpc/candle


the following is a long winded article mostly going on about how much work they did watching the screen as the HPC Exo-scale did the work debugging code ;P but interesting non the less

https://www.hpcwire.com/2017/11/07/sc17-ai-machine-learning-central-computational-attack-cancer/

RS
6) Message boards : Cafe Rosetta : Unveiled: Earth’s Viral Diversity DOE - relevance to Rosetta 100% (Message 87653)
Posted 9 Nov 2017 by QuantumEthos
Post:
https://jgi.doe.gov/unveiled-earth-viral-diversity/

Unveiled: Earth’s Viral Diversity

Environmental datasets help researchers double the number of microbial phyla known to be infected by viruses.

viral diversity graphic by Zosia Rostomian, Berkeley Lab
DOE JGI researchers utilized the largest collection of assembled metagenomic datasets from around the world to uncover over 125,000 partial and complete viral genomes, the majority of them infecting microbes. (Graphic by Zosia Rostomian, Berkeley Lab)
The number of microbes in, on, and around the planet – on the order of a nonillion, or 1030 – is estimated to outnumber the stars in the Milky Way. Microbes are known to play crucial roles in regulating carbon fixation, as well as maintaining global cycles involving nitrogen, sulfur, and phosphorus and other nutrients, but the majority of them remain uncultured and unknown. The U.S. Department of Energy (DOE) is targeting this “microbial dark matter” to better understand the planet’s microbial diversity and glean from nature lessons that can be applied toward energy and environmental challenges.

Plumbing the Earth’s microbial diversity, though, requires learning more about the poorly-studied relationships between microbes and the viruses that infect them, viruses that impact the microbes’ abilities to regulate global cycles. Although the number of viruses is estimated to be at least two orders of magnitude more than the microbial cells on the planet, there are currently less than 2,200 sequenced DNA virus genomes, compared to the approximately 50,000 bacterial genomes, in sequence databases. In a study published online August 17, 2016 in Nature, researchers at the DOE Joint Genome Institute (JGI), a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, utilized the largest collection of assembled metagenomic datasets from around the world to uncover over 125,000 partial and complete viral genomes, the majority of them infecting microbes. This single effort increases the number of known viral genes by a factor of 16, and provides researchers with a unique resource of viral sequence information.

“It is the first time that someone has looked systematically across all habitats and across such a large compendium of data,” said study senior author and DOE JGI Prokaryote Super Program head Nikos Kyrpides. “A key to uncover all these novel viruses was the sensitive computational approach we have developed along this work.”

“A key to uncover novel viruses”
That approach, explained first author and postdoctoral fellow David Paez-Espino, involved using a non-targeted metagenomic approach, referencing both isolate viruses and manually curated viral protein models, and what he described as “the largest and most diverse dataset to date.” The team analyzed over 5 trillion bases (Terabases or Tb) of sequence available in the DOE JGI’s Integrated Microbial Genomes with Microbiome Samples (IMG/M) system collected from 3,042 samples around the world from 10 different habitat types. Their efforts to sift through the veritable haystack of datasets yielded over 125,000 viral sequences containing 2.79 million proteins.

The team matched viral sequences against multiple samples in multiple habitats. For example, one viral group they identified was found in 95 percent of all samples in the ocean’s twilight zone – a region located between 200 and 1,000 meters below the ocean surface where insufficient sunlight penetrates for microorganisms to perform photosynthesis.

By analyzing a CRISPR-Cas system – an immune mechanism in bacteria that confers resistance to foreign genetic elements by incorporating short sequences from infecting viruses and phages – the team was able to generate a database of 3.5 million spacer sequences in IMG. These spacers, fragments of phage genetic sequences retained by the host, can then be used to explore viral and phage metagenomes for where the fragments may have originally come from. Also, using mainly this approach, the team computationally identified the host for nearly 10,000 viruses. “The majority of these connections were previously unknown, and include the identification of organisms serving as viral hosts from 16 prokaryotic phyla for which no viruses have previously been identified,” they reported.

Beacons for CRISPR-Cas proteins
Jan-Fang Cheng, head of the DOE JGI’s Functional Genomics group, said the work being done by Kyrpides’ group in identifying new viral sequences will help the Synthetic Biology group develop novel promoters that can work in many bacterial hosts. “We are constantly searching for regulatory DNA parts that will work across many different phyla, and that would allow us to build genes and pathways that can express in many different hosts.”

Cheng also anticipated that the expanded viral sequence space generated by Kyrpides’ team will allow researchers to look for other genetic sequences known as proto-spacer adjacent motifs (PAMs). These sequences lie next to spacer sequencers in phages and are used as beacons by CRISPR-Cas proteins, triggering actions such as editing or regulating a gene. “People are looking for new PAM sequences and new Cas9s, and with this new information, if you can map the spacer sequence back to the same phage and align them and see what’s in common in neighboring sequences, then you could ID new PAM sequences.”

“We believe that the finding of many large phages including the longest phage genome reported thus far points to the limitations of conventional virome enrichment and sequencing strategies which may bias the studies against the highly novel viruses with unusual properties”, said Natalia Ivanova, group lead in the Super Program and co-author of this study.

“One of the most important aspects of this study is that we did not focus on a single habitat type. Instead, we explored the global virome and examined the flow of viruses across all ecosystems,” said Kyrpides. “We have increased the number of viral sequences by 50x, and 99 percent of the virus families identified are not closely related to any previously sequenced virus. This provides an enormous amount of new data that would be studied in more detail in the years to come. We have more than doubled the number of microbial phyla that serve as hosts to viruses, and have created the first global viral distribution map. The amount of analysis and discoveries that we anticipate will follow this dataset cannot be overstated.”

The work also used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory.
7) Message boards : Cafe Rosetta : On behalf of IBM - World community grid is offering the service of, The HPC Research Cloud for science proposals in Environments and key health issues.. Contact IBM & World Community grid for details. (Message 87652)
Posted 9 Nov 2017 by QuantumEthos
Post:
On behalf of IBM - World community grid is offering the service of,
The HPC Research Cloud for science proposals in Environments and key health issues..
Contact IBM & World Community grid for details.

https://www.worldcommunitygrid.org/research/viewSubmitAProposal.do

http://www.research.ibm.com/university/awards/shared_univ_research.shtml

http://research.ibm.com/energy-and-environment/
8) Message boards : Number crunching : Readable Ideas that may change the speed of boinc and Research - add yours - firstly the Berkeley labs 100 award for science list (Message 87650)
Posted 9 Nov 2017 by QuantumEthos
Post:
thank you for the meaningful reply, much appreciated RS
9) Message boards : Number crunching : Readable Ideas that may change the speed of boinc and Research - add yours - firstly the Berkeley labs 100 award for science list (Message 87638)
Posted 8 Nov 2017 by QuantumEthos
Post:
Readable Ideas that may change the speed of boinc and Research - add yours - firstly the Berkeley labs 100 award for science list

https://science.energy.gov/about/honors-and-awards/rd-100-awards/2017-RD-100-Award-Finalists

Office of Science Laboratory: Lawrence Berkeley National Laboratory

Other Partners: N/A

Name of Project: Double Barcoded Shotgun Expression Library Sequencing (Dub-seq)

Double Barcoded Shotgun Expression Library Sequencing (Dub-seq) is a technology for discovering the functions of genes in microbes under different environmental conditions. Because Dub-seq can process large amounts of genetic information at once, it is faster, cheaper, more flexible, and requires less work than previous genetic analysis technologies. Scientists can adapt it to a variety of biotechnologies, such as discovering new enzymes, finding new cancer drugs, gaining insight into resistance to viruses, and understanding how antibiotics act on microbes that cause disease.

****

Office of Science Laboratory: Lawrence Berkeley National Laboratory

Other Partners: University of Illinois-Champaign

Name of Project: CrunchFlow

CrunchFlow is a software package that simulates how chemical reactions occur and change as fluids travel underground. CrunchFlow includes a number of chemical and physical processes that similar products do not, such as changes in how easily water can move through rocks. All of these features are available in a single package that users with a variety of expertise can run on a desktop computer. With CrunchFlow's computational efficiency, scientists can achieve high spatial resolution while extending simulations far back in geologic time. By improving the accuracy of a range of Earth and environmental sciences applications, CrunchFlow helps scientists better understand current and past ecological systems below the Earth's surface.

****

Office of Science Laboratory: Oak Ridge National Laboratory

Other Partners: SepQuant, Inc.

Name of Project: dropletProbe Surface Sampling System for Mass Spectrometry

The dropletProbe system, developed with support from Oak Ridge National Laboratory, is a completely new means of surface sampling for mass spectrometry, a major scientific technique for measuring the masses of chemicals in a sample. The dropletProbe system provides rapid, simple chemical extraction and analysis for a host of scientific applications. It is a low cost, low-maintenance, and nondestructive method for sampling complex analytical surfaces, such as biological tissue samples. It provides scientists with a high degree of precision for targeting specific areas on the sample. By reducing cost and improving accuracy, this tool should help increase the pace of scientific discovery.

****

Office of Science Laboratories: Pacific Northwest National Laboratory; Lawrence Berkeley National Laboratory

Other Partners: National Energy Technology Laboratory; Los Alamos National Laboratory; Lawrence Livermore National Laboratory

Name of Project: National Risk Assessment Partnership (NRAP) Toolset

Deep underground geologic formations offer promising places to safely and effectively store large volumes of carbon dioxide (CO2) generated from burning coal, oil, and natural gas. The National Risk Assessment Partnership (NRAP) Toolset is the first complete suite of computer software that models possible environmental risks from potential storage sites, such as fluid leakage and earthquakes. The Tool-set draws on the expertise of five DOE national laboratories and is being used by more than 250 stakeholders from academia, regulatory agencies, and industry.
10) Message boards : Number crunching : working for humanity - Computer Optimization - CPU , GPU & RAM - PC, Mac & ARM development and programming guide with SDK's (Message 87632)
Posted 8 Nov 2017 by QuantumEthos
Post:
boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development

HPC - High Performance Computation for beneficial goals and obvious worth.

(Guide, experimentation, developer kit's and manuals)


by (c) Rupert Summerskill

http://esa-space.blogspot.com/

HPC Computing work load Photos http://bit.ly/HPCImpact

http://bit.ly/HPC-Dev

http://bit.ly/tRNG-Dev

the links are for easier reading than the hard to manage post on forum of particular note is the T/RNG post because Rosetta uses a lot of Random Entropy and we have drivers and specialized sources of random that are honest and true to the boinc and HPC projects needs.

thank you kindly for your understanding.
RS


Observing the workloads of many beneficial projects we find that commonly the workload data set is small,
In addition to the memory set being smaller or larger than a machine can compute optimally; we find that feature sets such as fae and avx have commonly not been implemented,

Some projects like asteroids at home and the seti project are using enhanced computation instruction sets ... like avx and memory loads that benefit from the 4gb or more ram that is available on decent gaming and home laptops.

Not all modern machines have loads of ram; However research and or university establishments use sufficiently powerful machines that can glow on the boinc record in full glory with a 256mb to 768mb workload,

In addition the machines are operand,xen ... commonly and servers may have such as Sparc or power pc specific hardware and instruction sets,

In order to examine examples .. below we can see workloads include small data arrays; in the 40mb to 79mb range..

In line with servers and gaming rigs .. we have 1gb of ram per core, of course not all issues require a larger array in the workload and some machines have 256mb per core !

However much Ram you allocate to the projected workload; small memory loads can and will be sufficient for data swapping and or paging (like DNA Replicators)...

Some task can sufficiently benefit from larger thread and data models, to my mind DNA and mapping data are fine examples of specific workloads; Where memory counts,

In addition thread count can be 4 or other numbers and i suggest that a single task can use more than one core and instruction set (neon for example or Symmetric threading FPU, SMT)

Specific workload optimisation, or rather generic with SSE and AVX and FPU threading and precision optimisation would be very cool while we deal with the workload running app

In particular the Ryzen multi-core is a new and exciting product,

So take care to read the guides in the lower half of the document, AVX2, RDSEED, ADX and additional encryption formats are some of the most exciting changes to the AMD Ryzen Arch.

AVX similarities to GPU core, Function of AVX can be thought of as CPU extension function of the same usage as GPU!
In short combined with FPU very much in the same performance category as the GPU cores and of much worth to scientific research and development of game dynamics, sound, video and spaces in N-Dimension space.

CPU extensions can prepare vector space for GPU to enhance the speed and optimize vector tables before GPU rendering and sound space in 3D for surround sound...
Interpolate texture, sound and other data with bit swapping.. In SIMD instructions.

RND Function can be used to explore additional data spaces.

Encryption function to enhance unpredictable behavior or to save space.

Further thought ... Efficiency :

add a MHz/Dhrystone's/MIP'S performance per watt to each system ...
then projects will further optimise workloads to improve upon workload energy & environmental efficiency versus work carried out.

Work Hours x Mhz / (efficiency per watt)
-------
Hours / % of projects finished with work completed

Also bear in mind that GPU's need watt efficiency and task management to optimise power used versus work done....

worker priority should always be :

efficiency + merit of the work
--------
time / % necessity

Please examine the issue further.


Rupert S

https://www.worldcommunitygrid.org

https://boinc.berkeley.edu/

http://www.charityengine.com/

http://esa-space.blogspot.com/

HPC Computing work load Photos http://bit.ly/HPCImpact

http://bit.ly/HPC-Dev

http://bit.ly/tRNG-Dev

http://esa-space.blogspot.ru/2017/04/rng-and-random-web.html - we need Chaos Seeds : Random seeds for our work

https://www.youtube.com/watch?v=mLQGXlxemlg - Optimizing HPC Service Delivery by a life time super computing tec

https://youtu.be/KbjFGQ9fHvw - Scaling and Optimizing Climate and Weather Forecasting Programs on Sunway TaihuLight - very exciting

https://insidehpc.com/2017/06/video-scaling-climate-weather-forecasting-sunway-taihulight/

HPC Best Practices..

http://www.intertwine-project.eu/best-practice-guides

AMD Platform Optimization - please read for all developers

https://community.amd.com/thread/213045 - particular instruction differences for microcode optimisation

http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf - code optimisation a few very important lessons... may seem simple to some but obviously is not to be taken for granted.

http://support.amd.com/TechDocs/24593.pdf - AMD64 Architecture Programmer’s Manual Volume 2: System Programming

CPU Optimisation - utility and function.

http://gpuopen.com/compute-product/codexl/ - CodeXL is a code efficiency analyser optimiser debugger for GPU and CPU and system.
https://github.com/GPUOpen-Tools/CodeXL/releases/latest

http://bit.ly/CoXLPhoto - CodeXL in action photos

http://www.guru3d.com/files-details/siv-4-45-download.html SIV system information viewer & setup

http://www.noamross.net/blog/2013/4/25/faster-talk.html - speeding up code a guide - profiling and bench-marking.

http://www.pgroup.com/doc/pgi17ug-x64.pdf - PGI Compiler guide

http://www.agner.org/optimize/ - code optimisation for all programmers on X86,X86-64bit and some others.. this is a terrific resource !

http://www.agner.org

25/06/2017 11:36:51 | | OpenCL: AMD/ATI GPU 0: AMD Radeon R9 200 Series (driver version 2348.4, device version OpenCL 1.2 AMD-APP (2348.4), 3072MB, 3072MB available, 4178 GFLOPS peak)
25/06/2017 11:36:51 | | Host name: NKBlueCube
25/06/2017 11:36:51 | | Processor: 8 AuthenticAMD AMD FX-8320E Eight-Core Processor [Family 21 Model 2 Stepping 0]
25/06/2017 11:36:51 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1


for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1

or for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1

for an improved upon instruction list in the newer boinc application.. (with appropriate configuration)

11000 Mips & 2700 FPU Mips - per Core

**
an article that took some deep learning... itself ôo, anyway very interesting....
hip c++ will we think be simpler than open CL then as a higher level code port...
and machine converted CUDA-code to 99.6%

http://www.anandtech.com/show/10831/amd-sc16-rocm-13-released-boltzmann-realized

**
Compilers and Make compliant with SMT and other HPC Standards

https://cmake.org/

http://llvm.org/
http://llvm.org/docs/FAQ.html

https://gcc.gnu.org/

https://cygwin.com/index.html

*not free obviously .. intel*
https://software.intel.com/en-us/articles/intel-advisor-roofline

*compilers with FORTRAN specifics and preferably C/C++ and HPC (compatibility C++/C compatible with FORTRAN preferably)

https://gcc.gnu.org/wiki/HomePage
https://gcc.gnu.org/wiki/GFortranBinaries

https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy/#parallelstudioxe

http://www.pgroup.com/products/pgiworkstationg.htm (limitations nVidia compatable GPU Cuda code & no obvious statment of OpenCL Support)

http://llvm.org/ - llvg it seems has fortran compatibility.. (needs research)
http://llvm.org/docs/FAQ.html

http://www.pathscale.com/ - check it out

Fortrans Speacialists (no c++ etcetera)

https://www.absoft.com/products/windows-fortran-compiler-suite/
http://www.fortran.com/products-page/compilers/fortrantools-for-windows/

https://www.cs.sfu.ca/~fedorova/Teaching/CMPT886/Spring2007/papers/adaptive-execution.pdf

*ibm guidance*
http://www.prace-ri.eu/best-practice-guide-ibm-power-775-html/
https://www.redbooks.ibm.com/redbooks/pdfs/sg248280.pdf
**
PC/Mac/Windows/Linux/Android - high performance computation - the method and the means

https://www.khronos.org/news/events/2016-isc-high-performance

https://www.khronos.org/assets/uploads/developers/library/2008_siggraph_bof_opengl/OpenCL%20and%20OpenGL%20SIGGRAPH%20BOF%20Aug08.pdf HPC Report

http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2017-ISC.pdf - Overview of MPI message characteristics of HPC Sever proxy applications.

*Interesting statistics from which one can conclude that 64 to 256 core units is the space within which,
The maximum increase in message noise/entropic noise; Related to inter process communication is observed.*

https://www.microsoft.com/en-us/download/details.aspx?id=54507 Microsoft HPC Pack 2016 including linux

https://technet.microsoft.com/en-us/library/cc514029(v=ws.11).aspx all HPC Packs 2016,2012 to 2008 info and download

https://msdn.microsoft.com/en-us/library/ff976568.aspx Microsoft High Performance Computing for Developers - info and downloads

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/hpcpack-cluster-active-directory - information and virtualisation

https://www.openfabrics.org/

https://centers.hpc.mil/users/tools.html

https://centers.hpc.mil/users/COSTQuickRef.html

https://centers.hpc.mil/software/

https://openhpc.community/downloads/

http://www.cray.com/blog/getting-new-intel-xeon-scalable-processors-hpc-workloads/ - details about intel arch in HPC workloads.

**
OpenVX for high performance Computing : Multi platform spec
"OpenVX for HPC Neural Nets and processing .... a new way to deliver on research, gaming & processing of data and images"


https://www.khronos.org/news/tags/tag/OpenVX

https://www.khronos.org/news/press/openvx-1.2-specification-cross-platform-acceleration-power-efficient-vision

**
Open CL "GPU Development" links

https://www.khronos.org/blog/iwocl-where-you-learn-the-latest-on-opencl

https://www.khronos.org/opencl/

https://www.khronos.org/opencl/resources for SDK, learning & optimisation resources.

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/

https://github.com/RadeonOpenCompute - ROCm: Platform for GPU Enabled HPC and UltraScale Computing

http://gpuopen.com/professional-compute/

http://gpuopen.com/compute-product/hcrng/

https://bitbucket.org/multicoreware/hcrng

http://gpuopen.com/compute-product/clrng/

installing the AMD SDK improves compute performance, Optimise your code !

https://streamhpc.com/blog/2017-05-21/amd-open-sourced-rocms-opencl-driver-stack/

https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/amd-master/README.md

http://developer.amd.com/tools-and-sdks/opencl-zone/

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/

http://gpuopen.com/games-cgi/

http://developer.amd.com/tools-and-sdks/graphics-development/

http://hgpu.org information; interesting learning & source

http://dspace.princeton.edu/jspui/bitstream/88435/dsp01wm117r22g/1/Jia_princeton_0181D_11168.pdf Optimisation for parallel computing information.

https://arxiv.org/pdf/1705.05249 - CLBlast: A Tuned OpenCL BLAS Library demonstration.

https://indico.cern.ch/event/506317/contributions/2017945/attachments/1241758/1826458/SixTrackGPU.pdf
https://lhcathome.cern.ch/lhcathome/index.php - coders needed.

https://arxiv.org/pdf/1710.08616
https://arxiv.org/pdf/1710.08616.pdf - FORTRAN for GPU and multiprocessor usage in Scientific research,
Also of interest in the generation of coding Format, style, implementation & Structure.

"The new implementation performs up to 4.9x faster when comparing one GPU to one
multi-core CPU socket. On a full-scale production run with 1581 x 1301 x 58
grid size and 2km resolution, 24 Tesla P100 GPUs are shown to replace more
than 50 18-core Broadwell Xeon sockets."

"GPUs are an attractive target architecture, with a memory bandwidth that is
typically 5 to 7 times higher than Intel Xeon architectures of a similar generation."

"Compared to CPUs, GPUs support a very high number of parallel threads while
having a very low thread switching overhead - however with the cost of small
caches available per thread and a low single-threaded performance."


HIP - HSA - the CUDA Compatible C++ for Heterogeneous Computing

http://developer.amd.com/wordpress/media/2012/09/7637-HIP-Datasheet-V1_4-US-Letter.pdf

http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf - a full guide

http://www.hsafoundation.com/

http://www.hsafoundation.com/hsa-developer-tools/

https://github.com/HSAFoundation/HSA-docs-AMD/wiki#initial-implementation

https://github.com/HSAFoundation/HSAIL-Tools

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver - Driver for kernel

http://www.amd.com/Documents/SDN-Whitepaper.pdf - Smart Software Defined Networks

http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf - Secure Encrypted Virtualization Key Management

http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf - PROTECTING VM REGISTER STATE WITH SEV-ES

http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf - bios and kernel drivers

**
Machine Intelligence code optimization platforms

https://www.tensorflow.org/ - machine intelligence
https://github.com/tensorflow/tensorflow
https://github.com/hughperkins/tf-coriander - openCL Tensor flow

PyTorch - Machine learning with graphs, Tesor philosophie and python - https://github.com/pytorch/pytorch - http://pytorch.org

Hyperdash python SDK - PyTorch
https://github.com/hyperdashio/hyperdash-sdk-py

Richard Herbert real time learning with PyTorch - Real-time Machine Learning with PyTorch and Filestack
https://blog.filestack.com/tutorials/realtime-machine-learning-pytorch/

"Kirill DubovikovFollow - Knowledge distiller, Data Scientist and Software Architect"
https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b

speed and data comparison
https://medium.com/@yaroslavvb/tensorflow-meets-pytorch-with-eager-mode-714cce161e6c

**
ARM Development software/SDK's & tools - HPC

https://developer.arm.com/products/software-development-tools

https://developer.arm.com/products/software-development-tools/hpc for high performance computing (ideal for Boinc)

https://developer.arm.com/products/software-development-tools/compilers for both HPC and APP development.

https://developer.arm.com/products/system-design/fixed-virtual-platforms

https://www.synopsys.com/verification/virtual-prototyping/vdk/vdk-for-arm.html

https://www.synopsys.com/designware-ip/technical-bulletin/designware-hybrid-ip.html

**
ARM Feature Sets

https://www.arm.com/products/processors/instruction-set-architectures/index.php

https://www.arm.com/products/processors/armv8-architecture.php

**
IOT links - (internet of things)

https://www.infoq.com/articles/thread-protocol-for-home-automation

http://wso2.com/wso2_resources/wso2_whitepaper_a-reference-architecture-for-the-internet-of-things.pdf

**
compiler optimization - process

https://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

https://www.nextplatform.com/2017/05/25/nersc-supercomputing-site-eases-path-optimization-scale/

https://www-ssl.intel.com/content/www/us/en/events/hpcdevcon/parallel-programming-track.html#utilizing

**
Linux arch reference material

https://www.ibm.com/developerworks/library/l-linuxuniversal/

**
Agency GPL

https://code.nasa.gov/

**
Workers :

https://www.upwork.com/hire/driver-development-freelancers/

http://www.wcgsig.com/342585.gif

Update 2:

for a comparison of Gflops/Mips throughput of various Boinc Tasks ..

here we show the relevance of the code or function used ... AVX for example is multi threaded ! and so is the FPU pipeline of the AMD FX & Ryzen processor.....

http://bit.ly/HPCImpact (original non edited photos ...)

and set 2 (newer) http://bit.ly/2HPCImpact ....

Some of our work with the updated graphics http://bit.ly/ReserchPhotos

see the work throughput GFlops compared to code efficiency per task !

sometimes entropy is needed to for-fill the task one would imagine (for example on android) http://bit.ly/tRNG-Dev

the improvement of the boinc and worldcommunitygrid projects has been observed, noted and one feels improved upon, ..

further improvement should be implemented as soon as possible; To improve work versus output efficiency.

thank you kindly programmers/Workers & scientists for your perseverance & effort.

RS

**
Update 3 Q & A:

"In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set.

As to the particle spread one should possibly consider the multiple core and threaded core model specific to the Ryzen and intel sets...

One could imagine that the multi-threaded nature of arm server cores combined with the nature of multi-threaded and headed arm CPU's and GPU Run-script environments is a new and uncompromising land of opportunity and challenge.

Many of the instructions on the FMV4 and Vector instruction sets have multi-threaded en-action at lower precision..."

http://fife.fnal.gov/singularity-on-the-osg/

RS

----

Eric Mcintosh accredited scientist Cern
Project administrator
Project developer
Project tester
Project scientist

"Well we are far from trying to optimise GPU code.

First let me explain that we have a tracking loop over turns
(up to 1,000,000 hoping for 10,000,000 soon) which contains
a large number of inner loops over particles, currently up to 64.

Luckily these loops over particles can be paralleled as each
particle is totally independent. In addition the original author F. Schmidt
pre-calculated everything possible before entering the tracking loop.
Each turn involves some 10,000 steps over a varying number of inner loops,
e.g. straight section, quadruple, beam-beam interaction, power supply ripple, etc etc

Of which there are about 50 different possibilities. A straight section is really just
a multiply and add, whereas beam beam involves hundreds or more FLOP's.
The first idea would be to use a much larger number of particles to best
utilise the GPU. This however would produce a large amount of I/O and
use a lot of disk space, but maybe not insurmountable,

However all the code is FORTRAN, the outer loop calls subroutines (could inline), and has many tests/branches.
It would be great if the main loop fitted entirely into the GPU and we would have
rare Host access for I/O or BOINC checkpoint and progress calls or when
one or more particles are lost.

My colleague Ricardo is actively looking at redoing in C which would also allow
much more portability and also allow to be parallel on multi-core systems.
For the moment we just run tasks in parallel, which works rather well (apart
from some current infrastructure problems). I hope to come up with
some numbers next week on GPU testing.

The code itself has been regularly measured and optimised; for example we
re-ordered array indices to optimise memory access and rewrote the Error Function
of a Complex Number to be faster but with adequate precision.

Portability does come at a price but ensures accuracy of results. I shall publish
measurements in an upcoming paper. I am sure we gain much more from being portable
and being able to use almost any IEEE 754 compliant processor.

On the issue of SixTrack and/or experiments this will shortly be under discussion at
CERN I am sure. Currently SixTrack has many more Hosts/volunteers, is simple to install,
and has been around for 13 years. Not everyone loves VMbox. Not a big deal at
present as we rarely have enough SixTrack work to keep all volunteers busy.

I hope to re-address all this in some weeks after current BOINC infrastructure issues
are resolved and we have the new "super" sixtrack with much broader application
e.g.collimation studies and we support a much wider range of platforms MacOS ARM
and use features such as AVX.

Eric.

____________"

Update 4 : Virtualisation

QEMU is obviously be of use on many projects because of machine emulation and virtualisation..
Comes in flavours including Windows, Mac and Linux.

http://www.qemu.org/

https://www.vmware.com/try-vmware.html - free products at the bottom
https://www.vmware.com/go/downloadplayer
https://www.vmware.com/go/get-free-esxi

*
Docker Sever & Docker CE (community edition) and this comes with sever edition! (QEMU Based Containers)

So what do the projects & system.. feel and sense around the subject of using Docker CE ?

Obviously the professional version could be used for support of the main project and the CE edition or pro for the user..

https://store.docker.com/editions/community/docker-ce-desktop-windows

https://store.docker.com/search?offering=community&q=&type=edition

https://www.ctl.io/developers/blog/post/what-is-docker-and-when-to-use-it/

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-getting-started

https://www.howtoforge.com/tutorial/how-to-use-docker-introduction/

**
how to convert VM's and use hyper V and Docker

https://www.virtualbox.org/manual/ch10.html - compatibility

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v

https://www.groovypost.com/howto/migrate-virtual-box-vms-windows-10-hyper-v/

https://hyperv.veeam.com/blog/nested-vitualization-hyperv/

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/nested-virtualization

https://superuser.com/questions/1144405/enable-virtualization-for-windows-10-pro-running-inside-virtualbox

Update 5 : IO Bottlenecks and solutions.

http://blog.scoutapp.com/articles/2011/02/10/understanding-disk-i-o-when-should-you-be-worried

http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/

Drive Cache :

even a 128mb of cache does do wonders for #DataScience #storage
we use a 2gb

http://www.romexsoftware.com/en-us/primo-cache/index.html

#Cache to the #Drive 300mb/s

http://bit.ly/BoincStudies - Result Studies

https://browser.geekbench.com/v4/compute/743093 GPU Function
https://browser.geekbench.com/v4/cpu/2831836 CPU Function

http://www.anandtech.com/show/11523/qnap-launches-ts1277-nas-with-amd-ryzen-cpus






©2022 University of Washington
https://www.bakerlab.org