Let’s Learn Scientific Programming
Ahmed Shamsul Arefin Aug 11, 2017
What is scientific programming?
A Scientific programmer targets to solve scientific problems with the help of computers (possibly Supercomputers or HPC Clusters). The goal is to get results quickly, accurately and on large instances. Tools like Mathematica and Matlab remain the main commercial software for Scientific Programming.
The 'Learn Scientific Programming' project promo.
However, open source communities around the world have also developed some equally powerful computational tools (Open MPI, OpenMP, OpenACC, Scientific Python Stack (numPy, sciPy, Pandas), Julia and so on), which can be easily used by researchers. In fact, these tools are so robust that they are now also used across different companies like Microsoft, Google, Amazon and so on.
What I need to learn?
Image credit: Learn to Use HPC Systems and Supercomputers.
You can start with a little bit of Supercomputing history, Supercomputing examples, Supercomputers vs. HPC clusters, HPC clusters computers, Benefits of using cluster computing. Then you learn:
Components of a HPC system: Components of a High Performance Systems (HPC) cluster, Properties of Login node(s), Compute node(s), Master node(s), Storage node(s), HPC networks and so on.
PBS — Portable Batch System: Introduction to PBS, PBS basic commands, PBS qsub, PBS qstat, PBS qdel command, PBS qalter, PBS job states, PBS variables, PBS interactive jobs, PBS arrays, PBS Matlatb example.
SLURM -Workload Manager: Introduction to Slurm, Slurm commands, A simple Slurm job, Slurm distrbuted MPI and GPU jobs, Slurm multi-threaded OpenMP jobs, Slurm interactive jobs, Slurm array jobs, Slurm job dependencies
Parallel programming — OpenMP and MPI: OpenMP basics, Open MP — clauses, worksharing constructs, OpenMP- Hello world!, reduction and parallel for-loop, section parallelization, vector addition, MPI - hello world!, send/ receive and ping-pong
Parallel programming — GPU and CUDA: Finally, it gives you a concise beginner friendly guide to the GPUs — graphics processing units, GPU Programming — CUDA, CUDA — hello world and so on!
There are several free materials available online to introduce you with the HPC systems. If you do not have a HPC system available at your centre, but still you want to learn to use it, you can create a couple of virtual boxes and connect them to simulate a HPC environment and then install PBS or Slurm on them. Then you can learn to run the batch system’s commands.
On a HPC system, the uses of programming languages like Matlab, Octave, R, Python, Julia are common. Also, you may want to learn about file systems (such as GPFS, LUSTRE, etc.), because the HPC storage can be very useful for storing large data! Further more, you need to learn DMF commands if you want to push/pull data on tape drives.
Why should I care?
Jobs: With the increase in numbers of HPC systems at several research and educational institutes, demand for scientific programmers are increasing.
Typical position requirements for a Scientific Programmer would look like below:
- Bachelor’s and 5+ years; Master’s and 3+ years; Doctorate and 0 years, or equivalent
- Significant programming experience is required; in particular, expertise in C or C++ and Linux/Unix programming environment.
- Working knowledge of Fortran and Python is desired, as is familiarity with parallel programming using MPI and OpenMP.
Accelerated outcomes: If you are a researcher or big data analyst, it can accelerate your outcomes.
Who can be a scientific programmer?
Students, researchers and programmers from any discipline.
You need to know before-hand (assumed knowledge) two things: one, Linux/ Unix command line and two, computer programming skills in any language. I think it’s important to learn parallel and distributed computing (right now!) if you wish to do some large-scale data analytics and research works. This learning course should make your academic profile shine!