High Performance Computing
A cluster consists of a large number of interconnected machines.
Each machine is very much like your own desktop computer, except it’s much more powerful. Like your desktop computer, each machine (often referred to as node) has a CPU, some memory, and a hard disk. The nodes running your programs are usually called compute nodes.
To turn the nodes into a cluster we need three more ingredients: a network, data storage, and a queue manager. In section we’ll discuss these components in a bit more detail to give you an understanding of how a cluster works.
Nodes in the cluster connect to a high-performance network. The network allows programs running on different nodes to “talk” to each other.
The process is data intensive so we can’t just use a single hard disk as you would in your own computer. Instead, we buy several dedicated hard disks and use a file system that can manage files across these disks. The storage might be divided into partitions, and presented to the user in different locations on their path.
In theory, any user would be able to log in to any node and run a program, which may consume the resources of the entire node. This could cause other users’ programs on the same node to crash. Alternatively, one user could start thousands of runs of a program on different nodes, consuming the resources of the entire cluster for an unknown duration of time. In short, it would be impossible to share the HPC without a system to handle requests. To solve this problem we need a queue manager, which is a software that manages the requests for resources in a fair way. Like at the supermarket or an office, you stand in the queue and wait until it’s your turn. On the cluster, you submit jobs to the queue and your jobs will run on some node chosen by the queue manager, once resources are available.
The queue manager also allows you to specify certain requirements for your program to run. For example, your program may need to run on a node with a lot of memory. If you specify this when submitting your job, the queue manager will make sure to run the job on a node with at least that amount of memory available.
At IITJ we use a queue manager called Slurm: to run your programs on the HPC, you’ll be interacting a lot with Slurm.
A few things that you should keep in mind when using the cluster:
In the coming sections you will learn how to connect to the cluster so that you can start submitting jobs.
- The nodes in a cluster are set up so that they’re all (more or less) identical. Software that is available on one node will also be available on all other nodes and you can access your files in the same way on all nodes. When the queue manager runs your job on a node, it more or less corresponds to you logging in to the node and running the program yourself.
- You access the cluster through a single node, often denoted the frontend or head node (in our case hpclogin). The frontend node is identical to the other nodes, but it is set up to allow access from the Internet. Your day-to-day interaction with the cluster goes through the frontend. But, you should not run any computation or memory intensive programs on the frontend. All users share the frontend’s resources and you should only use it for basics things like looking around the file system, writing scripts, and submitting jobs.
When interacting with the cluster you will be using a shell on a Linux/UNIX system. If you are not familiar with these concepts we recommend the Shell novice tutorial from Software Carpentry.
Below you can find a few basic steps, to get you setup and ready to connect to EOS.
If you are on a UNIX system (i.e. either Linux or Mac OS), a terminal will already be present natively in your operating system. If you are on Windows, then we recommend you install Putty or MobaXTerm which is a very good UNIX emulator, and includes already useful things like an X server and a package management system to help you use languages like Python or Perl.
When you receive confirmation, that your account has been activated, you can then connect to the cluster by opening a terminal and using:
ssh user-name@hpclogin.iitj.ac.in If you need to display any graphics, you can use instead
ssh -X user-name@hpclogin.iitj.ac.in And a session on the terminal will be opened, within the login node of the cluster.
Since IITJ HPC is a shared system, all computations must be carried out through a queue. Users submit jobs to the queue and the jobs then run when it’s their turn. To handle different workloads, jobs can be submitted to one or more partitions, which are essentially queues that have been assigned certain restrictions such as the maximum running time.
The queueing system used at IITJ is Slurm. Users that are familiar with Sun Grid Engine (SGE) or Portable Batch System (PBS), will find Slurm very familiar.
Note :
A node can be shared by multiple users, so you should always take extra care in requesting to correct amount of resources (nodes, cores and memory). There is no reason to occupy an entire node if you are only using a single core and a few gigabytes of memory. Always make sure to use the resources on the requested nodes efficiently.
To get an overview of the available partitions:
[myuser@headnode1]$ sinfo
This will list each partition and all of the compute nodes assigned to each partition and the maximum walltime (running time) a job in the partition can have. By typing a slightly different command:
[myuser@headnode1]$ sinfo -N --long
All nodes will be listed and all partitions they belong to, together with the available resources such as the number of cores per node, available memory per node. The queueing system allows us to submit an batch job. In the following sections we the method.
Batch jobs are best suited for computations lasting longer or resource demanding analyses. Batch jobs are submitted to the queue like interactive jobs, but they don’t give you a shell to run commands. Instead, you must write a job script which contains the commands that needs to be run.
A job script looks like this:
#!/bin/bash
#SBATCH --partition small
#SBATCH -D /my/working/directory
#SBATCH -c 1
#SBATCH --mem 4G
#SBATCH -t 02:00:00
echo hello world > result.txt
The lines beginning with #SBATCH communicate to Slurm which parameters we want to set for a specific task, or what kind of resources we want to be reserved for that particular analysis. The job script specifies which resources are needed as well as the commands to be run.
Line 2 specifies that this job should be submitted to the partition.
Line 3 tells the scheduler that on execution should go (or chdir) to a specific folder where we want the analysis to be performed.
Line 4 specifies that we want a single core to run on, and line 5 specifies that we want 4G of memory per allocated core.
Finally, line 5 indicates we are reserving 2 hours to execute this script.
Warning
At the moment (issue under investigation) all three of –mem, -t and -D need to be specified in the job script, to make sure the job is scheduled in the correct way and your work is distributed as much as possible across all available resources.
See the table below for an overview of commonly used resource flags:
Short Flag | Long Flag |
Description | |
---|---|---|
-A |
–account |
Account to submit the job under. |
-p |
–partition |
One or more comma-separated partitions that the job may run on. |
-c |
–cpus-per-task |
Number of cores allocated for the job. All cores will be on the same node. |
-n |
–ntasks |
Number of cores allocated for the job. Cores may be allocated on different nodes. |
-N |
–nodes |
Number of nodes allocated for the job. Can be combined with -n and -c. |
-t |
–time |
Maximum time the job will be allowed to run. |
–mem |
Memory limit per compute node for the job. Do not use with mem-per-cpu flag. |
|
–mem-per-cpu |
Memory allocated per allocated CPU core. |
The rest of the script is a normal Bash script which contains the commands that should be executed, when the job is started by Slurm.
To submit a job for this script, save it to a file (e.g. example.sh) and run:
[myuser@headnode1]$ sbatch example.sh
Submitted batch job 17129500
[myuser@headnode1]$
To check the status of a job:
[myuser@headnode1]$ squeue -j 17129500
To check the status of all of your submitted jobs:
[myuser@headnode1]$ squeue -u USERNAME
You can also omit the username flag to get an overview of all jobs that have been submitted to the queue:
[myuser@headnode1]$ squeue
Jobs can be cancelled using the scancel command: [myuser@headnode1]$ scancel 17129500
$ module load python/3.7
Modules are “configuration” set/change/remove environment variables from the current environment.
Useful module commands
- module avail: display the available packages that can be loaded
- module list: lists the loaded packages
- module load foo: to load the package foo
- module rm foo: to unload the package foo
- module purge: to unload all the loaded packages
For detailed information on the usage of module check the man pages
$ man module
List of available Modules is below:-