Once you have obtained an account on Plato, follow this guide to make your first steps on the cluster.
Receiving important notifications
We strongly recommend that all Plato users subscribe to our Mailing list to stay informed about the status of the machine and receive important notifications.
Accessing the system
Plato is accessible through SSH at
plato.usask.ca. Your user name is your NSID and your password is the one associated with your NSID (used on
paws.usask.ca, for instance). On a UNIX machine (such as Linux or MacOS), use the following command in a terminal (replacing
abc123 by your NSID):
$ sign is used throughout this guide to denote the bash prompt. The text that follows is what should be typed in your shell (do not type the
To transfer files, you can use standard UNIX tools such as
To access Plato from outside the campus, you will first need to connect to the University’s virtual private network (VPN).
Plato is also accessible via Globus (GridFTP).
Once you are connected to Plato, you will be presented with a Linux command prompt. If you are not familiar with the text-based Linux command-line environment, we recommend following our Introduction to the Linux command line workshop first. This workshop is given in an instructor-led format at least once per term, and the material is always available online for self-learning.
Finding and using provided software
On Plato, the
module command allows you to search for installed software packages and add them to your environment. Since the amount of available software is large, and several versions of the same package are often offered, packages are not available directly by default. Instead, you need to add software to your environment manually. If you are already familiar with
module, you should feel at home on Plato. The software stack is the same which is used on Compute Canada machines. However, not all versions and all packages are guaranteed to be available on Plato. If you are not familiar with
module, here are a few examples to get you started.
To load a particular software package:
To load a specific version of a package:
To unload a module:
To list the currently loaded modules:
These are the default modules that are loaded for you when starting a session:
- Gentoo: The base Linux layer that provides a uniform environment on all Compute Canada machines (and Plato)
- StdEnv: The module that defines the standard environment for all Compute Canada machines (and Plato), such as the default compilers, MPI and mathematical libraries
- Intel: The default compilers
- OpenMPI: The default MPI library
- Intel MKL: The default mathematical library
- MII: A smart search engine for module environments
gcccore: This is a hidden infrastructure module that users can ignore
libfabric: Dependencies of OpenMPI that users can ignore
To list all available modules:
To search for a specific package using a keyword or partial word:
To learn more about a package, use:
You can also get detailed information about a specific version:
The output of
module spider above tells you how to load the module. Here, we see that GROMACS 2021.4 is available for two different toolchains: GCC 9.3.0 + OpenMPI 4.0.3, and GCC 9.3.0 + OpenMPI 4.0.3 + CUDA 11.4 (for GPU computing). We can therefore load GROMACS for GPUs using:
For more details, see the extensive help provided by the
module command itself:
You will also find many examples in Compute Canada’s module usage page.
Compiling custom software
If the software you require is not available on Plato, you can compile it yourself. First, connect to a login node to perform the compilation there:
Then, select a compiler and load the appropriate module. In doubt, we suggest using GCC:
Compilers and other development tools are indicated by the t category in the output of
module avail. Available compilers include GCC (
gcc) and Intel.
If your program uses MPI, you should also make sure that an MPI library is present in your environment. We recommend OpenMPI:
If your software requires other libraries, you should first check if they are already available on Plato. If they are, there is no need for you to install them again! For example, if your package depends on HDF5, you can check that this software is available on Plato and load it with:
Once you have loaded the appropriate modules, follow your software package’s instructions. Most packages use a build system, such as Autotools or CMake, which provides a script to execute instead of calling the compiler directly.
To perform computations on Plato, you must ask the scheduler to allocate resources for you. Your software then runs on one or more compute nodes that have been granted to you. Computationally intensive software should not be run on the login nodes; these are reserved for software compilation, preparing jobs to submit to the scheduler, etc. You should not connect to compute nodes that have not been allocated to you either (e.g. using SSH); the compute nodes are managed by the scheduler.
The scheduler keeps a list of all requests to run computations on Plato and dispatches these requests to the compute nodes. On a large, multi-user system such as Plato, the use of a scheduler is necessary to ensure an efficient use of resources. The scheduler will avoid running your program on nodes that are already busy, wait for a node to become available, and automatically start your job when a node is ready. The scheduler also keeps track of how much computing time has been allocated to each research group to ensure that resources are shared fairly.
Plato uses the SLURM scheduler. There are two main ways to ask SLURM for resources: batch job scripts and interactive sessions. We will introduce both, starting with batch job scripts since they are more common.
Batch job scripts
sbatch SLURM command allows you to submit a shell script to the scheduler. This shell script will be executed on a compute node when your resources have been allocated. Here is a minimal example script:
Assuming this script is saved to
test-job.sh, it can be submitted to the scheduler. In the following example, we request resources for 4 tasks:
SLURM will create an output file for the script in the working directory. Once the job has completed, the output will look like the following:
Let us break down the script and its resulting output line by line. First,
#!/bin/bash tells SLURM that this is a script for the bash shell (the most common Linux shell). One compute node (the batch host) will execute the script. Here, the output tells us this is node
plato344. Then, command
srun runs a program in parallel on all allocated resources. Since we requested resources for 4 tasks, the program was run four times. The output shows that two of these tasks were dispatched to
plato344, and the two others to
srun command is provided by SLURM to run parallel programs. If your program is MPI-enabled, we recommend running it with
srun rather than
sbatch accepts a vast number of options. A very important one is
--time, which allows you to request a specific amount of computational time. For example,
sbatch --ntasks=4 --time=2-00:00:00 would request resources for 4 tasks and 2 days. If you omit the time, you are granted 20 minutes, allowing you to perform quick tests but nothing more. If your job does not finish within the requested time frame, it will be stopped by the system; you should therefore always request slightly more time than you require. You can request at most 21 days for any given job.
Using the following syntax, options for
sbatch can be given in the script rather than on the command line:
The above example also shows how to request a specific amount of total memory per node, and how to load modules in your job script to add the necessary software to the environment on the compute nodes.
To see a list of the jobs submitted to the scheduler, use
squeue. Since this is usually a pretty large list, you can filter it to show only your own jobs:
The output tells us that user
abc123, who is part of the
hpc_e_gratton SLURM account, has submitted 3 jobs, with names
exp003. Each job has a unique ID in SLURM (415174 for
exp001). Two are running (status
R), and one is pending (status
exp001 has a little under 3 days of allocated time remaining,
exp002 has over 9 days, and
exp003 requested 10 days. All jobs requested 64 tasks, resulting in 64 CPU cores being used on 4 nodes (16 cores/node). None of these jobs requested any special resources (
(null)), and they all requested 8G of memory. If a job is not running, the last column gives the reason; here, only the last job is still pending, because
(Resources) are not available yet (i.e. nodes are busy).
A job can be cancelled from its SLURM ID:
It is sometimes useful to run commands manually instead of wrapping them in a batch job script to pass to
sbatch, such as when performing quick tests or using interactive mathematical tools. SLURM makes this possible through interactive sessions.
To start an interactive session through SLURM, use the
This will allocate one task on a node and open a session on the allocated node (as shown by the output of
salloc configuration on Plato matches that of Compute Canada machines: when the allocation is granted, a Bash session is automatically started on the allocated node. This can be overridden by specifying the command for
salloc to execute (see
From there, you can run commands directly as you would on a login node (but without the restrictions on computationally-demanding tasks).
When you are finished, use
exit to close the session on the compute node and relinquish the resource allocation:
salloc allocates resources for a single task. However, you can pass it options just like
Getting the most out of the scheduler
Since many research groups use Plato, your jobs are likely to spend some time waiting in queue before they run! You can ask SLURM the estimated start time of your upcoming jobs:
You can also ask SLURM to compute the estimated start of a job without submitting it to the queue:
To minimise the time spent waiting in the queue, make sure to provide a good estimate of the time you require. If you simply request the maximum possible duration (21 days), your wait time will be longer since the scheduler favours short jobs by giving them a higher priority. Short jobs requiring less than 4 hours are fastest (we call these “burst” jobs).
The more computing time you consume, however, the more your overall priority decreases, to ensure that all users can run jobs and that no one can hog the cluster. Your overall priority goes back to normal over a two-week period. This means that the optimal way to work with Plato is to submit jobs regularly over the weeks rather than a large numbers of jobs once in a while. It also means that you should only request the necessary resources to complete your jobs. Asking resources for 64 tasks if your program is only marginally slower when using 32, for instance, would penalise you in the long run. You should therefore carefully assess the efficiency of your program before requesting more resources.
Also note that priorities are managed per group rather than per user: all students and staff in a group share the same priority. Again, this ensures that a single research group cannot hog resources to the detriment of others. You can check your cluster usage with
Under special circumstances, we can alter job priorities to some extent. However, we expect users who make such requests to have already optimised their workflow and to justify their request.
This Getting started guide only stratches the surface! To help you learn more about Linux, HPC, and the Plato cluster, we offer hands-on workshops every term; see our Training page. In particular, we recommend Introduction to high-performance computing, which is an extended version of the present guide. Even if the workshop you are interested in is not scheduled at this time, all course material is available online for self-learning.
The main Plato documentation page gives an overview of the cluster; it is also the root of the Plato documentation and links all Plato-related topics (subpages). We offer generic SLURM job script examples that will help you fit your program into the Plato scheduler, be it a trivial parallel job or a fine-grained hybrid MPI/OpenMP program. You may also be able to find software-specific documentation if your program is commonly used on Plato. Be sure to browse through the documentation, and happy computing!
If you encounter a problem while working on Plato, or otherwise need help, please read our user support page.