Plato is a Linux-based, heterogeneous, high-performance computing (HPC) cluster at the University of Saskatchewan. It is used for research and for training, and is restricted to USask users and their collaborators. Plato is managed by ICT’s Advanced Research Computing (ARC) team.

Although it is not a Digital Research Alliance of Canada cluster, Plato is configured to be as similar as possible to Alliance clusters. For instance, Plato and Alliance clusters use the same software stack and job scheduler.

Plato cannot provide all the computing power required for USask research projects. Researchers with important computing needs should consider Plato a stepping stone to test their projects locally before moving to Alliance clusters.

Quick Links

Basic information

SSH hostnameplato.usask.ca  (VPN required when off-campus)
System statusARC main page, and cluster-info command
Storage usagequota command

Storage

Globalhome filesystem

/globalhome

  • 75T total
  • Location of home directories
  • Your home directory is: /globalhome/$USER/HPC
    • $USER is your NSID
  • Each home directory has a storage quota
    • Maximum 300G
    • Maximum 2.5M inodes (number of files and directories)
  • No backup
  • For active research data and user programs
  • Inactive research data must be moved to Datastore
  • Globus endpoint: USASK-GLOBALHOME
    • Always use Globus for large data transfers between /globalhome and /datastore

Datastore filesystem

/datastore

  • 1,800T total
  • Also accessible from outside Plato
  • Faculty members are entitled to 3T free space and can obtain more
  • Backed-up
  • For long-term storage and inactive research data
  • Read-only access on the compute nodes, read-write on the login nodes
  • Globus endpoint: USASK-DATASTORE
    • Always use Globus for large data transfers between /globalhome and /datastore

Compute nodes

/local

  • Temporary storage for jobs
  • Total storage space varies by compute node type (see below)
  • Use $SLURM_TMPDIR to get the location of the temporary directory for your jobs, in the form: /local/$USER/$SLURM_JOB_ID
  • Files are deleted when a job ends

Network

Most Plato compute nodes are interconnected by a 1Gb Ethernet link. Login nodes are connected to compute nodes and to the University network by a 10Gb Ethernet link. Some compute nodes (see below) are interconnected by 10Gb Ethernet or FDR InfiniBand. Both login and compute nodes can access the external network, including the Internet.

Scheduler

Plato uses the SLURM scheduler. Job duration is limited to 21 days, except for the GPU and large-memory nodes where the limit is 7 days. (This limit does not apply to contributed hardware.) Shorter jobs get increased priority, according to the categories below. The default allocation is 20 minutes for a single task on one CPU with 512M of memory. Job arrays are limited to 10000 steps. Jobs can only be submitted from login nodes, not from compute nodes; it is therefore not possible to submit a job from within another.

Maximum duration
(d-hh:mm)

Priority
factor
21-00:001
4-00:002
12:004
04:008

Architecture

The oldest CPU model in use on Plato is from the Intel Ivy Bridge Xeon product line. These processors support the AVX instruction set, but not any later instruction sets (such as AVX2 or AVX512). When compiling your code on a login node, use GCC with the -march=core-avx option to get the best performance while retaining compatibility with all Plato compute nodes. Do not use -march=native as this would produce code that would be incompatible with Ivy Bridge processors.

Plato uses the AVX branch of the Compute Canada software stack. Therefore, the available software modules can differ slightly from those available on Compute Canada clusters, that typically use the AVX2 branch of the software stack. We can add missing software and versions on request.

Compute nodes

PublicCountTypeCPU

Cores / Node

Highest SIMDUsable Memory / NodeGPU/local storageInterconnect
(tick)64“Pipit”2 x Intel Xeon E5-2640 v2
@ 2.00GHz “Ivy Bridge”
16AVX

31000M (30G)

(error)347G1Gb Ethernet
(tick)28“Penguin”2 x Intel Xeon Gold 6148
@ 2.40GHz “Skylake”
40AVX512190000M (185G)(error)781G10Gb Ethernet
(tick)4Large-memory “Penguin”

2 x Intel Xeon Gold 6148 @ 2.40GHz “Skylake”

40AVX512

384000M (375G)

(error)781G10Gb Ethernet
(tick)2GPU2 x Intel Xeon E5-2640 v3
@ 2.60GHz “Haswell”
16AVX231000M (30G)2 x NVIDIA K40805G1Gb Ethernet
(error)20GWF2 x Intel Xeon E5-2640 v2
@ 2.00GHz “Ivy Bridge”
16AVX31000M (30G)(error)347G1Gb Ethernet
(error)2Tse group2 x Intel Xeon E5-2683 v4
@ 2.10GHz “Broadwell”
32AVX2250000M (244G)2 x NVIDIA K80768GFDR InfiniBand

Choosing nodes

Plato will choose the appropriate node type for your job according to your resource requirements (cores per node, memory, GPUs). It is therefore not necessary to request a specific node type. Doing so may reduce the number of nodes eligible to run your job, increasing your wait time. You can specify a node type using SLURM options: --constraint=ivybridge (for Pipit nodes) or --constraint=skylake (for Penguin nodes).

Login nodes

Plato has two login nodes, accessible by SSH at plato.usask.ca. The login nodes should be used to prepare jobs and submit them to the scheduler, to compile programs, and to run short calculations that require little memory and processing power. Intensive processes must never be run on the login node; they must be submitted to the scheduler to run on the compute nodes.

File transfers

Files can be transferred between Plato and your local computer using the SCP protocol through standard commands such as scp, rsync, etc. There are also Globus endpoints for home directories (USASK-GLOBALHOME) and Datastore (USASK-DATASTORE). Always use Globus for large data transfers between /globalhome and /datastore.

Educational accounts

If you are granted access to Plato for a class or training workshop, your jobs will be limited to 12 hours of runtime, and you will not have access to the large-memory nodes.

Other topics (subpages)



  • No labels