PyTorch is a open-source software library based on the Torch library for machine learning. Developed by Facebook's AI Research Lab (FAIR), the library is popular for computer vision and natural language processing. PyTorch's website where documentation and more examples can be found. Documentation for Python can be found on its official website. Currently, version 7 of Torch is available on the clusters, however see the SLURM script below to see how to build your own enviroment.

Basics

After loading the module, Python can be launched through the command line by simply typing python. Python files end in *.py, and can be run from the command line using python filename.py where filename is the name of the Python file.

Python Virtual Environment

To use Python through the command line, you must first initialize a Python virtual environment. Virtual environments are isolated environments for projects, so that each project can have its own dependencies and packages installed, regardless of what dependencies every other project has. To create the virtual environment, run the following commands:

        module load python3/anaconda/2020.02
        conda create --name torch-env pytorch torchvision cudatoolkit=10.2 --channel pytorch
        source activate torch-env
    
Once the virtual environment is created, it can be launched at any time by ensuring that the python3 module is loaded, using the command

module load python3/anaconda/2020.02

and then creating the environment by using

conda create --name torch-env pytorch torchvision cudatoolkit=10.2 --channel pytorch

and then launching the environment by using

source activate torch-env

Next, you can install any packages you need inside this environment. These packages will only be available within this environment. Python packages and dependencies can be installed in your virtual environment by either using pip or the conda package manager.

Pip

Pip is a program that installs Python packages. It can be used to install packages and any other Python packages that are dependencies with the command:

pip install package-name

where package-name is the name of the package you wish to install. For example, to install SQLAlchemy, a Python SQL database library, you can use the command

pip install SQLAlchemy

Once downloaded, you will be able to use the SQLAlchemy library in your Python programs within the created virtual environment.

Conda

The conda package manager is similar to pip, but also installs non-Python packages and dependencies. Packages can be installed using the command

conda install package-name

where package-name is the name of the package you wish to install. Many conda packages are used in scientific computing and data analysis. For example, NumPy, a useful scientific computing package for Python that contains an N-dimensional array object, tools for integrating C++ and Fortran code, and useful linear algebra and random number capabilities, can be installed through Conda using the command

conda install NumPy

To exit the Python virtual environment, use the command

source deactivate

Running Python through a job script

1. Ensure that you have a virtual environment created, following the steps described above. 2. Create a PyTorch Python script. The linked repository provides a simple script, example.py, which demonstrates some of PyTorch’s basic features including creating tensors, multiplying and adding tensors, and basic Python syntax.

example.py


import torch
print("Welcome to PyTorch!")
#Define tensor 1 5x5
t1 = torch.rand(5)
#Print tensor size
print(t1.size())
print(f"\nAddition Example")
#Print tensor 1
print(t1)
print("+")
#Define tensor 2 5x5
t2 = torch.rand(5)
#Print tensor 2
print(t2)
print("=")
#Add tensors
print(torch.add(t1, t2))
print(f"\nMultiplication Example")
#Begin multiplication
print(t1)
print("*")
print(t2)
print("=")
#Multiply tensors
print(torch.multiply(t1, t2))
        
3. Prepare the submission script, which is the script that is submitted to the Slurm scheduler as a job in order to run the Python script. The linked repository provides the script job.sh as an example.

job.sh


#!/bin/bash
#SBATCH --job-name=torch_test
#SBATCH -o r_out%j.out
#SBATCH -e r_err%j.err
#SBATCH -N 1
#SBATCH --ntasks-per-node=4
#SBATCH -p defq,defq-48core

module purge
module load python3/anaconda/2020.02 
#Create PyTorch Enviroment
conda create --name torch-env pytorch torchvision cudatoolkit=10.2 --channel pytorch
#Activate that environment
source activate torch-env
#Run script
python example.py
#Exit the conda system
conda deactivate
        

4. Submit the job using: sbatch job.sh

5. Examine the results.