TensorFlow is a open-source software library for machine learning. Orginally developed for Google's own use, it has become a popular training and inference application for deep neural networks. TensorFlow's website where documentation and more examples can be found. Documentation for Python can be found on its official website. Currently, version 1.3 of TensorFlow is available on the clusters.

Basics

After loading the module, Python can be launched through the command line by simply typing python. Python files end in *.py, and can be run from the command line using python filename.py where filename is the name of the Python file.

Python Virtual Environment

To use Python through the command line, you must first initialize a Python virtual environment. Virtual environments are isolated environments for projects, so that each project can have its own dependencies and packages installed, regardless of what dependencies every other project has. To create the virtual environment, run the following commands:

        module load python3/anaconda/2020.02 
        source activate tensorflow_env
    
Once the virtual environment is created, it can be launched at any time by ensuring that the python3 module is loaded, using the command

module load python3/anaconda/2020.02

and then launching the environment by using

source activate tensorflow_env

Next, you can install any packages you need inside this environment. These packages will only be available within this environment. Python packages and dependencies can be installed in your virtual environment by either using pip or the conda package manager.

Pip

Pip is a program that installs Python packages. It can be used to install packages and any other Python packages that are dependencies with the command:

pip install package-name

where package-name is the name of the package you wish to install. For example, to install SQLAlchemy, a Python SQL database library, you can use the command

pip install SQLAlchemy

Once downloaded, you will be able to use the SQLAlchemy library in your Python programs within the created virtual environment.

Conda

The conda package manager is similar to pip, but also installs non-Python packages and dependencies. Packages can be installed using the command

conda install package-name

where package-name is the name of the package you wish to install. Many conda packages are used in scientific computing and data analysis. For example, NumPy, a useful scientific computing package for Python that contains an N-dimensional array object, tools for integrating C++ and Fortran code, and useful linear algebra and random number capabilities, can be installed through Conda using the command

conda install NumPy

To exit the Python virtual environment, use the command

source deactivate

Running Python through a job script

1. Ensure that you have a virtual environment created, following the steps described above. 2. Create a TensorFlow Python script. The linked repository provides a simple script, test.py, which demonstrates some of TensorFlow’s basic features.

test.py


import tensorflow as tf
# Create hello world tensor
hello = tf.constant("hello world")
# Access tensor with numpy
print(hello.numpy())
# Create a 2D tensor
matrix = [[1,2,3,4,5],
          [6,7,8,9,10],
          [11,12,13,14,15],
          [16,17,18,19,20],
          [21,22,23,24,25]]
#Define tensor matrix and type (integer)
tensor = tf.Variable(matrix, dtype=tf.int32) 
#print the tensor rank
print(tf.rank(tensor))
#print the tensor shape (row by column)
print(tensor.shape)
        
3. Prepare the submission script, which is the script that is submitted to the Slurm scheduler as a job in order to run the Python script. The linked repository provides the script job.sh as an example.

job.sh


#!/bin/bash
#SBATCH --job-name=tf_test
#SBATCH -o r_out%j.out
#SBATCH -e r_err%j.err
#SBATCH -N 1
#SBATCH --ntasks-per-node=4
#SBATCH -p defq,defq-48core

module purge
module load python3/anaconda/2020.02 
source activate tensorflow_env

python example.py
conda deactivate
        

4. Submit the job using: sbatch job.sh

5. Examine the results.