DGX is a NVIDIA infastracture that can be used to deploy applications quickly and allows scalability between multple nodes. Documentation for DGX can be found on NVIDIA's official website.
module load python3/anaconda/2020.02
module load cuda/11.1
source activate /work/examples/.conda/dgx
To exit the virtual environment, use the command
source deactivate
2. Create a job script. This repository provides a simple script, job.sh, which demonstrates DGX and using NVIDIA's GPU commands.
#!/bin/sh
#SBATCH --job-name=dgx_test
#SBATCH -N 1 # number of notes
#SBATCH -n 24 ## number of CU cores
#SBATCH --gres=gpu:1 ## number of GPUs
#SBATCH --output dgx_%j.out #Output file
#SBATCH --error dgx_%j.err #Error output file
#SBATCH -p dgx_aic #DGX group
#Load desired modules
module load python3/anaconda/2020.02
module load cuda/11.1
source activate /work/examples/.conda/dgx
#The following is sample script
echo " The host name is"
hostname
echo " The current directory is:"
pwd
echo -e " \nConda environment list\n"
conda list
echo -e "\nCUDA Visible Devices "
echo $CUDA_VISIBLE_DEVICES
echo -e "\nNvidia GPU info\n"
nvidia-smi
#Add your script. Example:
# python ./your_script.py
#See our tutorials for how to run other applications on our clusters.
4. Submit the job using: sbatch job.sh
5. Examine the results.