R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Some of R’s main features include:

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
  • Documentation for R can be found on its official website. Currently, versions 3.1.0, 3.3.1, 3.4.1, 3.5.0, 3.5.1, 3.5.2, 3.6.0, 3.6.1, 3.6.2, 4.0.2 of R are available on the cluster.

    Basics

    After loading the R module, R can be launched through the command line by simply typing R.

    R Virtual Environment

    To use R through the command line, you must first initialize a R virtual environment. Virtual environments are isolated environments for projects, so that each project can have its own dependencies and packages installed, regardless of what dependencies every other project has. To create the virtual environment, run the following commands:
    
    module load python3/anaconda/5.2.0
    conda create -n r-environment r-essentials r-base
    source activate r-environment
    export R_LIBS_USER=/home/$USER/.conda/envs/r-environment/lib/R/library/
        
    Once the virtual environment is created, it can be launched at any time by ensuring that the python3 module is loaded, using the command

    module load python3/anaconda/5.2.0

    and then launching the environment by using

    source activate r-environment

    Next, you can install any packages you need inside this environment. These packages will only be available within this environment. Packages can be installed with the install.packages() function in R. For example, to load the ggplot2 package, which is an R system for creating graphics, use the command

    install.packages("ggplot2")

    To exit the R virtual environment, use the command

    source deactivate

    Running R through a job script

    1. Ensure that you have a virtual environment created, following the steps described above.

    2. Create a R script. This repository provides a simple script, test.r, which demonstrates some of R's basic features.

    test.r

    
    print("Starting tests.")
    
    x <- 1:10 # initializes x as values from 1 to 10
    print(x) # print x
    
    y <- sample(1:100, 10, replace=T) # generate 10 random numbers from 1 to 100
    print(y) # print y
    
    mean(x) # find mean of x
    
    print(x*y) # print product of x and y
    
    a <- c(2, 4, 6, 8, 10, 12) # a is a vector
    b <- a[a>4] # b is the vector of all indices in a greater than 4
    print(a)
    print(b)
            
    3. Prepare the submission script, which is the script that is submitted to the Slurm scheduler as a job in order to run the R script. This repository provides the script job.sh as an example.

    job.sh

    
    #!/bin/sh
    #SBATCH --output R_job_%j.out
    #SBATCH --error  R_job_%j.err
    #SBATCH -N 1
    #SBATCH -n 1
    
    #SBATCH -p defq,defq-48core
    
    
    echo -e '\n submitted R job'
    echo 'hostname'
    hostname
    
    # load the module for R
    module load R/gcc/4.0.2
    
    # first test of .....
    R < test.R --no-save
    
            

    4. Submit the job using: sbatch job.sh

    5. Examine the results.