DMTN-089: Notes on Singularity

  • Greg Daues, Hsin-Fang Chiang

Latest Revision: 2018-07-18

Note

This technote is not yet published.

Notes on running the Stack using Singularity

1   Introduction to Singularity

The Singularity container project has commanded attention in the scientific world and the high performance computing (HPC) community. Singularity gives users the freedom to execute their customized applications, bundled with required dependencies, in an isolated context on HPC platforms without impacting the underlying system. Unprivileged users are able to effectively swap out the operating system on the HPC host for one that they select and control.

Singularity is well suited for HPC platforms because it prevents user context escalation within the container. The user inside a Singularity container is the same user as outside the container, and this feature has made it possible to run user supplied containers on shared infrastructures.

2   Singularity Image Files

Singularity utilizes a single file as a container image, providing the complete representation of all the software/files within the container. Because Singularity images are single files, they are easy to distribute, copy and manage. Administering permissions for the container is reduced to standard file system permissions. The Singularity container images are compressed and consume relatively little disk space.

A local Singularity image file can be created from an LSST Science Pipelines docker image on dockerhub utilizing singularity pull. An example of image generation is:

singularity pull --name  "/home/daues/7-stack-lsst_distrib-v16.0.img" docker://lsstsqre/centos:7-stack-lsst_distrib-v16_0

3   Running processCcd.py with Singularity

Processing with an LSST weekly or daily tag such as lsstsqre/centos:7-stack-lsst_distrib-w_2018_28 on the Verification Cluster can be initiated promptly using Singularity. A local container image can be created in a few minutes via:

singularity pull --name  "/home/daues/7-stack-lsst_distrib-w_2018_28.img" docker://lsstsqre/centos:7-stack-lsst_distrib-w_2018_28

This approach is rather expedient in comparison to building a stack from source on a target platform for each weekly/daily tag or release. With the container image present, a Slurm job can be submitted to the Verification Cluster that runs a singularity exec to launch processing. A simple example running processCcd is:

#!/bin/bash -l
#SBATCH -p debug
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 00:30:00
#SBATCH -J job1

singularity exec -B /datasets/:/datasets/ -B /project/:/project/ -B /scratch/:/scratch/ -B /home/:/home/  /home/daues/7-stack-lsst_distrib-w_2018_23.img /home/daues/run_ProcessCcd.sh

In this example a wrapper script run_ProcessCcd.sh is executed within the container; it initializes the stack under /opt/lsst within the container and runs the desired command line:

#!/bin/bash
source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
processCcd.py /datasets/hsc/repo --calib /datasets/hsc/repo/CALIB/ --output /scratch/daues/output1  --id visit=6320 ccd=10

The singularity exec command line utilizes user-bind path specifiers such as -B /datasets/:/datasets/ to map file spaces into the Singulariy container. The format of the specifications is -B src:dest, where src and dest are outside and inside paths, respectively.

4   SingleFrameDriver in SMP mode with Singularity

In this section we show one approach to use Singularity for processing at a more significant scale, sufficient to fully utilize a single multi-core node. This example runs singleFrameDriver, a ctrl_pool-style pipeline driver (DMTN-023), in SMP mode within the context of a singularity exec. We write a Slurm job description to launch the singularity exec on a compute node of the Verification Cluster:

#!/bin/bash -l
#SBATCH -p debug
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 00:30:00
#SBATCH -J job2

singularity exec -B /datasets/:/datasets/ -B /project/:/project/ -B /scratch/:/scratch/ -B /home/:/home/  /home/daues/singularity/images/7-stack-lsst_distrib-w_2018_23.img /home/daues/run_singleFrameDriver_192.sh

A wrapper script run_singleFrameDriver_192.sh is executed within the container, initializing the stack and running the singleFrameDriver command line:

#!/bin/bash
source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
singleFrameDriver.py /datasets/hsc/repo --calib /datasets/hsc/repo/CALIB/ --output /scratch/daues/output2 --batch-type smp --job test --id visit=6320 ccd=10..33 --cores 48  --mpiexec "-launcher ssh"

We note that an option --mpiexec "-launcher ssh" is included on the singleFrameDriver command line. Without this option, we observe the internal mpiexec of the singleFrameDriver --batch-type smp to attempt to use the slurm launcher, but this fails because the slurm commands such as srun are not present within the container. We work around this by designating an alternative launcher for the mpiexec.

5   Configuring HTCondor to run Singularity Jobs

In this section we consider the context of an HTCondor pool with admin/root level installations. An HTCondor worker node (i.e., running a startd daemon) can be configured to run jobs within individual slots that are comprised of a Singularity container. Sample additions to the HTCondor configuration file (typically located at /etc/condor/condor_config.local) are:

SINGULARITY = /usr/local/bin/singularity
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage

# Maps $_CONDOR_SCRATCH_DIR on the host to /srv inside the image.
SINGULARITY_TARGET_DIR = /srv
SINGULARITY_BIND_EXPR = "/home,/scratch,/data"

When the HTCondor startd runs with these settings a machine ad HasSingularity will be displayed on submit nodes:

% condor_status -long slot1@worker01.ncsa.illinois.edu
  ...
  HasSingularity = true
  ...

This ClassAd will allow jobs that require Singularity to match against the resource. In addition, this configuration allows the submission of Singularity jobs, but does not force all jobs to be so. If an HTCondor job description has an expression such as:

+SingularityImage = "7-stack-lsst_distrib-w_2018_28.img"

then the job will run as a Singularity container and use the provided image. If no such expression exists, a conventional job can be submitted for execution.

An alternative configuration with greater administrative control could specify:

# Forces _all_ jobs to run inside singularity.
SINGULARITY_JOB = true

# Forces all jobs to use this image.
SINGULARITY_IMAGE_EXPR = "/cvmfs/cernvm-prod.cern.ch/cvm3"

In this case all jobs run as Singularity containers and utilize the designated image (i.e., the user does not have the ability to specify the image.

Returning to the original flexible configuration, a example HTCondor job description could specify:

universe = vanilla

output = out.$(Cluster).$(Process)
error = err.$(Cluster).$(Process)

executable = setupLSSTStack.sh

log = test.log

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
notification=Error
transfer_executable = True

request_cpus = 1

Requirements = HAS_SINGULARITY == TRUE

+SingularityImage = "7-stack-lsst_distrib-w_2018_28.img"

queue

In this example a Singularity image 7-stack-lsst_distrib-w_2018_28.img generated from an LSST docker release image (i.e., a singularity pull thereof) is used. In order to guarantee that this job runs on a worker node that supports Singularity, we include the Requirements clause for HAS_SINGULARITY. Serving as the executable of this job, an example wrapper script setupLSSTStack.sh is used to run commands inside the container:

#!/bin/bash
source  /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
eups list | grep lsst_distrib