Managing environments

One of the main features of Notebooks Hub is the ability to manage software environments. Environments are used to isolate software dependencies and provide reproducibility of the results. At the core, environments are sets of binaries, libraries, and other dependencies together with environment definition which dictates how to load and use them (i.e. update $PATH). Environments are used to isolate software dependencies and provide reproducibility of the results. We rely on Lmod and Lua modulefiles to define the environments.

There are two ways to install software environments:

  • Shared environments installed by the Administrator. They are available to all users in the group and are immutable, which allows everyone in the group to use the same dependencies and being able to reproduce results. Shared environments are mounted to /opt/modules/shared.

  • Custom environments installed by the user into /opt/modules/my. These environments are not shared with other users and can be modified by the user.

All application types in Notebooks Hub support loading environments.

There are multiple ways to load environments:

  • When launching a Server in Notebooks Hub UI, you can select an environment from the list of available environments in the wizard.

  • In JupyterLab, you can select an environment from the list of available environments in the extension sidebar.

  • In all applications providing command line, you can load an environment using module load <environment name> command.

Creating a new user environment

Since Lmod is very flexible and open-ended system, you can create environments with almost any software or language you need.

The general steps are going to be the same for all environments:

  1. Install binaries and libraries in the environment location (/opt/modules/my/ for personal environments)

  2. Create a Lua modulefile (in /opt/modules/my/modulefiles/ for personal environments) that will define how to load and use the environment. See instructions on how to write modulefile here

Below are some common ways to create environments:

Conda environment

Conda provides a way to specify and install software dependencies in a reproducible way. It is langugage-agnostic and provides an excellent support for Python and R environments.

Python

  1. Install binaries and libraries

  2. Create environment.yaml

name: gpu-env
channels:
  - pytorch
dependencies:
  - python=3.9
  - pip=22.2.2
  - ipykernel
  - pytorch=1.11.0=py3.9_cuda11.3_cudnn8.2.0_0
  - torchvision=0.12.0=py39_cu113
  - torchaudio=0.11.0=py39_cu113
  - cudatoolkit=11.3.1
  1. Build conda environment

conda env create --prefix /opt/modules/my/conda-envs/test-env --file environment.yaml
  1. (Optional) Modify Jupyter kernel name: Rename folder in /opt/modules/my/conda-envs/test-env/share/jupyter/kernels/ to test-kernel and modify /opt/modules/my/conda-envs/test-env/share/jupyter/kernels/test-kernel/kernel.json to change display_name so the new kernel won’t clash with existing Python 3 kernel

  2. Create environment module file at /opt/modules/my/modulefiles/test-env/0.1.0.lua. For each environment, you need to create a separate folder, than within that folder, create a separate modulefile for each version of the environment. The modulefile can look something like this:

help([[
Test GPU kernel
]])

whatis("Version: 0.1.0")
whatis("Keywords: GPU, PyTorch")

prepend_path("JUPYTER_PATH", "/opt/modules/my/conda-envs/test-env/share/jupyter")
setenv("JUPYTER_KERNEL_NAME", "My Test Environment")
setenv("PYTHON_EXEC_PATH", "/opt/modules/my/conda-envs/test-env/bin/python")

In this example, we are using JUPYTER_PATH and JUPYTER_KERNEL_NAME to provide Jupyter kernel for the environment. We also use PYTHON_EXEC_PATH to provide Python interpreter for the environment.

As a result, the new module will appear in Notebooks Hub that you can load at any time for new servers or at the runtime.

Using Poetry with Conda

Poetry is a tool for dependency management and packaging in Python.While Conda is a general-purpose package and environment manager with cross-language support, Poetry is specifically designed for Python projects, providing dependency management, packaging, and project metadata features. Poetry can easily be used in conjuction with Conda environments.

  1. Create a new minimal Conda environment with Poetry pre-installed

conda create --prefix /opt/modules/my/conda-envs/poetry-env python=3.11 poetry
conda activate /opt/modules/my/conda-envs/poetry-env
  1. Clone existing project

git clone <project-url>
cd <project-name>
  1. Initialize Poetry

poetry init
  1. Install the project

poetry install

References:

R

When using R, we recommend to start from a minimal Conda environment and either add remaining dependencies using Conda or R package manager.

  1. Install binaries and libraries

  2. Create environment.yaml

name: r-env
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - r-base=4.3.3
  - r-essentials=4.3
  - r-devtools=2.4.5 
  1. Build conda environment

conda env create --prefix /opt/modules/my/conda-envs/r-env --file environment.yaml
  1. Create environment module file at /opt/modules/my/modulefiles/r-env/0.1.0.lua. For each environment, you need to create a separate folder, than within that folder, create a separate modulefile for each version of the environment. The modulefile should look something like this:

help([[
  Conda environment with R packages
  ]])

  whatis("Version: 0.1.0")
  whatis("Keywords: Scientific/Engineering, Software Development, R")


  setenv("R", "/opt/modules/my/conda-envs/r-env/bin/R")
  setenv("RSTUDIO_WHICH_R", "/opt/modules/my/conda-envs/r-env/bin/R")
  setenv("R_LIBS", "/opt/modules/shared/conda-envs/R-0.1.0/lib")
  setenv("R_LIBS_USER", "/opt/modules/my/conda-envs/r-env/lib/R/library")

The last 3 lines are required to get the environment working in our implementation of RStudio IDE and R Shiny dashboard.

Debian package

Normally, in Ubuntu and Debian, you can install packages using apt-get or apt. However, since user Servers are containerazied and don’t have root rights, you can’t install packages using apt-get or apt. Instead, you can download the package, install it in a modules directory using dpkg and point to it using LD_LIBRARY_PATH and PATH environment variables.

  1. Install binaries

  2. Download the package from the official repository. For example, to download libmariadb-dev package, you can use the following command:

wget http://security.ubuntu.com/ubuntu/pool/universe/m/mariadb-10.6/libmariadb-dev_10.6.16-0ubuntu0.22.04.1_amd64.deb
  1. Install the package in the modules directory

dpkg -x libmariadb-dev_10.6.16-0ubuntu0.22.04.1_amd64.deb /opt/modules/my/dpkg/libmariadb-dev
  1. Create environment module file at /opt/modules/my/modulefiles/libmariadb-dev/10.6.16.lua.

Open the file in an editor and add the following content:

help([[
  Debian package libmariadb-dev
]])

whatis("Version: 10.6.16")
whatis("Keywords: Database, Development, C")

append_path("INCLUDE_DIR", "/opt/modules/my/dpkg/libmariadb-dev/usr/include")
append_path("LIB_DIR", "/opt/modules/my/dpkg/libmariadb-dev/usr/lib")
append_path("LD_LIBRARY_PATH", "/opt/modules/my/dpkg/libmariadb-dev/usr/lib")
append_path("PATH", "/opt/modules/my/dpkg/libmariadb-dev/usr/bin")

References: