# Condainer - Compressed Conda environments for HPC systems


## TL;DR - Quick start guide
## TL;DR - Quick start guide
Condainer puts Conda environments into compressed squashfs images which makes
Condainer puts Conda environments into compressed (squashfs) images which makes
the use of such environments portable and more efficient, in particular on HPC
the use of such environments portable and more efficient, in particular on HPC
systems. These environments (respectively images) are standalone and completely
systems. These Condainer environments are standalone, and sidestep the typical
avoid the integration of a specific `conda` executable into the user's `.bashrc`
integration of a specific `conda` executable into the user's `.bashrc` file
file which often causes issues, for example on HPC systems.
completely, which often causes issues, for example with the software environment
on HPC systems.
### Build a compressed environment
### Build a compressed environment
Starting in an empty directory, use the following commands once to build a compressed image of your Conda environment, defined by 'environment.yml':
Starting in an empty directory, use the following commands once to build a
compressed image of your Conda environment that is defined in 'environment.yml':
```bash
```bash
cnd init
cnd init
ls
ls
# edit the provided example 'environment.yml' file, or copy your own file here, before running
# edit the example 'environment.yml' file, or copy your own file here, before running
cnd build
cnd build
ls
ls
```
```
### Activate a compressed environment
### Activate a compressed environment
After building successfully, you can activate the environment for your current shell session, just like with plain conda:
After building successfully you can activate the environment for your current
shell session, sililar to plain Conda or to a Python virtual environment:
```bash
```bash
source activate
source activate
```
```
### Alternatively, run an executable from a compressed environment without activating it
### Alternatively, run an executable from a compressed environment directly
In case you do not want to activate the environment, you can run individual executables from the environment transparently, e.g.
In case you do not want to activate the environment, you can run individual executables from the environment transparently, e.g.
...
@@ -38,13 +41,13 @@ In case you do not want to activate the environment, you can run individual exec
...
@@ -38,13 +41,13 @@ In case you do not want to activate the environment, you can run individual exec
cnd exec-- python3
cnd exec-- python3
```
```
Please see the sections below for more detailed explanations and more options.
See the sections below for more detailed explanations and more options.
## Background
## Background
### Problem: Conda environments on HPC systems
### Often a Problem: Conda environments on HPC file systems
The Conda package manager and related workflows have become an
The Conda package manager and the related workflows have become an
adopted standard when it comes to distributing scientific software
adopted standard when it comes to distributing scientific software
for easy installation by end users. It not only handles native
for easy installation by end users. It not only handles native
Python packages but also manages dependencies in the form of
Python packages but also manages dependencies in the form of
...
@@ -52,78 +55,82 @@ binary blobs, such as third-party libraries that are provided as
...
@@ -52,78 +55,82 @@ binary blobs, such as third-party libraries that are provided as
shared objects. Using `conda`, complex software environments can
shared objects. Using `conda`, complex software environments can
be defined by means of simple descriptive `environment.yml` files.
be defined by means of simple descriptive `environment.yml` files.
Large environments can easily amount to several 100k individual
Once installed, large environments can easily amount to several 100k individual
(small) files. On a local desktop file system, this is typically not
(small) files. On a local desktop file system, this is typically not an issue.
an issue. However, in particular on the large shared parallel file
However, in particular on the large shared parallel file systems of HPC systems,
systems of HPC systems, the vast amount of small files can cause
the vast amount of small files can cause issues as these filesystems are
severe trouble as these filesystems are optimized for different IO
optimized for different scenarios. Inode exhaustion and heavy load due to
patterns. Inode exhaustion, and heavy load due to (millions of) file
(millions of) file opens, short reads, and closes happening during the startup
opens, short reads, and closes during the startup phase of
phase of (parallel) Python jobs from the different users on the system are only
(parallel) Python jobs from numerous different users on the HPC
two examples.
cluster are only two examples.
### Solution: Put Conda environments into compressed image files
### Solution: Move Conda environments into compressed image files
Condainer solves these issues by putting conda environments into
Condainer adresses these issues by moving Conda environments into
compressed squashfs images, reducing the number of files
compressed squashfs images, reducing the number of files
stored directly on the host file system by orders of magnitude.
stored on the host file system directly by orders of magnitude.
Condainer images are standalone and portable, i.e., they can be
Condainer images are standalone and portable, i.e., they can be
copied between different systems, improving reproducibility
copied between different systems, improving reproducibility
and reusability of provenworking software environments.
and reusability of proven-to-work software environments.
Technically, Condainer uses the Python basis from Miniforge
Technically, Condainer uses the Python basis from Miniforge
(which is a free alternative similar to Miniconda) and installs the
(which is a free alternative similar to Miniconda) and then installs the
software stack defined by the user via an`environment.yml`into an environment.
software stack defined by the user based on the usual`environment.yml`file.
Package resolution and installation are extremely fast thanks to the
Package resolution and installation are extremely fast thanks to the
`mamba` package manager (an optimized replacement for `conda`).
`mamba` package manager (an optimized replacement for `conda`).
As a second step, Condainer creates a compressed squashfs image file
As a second step, Condainer creates a compressed squashfs image file
from that installation, before it deletes the latter to save disk
from that installation, before it deletes the latter to save disk
space. The compressed image is then mounted at the very same
space. The compressed image is then mounted at the very same
directory, providing the complete conda environment to
directory, providing the complete Conda environment to
the user who can `activate` or `deactivate` it, just as usual. Moreover,
the user who can `activate` or `deactivate` it, just as usual. Moreover,
Condainer provides a wrapper to run executables from the
Condainer provides a wrapper to run executables from the
conda environment directly and transparently, without the need to
Conda environment directly and transparently, without the need to
explicitly mount and unmount the image.
explicitly mount and unmount the image.
Please note that the squashfs images used by Condainer are not "containers"
Please note that the squashfs images used by Condainer are not "containers"
in the strict terminology of Docker, Apptainer, etc. With Condainer,
in the strict terminology of Docker, Apptainer, etc. With Condainer,
there is no encapsulation, isolation, or similar, rather Condainer
there is no encapsulation, isolation, or similar, rather Condainer
is an easy-to-use wrapper around the building, compressing,
is an easy-to-use wrapper around the building, compressing,
mounting, and unmounting of conda environments on top of compressed
mounting, and unmounting of Conda environments on top of compressed
image files.
image files.
## Installation
## Installation
After cloning the repository, Condainer can be installed using pip, e.g. using
After cloning the repository, Condainer can be installed via `pip``, e.g. using the command
`pip install --user .`
`pip install --user .`
which would place the executable `cnd` into `~/.local/bin`.
which would place the executable `cnd` into `~/.local/bin` in the user's homedirectory.
## Usage
## Usage
The Condainer executable is `cnd` and is controlled via subcommands and flags.
The Condainer executable is `cnd` and is controlled via subcommands and flags.
See `cnd --help` for full details.
See `cnd --help` for full details. The following subcommands are available for `cnd`
The following subcommands are available for `cnd`:
and are described briefly below.
### Initialize a project using `cnd init`
### Initialize a project using `cnd init`
Create an empty directory, enter it, and run `cnd init` to
Create an empty directory, enter it, and run `cnd init` to create a skeleton for
create a skeleton for a condainer project. You may edit
a Condainer project. Optionally, you may inspect and edit the config file
`condainer.yml`, and, importantly, add your `environment.yml` file
`condainer.yml`. Importantly, add your `environment.yml` file to the same
to the same directory.
directory.
### Build and compress an environment using `cnd build`
### Build and compress an environment using `cnd build`
Build the conda environment specified in `environment.yml`. In case
Build the Conda environment specified in `environment.yml`. In case a file
a file `requirements.txt` is present, its contents will be installed
`requirements.txt` is present, its contents will be installed in addition using
in addition, using `pip`. Finally, create a compressed
`pip`. Finally, create a compressed squashfs image, and delete the files from
squashfs image, and delete the files from the staging environment.
the staging process.
To stage the files for the Conda environment, a uniquely named directory below
the base directory (as specified in `condainer.yml`) is used. By default, the base
directory is `/tmp`. The unique subdirectory name is of the form `condainer-UUID`
where UUID is a type4 UUID generated and saved during `cnd init`.
### Execute a command using `cnd exec`
### Execute a command using `cnd exec`
Using a command of the form `cnd exec -- python3 myscript.py`
Using a command of the form `cnd exec -- python3 myscript.py`
it is possible to run executables from the contained conda
it is possible to run executables from the compressed Conda
environment directly, in the present example the Python interpreter
environment directly, in the present example the Python interpreter
`python3`. Mounting and unmounting of the squashfs image are
`python3`. Mounting and unmounting of the squashfs image are
handled automatically and invisibly to the user. Note that the '--'
handled automatically and invisibly to the user. Note that the '--'
...
@@ -137,19 +144,23 @@ In the project directory, run `source activate` to activate the
...
@@ -137,19 +144,23 @@ In the project directory, run `source activate` to activate the
compressed environment for your current shell session. Similarly,
compressed environment for your current shell session. Similarly,
run `source deactivate` to deactivate it.
run `source deactivate` to deactivate it.
Once activated, the compressed environment is available just like
Once activated, the compressed environment is available just like
normal, however read-only.
normal, however, in read-only mode.
### Explicitly mount the squashfs image using `cnd mount`
### Explicitly mount the squashfs image using `cnd mount`
The command `cnd mount` mounts the squashfs image at the base
The command `cnd mount` mounts the squashfs image below the base directory that
location specified in `condainer.yml`. Mount points have the form of
is specified in `condainer.yml`. Hints on activating and deactivating the Conda
`cnd-UUID` where UUID is the type4 UUID generated and saved
environment are printed.
during `cnd init`. Hints on activating and deactivating the
conda environment are printed.
Consistent with the `cnd build` step, the mount point is identical to the
directory used during staging and building, such that the absolute paths to the
files are unchanged between build and mount.
### Explicitly unmount the squashfs image using `cnd umount`
### Explicitly un-mount the squashfs image using `cnd umount`
Unmount the image, if mounted.
Unmount the image, if mounted. Make sure to run `conda deactivate`
Make sure to run `conda deactivate`
in all relevant shell sessions prior to unmounting.
in all relevant shell sessions prior to unmounting.
### Print information using `cnd status`
### Print information using `cnd status`
...
@@ -158,21 +169,20 @@ Show some information and the mount status of the image.
...
@@ -158,21 +169,20 @@ Show some information and the mount status of the image.
### Check if the necessary tools are available using `cnd prereq`
### Check if the necessary tools are available using `cnd prereq`
Check and show if the required software is locally available, also see
Check and show if the required software is locally available (see below).
below.
## System requirements
## System requirements
Condainer should work on any recent Linux system and expects the following set
Condainer works on any recent Linux system and expects the following set
of tools available and enabled for non-privileged users:
of tools available and enabled for non-privileged users:
* fuse
* fuse
* squashfuse
* squashfuse
* squashfs tools
* squashfs tools
On an Ubuntu (or similar) system, run (as root) the command
On an Ubuntu (or similar) system, run the command
`apt install squashfs-tools squashfuse`
`sudo apt install squashfs-tools squashfuse`
to install the necessary tools. In addition `curl` is required to download
to install the necessary tools. In addition `curl` is required to download
the Miniforge installer, in case it is not available locally.
the Miniforge installer, in case it is not available locally.
...
@@ -185,8 +195,9 @@ No installer is downloaded in case that variable is defined.
...
@@ -185,8 +195,9 @@ No installer is downloaded in case that variable is defined.
## Features and Limitations
## Features and Limitations
* Any valid `environment.yml` should work, there is no lock-in effect when using Condainer, and you can use the same `environment.yml` with plain Conda elsewhere.
* Any valid `environment.yml` will work with Condainer, there is no lock-in when using Condainer, as you can use the same `environment.yml` with plain Conda elsewhere.
* Condainer environments are read-only and immutable. In case you need to add packages, rebuild the image. (You can toggle between multiple existing squashfs images by editing the UUID string in `condainer.yml`.)
* Condainer environments are read-only and immutable. In case you need to add packages, rebuild the image.
* Within the same project, when experimenting, you can toggle between multiple existing squashfs images by editing the UUID string in `condainer.yml`.