diff --git a/README.md b/README.md index 29334cb45ac8f2983dfab358003eb76e3bed7bea..ea8a7edb9d21edba7d0fe70b544696bbb552fe6a 100644 --- a/README.md +++ b/README.md @@ -3,20 +3,19 @@ ## TL;DR - Quick start guide Condainer puts Conda environments into compressed squashfs images which makes -the use of such environments much more efficient, in particular on HPC systems. -To this end, Condainer implements some lightweight container-like functionality. +the use of such environments portable and more efficient, in particular on HPC systems. -### Build the compressed environment +### Build a compressed environment -Starting in an empty directory, use the following commands once to build a compressed image of your Conda environment defined by 'environment.yml': +Starting in an empty directory, use the following commands once to build a compressed image of your Conda environment, defined by 'environment.yml': ```bash cnd init -# now, edit the provided example 'environment.yml' file, or copy your own file here, before running +# edit the provided example 'environment.yml' file, or copy your own file here, before running cnd build ``` -### Activate the compressed environment +### Activate a compressed environment After building successfully, you can activate the environment for your current shell session, just like with plain conda: @@ -24,7 +23,7 @@ After building successfully, you can activate the environment for your current s source activate ``` -### Alternatively, run an executable from the compressed environment without activating it +### Alternatively, run an executable from a compressed environment without activating it In case you do not want to activate the environment, you can run individual executables from the environment transparently, e.g. @@ -47,38 +46,39 @@ shared objects. Using `conda`, complex software environments can be defined by means of simple descriptive `environment.yml` files. Large environments can easily amount to several 100k individual -small files. On a local desktop file system, this is typically not +(small) files. On a local desktop file system, this is typically not an issue. However, in particular on the large shared parallel file systems of HPC systems, the vast amount of small files can cause -severe trouble as these filesystems are optimized for different +severe trouble as these filesystems are optimized for different IO patterns. Inode exhaustion, and heavy load due to (millions of) file opens, short reads, and closes during the startup phase of (parallel) Python jobs from numerous different users on the HPC cluster are only two examples. -### Solution: Containerization of Conda environments into compressed images +### Solution: Put Conda environments into compressed image files Condainer solves these issues by putting conda environments into -compressed squashfs images, reducing the number of files explicitly -stored on the host file system by many orders of magnitude. +compressed squashfs images, reducing the number of files +stored directly on the host file system by orders of magnitude. Condainer images are standalone and portable, i.e., they can be -copied between different systems, adding value to reproducibility +copied between different systems, improving reproducibility and reusability of proven working software environments. Technically, Condainer uses the Python basis from `Miniforge` -(which is a free alternative to Miniconda) and installs arbitrary -software defined by the user via an `environment.yml` on top. +(which is a free alternative similar to Miniconda) and installs the +software stack defined by the user via an `environment.yml` into a nested environment. Package resolution and installation are extremely fast thanks to the `mamba` package manager (an optimized replacement for `conda`). -As a second step, Condainer creates a compressed squashfs image +As a second step, Condainer creates a compressed squashfs image file from that installation, before it deletes the latter to save disk space. The compressed image is then mounted at the very same -installation directory, providing the complete conda environment to +directory, providing the complete conda environment to the user who can `activate` or `deactivate` it, just as usual. Moreover, -Condainer provides a wrapper to run executables from the contained -conda environment directly and transparently. +Condainer provides a wrapper to run executables from the +conda environment directly and transparently, without the need to +explicitly mount and unmount the image. -Note that the squashfs images used by Condainer are not "containers" +Please note that the squashfs images used by Condainer are not "containers" in the strict terminology of Docker, Apptainer, etc. With Condainer, there is no encapsulation, isolation, or similar, rather Condainer is an easy-to-use wrapper around the building, compressing, @@ -95,7 +95,7 @@ which would place the executable `cnd` into `~/.local/bin`. ## Usage -The Condainer executable is `cnd` and is controlled via subcommands and flags. See `cnd --help` for details. +The Condainer executable is `cnd` and is controlled via subcommands and flags. See `cnd --help` for full details. The following subcommands are available with Condainer: ### Initialize a project using `cnd init` @@ -109,14 +109,14 @@ to the same directory. Build the conda environment specified in `environment.yml`. In case a file `requirements.txt` is present, its contents will be installed -additionally using `pip`. Finally, create a compressed -squashfs image, and delete the files from staging the environment. +in addition, using `pip`. Finally, create a compressed +squashfs image, and delete the files from the staging environment. ### Execute a command using `cnd exec` Using a command of the form `cnd exec -- python3 myscript.py` it is possible to run executables from the contained conda -installation directly, in the present example the Python interpreter +environment directly, in the present example the Python interpreter `python3`. Mounting and unmounting of the squashfs image are handled automatically and invisibly to the user. Note that the '--' is a necessary separator to be able to pass arguments and flags to @@ -125,9 +125,11 @@ flags. ### Activate the environment -In the project directory, run `source activate` to activate the +In the project directory, run `source activate` to activate the compressed environment for your current shell session. Similarly, run `source deactivate` to deactivate it. +Once activated, the compressed environment is available just like +normal, however read-only. ### Explicitly mount the squashfs image using `cnd mount` @@ -139,7 +141,7 @@ conda environment are printed. ### Explicitly un-mount the squashfs image using `cnd umount` -Unmount the image, if mounted. Make sure to run `conda deactivate` +Unmount the image, if mounted. Make sure to run `conda deactivate` in all relevant shell sessions prior to unmounting. ### Print information using `cnd status` @@ -156,13 +158,23 @@ below. Condainer should work on any recent Linux system and expects the following set of tools available and enabled for non-privileged users: -* squashfs tools * fuse * squashfuse -* curl +* squashfs tools On an Ubuntu (or similar) system, run (as root) the command `apt install squashfs-tools squashfuse` -to install the necessary tools. +to install the necessary tools. In addition `curl` is required to download +the Miniforge installer, in case it is not available locally. + +## Environment variables + +The environment variable `CONDAINER_INSTALLER` allows to specify the full file +path to a Miniforge installer, e.g. to provide it centrally on a cluster. +No installer is downloaded in case that variable is defined. + +## Contact + +Copyright © 2023 Klaus Reuter <klaus.reuter@mpcdf.mpg.de>, Max Planck Computing and Data Facility diff --git a/condainer/condainer.py b/condainer/condainer.py index 6790a801ee91b0f8d12132a95bc898ebd8cc86a1..caddda48b0784400e12a76a1c0f1476a8dc2482c 100644 --- a/condainer/condainer.py +++ b/condainer/condainer.py @@ -87,8 +87,8 @@ def get_env_directory(cfg): def get_installer_path(cfg): - """Return the path to the Miniconda/Miniforge installer, either the full path including the filename, - or the filename alone, assuming that it has been downloaded to the Condainer project directory. + """Return the path to the Miniforge installer, either the full path including the filename, + or the filename alone, assuming that it has been downloaded to the Condainer project directory already. """ if cfg['installer_url'].startswith('http'): return os.path.basename(cfg['installer_url']) @@ -261,10 +261,20 @@ def get_squashfs_num_threads(): return n_cores -def compress_environment(cfg): +def compress_environment(cfg, read_only_flags=True): """Create squashfs image from base environment. """ env_directory = get_env_directory(cfg) + # explicitly set read-only flags before compressing + if read_only_flags: + cmd = f"chmod -R a-w {env_directory}".split() + if cfg.get("dryrun"): + print(f"dryrun: {' '.join(cmd)}") + else: + proc = subprocess.Popen(cmd, shell=False) + proc.communicate() + # assert(proc.returncode == 0) + # compress files into image squashfs_image = get_image_filename(cfg) num_threads = get_squashfs_num_threads() cmd = f"mksquashfs {env_directory}/ {squashfs_image} -noappend -processors {num_threads}".split() @@ -274,6 +284,15 @@ def compress_environment(cfg): proc = subprocess.Popen(cmd, shell=False) proc.communicate() assert(proc.returncode == 0) + # restore permissions, allowing to delete the staging directory later + if read_only_flags: + cmd = f"chmod -R u+w {env_directory}".split() + if cfg.get("dryrun"): + print(f"dryrun: {' '.join(cmd)}") + else: + proc = subprocess.Popen(cmd, shell=False) + proc.communicate() + # assert(proc.returncode == 0) def run_cmd(args, cwd): @@ -315,8 +334,8 @@ def init(args): cfg["requirements_txt"] = 'requirements.txt' cfg['installer_url'] = installer_url cfg['conda_exe'] = 'mamba' - # --- advanced: non-conda application, e.g. Matlab, default False --- - cfg['non_conda_application'] = args.non_conda_application + # Advanced: non-conda application, e.g. Matlab, default False --- + # cfg['non_conda_application'] = args.non_conda_application # The following flag can be added later to the config file, e.g. when building and compressing via the OBS # For some applications, this would work (Matlab), for others not (Conda): #cfg['multiuser_mountpoint'] = False diff --git a/setup.py b/setup.py index fc93235e02e7ab42f21c78862870eb088205921b..2002e22bd1caedf915f9fd5352f27fb6d8068464 100644 --- a/setup.py +++ b/setup.py @@ -6,7 +6,7 @@ long_description = (base_dir / "README.md").read_text() setup( name='condainer', - version='0.1.7', + version='0.1.8', description='Build, manage, and run compressed squashfs images of Conda environments transparently on HPC or elsewhere.', long_description = long_description, author='Klaus Reuter',