DOC: Running containers

7ffa4be5 · Piero Coronica · e3952d5b · 7ffa4be5
Commit 7ffa4be5 authored 1 year ago by Piero Coronica
--- a/README.md
+++ b/README.md
@@ -3,6 +3,10 @@
 Goal: compile definition files of containers for AI use cases.
 Also provide documentation on how to use them on our HPC systems.

+<!---
+[TODO]: ADD example of distributed training with containers
+-->
+
 ## Getting started
 We use [Apptainer](https://apptainer.org/docs/user/main/index.html) to build/run containers on our HPC systems.
 You will need a Linux system to run Apptainer natively on your machine, and it’s easiest to [install](https://apptainer.org/docs/user/main/quick_start.html) if you have root access.
@@ -27,7 +31,10 @@ $ apptainer pull my_apptainer.sif docker://sylabsio/lolcow:latest
 ```

 ### Convert from Docker Daemon or Docker Archive files
-
+<!---
+Piero: I would stress LOCALLY here. Docker is not, and will never be, available on our systems.
+    In any case this is usefull. What about adding it to the "Local-to-HPC Workflow" section at the bottom?
+-->
 You can also [convert images/containers](https://apptainer.org/docs/user/latest/docker_and_oci.html#containers-from-docker-hub) running in your Docker Daemon:
 ```shell
 $ apptainer build my_apptainer.sif docker-daemon:sylabsio/lolcow:latest
@@ -45,9 +52,59 @@ $ apptainer build my_apptainer.sif docker-archive:lolcow.tar

 ## Running containers

-**TODO:**
- mention important flags, like `--nv` for example
- how to run the containers on our SLURM cluster
+> **_NOTE:_**  The following code snippets assume that you have loaded an Apptainer image using environmental modules (see [TODO](link/to/docs/images/modules)). For example:
+>
+> `module load image_pytorch`
+>
+> Replace `$IMAGE_SIF` with the path to a SIF file or a reference to an OCI registry to use the same commands with your own images.
+
+### Interactive Shell
+
+To run an interactive shell in a container, use:
+```shell
+$ apptainer shell $IMAGE_SIF
+Apptainer>
+```
+The prompt will change from `$` to `Apptainer>`, indicating that you are now running commands inside the container. The shell command is useful for interactively inspecting the content of an image.
+
+### Executing commands
+
+To execute a single program, use the exec command:
+
+```shell
+$ apptainer exec $IMAGE_SIF echo "Hallo Welt!"
+Hallo Welt!
+```
+
+For more details, refer to the [Apptainer documentation](https://apptainer.org/docs/user/latest/quick_start.html#interacting-with-images).
+
+### GPU support
+
+Apptainer natively supports running application containers that use NVIDIA’s CUDA or AMD’s ROCm GPU frameworks. To utilise these accelarators on a GPU compute node, simply add the `--nv` flag to the apptainer command. For example, on Raven:
+
+```shell
+$ apptainer exec --nv $IMAGE_SIF nvidia-smi --version
+GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX)
+```
+
+For more details refer to Apptainer documentation on (GPU support)[https://apptainer.org/docs/user/latest/gpu.html]
+<!---
+[TODO]: How to run containers on our SLURM cluster
+Piero: personally I would remove this subsection. Focus always on HPC systems
+-->
+
+## Example: submitting a multi-node distributed training with pytorch lightning
+<!---
+[TODO]: 
+-->
+### python script
+<!---
+[TODO]: Add the most simple training script (Lightning?) - Or reference to scripts from somewhere else!
+-->
+### Slurm batch script
+<!---
+[TODO]: Add multinode, multi GPU batch script
+-->

 ## Using containers with RVS

@@ -107,7 +164,7 @@ For example, in the kernel spec file above we bind your `ptmp` folder.

 **TODO: The sandbox option does not work 100% correctly for VSCode or PyCharm, use docker images instead! Need to update this guide!**

-A nice workflow to develop a python library locally and deploy it on our HPO systems (sharing exactly the same environment) is to use the [*sandbox* feature](https://apptainer.org/docs/user/main/build_a_container.html#sandbox) of Apptainer.
+A nice workflow to develop a python library locally and deploy it on our HPC systems (sharing exactly the same environment) is to use the [*sandbox* feature](https://apptainer.org/docs/user/main/build_a_container.html#sandbox) of Apptainer.

 We are still investigating if something similar is possible with `Docker` (please let us know if you find a way :) ).