Skip to content
Snippets Groups Projects
Commit 67d89222 authored by Piero Coronica's avatar Piero Coronica
Browse files

Add example readme

parent 55f049dc
No related branches found
No related tags found
No related merge requests found
# PyTorch Example
This directory contains the source code, definition files, and submission scripts needed to train a ResNet50 model on synthetic data in a containerized environment.
By default, the SLURM scripts will run a small distributed training workload on 2 nodes. They can be easily modified to run on a different number of nodes or on a single device.
For instructions on setting up the containers and running the example on MPCDF HPC systems, please refer to the [raven](raven/) and [viper](viper/) directories.
\ No newline at end of file
......@@ -4,8 +4,6 @@ Use NGC containers: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorc
## BUILD
> HINT: Use a compute node, and export `APPTAINER_TMPDIR` and `APPTAINER_CACHEDIR` in `${JOB_SHMTMPDIR}/`. This will reduce **drastically** the creation of the SIF file and so the overall build time.
apptainer build nv-pytorch.sif nv-pytorch.def
## RUN
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment