Skip to content
Snippets Groups Projects
Frank Berghaus's avatar
Frank Berghaus authored
Update Calico networking configuration and set compatible k8s
versions
f51eb688
History

Kubernetes on the MPCDF HPC Cloud

This guide sets up a production Kubernetes, for a testing/development version check out the dev branch.

The [Heat orchestration template](https://docs.openstack.org/heat/ussuri/ template_guide/hot_spec.html) "Magnum ohne Magnum" (MOM) described below automates the deployment of a production-ready Kubernetes cluster on the MPCDF HPC Cloud, including "out-of-the-box" [support](https://github.com/kubernetes/ cloud-provider-openstack) for persistent storage and load balancers.

For an equivalent, non-templatized procedure, see the step-by-step version.

Deployment

Dashboard

  1. Create an application credential with default settings. Record the secret somewhere safe.
openstack application credential create $APP_CRED_NAME
  1. Launch a new orchestration stack.
    • Select the template mom-template.yaml as a local file or URL.
    • Provide (at least) the application credential id and secret, as well as the keypair you want to use to login to the SSH gateway node.
edit mom-env.yaml  # fill-in (at least) the required parameters
openstack stack create $STACK_NAME -t mom-template.yaml -e mom-env.yaml

Scaling

The number and/or size of the worker nodes may be changed after the initial deployment, as well as the size of the controller. The command-line client makes this easy, for example:

openstack stack update $STACK_NAME --existing --parameter worker_count=$COUNT

Only the changed parameters need to be mentioned. When changing the worker flavor, there will be a rolling reboot of the nodes, one per 90 seconds. Scaling is also possible via the dashboard through the "Change Stack Template" action. Be sure to provide the exact same version of the template.

Administration

You can login to the gateway via its external IP, found on the dashboard in the "Output" section of the "Overview" tab or with:

openstack stack output show STACK_NAME gateway_ip -f value -c output_value
ssh GATEWAY_IP -l root

If you are not in the Garching campus network you will need to use one of the SSH gateways to reach the gateway machine, more information see the connecting documentation.

The tools kubectl and helm as well as the administrative credentials for your Kubernetes cluster are installed on the SSH Gateway. Try:

kubectl get node -o wide

The control plane and worker nodes can be reached via the SSH gateway:

ssh -i ~/.ssh/id_rsa root@IP

Remote Clients

  1. Download /root/.kube/config from the gateway to your local machine
  2. Run export KUBECONFIG=config, or add the contents of the config to your existing environment with kubectl config set-cluster, etc.

Tools such as kubectl should now work out-of-the-box, provided the connections originate from the specified API client network. This parameter may be updated as necessary, for example to support off-site administrators. In this case it is recommended to choose the smallest possible range.

Example Usage

  • Externally-accessible service

    kubectl apply -f examples/svc-demo.yaml
    kubectl get svc svc-demo  # note external ip
    curl http://$SERVICE_IP
  • Ingress-managed endpoints

    kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/cloud/deploy.yaml
    kubectl get svc ingress-nginx-controller -n ingress-nginx  # note external ip
    
    kubectl apply -f examples/ingress-demo.yaml
    curl http://$INGRESS_IP/demo/
  • Pod with persistent storage

    kubectl apply -f examples/pvc-demo.yaml
    kubectl exec pvc-demo -- /bin/sh -c "echo Hallo > /data/file.txt"
    kubectl delete pod pvc-demo
    
    kubectl apply -f examples/pvc-demo.yaml
    kubectl exec pvc-demo -- cat /data/file.txt

Limitations

  • The external network, application credential, and key pair cannot be changed after the initial deployment
  • Load balancers are not automatically removed prior to stack deletion, which blocks stack deletion. If possible, delete these resources from Kubernetes beforehand
  • Volumes are also not removed automatically but do not block stack deletion
  • Kubernetes upgrades and certificate renewal must be performed manually
  • containerd is the only supported CRI
  • Calico with VXLAN overlay is the only supported CNI