CSI Cinder Controller Plugin Times out retrieving auth tokens
In recent versions of the cloud-provider-openstack the CSI plugin fails to authenticate. The failing part is:
cinder-csi-plugin:
Container ID: containerd://21d42c60c6880dcd5ad1a3113cc282a952127b5a05cda4c217a8160883e904c2
Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.26.2
Image ID: docker.io/k8scloudprovider/cinder-csi-plugin@sha256:35ffa1d58fdfb86cb3093b1f6f8972504e13360fe985ebf2033a894a35b25557
Port: 9808/TCP
Host Port: 0/TCP
Args:
/bin/cinder-csi-plugin
--endpoint=$(CSI_ENDPOINT)
--cloud-config=$(CLOUD_CONFIG)
--cluster=$(CLUSTER_NAME)
--v=1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Mar 2023 09:58:46 +0000
Finished: Fri, 17 Mar 2023 09:59:16 +0000
Ready: False
Restart Count: 6
Liveness: http-get http://:healthz/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix://csi/csi.sock
CLOUD_CONFIG: /etc/config/cloud.conf
CLUSTER_NAME: kubernetes
Mounts:
/csi from socket-dir (rw)
/etc/config from secret-cinderplugin (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rw65z (ro)
The logs from the failing container are:
I0317 09:58:46.633451 1 driver.go:81] Driver: cinder.csi.openstack.org
I0317 09:58:46.633525 1 driver.go:82] Driver version: 2.0.0@
I0317 09:58:46.633528 1 driver.go:83] CSI Spec version: 1.3.0
I0317 09:58:46.633534 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES
I0317 09:58:46.633538 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME
I0317 09:58:46.633542 1 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0317 09:58:46.633545 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0317 09:58:46.633548 1 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS
I0317 09:58:46.633554 1 driver.go:115] Enabling controller service capability: EXPAND_VOLUME
I0317 09:58:46.633557 1 driver.go:115] Enabling controller service capability: CLONE_VOLUME
I0317 09:58:46.633559 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0317 09:58:46.633561 1 driver.go:115] Enabling controller service capability: GET_VOLUME
I0317 09:58:46.633564 1 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER
I0317 09:58:46.633567 1 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0317 09:58:46.633570 1 driver.go:135] Enabling node service capability: EXPAND_VOLUME
I0317 09:58:46.633572 1 driver.go:135] Enabling node service capability: GET_VOLUME_STATS
I0317 09:58:46.633743 1 openstack.go:90] Block storage opts: {0 false true false}
W0317 09:59:16.637744 1 main.go:105] Failed to GetOpenStackProvider: Post "https://hpccloud.mpcdf.mpg.de:13000/v3/auth/tokens": dial tcp: lookup hpccloud.mpcdf.mpg.de: i/o timeout
I fail to reproduce this behavior on my laptop:
fberg:~/ $ podman run -p 9808:9808 -v $PWD/cloud.config:/cloud.config -it docker.io/k8scloudprovider/cinder-csi-plugin:v1.26.2 /bin/sh
# mkdir /csi
# /bin/cinder-csi-plugin --endpoint=unix://csi/csi.sock --cloud-config=/cloud.config --cluster=kubernetes --v=1
I0317 11:08:20.226182 17 driver.go:81] Driver: cinder.csi.openstack.org
I0317 11:08:20.226243 17 driver.go:82] Driver version: 2.0.0@
I0317 11:08:20.226252 17 driver.go:83] CSI Spec version: 1.3.0
I0317 11:08:20.226268 17 driver.go:115] Enabling controller service capability: LIST_VOLUMES
I0317 11:08:20.226279 17 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME
I0317 11:08:20.226287 17 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0317 11:08:20.226296 17 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0317 11:08:20.226304 17 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS
I0317 11:08:20.226314 17 driver.go:115] Enabling controller service capability: EXPAND_VOLUME
I0317 11:08:20.226322 17 driver.go:115] Enabling controller service capability: CLONE_VOLUME
I0317 11:08:20.226330 17 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0317 11:08:20.226351 17 driver.go:115] Enabling controller service capability: GET_VOLUME
I0317 11:08:20.226361 17 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER
I0317 11:08:20.226371 17 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0317 11:08:20.226380 17 driver.go:135] Enabling node service capability: EXPAND_VOLUME
I0317 11:08:20.226389 17 driver.go:135] Enabling node service capability: GET_VOLUME_STATS
I0317 11:08:20.226694 17 openstack.go:90] Block storage opts: {0 false true false}
I0317 11:08:20.915013 17 server.go:106] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I am surprised to find things working. Do you guys have any ideas what the issue may be? I am going to try on a cloud node in the same network as the controllers to check if it is a network thing ... but why should the software version matter then ...
Cheers, -Frank