cloud issues
https://gitlab.mpcdf.mpg.de/groups/mpcdf/cloud/-/issues
2023-10-17T09:47:42Z
https://gitlab.mpcdf.mpg.de/mpcdf/cloud/kubernetes/-/issues/5
CSI Cinder Controller Plugin Times out
2023-10-17T09:47:42Z
Adam Fekete
CSI Cinder Controller Plugin Times out
On the surface it looks fine but some of the core pods are crashing from time to time:
```
$ kubectl get all -A | grep "ago)"
kube-system pod/calico-kube-controllers-57b57c56f-vkt4s 1/1 Running 2 (22h ag...
On the surface it looks fine but some of the core pods are crashing from time to time:
```
$ kubectl get all -A | grep "ago)"
kube-system pod/calico-kube-controllers-57b57c56f-vkt4s 1/1 Running 2 (22h ago) 2d11h
kube-system pod/csi-cinder-controllerplugin-64b5578777-28tg2 6/6 Running 53 (53m ago) 2d11h
kube-system pod/kube-apiserver-nomad-control-plane-0 1/1 Running 2 (22h ago) 2d11h
kube-system pod/kube-apiserver-nomad-control-plane-1 1/1 Running 1 (23h ago) 2d11h
kube-system pod/kube-apiserver-nomad-control-plane-2 1/1 Running 2 (23h ago) 2d11h
kube-system pod/kube-controller-manager-nomad-control-plane-0 1/1 Running 4 (22h ago) 2d11h
kube-system pod/kube-controller-manager-nomad-control-plane-1 1/1 Running 3 (16h ago) 2d11h
kube-system pod/kube-controller-manager-nomad-control-plane-2 1/1 Running 4 (22h ago) 2d11h
kube-system pod/kube-scheduler-nomad-control-plane-0 1/1 Running 5 (23h ago) 2d11h
kube-system pod/kube-scheduler-nomad-control-plane-1 1/1 Running 5 (16h ago) 2d11h
kube-system pod/kube-scheduler-nomad-control-plane-2 1/1 Running 2 (22h ago) 2d11h
kube-system pod/openstack-cloud-controller-manager-6bcrj 1/1 Running 4 (23h ago) 2d11h
kube-system pod/openstack-cloud-controller-manager-7mbc4 1/1 Running 2 (23h ago) 2d11h
kube-system pod/openstack-cloud-controller-manager-r5df4 1/1 Running 6 (16h ago) 2d11h
nomad-system pod/cert-manager-7b5cc56d74-svj4b 1/1 Running 3 (22h ago) 2d9h
nomad-system pod/cert-manager-cainjector-7d948796d5-nx4jq 1/1 Running 2 (22h ago) 2d9h
```
Brian Standley
brian.standley@mpcdf.mpg.de
Brian Standley
brian.standley@mpcdf.mpg.de
https://gitlab.mpcdf.mpg.de/mpcdf/cloud/kubernetes/-/issues/1
CSI Cinder Controller Plugin Times out retrieving auth tokens
2023-06-28T12:52:59Z
Frank Berghaus
CSI Cinder Controller Plugin Times out retrieving auth tokens
Tag: @brian @mgeier @jkennedy
In recent versions of the cloud-provider-openstack the CSI plugin fails to authenticate. The failing part is:
```
cinder-csi-plugin:
Container ID: containerd://21d42c60c6880dcd5ad1a3113cc282a952127...
Tag: @brian @mgeier @jkennedy
In recent versions of the cloud-provider-openstack the CSI plugin fails to authenticate. The failing part is:
```
cinder-csi-plugin:
Container ID: containerd://21d42c60c6880dcd5ad1a3113cc282a952127b5a05cda4c217a8160883e904c2
Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.26.2
Image ID: docker.io/k8scloudprovider/cinder-csi-plugin@sha256:35ffa1d58fdfb86cb3093b1f6f8972504e13360fe985ebf2033a894a35b25557
Port: 9808/TCP
Host Port: 0/TCP
Args:
/bin/cinder-csi-plugin
--endpoint=$(CSI_ENDPOINT)
--cloud-config=$(CLOUD_CONFIG)
--cluster=$(CLUSTER_NAME)
--v=1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Mar 2023 09:58:46 +0000
Finished: Fri, 17 Mar 2023 09:59:16 +0000
Ready: False
Restart Count: 6
Liveness: http-get http://:healthz/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix://csi/csi.sock
CLOUD_CONFIG: /etc/config/cloud.conf
CLUSTER_NAME: kubernetes
Mounts:
/csi from socket-dir (rw)
/etc/config from secret-cinderplugin (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rw65z (ro)
```
The logs from the failing container are:
```
I0317 09:58:46.633451 1 driver.go:81] Driver: cinder.csi.openstack.org
I0317 09:58:46.633525 1 driver.go:82] Driver version: 2.0.0@
I0317 09:58:46.633528 1 driver.go:83] CSI Spec version: 1.3.0
I0317 09:58:46.633534 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES
I0317 09:58:46.633538 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME
I0317 09:58:46.633542 1 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0317 09:58:46.633545 1 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0317 09:58:46.633548 1 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS
I0317 09:58:46.633554 1 driver.go:115] Enabling controller service capability: EXPAND_VOLUME
I0317 09:58:46.633557 1 driver.go:115] Enabling controller service capability: CLONE_VOLUME
I0317 09:58:46.633559 1 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0317 09:58:46.633561 1 driver.go:115] Enabling controller service capability: GET_VOLUME
I0317 09:58:46.633564 1 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER
I0317 09:58:46.633567 1 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0317 09:58:46.633570 1 driver.go:135] Enabling node service capability: EXPAND_VOLUME
I0317 09:58:46.633572 1 driver.go:135] Enabling node service capability: GET_VOLUME_STATS
I0317 09:58:46.633743 1 openstack.go:90] Block storage opts: {0 false true false}
W0317 09:59:16.637744 1 main.go:105] Failed to GetOpenStackProvider: Post "https://hpccloud.mpcdf.mpg.de:13000/v3/auth/tokens": dial tcp: lookup hpccloud.mpcdf.mpg.de: i/o timeout
```
I fail to reproduce this behavior on my laptop:
```
fberg:~/ $ podman run -p 9808:9808 -v $PWD/cloud.config:/cloud.config -it docker.io/k8scloudprovider/cinder-csi-plugin:v1.26.2 /bin/sh
# mkdir /csi
# /bin/cinder-csi-plugin --endpoint=unix://csi/csi.sock --cloud-config=/cloud.config --cluster=kubernetes --v=1
I0317 11:08:20.226182 17 driver.go:81] Driver: cinder.csi.openstack.org
I0317 11:08:20.226243 17 driver.go:82] Driver version: 2.0.0@
I0317 11:08:20.226252 17 driver.go:83] CSI Spec version: 1.3.0
I0317 11:08:20.226268 17 driver.go:115] Enabling controller service capability: LIST_VOLUMES
I0317 11:08:20.226279 17 driver.go:115] Enabling controller service capability: CREATE_DELETE_VOLUME
I0317 11:08:20.226287 17 driver.go:115] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0317 11:08:20.226296 17 driver.go:115] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0317 11:08:20.226304 17 driver.go:115] Enabling controller service capability: LIST_SNAPSHOTS
I0317 11:08:20.226314 17 driver.go:115] Enabling controller service capability: EXPAND_VOLUME
I0317 11:08:20.226322 17 driver.go:115] Enabling controller service capability: CLONE_VOLUME
I0317 11:08:20.226330 17 driver.go:115] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0317 11:08:20.226351 17 driver.go:115] Enabling controller service capability: GET_VOLUME
I0317 11:08:20.226361 17 driver.go:125] Enabling volume access mode: SINGLE_NODE_WRITER
I0317 11:08:20.226371 17 driver.go:135] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0317 11:08:20.226380 17 driver.go:135] Enabling node service capability: EXPAND_VOLUME
I0317 11:08:20.226389 17 driver.go:135] Enabling node service capability: GET_VOLUME_STATS
I0317 11:08:20.226694 17 openstack.go:90] Block storage opts: {0 false true false}
I0317 11:08:20.915013 17 server.go:106] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
```
I am surprised to find things working. Do you guys have any ideas what the issue may be? I am going to try on a cloud node in the same network as the controllers to check if it is a network thing ... but why should the software version matter then ...
Cheers,
-Frank