diff --git a/CHANGELOG.md b/CHANGELOG.md index 3285d0a0..3c75e74d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,40 @@ ## unreleased +## v4.0.1 - 2026.06.04 + +Fixes a LUKS volume handling bug that could leave a volume stuck attached to a node after pod termination, surfacing later as a `Multi-Attach error` +when the workload was rescheduled. The node plugin now verifies that an existing LUKS mapping is backed by the expected device before reusing it, +and rejects a staging path that is already mounted from an unexpected source. + +In addition, every gRPC handler now uses a request-scoped logger, so log lines include the volume and request context that triggered them. + +### Detecting whether you were affected + +The bug surfaces in two places, neither of which is the csi-cloudscale logs: + +- `kube-controller-manager` logs (or the pod's events): + `Multi-Attach error: volume is already exclusively attached and can't be attached to another node` +- `kubelet` logs on the node listed in `attachedTo=[...]`: repeated + `GetDeviceMountRefs check failed for volume ... is still mounted by other references` entries, typically every ~2 minutes. + +### Recovering a stuck volume + +Upgrading to v4.0.1 prevents new occurrences but does not clean up volumes that are already stuck. To recover one: + +1. From the `Multi-Attach error` message, note the affected node listed in `attachedTo=[...]`. Confirm no pod is actively using the + volume on that node (`kubectl get pods --all-namespaces -o wide` and check mount refs under `/var/lib/kubelet/pods/`). +2. Cordon the node so no new workloads land on it during recovery: `kubectl cordon `. +3. On that node, inspect the stale state. They are the duplicated `.../globalmount` paths that share a `(major, minor)`: + - `findmnt | grep globalmount` +4. `umount` each stale staging path. Once `/proc/self/mountinfo` is clean, kubelet's next `NodeUnstageVolume` retry (within ~2 minutes) succeeds. + The csi-cloudscale node plugin runs `cryptsetup close` itself as part of that unstage, the `VolumeAttachment` is deleted, + and the cloudscale API detaches the volume. The next pod can then attach it elsewhere. +5. Uncordon the node: `kubectl uncordon `. + +If a leftover `/dev/mapper/pvc-` is still present after several minutes, close it manually with `cryptsetup close pvc-` once +nothing references it. If you cannot identify or clear the stale state safely, rebooting the node is a valid fallback. +Kernel state is discarded and the volume detaches on the next reconcile. + ## v4.0.0 - 2026.03.30 ⚠️ See the [update instructions](https://github.com/cloudscale-ch/csi-cloudscale#from-csi-cloudscale-v3x-to-v4x). diff --git a/README.md b/README.md index f94ff072..e3a008af 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,7 @@ secret `my-pvc-luks-key`. ## Releases The cloudscale.ch CSI plugin follows [semantic versioning](https://semver.org/). -The current version is: **`v4.0.0`**. +The current version is: **`v4.0.1`**. * Bug fixes will be released as a `PATCH` update. * New features (such as CSI spec bumps) will be released as a `MINOR` update. @@ -92,15 +92,15 @@ We recommend using the latest cloudscale.ch CSI driver compatible with your Kube | 1.25 | v3.3.0 | v3.5.6 | | 1.26 | v3.3.0 | v3.5.6 | | 1.27 | v3.3.0 | v3.5.6 | -| 1.28 | v3.3.0 | v4.0.0 | -| 1.29 | v3.3.0 | v4.0.0 | -| 1.30 | v3.3.0 | v4.0.0 | -| 1.31 | v3.3.0 | v4.0.0 | -| 1.32 | v3.3.0 | v4.0.0 | -| 1.33 | v3.3.0 | v4.0.0 | -| 1.34 [1] | v3.3.0 | v4.0.0 | -| 1.35 | v3.4.1 | v4.0.0 | -| 1.36 | v3.4.1 | v4.0.0 | +| 1.28 | v3.3.0 | v4.0.1 | +| 1.29 | v3.3.0 | v4.0.1 | +| 1.30 | v3.3.0 | v4.0.1 | +| 1.31 | v3.3.0 | v4.0.1 | +| 1.32 | v3.3.0 | v4.0.1 | +| 1.33 | v3.3.0 | v4.0.1 | +| 1.34 [1] | v3.3.0 | v4.0.1 | +| 1.35 | v3.4.1 | v4.0.1 | +| 1.36 | v3.4.1 | v4.0.1 | [1] Prometheus `kubelet_volume_stats_*` metrics not available in 1.34.0 and 1.34.1 due to a [bug in Kubelet](https://github.com/kubernetes/kubernetes/issues/133847). Fixed in `1.34.2`. @@ -216,10 +216,10 @@ $ helm install -g -n kube-system --set controller.image.tag=dev --set node.image Before you continue, be sure to checkout to a [tagged release](https://github.com/cloudscale-ch/csi-cloudscale/releases). Always use the [latest stable version](https://github.com/cloudscale-ch/csi-cloudscale/releases/latest) -For example, to use the latest stable version (`v4.0.0`) you can execute the following command: +For example, to use the latest stable version (`v4.0.1`) you can execute the following command: ``` -$ kubectl apply -f https://raw.githubusercontent.com/cloudscale-ch/csi-cloudscale/master/deploy/kubernetes/releases/csi-cloudscale-v4.0.0.yaml +$ kubectl apply -f https://raw.githubusercontent.com/cloudscale-ch/csi-cloudscale/master/deploy/kubernetes/releases/csi-cloudscale-v4.0.1.yaml ``` The storage classes `cloudscale-volume-ssd` and `cloudscale-volume-bulk` will be created. The @@ -446,15 +446,15 @@ $ git push origin After it's merged to master, [create a new Github release](https://github.com/cloudscale-ch/csi-cloudscale/releases/new) from -master with the version `v4.0.0` and then publish a new docker build: +master with the version `v4.0.1` and then publish a new docker build: ``` $ git checkout master $ make publish ``` -This will create a binary with version `v4.0.0` and docker image pushed to -`cloudscalech/cloudscale-csi-plugin:v4.0.0` +This will create a binary with version `v4.0.1` and docker image pushed to +`cloudscalech/cloudscale-csi-plugin:v4.0.1` ### Release a pre-release version diff --git a/VERSION b/VERSION index 857572fc..82f24fdf 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -v4.0.0 +v4.0.1 diff --git a/charts/csi-cloudscale/Chart.yaml b/charts/csi-cloudscale/Chart.yaml index a6183815..52e4b2e1 100644 --- a/charts/csi-cloudscale/Chart.yaml +++ b/charts/csi-cloudscale/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: csi-cloudscale description: A Container Storage Interface Driver for cloudscale.ch volumes. type: application -version: 1.5.0 -appVersion: "4.0.0" +version: 1.5.1 +appVersion: "4.0.1" home: https://github.com/cloudscale-ch/csi-cloudscale sources: - https://github.com/cloudscale-ch/csi-cloudscale.git diff --git a/charts/csi-cloudscale/values.yaml b/charts/csi-cloudscale/values.yaml index 1d45c3bc..727180f5 100644 --- a/charts/csi-cloudscale/values.yaml +++ b/charts/csi-cloudscale/values.yaml @@ -107,7 +107,7 @@ controller: image: registry: quay.io repository: cloudscalech/cloudscale-csi-plugin - tag: v4.0.0 + tag: v4.0.1 pullPolicy: IfNotPresent serviceAccountName: logLevel: info @@ -123,7 +123,7 @@ node: image: registry: quay.io repository: cloudscalech/cloudscale-csi-plugin - tag: v4.0.0 + tag: v4.0.1 pullPolicy: IfNotPresent nodeSelector: {} tolerations: [] diff --git a/deploy/kubernetes/releases/csi-cloudscale-v4.0.1.yaml b/deploy/kubernetes/releases/csi-cloudscale-v4.0.1.yaml new file mode 100644 index 00000000..5b24e1ca --- /dev/null +++ b/deploy/kubernetes/releases/csi-cloudscale-v4.0.1.yaml @@ -0,0 +1,484 @@ +--- +# Source: csi-cloudscale/templates/serviceaccount.yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: csi-cloudscale-controller-sa + namespace: kube-system +--- +# Source: csi-cloudscale/templates/serviceaccount.yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: csi-cloudscale-node-sa + namespace: kube-system + +--- +# Source: csi-cloudscale/templates/storageclass.yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: cloudscale-volume-ssd + namespace: kube-system + annotations: + storageclass.kubernetes.io/is-default-class: "true" +provisioner: csi.cloudscale.ch +allowVolumeExpansion: true +reclaimPolicy: Delete +volumeBindingMode: Immediate +parameters: + csi.cloudscale.ch/volume-type: ssd +--- +# Source: csi-cloudscale/templates/storageclass.yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: cloudscale-volume-ssd-luks + namespace: kube-system +provisioner: csi.cloudscale.ch +allowVolumeExpansion: true +reclaimPolicy: Delete +volumeBindingMode: Immediate +parameters: + csi.cloudscale.ch/volume-type: ssd + csi.cloudscale.ch/luks-encrypted: "true" + csi.cloudscale.ch/luks-cipher: "aes-xts-plain64" + csi.cloudscale.ch/luks-key-size: "512" + csi.storage.k8s.io/node-stage-secret-namespace: ${pvc.namespace} + csi.storage.k8s.io/node-stage-secret-name: ${pvc.name}-luks-key +--- +# Source: csi-cloudscale/templates/storageclass.yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: cloudscale-volume-bulk + namespace: kube-system +provisioner: csi.cloudscale.ch +allowVolumeExpansion: true +reclaimPolicy: Delete +volumeBindingMode: Immediate +parameters: + csi.cloudscale.ch/volume-type: bulk +--- +# Source: csi-cloudscale/templates/storageclass.yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: cloudscale-volume-bulk-luks + namespace: kube-system +provisioner: csi.cloudscale.ch +allowVolumeExpansion: true +reclaimPolicy: Delete +volumeBindingMode: Immediate +parameters: + csi.cloudscale.ch/volume-type: bulk + csi.cloudscale.ch/luks-encrypted: "true" + csi.cloudscale.ch/luks-cipher: "aes-xts-plain64" + csi.cloudscale.ch/luks-key-size: "512" + csi.storage.k8s.io/node-stage-secret-namespace: ${pvc.namespace} + csi.storage.k8s.io/node-stage-secret-name: ${pvc.name}-luks-key + +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-provisioner-role +rules: + - apiGroups: [""] + resources: ["persistentvolumes"] + verbs: ["get", "list", "watch", "create", "patch", "delete"] + - apiGroups: [""] + resources: ["persistentvolumeclaims"] + verbs: ["get", "list", "watch", "update"] + - apiGroups: ["storage.k8s.io"] + resources: ["storageclasses"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["events"] + verbs: ["list", "watch", "create", "update", "patch"] + - apiGroups: ["snapshot.storage.k8s.io"] + resources: ["volumesnapshots"] + verbs: ["get", "list", "watch", "update"] + - apiGroups: ["snapshot.storage.k8s.io"] + resources: ["volumesnapshotcontents"] + verbs: ["get", "list"] + - apiGroups: [ "coordination.k8s.io" ] + resources: [ "leases" ] + verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ] + - apiGroups: [ "storage.k8s.io" ] + resources: [ "csinodes" ] + verbs: [ "get", "list", "watch" ] + - apiGroups: [ "" ] + resources: [ "nodes" ] + verbs: [ "get", "list", "watch" ] + - apiGroups: ["storage.k8s.io"] + resources: ["volumeattachments"] + verbs: ["get", "list", "watch"] +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-attacher-role +rules: + - apiGroups: [""] + resources: ["persistentvolumes"] + verbs: ["get", "list", "watch", "update", "patch"] + - apiGroups: ["storage.k8s.io"] + resources: ["csinodes"] + verbs: ["get", "list", "watch"] + - apiGroups: ["storage.k8s.io"] + resources: ["volumeattachments"] + verbs: ["get", "list", "watch", "update", "patch"] + - apiGroups: ["storage.k8s.io"] + resources: ["volumeattachments/status"] + verbs: ["patch"] +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-snapshotter-role +rules: + - apiGroups: ["snapshot.storage.k8s.io"] + resources: ["volumesnapshots"] + verbs: [ "get", "list", "watch", "update" ] + - apiGroups: ["snapshot.storage.k8s.io"] + resources: ["volumesnapshotcontents"] + verbs: [ "get", "list", "watch", "update", "patch" ] + - apiGroups: [ "snapshot.storage.k8s.io" ] + resources: [ "volumesnapshotcontents/status" ] + verbs: [ "update", "patch" ] + - apiGroups: [ "snapshot.storage.k8s.io" ] + resources: [ "volumesnapshotclasses" ] + verbs: [ "get", "list", "watch" ] + - apiGroups: [""] + resources: ["events"] + verbs: ["list", "watch", "create", "update", "patch"] +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-resizer-role +rules: + - apiGroups: [""] + resources: ["persistentvolumes"] + verbs: ["get", "list", "watch", "update", "patch"] + - apiGroups: [""] + resources: ["persistentvolumeclaims"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["persistentvolumeclaims/status"] + verbs: ["update", "patch"] + - apiGroups: [""] + resources: ["events"] + verbs: ["list", "watch", "create", "update", "patch"] + - apiGroups: ["storage.k8s.io"] + resources: ["volumeattributesclasses"] + verbs: ["get", "list", "watch"] +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-node-driver-registrar-role + namespace: kube-system +rules: + - apiGroups: [""] + resources: ["events"] + verbs: ["get", "list", "watch", "create", "update", "patch"] +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-provisioner-binding +subjects: + - kind: ServiceAccount + name: csi-cloudscale-controller-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: csi-cloudscale-provisioner-role + apiGroup: rbac.authorization.k8s.io +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-snapshotter-binding +subjects: + - kind: ServiceAccount + name: csi-cloudscale-controller-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: csi-cloudscale-snapshotter-role + apiGroup: rbac.authorization.k8s.io +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-resizer-binding +subjects: + - kind: ServiceAccount + name: csi-cloudscale-controller-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: csi-cloudscale-resizer-role + apiGroup: rbac.authorization.k8s.io +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-attacher-binding +subjects: + - kind: ServiceAccount + name: csi-cloudscale-controller-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: csi-cloudscale-attacher-role + apiGroup: rbac.authorization.k8s.io +--- +# Source: csi-cloudscale/templates/rbac.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: csi-cloudscale-node-driver-registrar-binding +subjects: + - kind: ServiceAccount + name: csi-cloudscale-node-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: csi-cloudscale-node-driver-registrar-role + apiGroup: rbac.authorization.k8s.io + +--- +# Source: csi-cloudscale/templates/daemonset.yaml +kind: DaemonSet +apiVersion: apps/v1 +metadata: + name: csi-cloudscale-node + namespace: kube-system +spec: + selector: + matchLabels: + app: csi-cloudscale-node + template: + metadata: + labels: + app: csi-cloudscale-node + role: csi-cloudscale + spec: + priorityClassName: system-node-critical + serviceAccountName: csi-cloudscale-node-sa + hostNetwork: true + containers: + - name: csi-node-driver-registrar + image: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.15.0" + imagePullPolicy: IfNotPresent + args: + - "--v=5" + - "--csi-address=$(ADDRESS)" + - "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)" + lifecycle: + preStop: + exec: + command: ["/bin/sh", "-c", "rm -rf /registration/csi.cloudscale.ch /registration/csi.cloudscale.ch-reg.sock"] + env: + - name: ADDRESS + value: /csi/csi.sock + - name: DRIVER_REG_SOCK_PATH + value: /var/lib/kubelet/plugins/csi.cloudscale.ch/csi.sock + - name: KUBE_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + volumeMounts: + - name: plugin-dir + mountPath: /csi/ + - name: registration-dir + mountPath: /registration/ + - name: csi-cloudscale-plugin + image: "quay.io/cloudscalech/cloudscale-csi-plugin:v4.0.1" + imagePullPolicy: IfNotPresent + args : + - "--endpoint=$(CSI_ENDPOINT)" + - "--url=$(CLOUDSCALE_API_URL)" + - "--log-level=info" + env: + - name: CSI_ENDPOINT + value: unix:///csi/csi.sock + - name: CLOUDSCALE_API_URL + value: https://api.cloudscale.ch/ + - name: CLOUDSCALE_MAX_CSI_VOLUMES_PER_NODE + value: "125" + - name: CLOUDSCALE_ACCESS_TOKEN + valueFrom: + secretKeyRef: + name: cloudscale + key: access-token + securityContext: + privileged: true + capabilities: + add: ["SYS_ADMIN"] + allowPrivilegeEscalation: true + volumeMounts: + - name: plugin-dir + mountPath: /csi + - name: pods-mount-dir + mountPath: /var/lib/kubelet + # needed so that any mounts setup inside this container are + # propagated back to the host machine. + mountPropagation: "Bidirectional" + - name: device-dir + mountPath: /dev + - name: tmpfs + mountPath: /tmp + volumes: + - name: registration-dir + hostPath: + path: /var/lib/kubelet/plugins_registry/ + type: DirectoryOrCreate + - name: plugin-dir + hostPath: + path: /var/lib/kubelet/plugins/csi.cloudscale.ch + type: DirectoryOrCreate + - name: pods-mount-dir + hostPath: + path: /var/lib/kubelet + type: Directory + - name: device-dir + hostPath: + path: /dev + # to make sure temporary stored luks keys never touch a disk + - name: tmpfs + emptyDir: + medium: Memory + +--- +# Source: csi-cloudscale/templates/statefulset.yaml +kind: StatefulSet +apiVersion: apps/v1 +metadata: + name: csi-cloudscale-controller + namespace: kube-system +spec: + serviceName: "csi-cloudscale" + selector: + matchLabels: + app: csi-cloudscale-controller + replicas: 1 + template: + metadata: + labels: + app: csi-cloudscale-controller + role: csi-cloudscale + spec: + hostNetwork: true + priorityClassName: system-cluster-critical + serviceAccount: csi-cloudscale-controller-sa + containers: + - name: csi-provisioner + image: "registry.k8s.io/sig-storage/csi-provisioner:v5.3.0" + imagePullPolicy: IfNotPresent + args: + - "--csi-address=$(ADDRESS)" + - "--default-fstype=ext4" + - "--v=5" + - "--feature-gates=Topology=false" + env: + - name: ADDRESS + value: /var/lib/csi/sockets/pluginproxy/csi.sock + volumeMounts: + - name: socket-dir + mountPath: /var/lib/csi/sockets/pluginproxy/ + - name: csi-attacher + image: "registry.k8s.io/sig-storage/csi-attacher:v4.10.0" + imagePullPolicy: IfNotPresent + args: + - "--csi-address=$(ADDRESS)" + - "--v=5" + env: + - name: ADDRESS + value: /var/lib/csi/sockets/pluginproxy/csi.sock + volumeMounts: + - name: socket-dir + mountPath: /var/lib/csi/sockets/pluginproxy/ + - name: csi-resizer + image: "registry.k8s.io/sig-storage/csi-resizer:v2.0.0" + args: + - "--csi-address=$(ADDRESS)" + - "--timeout=30s" + - "--v=5" + - "--handle-volume-inuse-error=false" + env: + - name: ADDRESS + value: /var/lib/csi/sockets/pluginproxy/csi.sock + imagePullPolicy: IfNotPresent + volumeMounts: + - name: socket-dir + mountPath: /var/lib/csi/sockets/pluginproxy/ + - name: csi-snapshotter + image: "registry.k8s.io/sig-storage/csi-snapshotter:v8.4.0" + args: + - "--csi-address=$(CSI_ENDPOINT)" + - "--v=5" + env: + - name: CSI_ENDPOINT + value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock + volumeMounts: + - name: socket-dir + mountPath: /var/lib/csi/sockets/pluginproxy/ + - name: csi-cloudscale-plugin + image: "quay.io/cloudscalech/cloudscale-csi-plugin:v4.0.1" + args : + - "--endpoint=$(CSI_ENDPOINT)" + - "--url=$(CLOUDSCALE_API_URL)" + - "--log-level=info" + env: + - name: CSI_ENDPOINT + value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock + - name: CLOUDSCALE_API_URL + value: https://api.cloudscale.ch/ + - name: CLOUDSCALE_ACCESS_TOKEN + valueFrom: + secretKeyRef: + name: cloudscale + key: access-token + imagePullPolicy: IfNotPresent + volumeMounts: + - name: socket-dir + mountPath: /var/lib/csi/sockets/pluginproxy/ + volumes: + - name: socket-dir + emptyDir: {} + +--- +# Source: csi-cloudscale/templates/csi_driver.yaml +apiVersion: storage.k8s.io/v1 +kind: CSIDriver +metadata: + name: csi.cloudscale.ch +spec: + attachRequired: true + podInfoOnMount: true + +--- +# Source: csi-cloudscale/templates/volumesnapshotclass.yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshotClass +metadata: + name: cloudscale-snapshots +driver: csi.cloudscale.ch +deletionPolicy: Delete