You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci(docker): use prebuilt Rust binaries by default (#1027)
* ci(docker): use prebuilt Rust binaries by default
Flip Docker image builds to consume staged native Rust artifacts, remove in-Docker Rust build stages, and publish per-arch images with a manifest merge.
Add local staging support for prebuilt gateway and sandbox binaries so development image builds continue to work without CI artifacts.
Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
* ci(docker): address prebuilt build review feedback
* ci(rust): allow existing vfio complexity
* ci(rust): pin toolchain to 1.95
---------
Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
Copy file name to clipboardExpand all lines: .agents/skills/debug-openshell-cluster/SKILL.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -184,7 +184,7 @@ Component images (server, sandbox) can reach kubelet via two paths:
184
184
185
185
**Local/external pull mode** (default local via `mise run cluster`): Local images are tagged to the configured local registry base (default `127.0.0.1:5000/openshell/*`), pushed to that registry, and pulled by k3s via `registries.yaml` mirror endpoint (typically `host.docker.internal:5000`). The `cluster` task pushes prebuilt local tags (`openshell/*:dev`, falling back to `localhost:5000/openshell/*:dev` or `127.0.0.1:5000/openshell/*:dev`).
186
186
187
-
Gateway image builds now stage a partial Rust workspace from `deploy/docker/Dockerfile.images`. If cargo fails with a missing manifest under `/build/crates/...`, or an imported symbol exists locally but is missing in the image build, verify that every current gateway dependency crate (including `openshell-driver-docker`, `openshell-driver-kubernetes`, and `openshell-ocsf`) is copied into the staged workspace there.
187
+
Gateway and cluster image builds consume Rust binaries staged at `deploy/docker/.build/prebuilt-binaries/<arch>/`. In CI these come from the reusable Rust native build workflow; locally `tasks/scripts/docker-build-image.sh` runs `tasks/scripts/stage-prebuilt-binaries.sh` before invoking Docker unless `PREBUILT_AUTO_STAGE=0` is set.
188
188
189
189
```bash
190
190
# Verify image refs currently used by openshell deployment
@@ -368,7 +368,7 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
368
368
|`metrics-server` errors in logs | Normal k3s noise, not the root cause | These errors are benign — look for the actual failing health check component |
369
369
| Stale NotReady nodes from previous deploys | Volume reused across container recreations | The deploy flow now auto-cleans stale nodes; if it still fails, manually delete NotReady nodes (see Step 2) or choose "Recreate" when prompted |
370
370
| gRPC `UNIMPLEMENTED` for newer RPCs in push mode | Helm values still point at older pulled images instead of the pushed refs | Verify rendered `openshell-helmchart.yaml` uses the expected push refs (`server`, `sandbox`, `pki-job`) and not `:latest`|
371
-
| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory`| Supervisor binary missing from cluster image | The cluster image was built/published without the `supervisor-builder` target in `deploy/docker/Dockerfile.images`. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker |
371
+
| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory`| Supervisor binary missing from cluster image | The cluster image was built/published without a staged `openshell-sandbox` prebuilt binary. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker |
372
372
|`HEALTHCHECK_MISSING_SUPERVISOR` in health check logs |`/opt/openshell/bin/openshell-sandbox` not found in gateway container | Rebuild cluster image: `mise run docker:build:cluster`, then `openshell gateway destroy <name> && openshell gateway start`|
373
373
|`nvidia-ctk cdi list` returns no `k8s.device-plugin.nvidia.com/gpu=` entries | CDI specs not yet generated by device plugin | Device plugin may still be starting; wait and retry, or check pod logs (Step 8) |
0 commit comments