Skip to content

test(e2e): add Cilium-CNI e2e suite#10

Closed
kvaps wants to merge 1 commit intofeat/cross-mesh-granularityfrom
feat/cilium-e2e-tests
Closed

test(e2e): add Cilium-CNI e2e suite#10
kvaps wants to merge 1 commit intofeat/cross-mesh-granularityfrom
feat/cilium-e2e-tests

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps kvaps commented Apr 28, 2026

Summary

Adds an e2e suite that runs Kilo with --compatibility=cilium on top of
a kind cluster where Cilium is the CNI. The existing e2e suites only
cover the Kilo bridge CNI path, so the --compatibility=cilium mode has
no end-to-end coverage in upstream CI today.

This is a stacked PR: the diff against feat/cross-mesh-granularity
shows only the Cilium-specific changes. The branch is still based on
top of the cross-granularity work that we'll send upstream as squat/kilo
PR squat#490 once it's green.

What's added

  • e2e/kilo-kind-cilium.yaml — Kilo DaemonSet for the Cilium-CNI
    cluster: no kilo CNI ConfigMap, no install-cni init container,
    --cni=false --compatibility=cilium
  • e2e/lib.shinstall_cilium() (Helm, Cilium 1.16.5, VXLAN
    tunnelProtocol, ipam.mode=kubernetes, host firewall off) and
    create_cilium_cluster() mirroring create_cluster()
  • e2e/cilium-setup.sh, cilium-cross-mesh.sh, cilium-teardown.sh
    — three-stage suite mirroring the bridge setup/...mesh/teardown
    pattern
  • Makefile — new e2e-cilium target, kept separate from e2e
    because the Cilium cluster is incompatible with Kilo's bridge CNI
    used by the default suite

Scope

Baseline coverage is intentionally narrow:

  • Only cross granularity is exercised; full and location with
    Cilium CNI are reasonable follow-ups in the same harness.
  • kubeProxyReplacement is off, so this checks Kilo's Cilium-overlay
    handling without entangling Cilium's eBPF service LB. KPR coverage
    can be added via a values flag in a follow-up.

Validation note

helm must be available on the runner. The author couldn't run
make e2e-cilium locally (Docker Desktop on macOS), so behaviour will
first be observed in CI; tuning (Cilium version pin, MTU, timeouts)
may be needed once it runs.

Adds a new e2e harness that brings up a kind cluster with Cilium as
the CNI (VXLAN overlay, default kube-proxy) and runs Kilo on top
with --cni=false --compatibility=cilium. This validates the cross
granularity + Cilium combination, which is the configuration shipped
by downstream platforms but not exercised by the existing bridge-CNI
suite.

Files:
- e2e/kilo-kind-cilium.yaml: Kilo DaemonSet for the Cilium-CNI
  cluster (no kilo CNI ConfigMap, no install-cni init container,
  --cni=false, --compatibility=cilium)
- e2e/lib.sh: install_cilium() helper (Helm, Cilium 1.16.5, VXLAN
  tunnelProtocol, IPAM kubernetes, host firewall off) and
  create_cilium_cluster()
- e2e/cilium-setup.sh, cilium-cross-mesh.sh, cilium-teardown.sh:
  three-stage suite mirroring the existing setup/...mesh/teardown
  pattern
- Makefile: new `e2e-cilium` target, kept separate from `e2e`
  because the Cilium cluster is incompatible with the Kilo bridge
  CNI used by the default suite

Kube-proxy replacement is intentionally left at the default (off)
for this baseline; KPR coverage can be added in a follow-up.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f3c8653-db44-42a0-ae31-dca6bbae2d40

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cilium-e2e-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kvaps
Copy link
Copy Markdown
Member Author

kvaps commented Apr 28, 2026

Reverting — keeping the upstream PR squat#491 as the canonical one.

@kvaps kvaps closed this Apr 28, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an end-to-end test suite for Kilo's Cilium compatibility mode, including new test scripts, a dedicated Kilo manifest, and helper functions for cluster provisioning. Feedback highlights opportunities to improve test stability by replacing a hardcoded sleep with polling, fixing a namespace typo in the setup logic, and pinning external manifest URLs to specific versions.

Comment thread e2e/cilium-cross-mesh.sh
assert "retry 30 5 '' check_ping" "should be able to ping all Pods over Cilium VXLAN + Kilo cross mesh"
assert "retry 10 5 'the adjacency matrix is not complete yet' check_adjacent 3" "adjacency should return the right number of successful pings"
echo "sleep for 30s (one reconciliation period) and try again..."
sleep 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a hardcoded sleep in end-to-end tests is generally discouraged as it can lead to flaky tests (if the reconciliation takes longer than 30s) or unnecessarily slow tests (if it takes much less). It is better to poll for a specific state or condition that indicates the reconciliation has completed, such as checking for a specific log message or a change in the node/peer status.

Comment thread e2e/lib.sh
install_cilium
block_until_ready kube-system k8s-app=cilium
_kubectl wait nodes --all --for=condition=Ready --timeout=120s
block_until_ready kube_system k8s-app=kube-dns
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The namespace kube_system contains an underscore, which is likely a typo for the standard Kubernetes namespace kube-system. This inconsistency with line 177 will cause the block_until_ready call to look in a non-existent namespace, which may lead to a false positive or a timeout depending on how is_ready handles missing namespaces.

Suggested change
block_until_ready kube_system k8s-app=kube-dns
block_until_ready kube-system k8s-app=kube-dns

Comment thread e2e/lib.sh
_kubectl apply -f helper-curl.yaml
block_until_ready_by_name default curl || return 1
_kubectl taint node $KIND_CLUSTER-control-plane node-role.kubernetes.io/control-plane:NoSchedule-
_kubectl apply -f https://raw.githubusercontent.com/kilo-io/adjacency/main/example.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Relying on a raw URL from the main branch of an external repository for test manifests can introduce fragility. If the external file is modified or moved, the e2e suite may break unexpectedly. It is recommended to use a specific commit hash in the URL or to vendor the manifest locally within the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant