docs(networking): Cilium Gateway API — architecture, security, migration#509
docs(networking): Cilium Gateway API — architecture, security, migration#509
Conversation
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds documentation and platform configuration for a Cilium-based Gateway API: platform opt-in plus per-tenant Gateways, Envoy DaemonSet for TLS termination and passthrough, tenant-scoped load balancer IPs, cert-manager Issuers, validating admission policies, migration notes, and troubleshooting. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Platform as Platform Helm/Values
participant Kubernetes
participant Tenant as Tenant Namespace
participant CertManager
participant Envoy as Cilium Envoy DaemonSet
participant LBPool as CiliumLoadBalancerIPPool
User->>Platform: enable gateway.enabled
Platform->>Kubernetes: apply GatewayClass, admission policies, values
User->>Kubernetes: create tenant with spec.gateway: true
Kubernetes->>Tenant: render Gateway, HTTPRoute/TLSRoute, Issuer, Certificate
Tenant->>CertManager: request ACME certificate (prod/stage)
CertManager-->>Tenant: certificate issued
Kubernetes->>LBPool: allocate tenant load balancer IP
Envoy->>Kubernetes: attach/listen (HTTPS + TLS passthrough listeners)
User->>Envoy: client traffic (terminate or passthrough)
Envoy->>Tenant: route to services per HTTPRoute/TLSRoute
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive documentation for the Gateway API support in Cozystack, detailing its architecture, security model, and migration path from ingress-nginx. The review feedback identifies opportunities to improve technical accuracy and consistency, specifically by clarifying that namespace whitelisting applies to both HTTPRoute and TLSRoute resources and resolving a naming inconsistency for the Kubernetes API route.
| - The exposed-service templates (dashboard, keycloak) stop rendering their `Ingress` and start rendering their `HTTPRoute`. | ||
| - TLS-passthrough services (cozystack-api, vm-exportproxy, cdi-uploadproxy) stop rendering their `Ingress` and start rendering a `TLSRoute` attached to a dedicated Passthrough listener. | ||
|
|
||
| The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry. |
There was a problem hiding this comment.
The documentation mentions that attachedNamespaces restricts HTTPRoute attachments. However, the architecture also utilizes TLSRoute for services like the Kubernetes API and KubeVirt proxies (as shown in the routing tables). It would be more accurate to state that this list applies to both HTTPRoute and TLSRoute (or Gateway API routes in general).
| The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry. | |
| The attachedNamespaces list restricts which namespaces may attach HTTPRoutes and TLSRoutes to tenant Gateways through the listener allowedRoutes whitelist (see [Security](#security)). It is also guarded by a runtime ValidatingAdmissionPolicy that rejects any tenant-* entry. |
There was a problem hiding this comment.
Fixed in 1ea0093: the paragraph now says HTTPRoute or TLSRoute. The allowedRoutes whitelist on the listener is route-kind-agnostic, so in practice it restricts every route type that attaches to the Gateway — including the TLSRoutes used for the Kubernetes API, vm-exportproxy, and cdi-uploadproxy.
|
|
||
| | Service | Namespace | `TLSRoute` name | Backend | Listener | | ||
| |---|---|---|---|---| | ||
| | Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` | |
There was a problem hiding this comment.
There is an inconsistency in naming the Kubernetes API route. It is referred to as cozystack-api in the Mermaid diagram (line 27) and the migration section (line 265), but as kubernetes-api in this table. Using a consistent name throughout the document would improve clarity.
| | Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` | | |
| | Kubernetes API | default | cozystack-api | kubernetes:443 | tls-api | |
There was a problem hiding this comment.
Fixed in dcb805c by aligning the Mermaid diagram to the real resource name. The TLSRoute is literally named kubernetes-api (see packages/system/cozystack-api/templates/api-tlsroute.yaml), so the table at line 144 is the source of truth. cozystack-api in the diagram referred to the cozystack package that ships this route, which was misleading. The diagram now says kubernetes-api and the migration prose clarifies the relationship (cozystack-api (Kubernetes API)).
|
|
||
| Every listener on a tenant Gateway pins `allowedRoutes.namespaces.from: Selector` to a `matchExpressions` whitelist against the built-in `kubernetes.io/metadata.name` label. That label is written by kube-apiserver on every namespace and cannot be spoofed. | ||
|
|
||
| The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway. |
There was a problem hiding this comment.
Similar to the comment on line 101, this section should clarify that the whitelist applies to both HTTPRoute and TLSRoute, as both are used in the described architecture.
| The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway. | |
| The whitelist is the publishing tenant's namespace (always, implicit) plus publishing.gateway.attachedNamespaces. A namespace outside the list literally cannot attach any HTTPRoute or TLSRoute to the Gateway. |
There was a problem hiding this comment.
Fixed in e15d865: the Layer 1 description now explicitly says HTTPRoute or TLSRoute. Same root cause as the line 101 comment — the listener-level whitelist applies to every route kind attaching to that listener.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Line 56: The in-page anchor "#tls-passthrough" in the sentence "Plus one extra
listener per TLS-passthrough service (see [TLS passthrough](`#tls-passthrough`)
below)" doesn't match the actual heading ID; locate the "TLS passthrough"
section heading in this document and either rename that heading (or add an
explicit HTML anchor/id) to produce the ID tls-passthrough, or update the link
fragment to the existing heading ID (for example whatever the generated slug
is); ensure the link target and the heading ID for the TLS passthrough section
are identical so the anchor works.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1a003edc-54d7-4122-a90c-e40d9592e1c7
📒 Files selected for processing (1)
content/en/docs/next/networking/gateway-api.md
myasnikovdaniil
left a comment
There was a problem hiding this comment.
No explanation of publishing.exposure flag in platform package, needs to be added
|
@myasnikovdaniil Added a It covers what the flag does (ingress-nginx Service shape: ClusterIP+externalIPs vs LoadBalancer), why a Gateway API rollout is the natural moment to flip it (so ingress-nginx and the per-tenant Gateway draw from the same Cilium-managed pool), the KEP-5707 deprecation timeline that forces the move before Kubernetes v1.40, and the loadBalancer-mode caveats lifted from the platform values.yaml: non-empty |
|
Platform parameters must also land into platform package reference |
…r-tenant ingress Covers the architecture, the two-step opt-in (gateway.enabled at platform level, tenant.spec.gateway per tenant), per-service routing (HTTPRoute for termination, TLSRoute for passthrough), the four independent ValidatingAdmissionPolicies that guard cross-tenant hostname hijacking plus the listener allowedRoutes whitelist, the per-tenant cert-manager Issuer that enables isolated ACME state for child tenants, migration from ingress-nginx, rate-limit considerations, and operational troubleshooting. Weight 15 places the page between 'Architecture' (5) and 'HTTP Cache' (20) in the networking section sidebar. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…d TLSRoute Address review feedback from gemini-code-assist on content/en/docs/next/networking/gateway-api.md:101: the whitelist guards both HTTPRoute attachments (dashboard, keycloak, harbor, bucket) and TLSRoute attachments (Kubernetes API, vm-exportproxy, cdi-uploadproxy), not only HTTPRoute. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…TLSRoute name kubernetes-api Address review feedback from gemini-code-assist on content/en/docs/next/networking/gateway-api.md:144: the routing table listed the TLSRoute as kubernetes-api (the real resource name in the cozystack-api package, pointing at the kubernetes Service in the default namespace), but the Mermaid diagram labelled it cozystack-api. Update the diagram to match the actual resource name and add a parenthetical clarification in the migration section that the cozystack-api package ships the Kubernetes API TLSRoute. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…te and TLSRoute Address review feedback from gemini-code-assist on content/en/docs/next/networking/gateway-api.md:185: the Security section's Layer 1 description said the listener allowedRoutes whitelist blocks HTTPRoute attachments, but listener.allowedRoutes in Gateway API applies to every route kind attaching to that listener — HTTPRoute on the HTTPS listeners and TLSRoute on the tls-* Passthrough listeners. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…section Address review feedback from coderabbitai on content/en/docs/next/networking/gateway-api.md:56: the link fragment #tls-passthrough did not match the heading ID Hugo generates for 'TLSRoute (TLS passthrough)' (which slugifies to tlsroute-tls-passthrough), so the jump target was broken and markdownlint-cli2 flagged MD051. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…x Service modes Address review feedback from @myasnikovdaniil: the Migration section referenced `exposure: loadBalancer` in a YAML example without explaining what the flag does. Add a subsection covering both modes (externalIPs vs loadBalancer), the KEP-5707 deprecation timeline that motivates the flip, and the loadBalancer-mode caveats (non-empty externalIPs, externalTrafficPolicy: Local, no built-in Cilium announcement, brief ingress interruption on switch, scope limited to ingress-nginx). Signed-off-by: Aleksei Sviridkin <f@lex.la>
Layer 1 of the Security section called the whitelist publishing.gateway.attachedNamespaces. The actual platform values schema (packages/core/platform/values.yaml on chore/gateway-api-crds-v1.5.1) puts attachedNamespaces directly under the root gateway: key, and the helm consumer (packages/core/platform/templates/apps.yaml) reads .Values.gateway.attachedNamespaces. The publishing.gateway path appears in the upstream PR description, the extra/gateway README, and one helm-fail string, but it is not the real config path. Use gateway.attachedNamespaces here to match the schema authors will actually configure. Signed-off-by: Aleksei Sviridkin <f@lex.la>
…meters Address review feedback from @myasnikovdaniil: the platform parameters introduced by the Gateway API rollout (gateway.enabled, gateway.attachedNamespaces) and publishing.exposure were only described in the Gateway API guide. Add them to the Platform Package Reference, which is where operators look up platform values. - publishing.exposure: new row in the Publishing table with both modes, KEP-5707 deprecation pointer, and a cross-reference to the Gateway API page for the full caveat list. - New Gateway section between Authentication and Scheduling, mirroring the schema from packages/core/platform/values.yaml on chore/gateway-api-crds-v1.5.1: gateway.enabled and gateway.attachedNamespaces, with the default whitelist printed verbatim and a forward link to the Gateway API guide. Signed-off-by: Aleksei Sviridkin <f@lex.la>
10b1e7c to
6888f84
Compare
|
@myasnikovdaniil Done — added the platform parameters to the Platform Package Reference in 6888f84:
Schema was verified against The branch was rebased onto |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
content/en/docs/next/operations/configuration/platform-package.md (2)
108-108: Minor: Consider consistent spelling variant.Line 108 uses "Materialising" (British English). While both variants are correct, using consistent spelling throughout the documentation improves polish. Consider "Materializing" if the project prefers American English, or keep the current form if British English is the standard.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@content/en/docs/next/operations/configuration/platform-package.md` at line 108, The documentation uses the British English spelling "Materialising" in the description for `gateway.enabled`; update that word to the project's chosen variant (e.g., change "Materialising" to "Materializing") for consistency with the rest of the docs—edit the text in the `gateway.enabled` description to the preferred spelling.
66-66: Consider breaking up the dense table description for better scannability.The
publishing.exposuredescription packs mode definitions, deprecation timeline, validation behavior, and a caveat link into a single paragraph. Users scanning the table may miss the critical deprecation warning or the fail-fast validation note.♻️ Suggested restructure for improved readability
-| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707][kep-5707]) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. | +| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service.<br/><br/>`externalIPs`: Creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`.<br/><br/>`loadBalancer`: Creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` using the same addresses.<br/><br/>**Deprecation notice:** `Service.spec.externalIPs` is deprecated in Kubernetes v1.36 ([KEP-5707][kep-5707]). Switch to `loadBalancer` before upgrading to v1.40.<br/><br/>**Validation:** The chart returns an error if `loadBalancer` is set with an empty `publishing.externalIPs`.<br/><br/>See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for additional caveats. |This uses
<br/>tags (permitted byunsafe: trueGoldmark config) to create visual breaks within the table cell, making each concept easier to locate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@content/en/docs/next/operations/configuration/platform-package.md` at line 66, The table cell for publishing.exposure is too dense; split its single paragraph into separate sentences or lines (using permitted <br/> tags) that each cover: the two modes and what they do (externalIPs vs loadBalancer and that loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before v1.40, the validation/fail-fast behavior when loadBalancer is set but publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx Service mode, so readers can scan and find the deprecation and validation notes quickly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Line 108: The documentation uses the British English spelling "Materialising"
in the description for `gateway.enabled`; update that word to the project's
chosen variant (e.g., change "Materialising" to "Materializing") for consistency
with the rest of the docs—edit the text in the `gateway.enabled` description to
the preferred spelling.
- Line 66: The table cell for publishing.exposure is too dense; split its single
paragraph into separate sentences or lines (using permitted <br/> tags) that
each cover: the two modes and what they do (externalIPs vs loadBalancer and that
loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of
Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before
v1.40, the validation/fail-fast behavior when loadBalancer is set but
publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx
Service mode, so readers can scan and find the deprecation and validation notes
quickly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 144efd7e-3b34-413f-890f-38c874ba5c84
📒 Files selected for processing (2)
content/en/docs/next/networking/gateway-api.mdcontent/en/docs/next/operations/configuration/platform-package.md
🚧 Files skipped from review as they are similar to previous changes (1)
- content/en/docs/next/networking/gateway-api.md
What this PR does
Adds a new
networking/gateway-api.mdpage to thenext/docs trunk describing the Cilium-backed Gateway API feature that lands in cozystack/cozystack#2470 (and its dependency stack on #2464 / #2468).The page is intentionally detailed because the feature introduces:
gateway.enabled) and a new tenant-level toggle (tenant.spec.gateway);Issuerthat gives every tenant an isolated ACME account, so child tenants no longer share HTTP-01 state with the parent;cozystack-gateway-hostname-policy,cozystack-tenant-host-policy,cozystack-namespace-host-label-policy,cozystack-gateway-attached-namespaces-policy) plus the listenerallowedRoutesnamespace whitelist;publishing.gateway.attachedNamespaceswith tenant namespaces.Sections:
attachedNamespaceslist.Issuer, supported ACME servers, Let's Encrypt rate limits and mitigations.v1alpha2,tenant.spec.hostadmin responsibility, upstream application gaps.kubectlcommands for the four most likely "stuck" states.Target branch
next/— the version-agnostic trunk. When cozystack/cozystack#2470 lands in a minor release, this page ships with that version's docs automatically.Not included
The legacy
v1/networking/gateway-api.mdpage on the abandoneddocs/gateway-apibranch (from the Envoy Gateway proposal in cozystack/cozystack#2213) is unrelated to this PR. That PR proposed a different architecture that has since been superseded. This PR ships fresh docs for the new Cilium-based design.Release note
Summary by CodeRabbit