-
Notifications
You must be signed in to change notification settings - Fork 124
docs: add "Route to a Kubernetes service with HA" how-to #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SunsetDrifter
wants to merge
14
commits into
main
Choose a base branch
from
cc/k8s-ha-routing-peers
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
092dc06
docs: add Highly Available Routing Peers use-case page (Kubernetes op…
SunsetDrifter 29283d8
docs: add topology diagrams to HA routing peers page
SunsetDrifter fe43dda
docs: correct HA scheduling framing; drop single-node diagram
SunsetDrifter 3c4beb2
docs: add Friendly DNS names appendix to HA routing peers page
SunsetDrifter fa90894
docs: use ScheduleAnyway in spread example; note DoNotSchedule rollou…
SunsetDrifter cb7d26e
docs: clarify custom-zone records are per-name (no whole-domain shado…
SunsetDrifter 182aab2
docs: expand into full 'Route to a Kubernetes service' how-to
SunsetDrifter ba2dd4d
docs: add dashboard/terminal screenshots to the K8s how-to
SunsetDrifter 18e1958
docs: swap in cleaner pods-across-nodes screenshot for Step 5
SunsetDrifter 1120be9
docs: make node-spread central to the HA guide
SunsetDrifter 87d8b56
docs: clarify the custom zone is created empty (operator fills the re…
SunsetDrifter 2d8b12a
docs: replace Excalidraw topology with a custom dark-mode SVG
SunsetDrifter cce9e07
docs: add CNAME dialog screenshot to the friendly-DNS appendix
SunsetDrifter 020d952
docs: drop maxSurge:0 workaround (not configurable via the operator)
SunsetDrifter File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+135 KB
...integrations/kubernetes/use-cases/route-to-a-kubernetes-service/01-dns-zone.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+186 KB
...rations/kubernetes/use-cases/route-to-a-kubernetes-service/02-access-policy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+173 KB
.../integrations/kubernetes/use-cases/route-to-a-kubernetes-service/03-network.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+82.8 KB
...ons/kubernetes/use-cases/route-to-a-kubernetes-service/04-pods-across-nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+122 KB
...tions/kubernetes/use-cases/route-to-a-kubernetes-service/friendly-dns-cname.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions
60
...ge/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/topology.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
223 changes: 223 additions & 0 deletions
223
...ages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| import { Note } from '@/components/mdx' | ||
|
|
||
| # Route to a Kubernetes service with high availability | ||
|
|
||
| This guide walks the whole journey: create the NetBird-side pieces the operator doesn't make (a custom DNS zone, groups, an access policy), deploy a redundant pool of routing peers, expose an in-cluster Service as a NetBird resource, and reach it by name from a NetBird client. Because the routing peers run as a high-availability pool, access keeps working when a routing-peer pod or a node fails. | ||
|
|
||
| ## What you'll achieve | ||
|
|
||
| A NetBird client (for example, your laptop) reaches a private Kubernetes `ClusterIP` Service by a stable DNS name, with traffic flowing through a pool of routing-peer pods spread across your nodes. Lose a pod or a node and clients fail over automatically to a healthy peer. | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/topology.svg" alt="A NetBird client reaching a Kubernetes service through three routing-peer pods, one per node" className="imagewrapper-big"/> | ||
| </p> | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A Kubernetes cluster (multiple nodes recommended, so routing peers can spread across them). | ||
| - The NetBird operator installed — see [Getting Started](/manage/integrations/kubernetes). | ||
| - A NetBird account and a [Personal Access Token](/manage/public-api#creating-a-service-user). | ||
| - A NetBird client (the device that will reach the service) enrolled in your account. | ||
|
|
||
| In this guide the example objects are named `k8s.company.internal` (DNS zone), `kubernetes-clients` / `kubernetes-services` (groups), `kubernetes` (the network), and `nginx` (the Service). Substitute your own. | ||
|
|
||
| ## Step 1: Create a custom DNS zone | ||
|
|
||
| The operator publishes each exposed Service as a DNS record inside a custom zone, so clients reach it by name instead of by its (ephemeral) ClusterIP. The zone must exist **before** you deploy the routing peers. | ||
|
|
||
| In the dashboard, go to **DNS > Zones > Add Zone**: | ||
|
|
||
| - **Name**: `k8s.company.internal` | ||
| - **Distribution Groups**: `kubernetes-clients` — only peers in these groups can resolve the zone's records. | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/01-dns-zone.png" alt="The k8s.company.internal custom DNS zone, distributed to the kubernetes-clients group, with the operator-created nginx record" className="imagewrapper"/> | ||
| </p> | ||
|
|
||
| <Note> | ||
| Create only the zone — **leave its records empty**. You don't enter a hostname, IP, or TTL here. When you expose a Service in [Step 4](#step-4-expose-your-service), the operator automatically adds the `A` record (named `<service>.<namespace>.<zone>`, e.g. `nginx.default.k8s.company.internal`, pointing at the Service's ClusterIP, with a 5-minute TTL) — that's the record shown above. | ||
| </Note> | ||
|
|
||
| See [Custom Zones](/manage/dns/custom-zones) for details. | ||
|
|
||
| ## Step 2: Create groups and an access policy | ||
|
|
||
| NetBird is deny-by-default: nothing is reachable until a policy allows it, and the operator does **not** create groups or policies for you. Set up two groups and one policy under **Access Control**. | ||
|
|
||
| Create the groups via **Access Control > Groups**: | ||
|
|
||
| - `kubernetes-clients` — the peers that should reach your services (put your client device in it). | ||
| - `kubernetes-services` — the destination group the exposed Services will be placed in. | ||
|
|
||
| Then create a policy via **Access Control > Policies > Add policy**: | ||
|
|
||
| - **Name**: `kubernetes-access` | ||
| - **Source**: `kubernetes-clients` | ||
| - **Destination**: `kubernetes-services` | ||
| - **Protocol/Ports**: `TCP` `80` (match your Service's port) | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/02-access-policy.png" alt="The kubernetes-access policy allowing kubernetes-clients to reach kubernetes-services over TCP 80" className="imagewrapper"/> | ||
| </p> | ||
|
|
||
| See [Manage network access](/manage/access-control/manage-network-access). | ||
|
|
||
| <Note> | ||
| Create the group **and** the policy before the `NetworkResource` in Step 4. Until both exist, traffic is denied even though DNS resolves. | ||
| </Note> | ||
|
|
||
| ## Step 3: Deploy the routing peers (HA) | ||
|
|
||
| A `NetworkRouter` creates a NetBird network and deploys routing-peer pods. Set `spec.workloadOverride.replicas` to run a redundant pool: | ||
|
|
||
| ```yaml | ||
| apiVersion: netbird.io/v1alpha1 | ||
| kind: NetworkRouter | ||
| metadata: | ||
| name: kubernetes | ||
| namespace: netbird | ||
| spec: | ||
| dnsZoneRef: | ||
| name: k8s.company.internal | ||
| workloadOverride: | ||
| replicas: 3 | ||
| ``` | ||
|
|
||
| ```shell | ||
| kubectl apply -f networkrouter.yaml | ||
| ``` | ||
|
|
||
| The operator registers all replicas in a single routing-peer group at one metric, so each client connects through its lowest-latency peer and fails over automatically if that peer becomes unreachable (the equal-metric behavior in [How Routing Peers Work — High availability](/manage/networks/how-routing-peers-work#high-availability)). When `replicas > 1`, it also creates a **PodDisruptionBudget** with `maxUnavailable: 1`, so node drains and rolling updates never take down more than one routing peer at a time. | ||
|
|
||
| <Note> | ||
| The operator already defaults to **3** replicas — set the field explicitly to be intentional, or raise it for more redundancy. See the [Routing Peer](/manage/integrations/kubernetes/routing-peer) page for the full `NetworkRouter` reference. | ||
| </Note> | ||
|
|
||
| On a multi-node cluster, Kubernetes spreads these replicas across nodes by default — so you already have node-level high availability: a node failure takes out only one routing peer and clients fail over to the rest. | ||
|
|
||
| To **guarantee** that spread, or to spread across availability zones, add a constraint through `spec.workloadOverride.podTemplate`: | ||
|
|
||
| ```yaml | ||
| spec: | ||
| workloadOverride: | ||
| replicas: 3 | ||
| podTemplate: | ||
| spec: | ||
| topologySpreadConstraints: | ||
| - maxSkew: 1 | ||
| topologyKey: kubernetes.io/hostname # or topology.kubernetes.io/zone | ||
| whenUnsatisfiable: ScheduleAnyway | ||
| labelSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/instance: kubernetes | ||
| ``` | ||
|
|
||
| <Note> | ||
| `ScheduleAnyway` is a strong preference that never blocks scheduling. For a hard guarantee use `whenUnsatisfiable: DoNotSchedule` — but keep **more schedulable nodes than replicas**, or a rolling update's surge pod has nowhere to land that satisfies the constraint and the rollout stalls. | ||
| </Note> | ||
|
SunsetDrifter marked this conversation as resolved.
|
||
|
|
||
| ## Step 4: Expose your Service | ||
|
|
||
| A `NetworkResource` maps a Kubernetes `ClusterIP` Service to a NetBird resource and creates a DNS record for it in the router's zone. Place it in the `kubernetes-services` group from Step 2: | ||
|
|
||
| ```yaml | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: nginx | ||
| namespace: default | ||
| labels: { app: nginx } | ||
| spec: | ||
| replicas: 1 | ||
| selector: { matchLabels: { app: nginx } } | ||
| template: | ||
| metadata: { labels: { app: nginx } } | ||
| spec: | ||
| containers: | ||
| - name: nginx | ||
| image: nginx:stable | ||
| ports: [{ containerPort: 80 }] | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: nginx | ||
| namespace: default | ||
| spec: | ||
| type: ClusterIP | ||
| selector: { app: nginx } | ||
| ports: | ||
| - { name: http, port: 80, targetPort: 80, protocol: TCP } | ||
| --- | ||
| apiVersion: netbird.io/v1alpha1 | ||
| kind: NetworkResource | ||
| metadata: | ||
| name: nginx | ||
| namespace: default | ||
| spec: | ||
| networkRouterRef: | ||
| name: kubernetes | ||
| namespace: netbird | ||
| serviceRef: | ||
| name: nginx | ||
| groups: | ||
| - name: kubernetes-services | ||
| ``` | ||
|
|
||
| The Service must be type `ClusterIP`. The operator creates the record `nginx.default.k8s.company.internal` (`<service>.<namespace>.<zone>`) pointing at the Service's ClusterIP. The `kubernetes` network now shows its routing peers and the resource: | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/03-network.png" alt="The kubernetes network with its routing peers and the nginx resource" className="imagewrapper"/> | ||
| </p> | ||
|
|
||
| ## Step 5: Verify and test failover | ||
|
|
||
| Confirm the routing-peer pods are running, spread across nodes, and protected by a PodDisruptionBudget: | ||
|
|
||
| ```shell | ||
| kubectl -n netbird get pods -l app.kubernetes.io/name=networkrouter -o wide | ||
| kubectl -n netbird get pdb | ||
| ``` | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/04-pods-across-nodes.png" alt="kubectl get pods -o wide showing the routing peers on different nodes" className="imagewrapper"/> | ||
| </p> | ||
|
|
||
| From a NetBird client in `kubernetes-clients`, resolve and reach the service: | ||
|
|
||
| ```shell | ||
| curl http://nginx.default.k8s.company.internal/ | ||
| ``` | ||
|
|
||
| Then delete one routing-peer pod (or drain its node) while curling in a loop — the connection keeps working as another peer takes over, and the Deployment reschedules the missing pod: | ||
|
|
||
| ```shell | ||
| kubectl -n netbird delete pod <routing-peer-pod> | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| - [Routing Peer](/manage/integrations/kubernetes/routing-peer) — the `NetworkRouter` / `NetworkResource` reference. | ||
| - [How Routing Peers Work](/manage/networks/how-routing-peers-work) — failover, metrics, and access control. | ||
|
|
||
| ## Appendix: Friendly DNS names | ||
|
|
||
| Each `NetworkResource` is published at `<service>.<namespace>.<zone>` — here, `nginx.default.k8s.company.internal`. The operator always uses this form and it can't be customized on the `NetworkResource`. | ||
|
|
||
| To expose a service under a cleaner name, add a **CNAME** in a [custom DNS zone](/manage/dns/custom-zones) pointing at the operator's record: | ||
|
|
||
| ```text | ||
| app.k8s.company.internal CNAME nginx.default.k8s.company.internal | ||
| ``` | ||
|
|
||
| In the dashboard, that's **DNS > Zones >** the zone's **Add** button: a `CNAME` record with hostname `app` and the operator's record as the target. | ||
|
|
||
| <p> | ||
| <img src="/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/friendly-dns-cname.png" alt="Adding a CNAME record with hostname app targeting nginx.default.k8s.company.internal in the k8s.company.internal zone" className="imagewrapper"/> | ||
| </p> | ||
|
|
||
| Because it targets the operator-managed record, the alias keeps resolving if the service's ClusterIP changes. A static `A` record straight to the ClusterIP also works, but it goes stale when the ClusterIP changes — prefer the CNAME. | ||
|
|
||
| <Note> | ||
| The friendly name is only a DNS alias — traffic still routes through the `NetworkResource`, so keep it in place. NetBird serves only the specific records you add to a zone; other names under the same domain keep resolving through your existing DNS, so reusing a real internal domain is safe — just avoid a name that already exists in your corporate DNS. These manual records are not managed by the operator, so you maintain them yourself. | ||
| </Note> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Topology spread is already set by default so no need to do.