A simple kube-proxy addon for 1:1 NAT services in Kubernetes using an NFT backend.
This project ensures a one-to-one mapping between a service and a pod in Kubernetes.
At Cozystack, we strive to follow the standard Kubernetes network architecture by separating the pod network, service networks, and external load balancers. However, our platform also runs virtual machines that sometimes require an external IP address.
There are several ways to achieve this:
- Using a separate Kube-OVN subnet and exposing it via BGP with kube-ovn-speaker.
- Adding a secondary interface with Multus.
- Using native Kubernetes services with externalIPs and exposing them via MetalLB.
The last option is the simplest and most flexible, but it has a limitation: Kubernetes services do not forward all traffic, but only traffic on specific ports (see: Kubernetes Issue #23864). Additionally, kube-proxy does not perform SNAT, which causes outgoing traffic from the pod to use the default gateway of the host where it is running.
To address these issues, we have added an additional controller that performs NAT for services carrying the service.kubernetes.io/service-proxy-name: cozy-proxy label.
cozy-proxy is a simple Kubernetes controller that watches for services labeled with service.kubernetes.io/service-proxy-name: cozy-proxy — the standard Kubernetes mechanism for delegating a service to a non-default proxy. kube-proxy skips services carrying this label, so cozy-proxy becomes the sole handler and no rules collide.
When it finds such a service, it creates NFT rules that forward traffic from the service's external IP to the pod's IP and vice versa, performing source-IP preservation for egress traffic.
This controller can be used together with kube-proxy and Cilium in kube-proxy replacement mode.
The label is the only selector. Services that do not carry it are not managed by cozy-proxy, regardless of any annotations they may have.
The optional networking.cozystack.io/wholeIP annotation selects the ingress mode for a managed service:
| Value | Behavior |
|---|---|
"true" |
Whole-IP passthrough. All TCP/UDP traffic to the LoadBalancer IP is forwarded to the backend pod. |
"false" |
Per-port filtering (default). Only TCP/UDP traffic to ports listed in Service.spec.ports is forwarded; rest dropped. |
| absent | Same as "false" — per-port filtering. |
In both modes, egress traffic from the backend pod is SNATed to the LoadBalancer IP for source-IP preservation.
The optional networking.cozystack.io/allowICMP: "true" annotation, only
meaningful in port-filter mode (wholeIP: "false"), accepts ICMP traffic
toward the backend pod IP that would otherwise be dropped by the per-port
filter. Without it, all ICMP to a port-filtered pod is dropped — which also
blocks ping, PMTU discovery (ICMP "fragmentation needed"), and ICMP
unreachable signalling. Recommended for any service where path-MTU mismatches
or observability matter.
The nftables ruleset placed in table ip cozy_proxy consists of:
- Chain
egress_snatat priorityraw(-300): rewrites packet source IP via thepod_svcmap for outbound traffic from managed pods. Runs before conntrack so the recorded tuple hassaddr=LB_IP. - Chain
ingress_dnatat prioritymangle(-150): rewrites packet destination IP via thesvc_podmap for inbound traffic to a LoadBalancer IP. Runs after conntrack so reply packets of egress flows are matched correctly. - Chain
port_filterat priorityfilter(0): for Services in port-filter mode (wholeIP: "false"), drops ingress packets whose(daddr, l4proto, dport)is not inallowed_ports. The chain accepts packets in conntrack statesestablishedorrelatedfirst, so reply packets of egress flows bypass the filter even when their dport is the VM's ephemeral source port. ICMP is dropped by default; if theallowICMP: "true"annotation is set, the pod IP is added toicmp_allowed_podsand ICMP toward it is accepted before the drop rule.
Install controller using Helm-chart:
helm install cozy-proxy charts/cozy-proxy -n kube-systemCreate a LoadBalancer service with the service.kubernetes.io/service-proxy-name: cozy-proxy label. The default mode is per-port filtering, so to forward every port to the backend pod (whole-IP passthrough) add the networking.cozystack.io/wholeIP: "true" annotation:
apiVersion: v1
kind: Service
metadata:
name: example-service
labels:
service.kubernetes.io/service-proxy-name: cozy-proxy
annotations:
networking.cozystack.io/wholeIP: "true"
spec:
allocateLoadBalancerNodePorts: false
externalTrafficPolicy: Local
ports:
- port: 65535 # any
selector:
app: nginx
type: LoadBalancer
---
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: docker.io/library/nginx:alpineCheck that the service has an external IP:
kubectl get svcExample output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-service LoadBalancer 10.96.195.46 1.2.3.4 65535/TCP 84sNow try to access the service using icmp and tcp; both should work:
ping 1.2.3.4
curl 1.2.3.4Check external IP from inside the pod:
kubectl exec -ti nginx -- curl icanhazip.comExample output would be the same as the service external IP:
1.2.3.4This controller was developed primarily for the Cozystack platform and has been tested in the following environment:
- OS: Talos Linux
- CNI: Kube-OVN with Cilium in chaining mode.
- Kube-proxy: Cilium in kube-proxy replacement mode.
- LoadBalancer: MetalLB in L2 mode with
externalTrafficPolicy: Local.
If you have tested it in other environments, please let us know.
- @kvaps – for the implementation.
- @hexchain – for the Stateless NAT with NFTables snippet.
- @danwinship – for the idea regarding the annotation.