Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ endif

LINUX_VERSION ?= $(shell uname -r)
KDIR ?= /lib/modules/$(LINUX_VERSION)/build
#CC = gcc-8
CC = gcc

LINUX_SRC_DIR ?= ../net-next

Expand Down Expand Up @@ -106,3 +108,6 @@ printClean-%:
$(MAKE) -C $(KDIR) M=$(shell pwd) $@

endif

# Prevents warnings related to the __init annotation for homa_load.
CFLAGS_homa_plumbing.o += -Wno-missing-attributes
109 changes: 109 additions & 0 deletions UDP_HIJACK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# UDP Hijacking for Homa

## Overview

UDP hijacking is an optional mechanism that encapsulates Homa packets as UDP
datagrams, using `IPPROTO_UDP` instead of `IPPROTO_HOMA` as the IP protocol.
It works alongside the existing TCP hijacking feature — only one can be active
at a time on a given socket.

### Why UDP hijacking?

TCP hijacking uses `SYN+RST` flag combinations that never occur in real TCP
traffic. However, some firewalls (particularly on virtualized environments)
inspect TCP flags and drop packets with these "impossible" flag combinations.
UDP hijacking avoids this issue entirely since UDP has no flags for firewalls
to inspect.

### Trade-offs vs TCP hijacking

| Feature | TCP hijacking | UDP hijacking |
|---------------------|------------------------|------------------------|
| NIC TSO support | Yes (multi-segment) | No (single-segment) |
| Firewall friendly | No (SYN+RST blocked) | Yes |
| GSO segments/packet | Multiple | 1 (`segs_per_gso = 1`) |
| IP protocol | `IPPROTO_TCP` | `IPPROTO_UDP` |
| sysctl | `hijack_tcp` | `hijack_udp` |

Because NICs do not perform TSO on UDP packets the same way they do for TCP,
UDP hijacking forces `segs_per_gso = 1` (one segment per GSO packet). This
means each Homa data packet is sent individually rather than being batched
into large TSO super-packets.

## Configuration

Enable UDP hijacking at runtime via sysctl:

```bash
# Enable UDP hijacking (disable TCP hijacking first if it was on)
sudo sysctl net.homa.hijack_tcp=0
sudo sysctl net.homa.hijack_udp=1
```

To switch back to TCP hijacking:

```bash
sudo sysctl net.homa.hijack_udp=0
sudo sysctl net.homa.hijack_tcp=1
```

**Note:** If both `hijack_tcp` and `hijack_udp` are set, TCP hijacking takes
priority (sockets opened while both are set will use TCP).

## How It Works

### Sending (outgoing packets)

1. **Socket initialization** (`homa_hijack_sock_init`): When a new Homa socket
is created, if `hijack_udp` is set the socket's `sk_protocol` is set to
`IPPROTO_UDP`. The kernel then transmits packets with a UDP IP protocol.

2. **Header setup** (`homa_udp_hijack_set_hdr`): Before transmission, Homa
writes UDP-compatible header fields:
- `flags` is set to `HOMA_HIJACK_FLAGS` (6) — a marker value.
- `urgent` is set to `HOMA_HIJACK_URGENT` (0xb97d) — a second marker.
- Bytes 4-5 of the transport header are overwritten with the UDP length.
- Bytes 6-7 are set up for proper UDP checksum offload.
- Because the sequence field (bytes 4-7) is overwritten, the packet offset
is stored in `seg.offset` instead.

3. **GSO geometry**: With UDP hijacking, `segs_per_gso` is forced to 1 (no
multi-segment GSO batching).

### Receiving (incoming packets)

1. **GRO interception** (`homa_udp_hijack_gro_receive`): Homa hooks into the
UDP GRO pipeline. When a UDP packet arrives, Homa checks:
- At least 20 bytes of transport header are available.
- `flags == HOMA_HIJACK_FLAGS` and `urgent == HOMA_HIJACK_URGENT`.

2. If the packet is identified as a Homa-over-UDP packet, the IP protocol
is rewritten to `IPPROTO_HOMA` and the packet is handed to Homa's normal
GRO handler. Real UDP packets are passed through to the normal UDP stack.

### Qdisc support

The `is_homa_pkt()` function in `homa_qdisc.c` recognizes both TCP-hijacked
and UDP-hijacked packets, ensuring they receive proper Homa qdisc treatment.

## Files Modified

| File | Changes |
|-------------------|------------------------------------------------------------|
| `homa_wire.h` | No new defines needed (reuses `HOMA_HIJACK_FLAGS` and `HOMA_HIJACK_URGENT`) |
| `homa_impl.h` | Added `hijack_udp` field to `struct homa` |
| `homa_hijack.h` | Added `homa_udp_hijack_set_hdr()`, `homa_sock_udp_hijacked()`, `homa_skb_udp_hijacked()`; updated `homa_hijack_sock_init()` |
| `homa_hijack.c` | Added `homa_udp_hijack_init()`, `homa_udp_hijack_end()`, `homa_udp_hijack_gro_receive()` |
| `homa_outgoing.c` | Added `segs_per_gso=1` for UDP; added UDP header calls in xmit paths |
| `homa_plumbing.c` | Added `hijack_udp` sysctl; added UDP init/end calls |
| `homa_qdisc.c` | Added `IPPROTO_UDP` check in `is_homa_pkt()` |
| `util/homa_test.cc` | Added `udp_ping()`, `test_udp()`, "udp" test command |
| `util/server.cc` | Added `udp_server()` function |
| `util/cp_node.cc` | Added `udp_server` and `udp_client` classes, "udp" protocol option |

## Key Constants

| Constant | Value | Purpose |
|----------------------|----------|------------------------------------------------------|
| `HOMA_HIJACK_FLAGS` | 6 | Marker in the `flags` field (shared with TCP hijack) |
| `HOMA_HIJACK_URGENT` | 0xb97d | Marker in the `urgent` field (shared with TCP hijack)|
8 changes: 4 additions & 4 deletions homa_devel.c
Original file line number Diff line number Diff line change
Expand Up @@ -1266,6 +1266,8 @@ void homa_validate_rbtree(struct rb_node *node, int depth, char *message)
tt_printk();
BUG_ON(1);
}
#else
return;
#endif /* __UNIT_TEST__ */
}
#endif /* See strip.py */
Expand All @@ -1286,11 +1288,9 @@ int homa_tcp_checksum(struct sk_buff *skb)
data_csum = skb_checksum(skb, skb_transport_offset(skb), tcp_len, 0);

if (skb_is_ipv6(skb)) {
const struct ipv6hdr *ip6h = ipv6_hdr(skb);

// Fold the manual sum with the IPv6 pseudo-header
return csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, tcp_len,
IPPROTO_TCP, data_csum);
return csum_ipv6_magic(&ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr,
tcp_len, IPPROTO_TCP, data_csum);
} else {
const struct iphdr *iph = ip_hdr(skb);

Expand Down
3 changes: 2 additions & 1 deletion homa_grant.c
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ static struct ctl_table grant_ctl_table[] = {
.mode = 0644,
.proc_handler = homa_grant_dointvec
},
{}
};
#endif /* See strip.py */

Expand Down Expand Up @@ -1166,7 +1167,7 @@ void homa_grant_update_sysctl_deps(struct homa_grant *grant)
*
* Return: 0 for success, nonzero for error.
*/
int homa_grant_dointvec(const struct ctl_table *table, int write,
int homa_grant_dointvec(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table table_copy;
Expand Down
2 changes: 1 addition & 1 deletion homa_grant.h
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ void homa_grant_cand_check(struct homa_grant_candidates *cand,
struct homa_grant *grant);
void homa_grant_check_fifo(struct homa_grant *grant);
void homa_grant_check_rpc(struct homa_rpc *rpc);
int homa_grant_dointvec(const struct ctl_table *table, int write,
int homa_grant_dointvec(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
void homa_grant_end_rpc(struct homa_rpc *rpc);
void homa_grant_find_oldest(struct homa_grant *grant);
Expand Down
25 changes: 19 additions & 6 deletions homa_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,10 @@
#include <linux/vmalloc.h>
#include <net/icmp.h>
#include <net/ip.h>
#include <net/ip6_route.h>
#include <net/netns/generic.h>
#include <net/protocol.h>
#include <net/inet_common.h>
#include <net/gro.h>
#include <net/rps.h>

#ifndef __UPSTREAM__ /* See strip.py */
#include "homa.h"
Expand Down Expand Up @@ -72,6 +71,11 @@ struct homa_peer;
struct homa_rpc;
struct homa_sock;

/* Features not present in all kernels: */
#ifndef __cond_acquires
#define __cond_acquires(x)
#endif

#ifndef __STRIP__ /* See strip.py */
#include "timetrace.h"
#include "homa_metrics.h"
Expand Down Expand Up @@ -354,6 +358,13 @@ struct homa {
*/
int hijack_tcp;

/**
* @hijack_udp: Non-zero means encapsulate outgoing Homa packets
* as UDP packets (i.e. use UDP as the IP protocol). Set externally
* via sysctl.
*/
int hijack_udp;

/**
* @max_gro_skbs: Maximum number of socket buffers that can be
* aggregated by the GRO mechanism. Set externally via sysctl.
Expand Down Expand Up @@ -661,7 +672,9 @@ static inline bool is_homa_pkt(struct sk_buff *skb)
ip_hdr(skb)->protocol;
return (protocol == IPPROTO_HOMA ||
(protocol == IPPROTO_TCP &&
tcp_hdr(skb)->urg_ptr == htons(HOMA_TCP_URGENT)));
tcp_hdr(skb)->urg_ptr == htons(HOMA_TCP_URGENT)) ||
(protocol == IPPROTO_UDP &&
tcp_hdr(skb)->urg_ptr == htons(HOMA_UDP_URGENT)));
return protocol == IPPROTO_HOMA;
}
#endif /* See strip.py */
Expand Down Expand Up @@ -736,7 +749,7 @@ int homa_net_start(struct net *net);
__poll_t homa_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait);
int homa_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int flags, int *addr_len);
int flags, int noblock, int *addr_len);
void homa_request_retrans(struct homa_rpc *rpc);
void homa_resend_pkt(struct sk_buff *skb, struct homa_rpc *rpc,
struct homa_sock *hsk);
Expand Down Expand Up @@ -768,7 +781,7 @@ void homa_xmit_unknown(struct sk_buff *skb, struct homa_sock *hsk);

#ifndef __STRIP__ /* See strip.py */
void homa_cutoffs_pkt(struct sk_buff *skb, struct homa_sock *hsk);
int homa_dointvec(const struct ctl_table *table, int write,
int homa_dointvec(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
void homa_incoming_sysctl_changed(struct homa *homa);
int homa_ioc_abort(struct socket *sock, unsigned long arg);
Expand All @@ -777,7 +790,7 @@ int homa_message_in_init(struct homa_rpc *rpc, int length,
void homa_prios_changed(struct homa *homa);
void homa_resend_data(struct homa_rpc *rpc, int start, int end,
int priority);
int homa_sysctl_softirq_cores(const struct ctl_table *table,
int homa_sysctl_softirq_cores(struct ctl_table *table,
int write, void *buffer, size_t *lenp,
loff_t *ppos);
int homa_unsched_priority(struct homa *homa, struct homa_peer *peer,
Expand Down
24 changes: 8 additions & 16 deletions homa_incoming.c
Original file line number Diff line number Diff line change
Expand Up @@ -167,13 +167,11 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
struct homa_gap *gap, *dummy, *gap2;
int start = ntohl(h->seg.offset);
int length = homa_data_len(skb);
enum skb_drop_reason reason;
int end = start + length;

if ((start + length) > rpc->msgin.length) {
tt_record3("Packet extended past message end; id %d, offset %d, length %d",
rpc->id, start, length);
reason = SKB_DROP_REASON_PKT_TOO_BIG;
goto discard;
}

Expand All @@ -189,7 +187,6 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
rpc->msgin.recv_end, start)) {
tt_record2("Couldn't allocate gap for id %d (start %d): no memory",
rpc->id, start);
reason = SKB_DROP_REASON_NOMEM;
goto discard;
}
rpc->msgin.recv_end = end;
Expand All @@ -207,13 +204,11 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
if (start < gap->start) {
tt_record4("Packet overlaps gap start: id %d, start %d, end %d, gap_start %d",
rpc->id, start, end, gap->start);
reason = SKB_DROP_REASON_DUP_FRAG;
goto discard;
}
if (end > gap->end) {
tt_record4("Packet overlaps gap end: id %d, start %d, end %d, gap_end %d",
rpc->id, start, end, gap->start);
reason = SKB_DROP_REASON_DUP_FRAG;
goto discard;
}
gap->start = end;
Expand All @@ -233,7 +228,6 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
if (end > gap->end) {
tt_record4("Packet overlaps gap end: id %d, start %d, end %d, gap_end %d",
rpc->id, start, end, gap->start);
reason = SKB_DROP_REASON_DUP_FRAG;
goto discard;
}
gap->end = start;
Expand All @@ -245,15 +239,13 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
if (!gap2) {
tt_record2("Couldn't allocate gap for split for id %d (start %d): no memory",
rpc->id, end);
reason = SKB_DROP_REASON_NOMEM;
goto discard;
}
gap2->time = gap->time;
gap->start = end;
goto keep;
}
/* Packet doesn't overlap any gap, so it is a duplicate. */
reason = SKB_DROP_REASON_DUP_FRAG;

discard:
#ifndef __STRIP__ /* See strip.py */
Expand All @@ -264,7 +256,7 @@ void homa_add_packet(struct homa_rpc *rpc, struct sk_buff *skb)
#endif /* See strip.py */
tt_record4("homa_add_packet discarding packet for id %d, offset %d, length %d, retransmit %d",
rpc->id, start, length, h->retransmit);
kfree_skb_reason(skb, reason);
kfree_skb(skb);
return;

keep:
Expand Down Expand Up @@ -360,6 +352,7 @@ int homa_copy_to_user(struct homa_rpc *rpc)
int offset = ntohl(h->seg.offset);
int buf_bytes, chunk_size;
struct iov_iter iter;
struct iovec iov;
int copied = 0;
char __user *dst;

Expand All @@ -379,13 +372,12 @@ int homa_copy_to_user(struct homa_rpc *rpc)
}
chunk_size = buf_bytes;
}
error = import_ubuf(READ, dst, chunk_size,
&iter);
if (error)
goto free_skbs;
iov.iov_base = dst;
iov.iov_len = chunk_size;
iov_iter_init(&iter, READ, &iov, 1, chunk_size);
error = skb_copy_datagram_iter(skbs[i],
sizeof(*h) +
copied, &iter,
copied, &iter,
chunk_size);
if (error)
goto free_skbs;
Expand Down Expand Up @@ -459,8 +451,8 @@ void homa_dispatch_pkts(struct sk_buff *skb)
hsk = homa_sock_find(hnet, dport);
if (!hsk || (!homa_is_client(id) && !hsk->is_server)) {
if (skb_is_ipv6(skb))
icmp6_send(skb, ICMPV6_DEST_UNREACH,
ICMPV6_PORT_UNREACH, 0, NULL, IP6CB(skb));
icmpv6_send(skb, ICMPV6_DEST_UNREACH,
ICMPV6_PORT_UNREACH, 0);
else
icmp_send(skb, ICMP_DEST_UNREACH,
ICMP_PORT_UNREACH, 0);
Expand Down
8 changes: 4 additions & 4 deletions homa_metrics.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ DEFINE_PER_CPU(struct homa_metrics, homa_metrics);

/* Describes file operations implemented for /proc/net/homa_metrics. */
static const struct proc_ops homa_metrics_ops = {
.proc_open = homa_metrics_open,
.proc_read = homa_metrics_read,
.proc_lseek = homa_metrics_lseek,
.proc_release = homa_metrics_release,
.proc_open = homa_metrics_open,
.proc_read = homa_metrics_read,
.proc_lseek = homa_metrics_lseek,
.proc_release = homa_metrics_release,
};

/* Global information used to export metrics information through a file in
Expand Down
Loading