Microservices & Distributed Systems

eBPF Networking: High-Performance Policy Enforcement, Traffic Mirroring, and Load Balancing

MatterAI Agent
MatterAI Agent
18 min readยท

eBPF Networking: Network Policy Enforcement, Traffic Mirroring, and Load Balancing

eBPF enables programmable, kernel-level networking with line-rate performance. This guide covers three core use cases: policy enforcement via XDP and TC, traffic mirroring for observability, and load balancing with direct server return.

Required Headers

All eBPF networking programs require these includes:

#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <linux/in.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

Kernel Version Requirements

Different eBPF helpers have varying minimum kernel version requirements:

Helper Minimum Kernel Notes
bpf_redirect 4.8 Basic redirect to egress
bpf_clone_redirect 4.4 Packet mirroring
bpf_csum_diff 4.6 Incremental checksum updates
bpf_fib_lookup 4.18 Kernel FIB lookup
bpf_ct_lookup 5.10 Conntrack via kfuncs
bpf_sk_lookup 4.20+ Socket lookup

Network Policy Enforcement

Policy enforcement uses eBPF at two hook points: XDP (driver-level, pre-stack) for early drops and TC (qdisc-level, post-stack) for stateful filtering.

XDP for Ingress Filtering

XDP runs before the kernel networking stack, enabling sub-microsecond packet drops. Use for DDoS mitigation and IP blacklisting.

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
    __uint(max_entries, 65536);
    __type(key, __be32);
    __type(value, __u8);
} blacklist SEC(".maps");

SEC("xdp")
int xdp_firewall(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    struct ethhdr *eth = data;
    struct iphdr *ip;
    
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    if (eth->h_proto != bpf_htons(ETH_P_IP))
        return XDP_PASS;
    
    ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return XDP_PASS;
    
    __be32 src_ip = ip->saddr;
    if (bpf_map_lookup_elem(&blacklist, &src_ip))
        return XDP_DROP;
    
    return XDP_PASS;
}

XDP for IPv6 Filtering

IPv6 uses a 128-bit address space and different header structure. The same blacklist pattern applies with struct ipv6hdr:

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
    __uint(max_entries, 65536);
    __type(key, struct in6_addr);
    __type(value, __u8);
} blacklist_v6 SEC(".maps");

SEC("xdp")
int xdp_firewall_v6(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    struct ethhdr *eth = data;
    struct ipv6hdr *ip6;
    
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    if (eth->h_proto != bpf_htons(ETH_P_IPV6))
        return XDP_PASS;
    
    ip6 = (void *)(eth + 1);
    if ((void *)(ip6 + 1) > data_end)
        return XDP_PASS;
    
    if (bpf_map_lookup_elem(&blacklist_v6, &ip6->saddr))
        return XDP_DROP;
    
    return XDP_PASS;
}

TC for Stateless Port Filtering

TC eBPF has access to conntrack and full skb metadata. This example shows a simple port-based filter:

SEC("tc")
int tc_port_filter(struct __sk_buff *skb) {
    void *data_end = (void *)(long)skb->data_end;
    void *data = (void *)(long)skb->data;
    struct ethhdr *eth = data;
    struct iphdr *ip;
    
    if ((void *)(eth + 1) > data_end)
        return TC_ACT_OK;
    
    ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return TC_ACT_OK;
    
    // Block port 23 (telnet)
    if (ip->protocol == IPPROTO_TCP) {
        struct tcphdr *tcp = (void *)(ip + 1);
        if ((void *)(tcp + 1) > data_end)
            return TC_ACT_OK;
        if (tcp->dest == bpf_htons(23))
            return TC_ACT_SHOT;
    }
    
    return TC_ACT_OK;
}

TC for Stateful Policy with Conntrack

For stateful filtering, track connection state in a map. The following example implements a simple flow table that allows return traffic for permitted outbound connections:

struct flow_key {
    __be32 saddr;
    __be32 daddr;
    __be16 sport;
    __be16 dport;
    __u8 proto;
};

struct flow_state {
    __u64 last_seen;
    __u8 state;  // 0=NEW, 1=ESTABLISHED
};

#define FLOW_STATE_NEW 0
#define FLOW_STATE_ESTABLISHED 1

struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 65536);
    __type(key, struct flow_key);
    __type(value, struct flow_state);
} flow_table SEC(".maps");

// Allowed inbound ports
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 256);
    __type(key, __be16);
    __type(value, __u8);
} allowed_ports SEC(".maps");

SEC("tc")
int tc_stateful_policy(struct __sk_buff *skb) {
    void *data_end = (void *)(long)skb->data_end;
    void *data = (void *)(long)skb->data;
    struct ethhdr *eth = data;
    struct iphdr *ip;
    struct tcphdr *tcp;
    struct flow_key key = {};
    struct flow_state *state;
    
    if ((void *)(eth + 1) > data_end)
        return TC_ACT_OK;
    
    ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return TC_ACT_OK;
    if (ip->protocol != IPPROTO_TCP)
        return TC_ACT_OK;
    
    tcp = (void *)(ip + 1);
    if ((void *)(tcp + 1) > data_end)
        return TC_ACT_OK;
    
    // Build flow key
    key.saddr = ip->saddr;
    key.daddr = ip->daddr;
    key.sport = tcp->source;
    key.dport = tcp->dest;
    key.proto = ip->protocol;
    
    // Check existing flow state
    state = bpf_map_lookup_elem(&flow_table, &key);
    if (state && state->state == FLOW_STATE_ESTABLISHED) {
        // Update timestamp and allow
        state->last_seen = bpf_ktime_get_ns();
        return TC_ACT_OK;
    }
    
    // New connection - check if destination port is allowed
    __be16 dport = tcp->dest;
    if (bpf_map_lookup_elem(&allowed_ports, &dport)) {
        // Create flow entry
        struct flow_state new_state = {
            .last_seen = bpf_ktime_get_ns(),
            .state = FLOW_STATE_ESTABLISHED
        };
        bpf_map_update_elem(&flow_table, &key, &new_state, BPF_ANY);
        
        // Also add reverse flow for return traffic
        struct flow_key rev_key = {
            .saddr = ip->daddr,
            .daddr = ip->saddr,
            .sport = tcp->dest,
            .dport = tcp->source,
            .proto = ip->protocol,
        };
        bpf_map_update_elem(&flow_table, &rev_key, &new_state, BPF_ANY);
        
        return TC_ACT_OK;
    }
    
    return TC_ACT_SHOT;
}

Note: For integration with the kernel's native conntrack, use bpf_ct_lookup kfuncs (kernel 5.10+) which require including vmlinux.h and using BTF. The above example provides a self-contained flow table approach that works on older kernels.

XDP vs TC Comparison

Aspect XDP TC
Hook point Driver level Qdisc level
Speed Fastest (pre-stack) Fast (post-stack)
Conntrack No (without kfuncs) Yes
Egress No Yes
Use case DDoS, blacklist Policy, QoS

PERCPU Maps for Performance

Under high contention, BPF_MAP_TYPE_PERCPU_HASH provides significantly better performance by maintaining separate map instances per CPU. This eliminates lock contention and cache line bouncing:

// Standard hash - shared across CPUs, potential contention
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 65536);
    __type(key, __be32);
    __type(value, __u8);
} shared_map SEC(".maps");

// PERCPU hash - each CPU has its own instance
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
    __uint(max_entries, 65536);
    __type(key, __be32);
    __type(value, __u8);
} percpu_map SEC(".maps");

Trade-offs: PERCPU maps use more memory (value_size * num_cpus) and require user-space to aggregate per-CPU values for statistics.

Traffic Mirroring

Traffic mirroring duplicates packets to a monitoring interface using bpf_clone_redirect. This enables network observability without impacting production traffic.

Basic Mirroring Implementation

struct {
    __uint(type, BPF_MAP_TYPE_DEVMAP);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u32);
} mirror_iface SEC(".maps");

SEC("xdp")
int xdp_mirror(struct xdp_md *ctx) {
    __u32 key = 0;
    __u32 *ifindex = bpf_map_lookup_elem(&mirror_iface, &key);
    
    if (ifindex) {
        int ret = bpf_clone_redirect(ctx, *ifindex, BPF_F_NO_INGRESS);
        if (ret < 0)
            bpf_printk("mirror redirect failed: %d\n", ret);
    }
    
    return XDP_PASS;
}

Selective Mirroring with Filters

Mirror only specific traffic patterns to reduce monitoring overhead.

struct {
    __uint(type, BPF_MAP_TYPE_DEVMAP);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u32);
} mirror_target SEC(".maps");

SEC("tc")
int tc_selective_mirror(struct __sk_buff *skb) {
    void *data_end = (void *)(long)skb->data_end;
    void *data = (void *)(long)skb->data;
    struct ethhdr *eth = data;
    struct iphdr *ip;
    
    if ((void *)(eth + 1) > data_end)
        return TC_ACT_OK;
    
    ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return TC_ACT_OK;
    
    // Mirror only HTTP traffic (port 80)
    if (ip->protocol == IPPROTO_TCP) {
        struct tcphdr *tcp = (void *)(ip + 1);
        if ((void *)(tcp + 1) > data_end)
            return TC_ACT_OK;
        if (tcp->dest == bpf_htons(80)) {
            __u32 key = 0;
            __u32 *ifindex = bpf_map_lookup_elem(&mirror_target, &key);
            if (ifindex) {
                int ret = bpf_clone_redirect(skb, *ifindex, 0);
                if (ret < 0)
                    bpf_printk("selective mirror failed: %d\n", ret);
            }
        }
    }
    
    return TC_ACT_OK;
}

Load Balancing

eBPF load balancers operate at L3/L4, rewriting packet headers and bypassing the kernel stack for backend traffic. This enables Direct Server Return (DSR) where responses bypass the load balancer entirely.

XDP_TX vs XDP_REDIRECT Trade-offs

Action Description Use Case
XDP_TX Transmit on same interface Single-homed LB, DSR with same VLAN
XDP_REDIRECT Transmit on different interface Multi-homed LB, different egress network
bpf_redirect_map Redirect via DEVMAP Batch updates, CPU steering

XDP_TX is faster (no map lookup) but limited to the ingress interface. Use XDP_REDIRECT with bpf_fib_lookup when backends are on different networks or when you need per-packet egress interface selection.

Checksum Helper Functions

Use bpf_csum_diff() for incremental checksum updates. This helper is verifier-friendly and handles the RFC 1624 math internally:

// Update IP checksum using bpf_csum_diff
static __always_inline void ipv4_csum_update(struct iphdr *ip, __be32 old_addr, __be32 new_addr) {
    __wsum csum = bpf_csum_diff(&old_addr, sizeof(old_addr), &new_addr, sizeof(new_addr), 0);
    ip->check = bpf_csum_fold(bpf_csum_add(csum, ip->check));
}

// Update L4 checksum for IP address change (pseudo-header)
static __always_inline void l4_csum_update_ip(void *l4_hdr, __u8 proto,
                                               __be32 old_addr, __be32 new_addr) {
    __wsum csum = bpf_csum_diff(&old_addr, sizeof(old_addr), &new_addr, sizeof(new_addr), 0);
    
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        tcp->check = bpf_csum_fold(bpf_csum_add(csum, tcp->check));
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        // UDP allows 0 checksum (means no checksum), but we should update if present
        if (udp->check)
            udp->check = bpf_csum_fold(bpf_csum_add(csum, udp->check));
    }
}

// Update L4 checksum for port change
static __always_inline void l4_csum_update_port(void *l4_hdr, __u8 proto,
                                                 __be16 old_port, __be16 new_port) {
    __wsum csum = bpf_csum_diff(&old_port, sizeof(old_port), &new_port, sizeof(new_port), 0);
    
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        tcp->check = bpf_csum_fold(bpf_csum_add(csum, tcp->check));
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        if (udp->check)
            udp->check = bpf_csum_fold(bpf_csum_add(csum, udp->check));
    }
}

// Jenkins hash for consistent backend selection
static __always_inline __u32 hash_5tuple_v4(struct iphdr *ip, __be16 sport, __be16 dport) {
    __u32 a = bpf_ntohl(ip->saddr);
    __u32 b = bpf_ntohl(ip->daddr);
    __u32 c = (bpf_ntohs(sport) << 16) | bpf_ntohs(dport);
    
    // Jenkins mix
    a -= b; a -= c; a ^= (c >> 13);
    b -= c; b -= a; b ^= (a << 8);
    c -= a; c -= b; c ^= (b >> 13);
    a -= b; a -= c; a ^= (c >> 12);
    b -= c; b -= a; b ^= (a << 16);
    c -= a; c -= b; c ^= (b >> 5);
    a -= b; a -= c; a ^= (c >> 3);
    b -= c; b -= a; b ^= (a << 10);
    c -= a; c -= b; c ^= (b >> 15);
    
    return c;
}

Maglev Consistent Hashing

Production load balancers use consistent hashing for backend selection, ensuring minimal redistribution on backend changes.

struct backend {
    __be32 ip;
    __be16 port;
    __u8 mac[6];
};

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 65536);
    __type(key, __u32);
    __type(value, struct backend);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} backends SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 65537);
    __type(key, __u32);
    __type(value, __u32);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} maglev_lut SEC(".maps");

SEC("xdp")
int xdp_lb(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    struct ethhdr *eth = data;
    struct iphdr *ip;
    void *l4_hdr;
    __be16 sport, dport;
    __u8 proto;
    
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    
    ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return XDP_PASS;
    
    proto = ip->protocol;
    l4_hdr = (void *)(ip + 1);
    
    // Handle both TCP and UDP
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        if ((void *)(tcp + 1) > data_end)
            return XDP_PASS;
        sport = tcp->source;
        dport = tcp->dest;
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        if ((void *)(udp + 1) > data_end)
            return XDP_PASS;
        sport = udp->source;
        dport = udp->dest;
    } else {
        return XDP_PASS;
    }
    
    // Hash 5-tuple and lookup backend
    __u32 hash = hash_5tuple_v4(ip, sport, dport);
    __u32 lut_size = 65537;
    __u32 idx = hash % lut_size;
    __u32 *backend_idx = bpf_map_lookup_elem(&maglev_lut, &idx);
    
    if (!backend_idx)
        return XDP_PASS;
    
    struct backend *be = bpf_map_lookup_elem(&backends, backend_idx);
    if (!be)
        return XDP_PASS;
    
    // Store old values for checksum update
    __be32 old_dip = ip->daddr;
    __be16 old_dport = dport;
    
    // Rewrite destination
    ip->daddr = be->ip;
    
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        tcp->dest = be->port;
        l4_csum_update_ip(tcp, proto, old_dip, be->ip);
        l4_csum_update_port(tcp, proto, old_dport, be->port);
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        udp->dest = be->port;
        l4_csum_update_ip(udp, proto, old_dip, be->ip);
        l4_csum_update_port(udp, proto, old_dport, be->port);
    }
    
    // Update IP header checksum
    ipv4_csum_update(ip, old_dip, be->ip);
    
    // Rewrite destination MAC
    __builtin_memcpy(eth->h_dest, be->mac, 6);
    
    // Use FIB lookup for egress interface selection
    struct bpf_fib_lookup fib_params = {};
    fib_params.family = AF_INET;
    fib_params.tos = ip->tos;
    fib_params.l4_protocol = proto;
    fib_params.sport = sport;
    fib_params.dport = be->port;
    fib_params.tot_len = bpf_ntohs(ip->tot_len);
    fib_params.ifindex = ctx->ingress_ifindex;
    fib_params.ipv4_src = ip->saddr;
    fib_params.ipv4_dst = be->ip;
    
    int fib_ret = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), 0);
    
    if (fib_ret == BPF_FIB_LKUP_RET_SUCCESS) {
        // Update source MAC to egress interface MAC
        __builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
        return bpf_redirect(fib_params.ifindex, 0);
    }
    
    // Fallback: transmit on same interface (DSR scenario)
    return XDP_TX;
}

IPv6 Load Balancing

IPv6 load balancing follows the same pattern but uses 128-bit addresses. IPv6 has no header checksum, so only L4 checksums need updating:

struct backend_v6 {
    struct in6_addr ip;
    __be16 port;
    __u8 mac[6];
};

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 65536);
    __type(key, __u32);
    __type(value, struct backend_v6);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} backends_v6 SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 65537);
    __type(key, __u32);
    __type(value, __u32);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} maglev_lut_v6 SEC(".maps");

static __always_inline __u32 hash_5tuple_v6(struct ipv6hdr *ip6, __be16 sport, __be16 dport) {
    __u32 hash = 0;
    // Hash source and destination addresses
    for (int i = 0; i < 4; i++) {
        hash ^= bpf_ntohl(ip6->saddr.in6_u.u6_addr32[i]);
        hash ^= bpf_ntohl(ip6->daddr.in6_u.u6_addr32[i]);
    }
    hash ^= (bpf_ntohs(sport) << 16) | bpf_ntohs(dport);
    
    // Final mix
    hash ^= hash >> 16;
    hash *= 0x85ebca6b;
    hash ^= hash >> 13;
    hash *= 0xc2b2ae35;
    hash ^= hash >> 16;
    
    return hash;
}

// Update L4 checksum for IPv6 address change (128-bit pseudo-header)
static __always_inline void l4_csum_update_ip6(void *l4_hdr, __u8 proto,
                                                struct in6_addr *old_addr,
                                                struct in6_addr *new_addr) {
    // IPv6 addresses are 128-bit (4 x 32-bit words)
    __wsum csum = 0;
    for (int i = 0; i < 4; i++) {
        __be32 old_word = old_addr->in6_u.u6_addr32[i];
        __be32 new_word = new_addr->in6_u.u6_addr32[i];
        csum = bpf_csum_diff(&old_word, sizeof(old_word), &new_word, sizeof(new_word), csum);
    }
    
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        tcp->check = bpf_csum_fold(bpf_csum_add(csum, tcp->check));
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        if (udp->check)
            udp->check = bpf_csum_fold(bpf_csum_add(csum, udp->check));
    }
}

SEC("xdp")
int xdp_lb_v6(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    struct ethhdr *eth = data;
    struct ipv6hdr *ip6;
    void *l4_hdr;
    __be16 sport, dport;
    __u8 proto;
    
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    
    ip6 = (void *)(eth + 1);
    if ((void *)(ip6 + 1) > data_end)
        return XDP_PASS;
    
    proto = ip6->nexthdr;
    l4_hdr = (void *)(ip6 + 1);
    
    // Handle both TCP and UDP
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        if ((void *)(tcp + 1) > data_end)
            return XDP_PASS;
        sport = tcp->source;
        dport = tcp->dest;
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        if ((void *)(udp + 1) > data_end)
            return XDP_PASS;
        sport = udp->source;
        dport = udp->dest;
    } else {
        return XDP_PASS;
    }
    
    // Hash 5-tuple and lookup backend
    __u32 hash = hash_5tuple_v6(ip6, sport, dport);
    __u32 lut_size = 65537;
    __u32 idx = hash % lut_size;
    __u32 *backend_idx = bpf_map_lookup_elem(&maglev_lut_v6, &idx);
    
    if (!backend_idx)
        return XDP_PASS;
    
    struct backend_v6 *be = bpf_map_lookup_elem(&backends_v6, backend_idx);
    if (!be)
        return XDP_PASS;
    
    // Store old values for checksum update
    struct in6_addr old_dip = ip6->daddr;
    __be16 old_dport = dport;
    
    // Rewrite destination
    ip6->daddr = be->ip;
    
    // Update L4 checksum (IPv6 has no header checksum)
    if (proto == IPPROTO_TCP) {
        struct tcphdr *tcp = l4_hdr;
        tcp->dest = be->port;
        l4_csum_update_ip6(tcp, proto, &old_dip, &be->ip);
        l4_csum_update_port(tcp, proto, old_dport, be->port);
    } else if (proto == IPPROTO_UDP) {
        struct udphdr *udp = l4_hdr;
        udp->dest = be->port;
        l4_csum_update_ip6(udp, proto, &old_dip, &be->ip);
        l4_csum_update_port(udp, proto, old_dport, be->port);
    }
    
    // Rewrite destination MAC
    __builtin_memcpy(eth->h_dest, be->mac, 6);
    
    // Use FIB lookup for egress interface selection
    struct bpf_fib_lookup fib_params = {};
    fib_params.family = AF_INET6;
    fib_params.l4_protocol = proto;
    fib_params.sport = sport;
    fib_params.dport = be->port;
    fib_params.tot_len = bpf_ntohs(ip6->payload_len);
    fib_params.ifindex = ctx->ingress_ifindex;
    __builtin_memcpy(fib_params.ipv6_src, &ip6->saddr, sizeof(struct in6_addr));
    __builtin_memcpy(fib_params.ipv6_dst, &be->ip, sizeof(struct in6_addr));
    
    int fib_ret = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), BPF_FIB_LOOKUP_SKIP_NEIGH);
    
    if (fib_ret == BPF_FIB_LKUP_RET_SUCCESS) {
        __builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
        return bpf_redirect(fib_params.ifindex, 0);
    }
    
    // Fallback: transmit on same interface
    return XDP_TX;
}

DSR Architecture

In DSR mode, the load balancer rewrites only the destination MAC and optionally encapsulates. Backend responses go directly to clients.

Client Request:  Client โ†’ LB (XDP) โ†’ Backend
Backend Response: Backend โ†’ Client (direct, bypassing LB)

Benefits:

  • LB handles only ingress traffic
  • No bottleneck on response path
  • Scales to millions of connections

Map Pinning for Persistence

Maps with LIBBPF_PIN_BY_NAME are automatically pinned to /sys/fs/bpf/<map_name>. This ensures:

  • State persists across program reloads
  • User-space agents can update backends without disrupting traffic
  • Graceful upgrades without connection loss

To manually pin maps:

bpftool map pin id <MAP_ID> /sys/fs/bpf/my_map

Getting Started

Loading Programs

  1. Install dependencies: clang, llvm, libbpf, bpftool
  2. Compile eBPF program: clang -O2 -target bpf -c program.c -o program.o
  3. Load XDP program: ip link set dev eth0 xdp obj program.o sec xdp
  4. Load TC program:
    tc qdisc add dev eth0 clsact
    tc filter add dev eth0 ingress bpf obj program.o sec tc
    
  5. Update maps dynamically: bpftool map update id <ID> key <KEY> value <VALUE>
  6. Monitor events: bpftool prog tracelog

Unloading Programs

Unload XDP:

ip link set dev eth0 xdp off

Unload TC:

tc filter del dev eth0 ingress
tc qdisc del dev eth0 clsact

Remove pinned maps:

rm /sys/fs/bpf/<map_name>

For production deployments, use frameworks like Cilium, Katran, or bpftools which provide orchestration, health checking, and observability integrations.

Share this Guide:

Ready to Supercharge Your Development Workflow?

Join thousands of engineering teams using MatterAI to accelerate code reviews, catch bugs earlier, and ship faster.

No Credit Card Required
SOC 2 Type 2 Certified
Setup in 2 Minutes
Enterprise Security
4.9/5 Rating
2500+ Developers