How GitHub Uses eBPF to Block Circular Dependencies
GitHub built an eBPF-based firewall that blocks deployment scripts from creating circular dependencies. The system uses cGroups and kernel hooks to intercept DNS queries, correlate blocked requests to specific commands, and enforce architectural constraints automatically.
TL;DR
- GitHub built an eBPF-based firewall that blocks deployment scripts from creating circular dependencies on github.com
- The system uses cGroups and eBPF hooks to intercept DNS queries and network calls, then correlates blocked requests back to specific commands
- After a six-month rollout, the tool now catches problematic dependencies before they cause incidents
- This matters because circular dependencies can turn a MySQL outage into a deployment deadlock
The Big Picture
GitHub hosts its own source code on github.com. That's a circular dependency waiting to explode: if GitHub goes down, they can't access the code needed to fix GitHub. They've solved the obvious version of this problem with mirrors and cached assets. But what about the hidden dependencies?
A deployment script that fetches a binary from GitHub. A servicing tool that checks for updates mid-deploy. An internal API that pulls releases from github.com. Any of these can turn a routine MySQL outage into a deployment deadlock.
The traditional solution? Ask every team to audit their deployment scripts. In practice, most circular dependencies aren't discovered until an incident is already underway. GitHub needed a way to enforce the rule automatically: deployment scripts must work even when github.com is down.
Blocking github.com entirely isn't an option. These hosts serve production traffic during deploys. The solution needed to be surgical: block github.com only for the deployment process, not for the entire machine. That's where eBPF comes in.
How It Works
eBPF lets you load custom programs into the Linux kernel and hook into system primitives like networking. GitHub's implementation uses two key eBPF program types to build a per-process firewall.
First, BPF_PROG_TYPE_CGROUP_SKB hooks network egress from a specific cGroup. A cGroup is a Linux primitive that enforces resource limits and isolation for sets of processes. You can create a cGroup, move only the deployment script into it, and then limit outbound network access for just that script. No Docker required.
But CGROUP_SKB operates on IP addresses. Maintaining an up-to-date blocklist of GitHub's IPs would be a nightmare. GitHub needed DNS-level blocking.
Enter BPF_PROG_TYPE_CGROUP_SOCK_ADDR. This program type hooks syscalls that create sockets and can rewrite the destination IP. GitHub uses it to intercept DNS queries from the cGroup and redirect them to a userspace DNS proxy. The proxy evaluates each domain against a blocklist and uses eBPF Maps to communicate with the CGROUP_SKB program, allowing or denying the request.
The architecture looks like this: deployment script runs in isolated cGroup → DNS query intercepted by eBPF → redirected to userspace proxy → proxy checks blocklist → eBPF Map updated → CGROUP_SKB allows or blocks the connection.
GitHub built this using the cilium/ebpf library in Go, which massively simplifies authoring and running eBPF programs. The //go:generate directive compiles the eBPF C code and auto-generates Go structs, so a simple go build is all you need.
The deeper they got, the more they realized they could do. Could they correlate blocked requests back to the specific command that triggered them? Yes. Inside the CGROUP_SKB program, they extract the DNS transaction ID from the skb_buff and capture the Process ID using bpf_get_current_pid_tgid(). This information goes into an eBPF Map tracking DNS Transaction ID → Process ID.
When the userspace DNS proxy sees a request, it looks up the transaction ID in the eBPF Map, finds the PID, reads /proc/{PID}/cmdline, and outputs a log line with the full command that triggered the blocked request. Teams get immediate feedback: "Your deploy script ran curl github.com and it was blocked."
The system also provides an audit list of all domains contacted during deployment and uses cGroups to enforce CPU and memory limits on deploy scripts, preventing runaway resource usage.
What This Changes For Developers
Before this system, circular dependencies were discovered during incidents. A MySQL outage happens. The deploy script fails. Engineers scramble to figure out why. Turns out a tool buried three layers deep is trying to fetch a release from GitHub. Mean time to recovery just got longer.
Now, if a team accidentally adds a problematic dependency, or if an existing binary takes a new dependency on github.com, the tooling catches it immediately. The deployment fails with a clear error message pointing to the exact command that caused the problem.
This shifts the problem left. Instead of discovering circular dependencies during an outage, teams discover them during normal deploys. The feedback loop is instant. The fix happens before the incident.
For platform teams, this is a forcing function. You can't ship a deployment script that depends on github.com. The system won't let you. That constraint makes GitHub more resilient by design.
The broader lesson here is about enforcement versus documentation. GitHub could have written a wiki page titled "Don't Create Circular Dependencies in Deploy Scripts." They could have added it to onboarding. They could have sent Slack reminders. None of that would have worked as well as a system that makes the wrong thing impossible.
Try It Yourself
GitHub's early proof of concept is open source. The production implementation has evolved, but the PoC is a solid starting point for understanding the architecture.
Here's the core pattern for attaching an eBPF program to a cGroup using cilium/ebpf:
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -tags linux bpf cgroup_skb.c -- -I../headers
func main() {
// Load pre-compiled programs and maps into the kernel.
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("loading objects: %v", err)
}
defer objs.Close()
// Link the program to the cgroup.
l, err := link.AttachCgroup(link.CgroupOptions{
Path: "/sys/fs/cgroup/system.slice",
Attach: ebpf.AttachCGroupInetEgress,
Program: objs.CountEgressPackets,
})
if err != nil {
log.Fatal(err)
}
defer l.Close()
log.Println("Counting packets...")
}The corresponding eBPF C code hooks egress packets and increments a counter:
//go:build ignore
#include "common.h"
char __license[] SEC("license") = "Dual MIT/GPL";
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 1);
} pkt_count SEC(".maps");
SEC("cgroup_skb/egress")
int count_egress_packets(struct __sk_buff *skb) {
u32 key = 0;
u64 init_val = 1;
u64 *count = bpf_map_lookup_elem(&pkt_count, &key);
if (!count) {
bpf_map_update_elem(&pkt_count, &key, &init_val, BPF_ANY);
return 1;
}
__sync_fetch_and_add(count, 1);
return 1;
}If you're not ready to write eBPF programs yet, start with tools built on eBPF. bpftrace gives you deep tracing capabilities. ptcpdump captures TCP dumps with container-level metadata. Both are production-ready and will give you a feel for what eBPF enables.
The cilium/ebpf examples are excellent. The docs.ebpf.io site has comprehensive documentation on program types, helper functions, and maps.
The Bottom Line
Use this approach if you have deployment systems that must remain operational even when dependencies are down. The pattern applies beyond GitHub's specific use case. Any critical path that can't afford circular dependencies benefits from this kind of enforcement.
Skip it if your deployment scripts are simple, your dependencies are stable, or you don't have the kernel expertise to debug eBPF issues in production. This is infrastructure-level tooling. It requires kernel 4.15+ and a team comfortable operating at that layer.
The real opportunity here isn't just blocking github.com during deploys. It's the broader pattern: using eBPF to enforce architectural constraints at the kernel level. You can't accidentally violate a rule that the kernel enforces. That's more powerful than any amount of documentation or code review.
Source: GitHub Blog