eBPF in the Real World: Use Cases

Search

Table of Contents

As we explained in a previous article, eBPF allows the development of new solutions in different areas. Some are related to SDN management, DDoS mitigation, and intrusion detection through early packet drop. Others help to improve network performance, load balancing, observability, and more.

Now, you can discover some use cases and success stories from real-world projects.

eBPF: an overview

Even when BPF (Berkeley Packet Filter) emerged in 1992 as a solution for optimizing packet filters, it had some limitations. Working around these limitations, Alexei Starovoitov initially proposed a rewrite for BPF. Then he developed eBPF, or extended Berkeley Packet Filter, with Daniel Borkmann in 2014. 

Nowadays, its creators present eBPF as “a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in an operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules”. This enables the possibility “to run on events other than packets, and do actions other than filtering”, as Brandan Gregg refers.

eBPF Use Cases

Previously, we listed 5 reasons to use eBPF. More than reasons, there are 5 areas where your project can be improved and boosted by it. It is programmability, networking, tracing and profiling, observability and monitoring, and security.

Facebook’s load balancer

Facebook’s servers process millions and millions of visits every day. So how to optimize the traffic and guarantee the user experience in a reliable, safe, and fast way? The company’s engineers are using Katran. It “creates a software-based solution to load balancing with a re-engineered forwarding plane that takes advantage of recent innovations in kernel engineering”. These innovations are eXpress Data Path (XDP) and the eBPF virtual machine, as explained by Nikita Shirokov and Ranjeeth Dasineni.

Facebook uses a network load balancer (also called layer 4 load balancer, or L4LB). It operates on packets rather than serving application-level requests. To do this, a virtual IP address (VIP) is addressed “to the internet at each location. Packets destined to the VIP are then seamlessly distributed among the backend servers”, by the distribution algorithm. Then, the packets are sent to the globally distributed network of points of presence (PoP). The PoPs also act as proxies for Facebook’s data centers.

However, the first generation L4LB, based on the IPVS kernel module, presented some challenges related to backends. “In the second iteration, we leveraged the eXpress Data Path (XDP) framework and the new BPF virtual machine (eBPF) to run the software load balancer together with the backends on a large number of machines”, added the engineers.

Comparing generations, “both are software load balancers running on backend servers. Katran (right) allows us to colocate the load balancer with backend application, thus increasing the load balancer capacity”. Additionally, “Katran is deployed today on backend servers in Facebook’s points of presence (PoPs), and it has helped us improve the performance and scalability of network load balancing and reduce inefficiencies such as busy loops when there are no incoming packets”. 

Facebook’s encryption

Facebook also uses eBPF to enforce encryption policies within its network. Thinking of different options and scenarios to provide transparent enforcement, the team decided to develop and deploy an SSLWall. It’s “a system that cuts off non-SSL connections across various boundaries”, as explained in this blog post. This approach requires work in the kernel context. It’s here where engineers take advantage of eBPF capabilities, such as tc-bpf, kprobes, and maps.

The eBPF programs are managed through a daemon, which also sends logs to Scribe. “This makes management of releases easier to deal with, as we only have one software unit to monitor instead of needing to track a daemon and eBPF release. Additionally, we can modify the schema of our BPF tables, which both user space and kernel space consult, without compatibility concerns between releases.” Proxies are part of the final infrastructure too. 

Cloudflare’s Magic Firewall

Cloudflare is one of the leaders in the cloud computing market. Being a provider for companies around the world, it’s mandatory to offer a flawless service, but also a safe one to protect their assets. In this matter, Cloudflare used eBPF to build programmable packet filtering for the product called Magic Firewall.

“Magic Firewall allows custom packet-level rules, enabling customers to deprecate hardware firewall appliances and block malicious traffic at Cloudflare’s network”, according to the company. With cyberattacks becoming more frequent and sophisticated every day, it was necessary to shield Cloudlare’s network and services. 

How does eBPF enhance Magic Firewall

To achieve this goal, the engineering team is using eBPF capabilities. “With eBPF, you can insert packet processing programs that execute in the kernel, giving you the flexibility of familiar programming paradigms with the speed of in-kernel execution (…) We wanted to find a way to use eBPF to extend our use of nftables in Magic Firewall. This means being able to match, using an eBPF program within a table and chain as a rule. By doing this we can have our cake and eat it too, by keeping our existing infrastructure and code, and extending it further”.

Altogether using iptables and nftables, Cloudflare constructed an eBPF program. With this, it was able to load it into an existing nftables table and chain and integrate it into the tooling through Cilium. Now, Magic Firewall is more flexible and powerful. In addition to having an integrated solution, Cloudflare affirmed they can “look deeper into packets and implement more complex matching logic than nftables alone could provide. Since our firewall is running as software on all Cloudflare servers, we can quickly iterate and update features”.

Netflix’s Observability

Regarding observability and monitoring tasks, “eBPF enables the collection & in-kernel aggregation of custom metrics and generation of visibility events based on a wide range of possible sources”, refers to the eBPF official site. This way, it extends the depth of visibility and generates histograms and data structures that facilitate the analysis. Currently, there are several open-source plugins and applications you can orchestrate with your cloud infrastructure.

In the last years, Netflix has been “using eBPF to understand what software is doing, what the software is blocking in ways we couldn’t see before in production”, affirms Brendan Gregg, senior performance architect in the company. “We can log whenever machines talk to other machines, and we can use that for capacity planning and security analysis. It’s enabling us to use technologies in Linux that we couldn’t use before, kprobes, uprobes in production”.

Netflix’s new data flow

Several products, technologies, and services compose Netflix’s cloud infrastructure. It represents some challenges related to overall observability. In order to solve these problems, the company deployed the Cloud Network Insight. It’s “a suite of solutions that provides both operational and analytical insight into the cloud network infrastructure to address the identified problems”, it’s the definition from the Netflix team. In the Cloud Network Insight, different sources (such as  VPC Flow Logs, ELB Access Logs, eBPF flow logs on the instances, etc, collect the data). In the specific case of eBPF, “the Flow Exporter is a sidecar that uses eBPF tracepoints to capture TCP flows at near real-time on instances that power the Netflix microservices architecture”.

Flow Collector consumes two data streams, and the data goes through Keystone which routes to the data stores. Finally, data feeds “various use cases within Netflix like network monitoring and network usage forecasting available via Lumen dashboards and machine learning-based network segmentation. The data is also used by security and other partner teams for insight and incident analysis.”

This solution has shown to be scalable, being able to manage billions of eBPF flow logs per hour, affirms the team, while providing visibility.

Summary

The number of open-source applications and tools based on eBPF is increasing. The adoption of this technology is facilitating this process and confirms its usability in real-life projects. Big and small companies are benefiting from it in different ways. It seems like we are going to be listening and reading a lot more about eBPF in the next months and years.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top