Skip to main content

How do programs work on a computer? (Part 1/2)

Let's take a very simple program for understanding how this system works. The following example is a very basic C code that assigns 2 variables (x and y) and then adds them to generate a value 'z'.  Note: We do not output anything in this program void main() { int x = 3; int y = 4; int z = x + y; } But we have a pressing question. Computers inherently work on electronics and logic, so how does it understand and execute the program? The answer is not very simple and you have to bear with me through a tiny journey. Basic Logic Circuits Let's first begin with a very basic Circuit which implements AND logic and OR logic (skip if you are familiar)  As seen in the circuits above, for the left circuit, the LED lights up only when both switches are on. This is AND logic as "Switch 1 AND Switch 2 need to be on". In the circuit on the right, either of the switch will switch on the LED. Note that in the ...

Dynamically extending the Linux Kernel! An introduction to eBPF

A simple Google search of "eBPF" will tell you that it stands for "extended Berkeley Packet Filter". The words "packet filter" in the name makes me think this has something to do with Computer Networks. But, while it started as a packet filter system, it has extended much beyond that functionality since then. Facebook also had Whatsapp, Instagram, Oculus etc., under its belt, and hence the board decided to rename the company to Meta to symbolise this. In the same way, maybe eBPF should be renamed to represent what it is!




So, what exactly is eBPF? To understand eBPF, let us first understand why it is required.

A short video explaining eBPF can be found here.

Pre-requisites

In this journey, the first thing to learn is the difference between userspace and kernel space. This sub-section of my previous post gives you an overview. As mentioned there, you don't want your user application to have access to any data of any other running application unless specifically requested. Each application is given a virtual memory address where it resides. And a hardware chip stores the mapping of the actual physical memory address (in the RAM/cache) to that virtual location and serves it as required. On the other hand, the kernel has full access to the entire Physical Memory without any virtual addressing. In this post, we will not get into the details of physical vs virtual addressing.

What if you wanted to apply an OS-wide filter for network packets (blocking or accepting requests from a particular source) or log the system calls that an application makes. One way to do this is to develop a regular application using your favourite programming language. But that program would run in userspace, which has its own caveats. 

For example, each network packet you want to filter will still be de-packetised until L4 (Transport layer) in the kernel and then served to userspace. Only then can you filter it. Also, there is no direct way to capture the syscalls of an application without explicitly running the program in association with other tracing programs like 'strace'. Needless to say, all of these consume comparatively more processing power and memory than programs just running in kernel space. And it would be entirely impossible to 'see' a program if it's run and destroyed in a tiny amount of time, as userspace applications would poll the system with programs like 'ps' to see which programs are running at any given time.

Running programs in the kernel is much more helpful for the aforementioned use-cases. For example, you can attach a program to run every time a particular system call is made and then log which process called it. You can also block or re-route a network packet directly from the kernel by attaching a program to run whenever a network packet is received. The possibilities are endless!

Introduction

This brings us to what eBPF is. It's a kernel extension that allows users to run programs inside the kernel by attaching their execution to a particular kernel event. eBPF works by having a extremely tiny Virtual Machine inside the Linux Kernel. Don't worry; this Virtual Machine is not at all RAM hoggy. This VM can be related more to the Java Virtual Machine than the VirtualBox/VMWare/KVM systems. In Java, your code is compiled to an architecture-agnostic bytecode, and this (mostly) has a one-one correspondence to most architecture-specific bytecodes. Similarly, the BPF VM is a target machine for compiling programs to. LLVM has BPF VM as a target.

But there are obvious caveats to running a program in the kernel. The first one that comes to mind is to run an infinite loop every time a system call is made. The Linux kernel's scheduler determines which program to run next on a particular CPU core for any userspace program. And this is designed in such a way that, after a specific quantum of time/some event, the scheduler program runs, pre-empting (removing) any existing program from the core and executing itself. But, this is possible as the other programs are in ring 3, while the scheduler executes in ring 0 and has access to additional CPU features. If you don't know what ring 0 and ring 3 are, read this. Hence, if I run an infinite loop in a program inside the kernel, the kernel will never yield the processor, and your computer would be stuck. This is devastating.

Hence, eBPF is programmed with a subset of C to avoid these caveats. C is a Turing Complete language. A Turning Complete language, in the simplest terms, is a language that can do what any other Turing Complete language can do. i.e. you can do with C, whatever you can with Java, Python, Go, Rust, JS etc. The subset of C that eBPF permits is not Turing Complete. It does not allow features like loops. It also has a hard limit on the number of branches the code can have. Every time you load an eBPF program in the kernel, a verifier first verifies whether the program will always terminate.

The eBPF functionality was first introduced in the Linux kernel v3.18. Since then, eBPF has evolved quite a lot. But one of the most notable changes to eBPF was in Linux kernel v5.2. Before v5.2, eBPF programs had a maximum limit of 4096 instructions per program, which was confusing. In addition, the total number of instructions across all branches were limited to 232, beyond which the verifier would give up and not allow your code to execute. v5.2 got rid of the 4096 instruction limit and extended the verifier limit to 1M instructions, allowing developers to write more involved programs. Link to the changelog

Hello World!

Let us walk through a Hello World program with eBPF. As eBPF is a Linux functionality, the first thing you need is a Linux Machine. Don't worry if you are on Windows/Mac, though. You can use Multipass. Windows Home users need VirtualBox alongside Multipass, but Mac users do not require it. After the installation in either OS, use the following code in "cmd" or "Terminal".

Alternatively, you can use the interactive online system embedded at the end of this section!

multipass launch --name eBPF
multipass exec eBPF -- bash
The first line launches an instance of the latest Ubuntu LTS with 1 core and 1GB RAM. Note that this will take some time as your device will download the Ubuntu image and set up the VM. The second line logs you into the running instance. After logging into the instance, run the following commands:
# Update repositories and packages
sudo apt update && sudo apt upgrade -y

# Install BPF Compiler Collection and other required stuff
sudo apt install bpfcc-tools libbpfcc-dev linux-headers-$(uname -r) python3-bpfcc

# Fetch the hello world code from pastebin (also given below)
wget -O hello_ebpf.py https://pastebin.com/raw/Jj7xB7ZU

# Run the program
sudo python3 hello_ebpf.py
The program is set to run when any application calls the 'getdents64' system call, which is the call to access the list of files/folders in a particular folder. It is the underlying system call for 'ls', a command similar to 'dir' in Windows. To test this, in another terminal/cmd, run:
multipass exec eBPF -- bash
You will notice that the first terminal/cmd now has a list of programs that tried accessing different files from different folders, just to log you in! You can now type 'ls' in the 2nd terminal to check whether a new entry shows up in the first terminal. Spoiler: It will!

Code Walkthrough

from bcc import BPF

BPF_PROGRAM = r"""
int hello(void *ctx) {
  bpf_trace_printk("getdents64 detected: \n");
  return 0;
}
"""

bpf = BPF(text=BPF_PROGRAM)
bpf.attach_kprobe(event=bpf.get_syscall_fnname("getdents64"), fn_name="hello")

while True:
    try:
        (task, pid, cpu, flags, ts, msg) = bpf.trace_fields()
        print(msg.decode('utf8'), task.decode('utf8'), pid, cpu, flags.decode('utf8'), ts)
    except ValueError:
        print('ValueError')
        continue
    except KeyboardInterrupt:
        break

  1. The first line imports BPF into python. This is a wrapper around the bpf() system calls.
  2. BPF_PROGRAM in Line#3 is a multiline string, which is the actual eBPF code. 
  3. Line #5 has the function bpf_trace_printk, which prints to a specific well-known pipe*, but has limitations. The message is padded with extra information like the process name that triggered it and the process's id.
  4. Line #10 generates and gives the context of the eBPF program to python.
  5. Line #11 instructs the kernel to attach that eBPF program to a kernel event (a call to 'getdents64' in this case)
  6. Line #13 is a loop in the userspace Python program and not the eBPF program, and hence is allowed.
  7. Line #15 assigns the result of trace_fields() to a tuple. 

Further Reading Material

  1. BCC Reference Guide
  2. eBPF maps explained (Maps are storage units to store data across multiple calls of the eBPF program)

Comments

Popular posts from this blog

How do programs work on a computer? (Part 1/2)

Let's take a very simple program for understanding how this system works. The following example is a very basic C code that assigns 2 variables (x and y) and then adds them to generate a value 'z'.  Note: We do not output anything in this program void main() { int x = 3; int y = 4; int z = x + y; } But we have a pressing question. Computers inherently work on electronics and logic, so how does it understand and execute the program? The answer is not very simple and you have to bear with me through a tiny journey. Basic Logic Circuits Let's first begin with a very basic Circuit which implements AND logic and OR logic (skip if you are familiar)  As seen in the circuits above, for the left circuit, the LED lights up only when both switches are on. This is AND logic as "Switch 1 AND Switch 2 need to be on". In the circuit on the right, either of the switch will switch on the LED. Note that in the ...

Introductions

Hey! Welcome to "Root Access"! Before we get to the complicated stuff, I want to introduce myself and this blog. Well, I am a 2020 graduate who jumped into the network industry. My first (and current) job is as a Software Engineer in the Office of the CTO (fondly called OCTO) at F5 Networks Innovation Pvt Ltd . The projects that are undertaken here involve a broad range of the latest and next-generation technologies. Jargons like WebAssembly, SmartNIC, DDoS protection, eBPF, DPDK etc., are what you may hear every day! And it's pretty easy to get lost in all of these. Coming to this blog, I chose the name "Root Access" because it's a pun on my name. Also, because this blog (hopefully) provides you with a good enough introduction to techno-babble that you can then have the skill (privilege) to access the more profound knowledge. But mostly because it's a pun. This blog will mostly contain posts that explain the latest or complicated technologies from scra...

Photos to Photons

Heya there, Mike here! I hope you remember me from the previous story of "Photons to Photos". If not, please read that story here . Let's continue from where we left off! So, it turns out that the owners of the DSLR wanted to revisit us. For that, they had removed our permanent place of residence, the SD card, from the DSLR and put it in their laptop. So, now it was another entity who claimed it was from another DMA controller who would take us to another RAM, this time belonging to the laptop. And so we are off again, this time to meet the owners of the DSLR. In the RAM, we saw a lot of pictures already present alongside us. And more of them were coming in. We were all being loaded into some application that would enable the DSLR Owners to see us and even change us! The first contact As soon as the application had loaded us, it sent us to a chamber called "JPEG Decompression" (remember JPEG Compression chamber from the previous story?). We were asked to hand o...