Link Layer (Part 1)
Let’s start off with the first post in a series where we’ll build a TCP/IP stack.
Roadmap
We’re going to tackle each layer of the TCP/IP stack, starting with the link layer:
- Link Layer (Data Link) — Capturing and parsing raw Ethernet frames.
- Network Layer — Dissecting IPv4 headers and routing logic.
- Transport Layer — Implementing TCP/UDP basics.
- Application Layer — Simulating simple protocols (DNS/HTTP).
BPF
Usually for intercepting low-level network traffic we would resort to TUN/TAP devices. These are virtual network interfaces provided by the OS, that act like real NICs (Network interface controller, basically the hardware that connects a device to a computer network), but the traffic goes to/from user space instead of the physical network. A TUN device provides an L3 network layer access - which operates with IP packets, and a TAP device is one level below (L2 Link Layer), handling the Ethernet frames.
We are building everything from the ground up, so we need a way to access the Link layer, but unfortunately there are no TAP devices on macOS, which leaves us with one option - BPF.
While BPF is traditionally used for packet sniffing and monitoring, it’s good enough for building a TCP/IP stack on macOS because it gives you raw access to Ethernet frames.
This should probably give an idea of what’s going on:
┌─────────────────────────────┐
User Space
└─────────────────────────────┘
▲ ▲
│ │
┌───────────────┘ └───────────────┐
│ │
┌──────────────┐ ┌────────────────┐
│ BPF │ │ TAP device │
│ /dev/bpfX │ │ /dev/net/tun │
└──────────────┘ │ (mode TAP) │
▲ └────────────────┘
│ ▲
│ Copies traffic to/from │ Acts like a real NIC,
│ a real interface (e.g. en0) │ userland injects/receives frames
▼ ▼
┌────────────────┐ ┌──────────────────┐
│ Real NIC (en0) │ │ Kernel networking│
│ │ │ stack |
└────────────────┘ └──────────────────┘
BPF’s interface is a file descriptor on /dev/bpfX/
(by default macOS provides up to 256 BPF handlers)
BPF file descriptor:
- Is exclusive-use (can’t be shared)
- Must be bound to a network interface
- Is configured via
ioctl()
calls
Understanding ioctl()
Modern operating systems are divided into two layers: userspace (which we have mentioned earlier), where applications run, and the kernel.
Because kernel code deals with sensitive system resources, userspace programs aren’t allowed to directly interact with hardware. Instead, they make system calls. One of which is ioctl()
– a system call that lets user applications send custom commands to device drivers.
Each ioctl()
call includes:
- A file descriptor (e.g. a handle to
/dev/bpf0
) - A command code
- An optional pointer to a data structure for passing arguments
Ok, now that we got this out of the way, we can start building the thing.
Let’s define some of the constants that we are going to use:
const
BIOCSETIF = 0x8020426c
BIOCIMMEDIATE = 0x80044270
IFNAMSIZ = 16
BIOCSETIF (0x8020426c)
— tells the kernel which interface (like “en0”) to bind the BPF device to.BIOCIMMEDIATE (0x80044270)
— disables buffering, so packets are delivered to our program immediately after capture.IFNAMSIZ (16)
— defines the maximum length of a network interface name (e.g., “en0”, “lo0”), usually 16 bytes.
These constants are pretty much derived from the system header files:
#define BIOCSETIF _IOW('B',102,struct ifreq)
We also define a Nim table with a mapping – hex value of the Ethernet packet type to its name:
const ETHERTYPES = {
0x0800: "IPv4",
0x0806: "ARP",
0x0842: "Wake-on-LAN",
0x22EA: "Stream Reservation Protocol",
0x22F0: "Audio Video Transport Protocol (AVTP)",
0x22F3: "IETF TRILL Protocol",
...
}.toTable
Nim’s great C interop allows us to seamlessly integrate ioctl()
calls into our code by simply defining the function using Nim’s FFI syntax with the appropriate C header:
proc ioctl(fd: cint, request: culong, argp: pointer): cint {.importc, header: "<sys/ioctl.h>".}
To bind BPF to a network interface, we use a minimal version of the struct ifreq
, which represents an Interface Request structure in C:
type IfReq {.importc: "struct ifreq", header: "<net/if.h>", bycopy.} = object
ifr_name: array[IFNAMSIZ, char]
We now define our structure for working with the link layer - BPFLinkLayer
. It holds the file descriptor for the opened BPF device (bpf_fd
) and the name of the network interface we want to bind to (iface
):
type
BPFLinkLayer* = object
bpf_fd: int
iface: string
Although Nim is not an object-oriented language in a classic sense, it supports defining structured types using object, similar to structs in C and also supports limited inheritance. Additionally Nim’s uniform function call syntax allows you to call regular functions using dot notation, making them look like method calls:
type
MyType = object
field: int
proc doSomething(x: MyType) = ...
doSomething(myObj)
myObj.doSomething()
Let’s now bring everything together and write a method to bind to our BPF handler:
proc open_bpf(self: var BPFLinkLayer) =
# Try /dev/bpf0 → /dev/bpf255
var lastError = ""
for i in 0..255:
try:
self.bpf_fd = open(fmt"/dev/bpf{i}", O_RDWR)
if self.bpf_fd != -1:
echo fmt"Successfully opened /dev/bpf{i} with fd {self.bpf_fd}"
break
except OSError as e:
lastError = fmt"Error opening /dev/bpf{i}: {e.msg}"
continue
if self.bpf_fd == -1:
raise newException(Exception, fmt"No available /dev/bpfX devices. Error: {lastError}")
echo "Opened bpf device"
# Bind to interface
var ifreq: IfReq
copyMem(addr ifreq.ifr_name[0], self.iface.cstring, IFNAMSIZ)
discard ioctl(cint(self.bpf_fd), culong(BIOCSETIF), addr ifreq)
discard ioctl(cint(self.bpf_fd), culong(BIOCIMMEDIATE), cint(1))
echo "Successfully bound to interface"
We iterate over the available BPF handlers, try to connect to one and use ioctl()
to bind to our available BPF file descriptor.
Parsing the Ethernet frame
An Ethernet frame is the basic unit of data transmitted over an Ethernet. It wraps higher-layer protocols like IP or ARP and includes essential information like source and destination MAC addresses, making sure data gets to the right device on the same network
To decode raw packet data, we need to understand its layout — and then map that layout into Nim types.
When we read from a BPF device, we don’t get the Ethernet frame directly. The data begins with a BPF-specific header (bpf_hdr
) that includes:
- A timestamp (when the packet was received)
- How many bytes were captured
- Where in the buffer the actual packet data starts
BPF header:
type Timeval* {.packed.} = object # Timeval structure represents a timestamp with microsecond precision
tv_sec: int32 # Seconds since Unix epoch
tv_usec: int32 # # Microseconds past the second
type BPFHeader* {.packed.} = object
timeval: Timeval
bh_caplen: uint32 # Number of bytes actually captured
bh_datalen: uint32 # Actual packet length (might be bigger than captured)
bh_hdrlen: uint16 # Offset to skip to get to packet data
In Nim, the {.packed.}
pragma tells the compiler not to insert any padding between fields in an object. This is important when you’re mapping your types directly onto raw binary data — like network packets or syscall buffers. At this level every byte and offset matters.
We could have imported the original C structures using Nim’s {.importc.}
pragma as we did earlier, but we defined them manually instead — to clearly illustrate how the {.packed.}
pragma ensures the memory layout matches the raw packet format exactly.
We now define the EthernetHeader
header:
type EthernetHeader {.packed.} = object
dst: array[6, uint8]
src: array[6, uint8]
ethType: uint16
This type mirrors the standard Ethernet frame format:
Field | Offset | Size (bytes) | Description |
---|---|---|---|
dst | 0 | 6 | Destination MAC address |
src | 6 | 6 | Source MAC address |
ethType | 12 | 2 | EtherType (e.g. 0x0800 = IPv4) |
Here is the code:
proc formatMac(mac: openArray[uint8]): string =
mac.mapIt(fmt"{it:02x}").join(":")
type
FrameData = object
dest_mac: string
src_mac: string
eth_type: string
eth_type_name: string
payload: seq[uint8]
proc read_frame(self: var BPFLinkLayer): FrameData =
var buffer: array[4096, uint8]
let bytesRead = read(cint(self.bpf_fd), addr buffer[0], buffer.len)
if bytesRead <= 0:
stdout.styledWriteLine(fgRed, "Failed to read\n")
raise newException(Exception, "Failed to read")
let bh = cast[ptr BPFHeader](unsafeAddr buffer[0])[]
let pktStart = bh.bh_hdrlen.int
let pktEnd = pktStart + bh.bh_caplen.int
if pktEnd > buffer.len:
stdout.styledWriteLine(fgRed, "Packet data exceeds buffer length\n")
raise newException(Exception, "Packet data exceeds buffer length")
let ethFrame = buffer[pktStart ..< pktEnd]
if ethFrame.len < 14:
stdout.styledWriteLine(fgRed, "Incomplete Ethernet frame\n")
raise newException(Exception, "Incomplete Ethernet frame")
let dest_mac = ethFrame[0..5]
let src_mac = ethFrame[6..11]
var eth_type: uint16
bigEndian16(addr eth_type, unsafeAddr ethFrame[12])
let payload = ethFrame[14..<ethFrame.len]
let dest_mac_str = formatMac(dest_mac)
let src_mac_str = formatMac(src_mac)
let eth_type_str = "0x" & eth_type.toHex(4)
result = FrameData(
dest_mac: dest_mac_str,
src_mac: src_mac_str,
eth_type: eth_type_str,
eth_type_name: ETHERTYPES.getOrDefault(int(eth_type), "Unknown"),
payload: @payload
)
Breakdown:
- We allocate a 4KB buffer and read from the BPF file descriptor. This gives us raw packet bytes.
- If the BPF handler didn’t return any bytes (or returned an error), we raise an exception.
- We then parse the BPF header by directly casting it to our defined
BPFHeader
object. - After that we compute where the actual Ethernet frame starts and ends. Important part here is to validate the boundaries.
- Parse the Ethernet frame fields. The EtherType field is stored in big-endian order (network byte order), so we deal with it accordingly.
- We parse the MAC addresses using a small
formatMac()
helper - which takes a sequence of 6 bytes (raw MAC address) and turns it into a human-readable string. - We return an object of the Ethernet frame as a
FrameData
, including the frame’s payload defined as a raw byte sequence.
It’s time to finally run our code and see the data:
if isMainModule:
var bpf = BPFLinkLayer(iface: "en0")
bpf.open_bpf()
while true:
let frame = bpf.read_frame()
echo fmt"Source MAC: {frame.src_mac}"
echo fmt"Destination MAC: {frame.dest_mac}"
echo fmt"EtherType: {frame.eth_type} ({frame.eth_type_name})"
echo fmt"Payload length: {frame.payload.len} bytes"
echo "Payload: ", frame.payload.mapIt(fmt"{it:02x}").join("")
echo "---"
Which gives an output that should look like this:
Source MAC: d4:7b:2f:8c:aa:44
Destination MAC: 9a:bc:3e:17:6d:88
EtherType: 0x0800 (IPv4)
Payload length: 1496 bytes
Payload: 450005d8aabbccdd00117e9501fc7d1ae0c9a806321bb0023dd2e0011a45f59008602ec09a2c78f508801000402001c00000101080a12ab45d2347caf3d160303006f0200006b0303aa8f36a3deafcb1a09ffc1f246543a489294eff28c32aa4b14583d8c9ff210fa875638f7029e64915499bc75cc3e860b5da193bc561ebae3d011000020002f001d001e0020cde9c3fa82d91ae74156a3b5ce0d8492e96fa34b3f73834f1edc1c429a1030002d000203041403030001011703030caa3f5a7e42ce91347a3723e81255af20be718374c46d038e5adf4fcb2df7a3de99f2b8dd7ad8ca3a4903382a299f1a2bdf55de647325e39e84c3a5195a934a7e64cd19b3284742503f229e1d624c1a0a379802e7c2a601c3887d4fa7d4b0872fdc6ba7705bfa16e4ec327057d1a9bcdd1484fcab3438f297d7a57cfaf95d89be...
Wrap-up
All of the series code is available on Github - https://github.com/tigranl/network_programming.
I’ll update the table of contents as we progress towards our goal of building the TCP/IP stack!