Writing a TCP/IP Stack from Scratch in Nim: Link Layer (Part 1)

Link Layer (Part 1)

Let’s start off with the first post in a series where we’ll build a TCP/IP stack.

Roadmap

We’re going to tackle each layer of the TCP/IP stack, starting with the link layer:

Link Layer (Data Link) — Capturing and parsing raw Ethernet frames.
Network Layer — Dissecting IPv4 headers and routing logic.
Transport Layer — Implementing TCP/UDP basics.
Application Layer — Simulating simple protocols (DNS/HTTP).

BPF

Usually for intercepting low-level network traffic we would resort to TUN/TAP devices. These are virtual network interfaces provided by the OS, that act like real NICs (Network interface controller, basically the hardware that connects a device to a computer network), but the traffic goes to/from user space instead of the physical network. A TUN device provides an L3 network layer access - which operates with IP packets, and a TAP device is one level below (L2 Link Layer), handling the Ethernet frames.

We are building everything from the ground up, so we need a way to access the Link layer, but unfortunately there are no TAP devices on macOS, which leaves us with one option - BPF.

While BPF is traditionally used for packet sniffing and monitoring, it’s good enough for building a TCP/IP stack on macOS because it gives you raw access to Ethernet frames.

This should probably give an idea of what’s going on:


                            ┌─────────────────────────────┐
                                      User Space           
                            └─────────────────────────────┘
                                      ▲        ▲
                                      │        │
                      ┌───────────────┘        └───────────────┐
                      │                                        │
                ┌──────────────┐                       ┌────────────────┐
                │     BPF      │                       │   TAP device   │
                │   /dev/bpfX  │                       │  /dev/net/tun  │
                └──────────────┘                       │   (mode TAP)   │
                      ▲                                └────────────────┘
                      │                                        ▲
                      │ Copies traffic to/from                 │ Acts like a real NIC,
                      │ a real interface (e.g. en0)            │ userland injects/receives frames
                      ▼                                        ▼
                ┌────────────────┐                       ┌──────────────────┐
                │ Real NIC (en0) │                       │ Kernel networking│
                │                │                       │ stack            |
                └────────────────┘                       └──────────────────┘

BPF’s interface is a file descriptor on /dev/bpfX/ (by default macOS provides up to 256 BPF handlers)

BPF file descriptor:

Is exclusive-use (can’t be shared)
Must be bound to a network interface
Is configured via ioctl() calls

Understanding `ioctl()`

Modern operating systems are divided into two layers: userspace (which we have mentioned earlier), where applications run, and the kernel.

Because kernel code deals with sensitive system resources, userspace programs aren’t allowed to directly interact with hardware. Instead, they make system calls. One of which is ioctl() – a system call that lets user applications send custom commands to device drivers.

Each ioctl() call includes:

A file descriptor (e.g. a handle to /dev/bpf0)
A command code
An optional pointer to a data structure for passing arguments

Ok, now that we got this out of the way, we can start building the thing.

Let’s define some of the constants that we are going to use:

const
  BIOCSETIF = 0x8020426c
  BIOCIMMEDIATE = 0x80044270
  IFNAMSIZ = 16

BIOCSETIF (0x8020426c) — tells the kernel which interface (like “en0”) to bind the BPF device to.
BIOCIMMEDIATE (0x80044270) — disables buffering, so packets are delivered to our program immediately after capture.
IFNAMSIZ (16) — defines the maximum length of a network interface name (e.g., “en0”, “lo0”), usually 16 bytes.

These constants are pretty much derived from the system header files:

#define BIOCSETIF       _IOW('B',102,struct ifreq)

We also define a Nim table with a mapping – hex value of the Ethernet packet type to its name:

const ETHERTYPES = {
    0x0800: "IPv4",
    0x0806: "ARP",
    0x0842: "Wake-on-LAN",
    0x22EA: "Stream Reservation Protocol",
    0x22F0: "Audio Video Transport Protocol (AVTP)",
    0x22F3: "IETF TRILL Protocol",
    ...
}.toTable

Nim’s great C interop allows us to seamlessly integrate ioctl() calls into our code by simply defining the function using Nim’s FFI syntax with the appropriate C header:

proc ioctl(fd: cint, request: culong, argp: pointer): cint {.importc, header: "<sys/ioctl.h>".}

To bind BPF to a network interface, we use a minimal version of the struct ifreq, which represents an Interface Request structure in C:

type IfReq {.importc: "struct ifreq", header: "<net/if.h>", bycopy.} = object
  ifr_name: array[IFNAMSIZ, char]

We now define our structure for working with the link layer - BPFLinkLayer. It holds the file descriptor for the opened BPF device (bpf_fd) and the name of the network interface we want to bind to (iface):

type 
  BPFLinkLayer* = object
    bpf_fd: int
    iface: string

Although Nim is not an object-oriented language in a classic sense, it supports defining structured types using object, similar to structs in C and also supports limited inheritance. Additionally Nim’s uniform function call syntax allows you to call regular functions using dot notation, making them look like method calls:

type 
  MyType = object
    field: int

proc doSomething(x: MyType) = ...

doSomething(myObj)
myObj.doSomething()

Let’s now bring everything together and write a method to bind to our BPF handler:

proc open_bpf(self: var BPFLinkLayer) =
  # Try /dev/bpf0 → /dev/bpf255
  var lastError = ""
  for i in 0..255:
    try:
      self.bpf_fd = open(fmt"/dev/bpf{i}", O_RDWR)
      if self.bpf_fd != -1:
        echo fmt"Successfully opened /dev/bpf{i} with fd {self.bpf_fd}"
        break
    except OSError as e:
      lastError = fmt"Error opening /dev/bpf{i}: {e.msg}"
      continue
  
  if self.bpf_fd == -1:
    raise newException(Exception, fmt"No available /dev/bpfX devices. Error: {lastError}")
  
  echo "Opened bpf device"
  # Bind to interface
  var ifreq: IfReq
  copyMem(addr ifreq.ifr_name[0], self.iface.cstring, IFNAMSIZ)

  discard ioctl(cint(self.bpf_fd), culong(BIOCSETIF), addr ifreq)
  discard ioctl(cint(self.bpf_fd), culong(BIOCIMMEDIATE), cint(1))

  echo "Successfully bound to interface"

We iterate over the available BPF handlers, try to connect to one and use ioctl() to bind to our available BPF file descriptor.

Parsing the Ethernet frame

An Ethernet frame is the basic unit of data transmitted over an Ethernet. It wraps higher-layer protocols like IP or ARP and includes essential information like source and destination MAC addresses, making sure data gets to the right device on the same network

To decode raw packet data, we need to understand its layout — and then map that layout into Nim types.

When we read from a BPF device, we don’t get the Ethernet frame directly. The data begins with a BPF-specific header (bpf_hdr) that includes:

A timestamp (when the packet was received)
How many bytes were captured
Where in the buffer the actual packet data starts

BPF header:

type Timeval* {.packed.} = object  # Timeval structure represents a timestamp with microsecond precision
  tv_sec: int32  # Seconds since Unix epoch
  tv_usec: int32  # # Microseconds past the second

type BPFHeader* {.packed.} = object
  timeval: Timeval
  bh_caplen: uint32     # Number of bytes actually captured
  bh_datalen: uint32    # Actual packet length (might be bigger than captured)
  bh_hdrlen: uint16     # Offset to skip to get to packet data

In Nim, the {.packed.} pragma tells the compiler not to insert any padding between fields in an object. This is important when you’re mapping your types directly onto raw binary data — like network packets or syscall buffers. At this level every byte and offset matters.

We could have imported the original C structures using Nim’s {.importc.} pragma as we did earlier, but we defined them manually instead — to clearly illustrate how the {.packed.} pragma ensures the memory layout matches the raw packet format exactly.

We now define the EthernetHeader header:

type EthernetHeader {.packed.} = object
  dst: array[6, uint8]
  src: array[6, uint8]
  ethType: uint16

This type mirrors the standard Ethernet frame format:

Field	Offset	Size (bytes)	Description
`dst`	0	6	Destination MAC address
`src`	6	6	Source MAC address
`ethType`	12	2	EtherType (e.g. 0x0800 = IPv4)

Here is the code:

proc formatMac(mac: openArray[uint8]): string =
  mac.mapIt(fmt"{it:02x}").join(":")
type
  FrameData = object
    dest_mac: string
    src_mac: string
    eth_type: string
    eth_type_name: string
    payload: seq[uint8]

proc read_frame(self: var BPFLinkLayer): FrameData =
  var buffer: array[4096, uint8]
  let bytesRead = read(cint(self.bpf_fd), addr buffer[0], buffer.len)

  if bytesRead <= 0:
    stdout.styledWriteLine(fgRed, "Failed to read\n")
    raise newException(Exception, "Failed to read")

  let bh = cast[ptr BPFHeader](unsafeAddr buffer[0])[]

  let pktStart = bh.bh_hdrlen.int
  let pktEnd = pktStart + bh.bh_caplen.int

  if pktEnd > buffer.len:
    stdout.styledWriteLine(fgRed, "Packet data exceeds buffer length\n")
    raise newException(Exception, "Packet data exceeds buffer length")

  let ethFrame = buffer[pktStart ..< pktEnd]
  if ethFrame.len < 14:
    stdout.styledWriteLine(fgRed, "Incomplete Ethernet frame\n")
    raise newException(Exception, "Incomplete Ethernet frame")

  let dest_mac = ethFrame[0..5]
  let src_mac = ethFrame[6..11]
  var eth_type: uint16
  bigEndian16(addr eth_type, unsafeAddr ethFrame[12])

  let payload = ethFrame[14..<ethFrame.len]

  let dest_mac_str = formatMac(dest_mac)
  let src_mac_str = formatMac(src_mac)
  let eth_type_str = "0x" & eth_type.toHex(4)

  result = FrameData(
    dest_mac: dest_mac_str,
    src_mac: src_mac_str,
    eth_type: eth_type_str,
    eth_type_name: ETHERTYPES.getOrDefault(int(eth_type), "Unknown"),
    payload: @payload
  )

Breakdown:

We allocate a 4KB buffer and read from the BPF file descriptor. This gives us raw packet bytes.
If the BPF handler didn’t return any bytes (or returned an error), we raise an exception.
We then parse the BPF header by directly casting it to our defined BPFHeader object.
After that we compute where the actual Ethernet frame starts and ends. Important part here is to validate the boundaries.
Parse the Ethernet frame fields. The EtherType field is stored in big-endian order (network byte order), so we deal with it accordingly.
We parse the MAC addresses using a small formatMac() helper - which takes a sequence of 6 bytes (raw MAC address) and turns it into a human-readable string.
We return an object of the Ethernet frame as a FrameData, including the frame’s payload defined as a raw byte sequence.

It’s time to finally run our code and see the data:

if isMainModule:
  var bpf = BPFLinkLayer(iface: "en0")
  bpf.open_bpf()
  while true:
    let frame = bpf.read_frame()
    echo fmt"Source MAC: {frame.src_mac}"
    echo fmt"Destination MAC: {frame.dest_mac}"
    echo fmt"EtherType: {frame.eth_type} ({frame.eth_type_name})"
    echo fmt"Payload length: {frame.payload.len} bytes"
    echo "Payload: ", frame.payload.mapIt(fmt"{it:02x}").join("")
    echo "---"

Which gives an output that should look like this:

Source MAC: d4:7b:2f:8c:aa:44
Destination MAC: 9a:bc:3e:17:6d:88
EtherType: 0x0800 (IPv4)
Payload length: 1496 bytes
Payload: 450005d8aabbccdd00117e9501fc7d1ae0c9a806321bb0023dd2e0011a45f59008602ec09a2c78f508801000402001c00000101080a12ab45d2347caf3d160303006f0200006b0303aa8f36a3deafcb1a09ffc1f246543a489294eff28c32aa4b14583d8c9ff210fa875638f7029e64915499bc75cc3e860b5da193bc561ebae3d011000020002f001d001e0020cde9c3fa82d91ae74156a3b5ce0d8492e96fa34b3f73834f1edc1c429a1030002d000203041403030001011703030caa3f5a7e42ce91347a3723e81255af20be718374c46d038e5adf4fcb2df7a3de99f2b8dd7ad8ca3a4903382a299f1a2bdf55de647325e39e84c3a5195a934a7e64cd19b3284742503f229e1d624c1a0a379802e7c2a601c3887d4fa7d4b0872fdc6ba7705bfa16e4ec327057d1a9bcdd1484fcab3438f297d7a57cfaf95d89be...

Wrap-up

All of the series code is available on Github - https://github.com/tigranl/network_programming.
I’ll update the table of contents as we progress towards our goal of building the TCP/IP stack!

#Networking #Tcp/Ip #Nim #Bpf