ARP
Welcome to the second post in the series. Last time we built the link layer – capturing raw Ethernet frames over BPF on macOS. This time we’re going one notch up the stack: ARP, the Address Resolution Protocol. It’s how a host figures out which MAC address corresponds to a given IPv4 address on the local network, so Ethernet frames know where to actually go.
But before we get to ARP itself, we need to revisit the link layer. The version we ended Part 1 with was fine for reading and printing frames, but it has a handful of problems that show up as soon as we try to do anything more serious – like sending frames back, processing more than one packet per syscall, or letting upper layers (ARP, IP, TCP) compare ethertypes without parsing strings.
So this post has two halves:
- Fix up the link layer.
- Then build ARP on top of it.
Fixing the link layer
Let’s go through the issues one at a time.
Well, first of all, we need to address the existing issue with read_frame() implementation. Right now, it’s just a procedure that reads exactly one packet and returns the frame.
┌────────┬─────────────────────┬───┬────────┬─────────────────────┬───┬────────┬─────────────────────┬───┐
│bpf_hdr0│ Packet │ ◊ │bpf_hdr1│ Packet │ ◊ │bpf_hdr2│ Packet │ ◊ │
└────────┴─────────────────────┴───┴────────┴─────────────────────┴───┴────────┴─────────────────────┴───┘
◄─── parsed ──────────────────► ◄────────── thrown away ──────────────────────────────────────────────►
But of course, that was a simplification on our end, in reality the correct way to handle it is to treat the buffer as a list: walk a cursor, parse a packet, advance to others. That’s precisely because BPF’s read API fundamentally hands back N packets per call.
-proc read_frame*(self: var BPFLinkLayer): FrameData =
- var buffer: array[4096, uint8]
- let bytesRead = read(cint(self.bpf_fd), addr buffer[0], buffer.len)
- ...
- let bh = cast[ptr BPFHeader](addr buffer[0])[]
- let pktStart = bh.bh_hdrlen.int
- ...
- result = FrameData(...)
+iterator read_frames*(self: var BPFLinkLayer): FrameData =
+ var buf: array[BPF_BUF_SIZE, uint8]
+ while true:
+ let bytesRead = read(cint(self.bpf_fd), addr buf[0], buf.len)
+ ...
+ var pos = 0
+ while pos < bytesRead:
+ let bh = cast[ptr BPFHeader](addr buf[pos])[]
+ ...
+ yield FrameData(...)
+ let totalLen = bh.bh_hdrlen.int + bh.bh_caplen.int
+ pos += (totalLen + BPF_ALIGNMENT - 1) and not (BPF_ALIGNMENT - 1)
BPF_ALIGNMENT – is the 4-byte (on MacOS/BSD) boundary the kernel pads each packet to in the buffer, so the next bpf_hdr lands on a word-aligned offset that userspace can cast and read directly, allowing for a safe access to fields.
The math behind that step – (totalLen + BPF_ALIGNMENT - 1) and not (BPF_ALIGNMENT - 1) – is the standard “round up to the next multiple of N” trick. We add align - 1 to push past the boundary if we’re below it, then mask off the low bits to land back on it.
So the re-written version is a generator that yields each frame as we go, which can later be used like this:
for frame in bpf.read_frames():
...
Another thing that we need to change is the way we pass around Ethernet frames:
type
FrameData* = object
- dest_mac*: string
- src_mac*: string
- eth_type*: string
- eth_type_name*: string
+ dest_mac*: MacAddr
+ src_mac*: MacAddr
+ eth_type*: uint16
payload*: seq[uint8]
Sure, dealing with MAC addresses for debugging and printing was easier when we just used strings, but it pushes a parsing problem onto every upper layer. Better to hold the raw bytes and format only when something is being printed.
Let’s define MacAddr type and for eth_type we would just use 16-bit number, since it’s a 2 byte field:
type
MacAddr* = array[6, uint8] # six raw octets
Also, let’s fix our BIOCIMMEDIATE flag setup and correctly handle any errors:
-discard ioctl(cint(self.bpf_fd), culong(BIOCIMMEDIATE), cint(1))
+var enable: cuint = 1
+if ioctl(cint(self.bpf_fd), culong(BIOCIMMEDIATE), addr enable) != 0:
+ raise newException(OSError, "BIOCIMMEDIATE failed")
What we’ve missed here is the fact that the third argument to ioctl for BIOCIMMEDIATE is a pointer to a u_int, not the integer itself. The old call cast 1 to a pointer and copying failed silently in the kernel, we don’t really want that. In general, we should check the return values of all our ioctl() calls:
-discard ioctl(cint(self.bpf_fd), culong(BIOCSETIF), addr ifreq)
+if ioctl(cint(self.bpf_fd), culong(BIOCSETIF), addr ifreq) != 0:
+ raise newException(OSError, fmt"BIOCSETIF failed for interface {self.iface}")
Before we start implementing ARP, let’s quickly add a few necessary blocks to our Link Layer that would later on be used not only for ARP, but for future layers as well.
First things first, we need to somehow send frames back using BPF, for that let’s define send_frame() procedure:
proc send_frame*(self: var BPFLinkLayer, frame: openArray[uint8]) =
let n = write(cint(self.bpf_fd), addr frame[0], frame.len)
if n < 0:
let err = errno
raise newException(IOError, fmt"BPF write failed: errno={err} ({$strerror(err)})")
if n != frame.len:
raise newException(IOError, fmt"BPF short write: {n}/{frame.len}")
It’s just a regular write() call to the same BPF file descriptor, we specify the address of our first frame’s element and the length.
But send_frame() only writes raw bytes – we still need a way to produce those bytes. Hand-assembling an Ethernet frame at every call site would get old fast, so let’s add a small helper:
proc build_ethernet_frame*(dest_mac: MacAddr, src_mac: MacAddr, eth_type: uint16, payload: openArray[uint8]): seq[uint8] =
var frame: seq[uint8] = @[]
frame.setLen(14 + payload.len)
copyMem(addr frame[0], addr dest_mac[0], 6)
copyMem(addr frame[6], addr src_mac[0], 6)
bigEndian16(addr frame[12], addr eth_type)
if payload.len > 0:
copyMem(addr frame[14], addr payload[0], payload.len)
frame
It’s the mirror image of the parser from Part 1: 6 bytes for the destination MAC, 6 bytes for the source MAC, 2 bytes for the Ether type, then the payload – total length 14 + payload.len.
Ok, now that it’s out of the way, let’s add one more missing detail. For ARP implementation we need to know our MAC address, let’s go over the function that gets us there.
On MacOS/BSD we will need a getifaddrs() function. Basically, it asks the kernel for a list of every address on every network interface on the machine and hands it back as a linked list. We need to traverse that linked list and find MAC address for the interface we bind BPF to.
Let’s prepare some types that we’ll need:
type
Sockaddr {.importc: "struct sockaddr", header: "<sys/socket.h>", bycopy.} = object
sa_len: uint8
sa_family: uint8
IfAddrs {.importc: "struct ifaddrs", header: "<ifaddrs.h>", bycopy.} = object
ifa_next: ptr IfAddrs
ifa_name: cstring
ifa_flags: cuint
ifa_addr: ptr Sockaddr
SockaddrDl {.importc: "struct sockaddr_dl", header: "<net/if_dl.h>", bycopy.} = object
sdl_len: uint8
sdl_family: uint8
sdl_index: uint16
sdl_type: uint8
sdl_nlen: uint8
sdl_alen: uint8
sdl_slen: uint8
sdl_data: array[12, char]
const
AF_LINK = 18'u8
IFT_ETHER = 0x06'u8
These mirror C structs the kernel uses, each one needs to match C layout exactly, which is why we use these pragmas.
Let’s quickly recap pragmas we’ve seen earlier, when dealing with Link Layer:
{.importc: "struct sockaddr", header: "<sys/socket.h>"}– this tells compiler “when generating C code, use the existing structsockaddrtype from the system headers, don’t generate a new one”, header just points to it.bycopy– it instructs the compiler to pass a type by value. In Nim specifically, the compiler might decide to pass a parameter by reference, if it can speed up the execution.
IfAddrs is the linked list node that holds address data. ifa_next is a pointer to the next node in the list. ifa_name - a cstring that holds the interface name, like en0. ifa_addr points to a generic Sockaddr address type that specifies length of the address and the address family (AF_INET for IPv4, AF_LINK for the Link Layer etc).
The generic Sockaddr lets us identify the address type, but the bytes behind the pointer are actually SockaddrDl type, when sa_family is AF_LINK – so we cast to that type to access the MAC address.
Define the clib functions:
proc getifaddrs(ifap: ptr ptr IfAddrs): cint {.importc, header: "<net/if.h>".}
proc freeifaddrs(ifap: ptr IfAddrs) {.importc, header: "<ifaddrs.h>".}
And we’re ready to write our function to get the MAC address:
proc get_my_mac_addr(self: var BPFLinkLayer): MacAddr =
var ifap: ptr IfAddrs
if getifaddrs(addr ifap) != 0:
raise newException(Exception, "Failed to get interface addresses")
defer: freeifaddrs(ifap)
var curr: ptr IfAddrs = ifap
while curr != nil:
if curr.ifa_addr != nil and curr.ifa_addr.sa_family == AF_LINK and curr.ifa_name == self.iface.cstring:
let sdl = cast[ptr SockaddrDl](curr.ifa_addr)
if sdl.sdl_type == IFT_ETHER and sdl.sdl_alen == 6:
let macPtr = cast[ptr UncheckedArray[uint8]](
cast[uint](addr sdl.sdl_data) + sdl.sdl_nlen.uint
)
var mac: array[6, uint8]
for i in 0..5:
mac[i] = macPtr[i]
return mac
curr = curr.ifa_next
raise newException(Exception, "No MAC address found for interface")
Let’s break it down step by step.
We start by declaring an uninitialized pointer ifap and pass its address
into getifaddrs. The function takes a ptr ptr IfAddrs (a pointer to our
pointer) so it can write the address of the freshly allocated list back
into our variable – this is the standard C idiom for “give me back a new
pointer.” On failure it returns non-zero, so we bail out with an exception.
Right after that, we set up cleanup with defer. The kernel allocated a
linked list for us, and we’re responsible for freeing it. defer runs
freeifaddrs(ifap) at the end of the scope no matter how we exit, basically a try/finally.
Now we walk the linked list. Standard traversal: start at the head, follow
ifa_next until we hit nil:
var curr: ptr IfAddrs = ifap
while curr != nil:
...
curr = curr.ifa_next
One thing worth noting – getifaddrs returns one node per (interface,
address) pair, not one per interface. So en0 will appear multiple times
in the list. That’s why we filter by both name and address family:
if curr.ifa_addr != nil and
curr.ifa_addr.sa_family == AF_LINK and
curr.ifa_name == self.iface.cstring:
Once we’ve found the right node, we know the bytes behind ifa_addr are
actually a sockaddr_dl, so we cast:
let sdl = cast[ptr SockaddrDl](curr.ifa_addr)
Before extracting the MAC, 2 additional checks:
if sdl.sdl_type == IFT_ETHER and sdl.sdl_alen == 6:
AF_LINK covers more than Ethernet – loopback, WI-FI, etc, so we need the Ethernet type IFT_ETHER.
Now the trickiest part. sockaddr_dl packs the interface name and the MAC into a single sdl_data buffer, with the name first:
◄── sdl_nlen ──►◄── sdl_alen ──►◄── sdl_slen ──►
┌────────────────┬───────────────┬───────────────┐
│ interface name │ link address │ selector │
│ ("en0") │ (the MAC) │ (often empty)│
└────────────────┴───────────────┴───────────────┘
So to find where the MAC starts, we have to skip past the name, which means jumping sdl_nlen bytes past the start of sdl_data:
let macPtr = cast[ptr UncheckedArray[uint8]](
cast[uint](addr sdl.sdl_data) + sdl.sdl_nlen.uint
)
How did I know this? Claude told me. I mean, the header says char sdl_data[12] with a comment “contains both
if name and ll address” and that’s it. A few years BC (Before Claude) you’d find this by reading Stevens’ UNIX Network Programming or grepping through ifconfig.c. I did neither.
Finally, we copy the six bytes into a fixed-size array we own and return:
var mac: array[6, uint8]
for i in 0..5:
mac[i] = macPtr[i]
return mac
ARP implementation
Let’s move on to ARP implementation.
So what does an ARP exchange actually look like on the wire?
Picture two machines on the same LAN. Host A (192.168.0.1) wants to send an IP packet to Host B (192.168.0.2). It knows the IP, but the Ethernet layer needs a destination MAC. Host A doesn’t know it yet.
Host A broadcasts an ARP request:
“Who has 192.168.0.2? Tell 192.168.0.1, my MAC is aa:bb:cc:dd:ee:ff.”
Every host on the segment sees it. Host B sees its own IP in the question and replies – this time as a unicast straight back to A:
“192.168.0.2 is at MAC 66:65:74:68:00:01.”
Host A caches the answer and moves on. Subsequent IP packets to .2 use that MAC directly.
That’s the whole protocol. Two opcodes (request and reply), one wire format for both, and an in-memory cache so we don’t broadcast for the same answer over and over.
It’s time to dissect how an ARP packet (ethertype 0x0806) actually looks like:
┌───────┬────────┬──────┬─────────────────────────────────────────────────┐
│ Field │ Offset │ Size │ Meaning │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ htype │ 0 │ 2 B │ Hardware type. 1 = Ethernet. │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ ptype │ 2 │ 2 B │ Protocol type. 0x0800 = IPv4. │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ hlen │ 4 │ 1 B │ Hardware address length. 6 for MAC. │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ plen │ 5 │ 1 B │ Protocol address length. 4 for IPv4. │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ op │ 6 │ 2 B │ Operation. 1 = request, 2 = reply. │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ sha │ 8 │ 6 B │ Sender hardware address (MAC of who's asking). │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ spa │ 14 │ 4 B │ Sender protocol address (IPv4 of who's asking). │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ tha │ 18 │ 6 B │ Target hardware address (zero in a request). │
├───────┼────────┼──────┼─────────────────────────────────────────────────┤
│ tpa │ 24 │ 4 B │ Target protocol address (the IPv4 we're after). │
└───────┴────────┴──────┴─────────────────────────────────────────────────┘
htype, ptype, and op are big-endian on the wire – we’ll use bigEndian16 to read and write them, same as we did for the ethertype.
Mirror the layout in Nim and add a couple of ARP-specific constants:
const
ArpHwEthernet = 1'u16
ArpProtoIPv4 = 0x0800'u16
ArpOpRequest = 1'u16
ArpOpReply = 2'u16
EthertypeArp* = 0x0806'u16
type
ArpFrame* = object
htype*: uint16 # 1 = Ethernet
ptype*: uint16 # 0x0800 = IPv4
hlen*: uint8
plen*: uint8
op*: uint16 # 1 = request, 2 = reply
sha*: MacAddr # sender MAC
spa*: IPv4Addr # sender IPv4
tha*: MacAddr # target MAC
tpa*: IPv4Addr # target IPv4
The 16-bit fields are plain uint16, not 2-byte arrays. Same reasoning as eth_type: the wire is bytes, but once parsed we want a number we can compare against constants.
IPv4Addr is array[4, uint8] – same logic as MacAddr.
We also need an in-memory cache to track what we’ve learned and some context (BPF handler and our IP-address):
type
ArpCache* = Table[IPv4Addr, MacAddr]
ArpCtx* = object
bpf*: BPFLinkLayer
cache*: ArpCache
my_ip*: IPv4Addr
The cache is a plain Nim Table. my_ip is the address our stack claims, so we’ll reply to ARP requests for it, and ignore everything else.
When we get read_frames() generator’s FrameData, the first 28 bytes of that payload is our ArpFrame:
proc parse_arp_frame*(frame: link_layer.FrameData): Option[ArpFrame] =
if frame.payload.len < 28:
return none(ArpFrame)
var arp_frame: ArpFrame
let p = frame.payload
bigEndian16(addr arp_frame.htype, addr p[0])
bigEndian16(addr arp_frame.ptype, addr p[2])
arp_frame.hlen = p[4]
arp_frame.plen = p[5]
bigEndian16(addr arp_frame.op, addr p[6])
copyMem(addr arp_frame.sha[0], addr p[8], 6)
copyMem(addr arp_frame.spa[0], addr p[14], 4)
copyMem(addr arp_frame.tha[0], addr p[18], 6)
copyMem(addr arp_frame.tpa[0], addr p[24], 4)
if arp_frame.htype != ArpHwEthernet or
arp_frame.ptype != ArpProtoIPv4 or
arp_frame.hlen != 6'u8 or
arp_frame.plen != 4'u8:
return none(ArpFrame)
some(arp_frame)
We return Option[ArpFrame] rather than raising. ARP is best-effort, so any malformed packets are just ignored.
We also need to be able to build an ARP frame, so we can send a reply back:
proc build_arp_frame*(arp_frame: ArpFrame): seq[uint8] =
result.setLen(28)
bigEndian16(addr result[0], addr arp_frame.htype)
bigEndian16(addr result[2], addr arp_frame.ptype)
result[4] = arp_frame.hlen
result[5] = arp_frame.plen
bigEndian16(addr result[6], addr arp_frame.op)
copyMem(addr result[8], addr arp_frame.sha[0], 6)
copyMem(addr result[14], addr arp_frame.spa[0], 4)
copyMem(addr result[18], addr arp_frame.tha[0], 6)
copyMem(addr result[24], addr arp_frame.tpa[0], 4)
Again, just a mirror image of the parser.
Handling incoming ARP
Now the actual process: what do we do when an ARP packet arrives?
proc handle_arp_ingress*(ctx: var ArpCtx, frame: link_layer.FrameData) =
if frame.eth_type != EthertypeArp:
return
let arpOpt = parse_arp_frame(frame)
if arpOpt.isNone:
return
let arp = get(arpOpt)
if arp.sha == ctx.bpf.my_mac:
return # ignore our own outbound frames captured via BPF
ctx.cache[arp.spa] = arp.sha
case arp.op
of ArpOpRequest:
if arp.tpa == ctx.my_ip:
send_arp_reply(ctx, arp)
of ArpOpReply:
discard # cache already updated above
else:
discard # unknown opcode (RARP, InARP, etc.) – ignore
First thing we do – drop a frame if it’s not an ARP. We’re not going to process a malformed frame either – skip it.
This one is subtle – BPF captures both directions on the interface. So when we send_frame() an ARP reply, we’ll see our own packet come back through read_frames() a moment later:
if arp.sha == ctx.bpf.my_mac:
return
One thing worth pointing out – we should always update our cache, this comes straight from RFC 826. This gives us up-to-date information.
When we get a request for our IP, we build a reply by swapping sender and target around:
proc send_arp_reply*(ctx: var ArpCtx, arp: ArpFrame) =
var reply: ArpFrame
reply.htype = ArpHwEthernet
reply.ptype = ArpProtoIPv4
reply.hlen = 6
reply.plen = 4
reply.op = ArpOpReply
reply.sha = ctx.bpf.my_mac # we are the sender now
reply.spa = ctx.my_ip
reply.tha = arp.sha # the original requester is the target
reply.tpa = arp.spa
let arpBytes = build_arp_frame(reply)
let eth = build_ethernet_frame(arp.sha, ctx.bpf.my_mac,
EthertypeArp, arpBytes)
send_frame(ctx.bpf, eth)
Everything together:
when isMainModule:
var ctx = ArpCtx(
bpf: BPFLinkLayer(iface: "feth1"),
cache: initTable[IPv4Addr, MacAddr](),
my_ip: [192'u8, 168, 42, 2],
)
ctx.bpf.open_bpf()
for frame in ctx.bpf.read_frames():
handle_arp_ingress(ctx, frame)
Open BPF on some interface, claim 192.168.42.2 as our address, then read frames forever and feed each one into the handler.
A small note on the address itself: we’ve hardcoded 192.168.42.2 rather than picking it up from the kernel, and that’s deliberate.
In a real TCP/IP stack you’d get your address from DHCP, link-local autoconfiguration, or static config. We’re not there yet – we’re building just enough to demonstrate ARP.
Testing
Since we’ve connected everything together, it’s time to test our ARP implementation.
You might notice from our code that I’m using ‘feth1’ interface, instead of just en0 and you might ask why.
Well, 2 reasons for that:
- The kernel competes. Our OS already runs a TCP/IP stack on en0, and for any IP it owns there, it answers ARP requests before our userspace stack can.
- The loopback. Even setting that aside, you can’t drive an ARP exchange to your own IP from your own machine. The kernel shortcuts traffic to loopback interface.
So, in order to test our ARP implementation and future layers, we need to have an isolated interface to test against. We’ll create our own pair of virtual interfaces and let the kernel route between them. One interface for the kernel to own, one for us:
sudo ifconfig feth0 create # kernel's end
sudo ifconfig feth1 create # our end
sudo ifconfig feth0 peer feth1 # The call that turns them into a pair
sudo ifconfig feth0 inet 192.168.42.1/24 up
sudo ifconfig feth1 up
inet 192.168.42.1/24 up– give feth0 an IP and bring it up. This is what makes the rest work – assigning the address triggers the kernel to install a route:192.168.42.0/24viafeth0. Now anything on the host trying to reach .42.x knows exactly how to get there.feth1 up– bring up our end without an IP. Deliberate: no kernel claim on 192.168.42.2, so the loopback shortcut doesn’t apply, and nothing on the host competes with our stack to answer ARP for it.
Now we can go ahead and test it. We flush our ARP cache, just in case we have any stale entries.
sudo arp -d 192.168.42.2 2>/dev/null
Then we can trigger our ARP request:
ping -c 1 -W 1000 192.168.42.2
It should time out and that’s fine. We haven’t built ICMP yet – that’s a future post.
PING 192.168.42.2 (192.168.42.2): 56 data bytes
--- 192.168.42.2 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
But before timing out, the kernel needs a MAC for .2 to send the echo to, so it ARPs first:
sudo nim c -r src/arp.nim
Successfully opened /dev/bpf0 with fd 3
Opened bpf device
My MAC address: 66:65:74:68:00:01
Successfully bound to interface
ARP REQ: 192.168.42.1 (66:65:74:68:00:00) -> 192.168.42.2
Sending ARP reply
ARP ingress handled
Wrap-up
That’s ARP done. We patched a handful of issues in the link layer along the way: multi-packet BPF reads, raw frame types, sending frames, fetching our own MAC, and built a small ARP responder on top. A few things that we deliberately dropped in our implementation:
- Cache expiration. Entries in our cache live forever. Real ARP caches expire entries after a few minutes, so stale mappings don’t linger when MACs change. We’d add a timestamp per entry and evict old ones on a timer.
- Sending requests. Our stack only answers requests, never sends them. A complete implementation also needs to originate an ARP request when it wants to reach an IP whose MAC it doesn’t know yet.
- Gratuitous ARP. When a host claims an IP, it’s polite to announce the mapping unsolicited so neighbors populate their caches without having to ask. Useful when an interface comes up or a MAC changes.
- IPv6 support. We’re IPv4-only here.
For now what we have is enough to demonstrate the protocol.
Next post: IPv4 and ICMP, so the ping from this post actually returns a reply instead of timing out.
All the code is on Github.