Do Not Fear Systems Programming

Author: Alexander Avery

Posted: Tue | Mar 12, 2024

#computer-science #linux A car driving down a foggy road in the Appalachian Mountains.

First time with low level networking

In my junior year of college, I took a computer networking course. It was my introduction to C++, glibc, sockets, and other topics, covered in around six assignments. These assignments included making a chat program, and a client and server for a subset of FTP.

At that point, I had written HTTP servers, but had never dealt with raw bytes going over sockets. The beginning was intimidating, but the course and professor primed my intrigue. After the final, I had changed from avoiding systems programming to being distantly interested.

Today, I program on sockets as a hobby. Most recently, I wrote a MUD that uses only net/tcp for communication from the Go stdlib. It’s a decently concurrent program and a real joy to write in Go. But, it got me wondering: how would I implement this program in a language without CSP-inspired features?

Channels, select statements, and goroutines are fundamental to my understanding of concurrent programs, but they do a lot of work behind the scenes. I wanted to challenge myself to see if I could write a concurrent program without these helpful features.

Writing the ’tick’ server in Go

I’ll begin by demonstrating how easily we can write a server in Go that can orchestrate multiple client connections. This server will send ’tick’ messages to connected clients each second, and allow clients to send messages of their own.

package main

import (
	"bufio"
	"io"
	"log"
	"net"
	"time"
)

// read all messages from client.
func read(rc io.ReadCloser, id int, msg chan<- string, done chan<- int) {
	defer rc.Close()

	scanner := bufio.NewScanner(rc)
	for scanner.Scan() {
		msg <- scanner.Text()
	}

	// we are done once the scanner has reached io.EOF
	done <- id
}

// write 'tick' message to all clients.
func write(conns []net.Conn) {
	for _, c := range conns {
		// some elements in the slice may
		// be nil, representing a disconnected client.
		if c != nil {
			io.WriteString(c, "tick\n")
		}
	}
}

// add a client and read messages if there is room.
func addClient(conns []net.Conn, conn net.Conn, msg chan<- string, done chan<- int) []net.Conn {
	for i, c := range conns {
		// the first available space will
		// receive the incoming client.
		if c == nil {
			go read(conn, i, msg, done)
			conns[i] = conn
			return conns
		}
	}

	// reject the client if there was no available
	// space in the slice.
	conn.Close()
	return conns
}

func main() {
	conns := make([]net.Conn, 15)

	connC := make(chan net.Conn, 0)
	msgC := make(chan string, 0)
	doneC := make(chan int, 0)

	listener, err := net.Listen("tcp", ":8080")
	if err != nil {
		log.Fatalf("listening on :8080: %s", err.Error())
	}

	go func() {
		for {
			c, err := listener.Accept()
			if err == nil {
				connC <- c
			}
		}
	}()

	// we will message all clients when the
	// channel on this ticker receives a value.
	ticker := time.NewTicker(time.Second)

	for {
		select {

		// when we receive a connection, we attempt to add
		// the new client
		case conn := <-connC:
			conns = addClient(conns, conn, msgC, doneC)

		// if a client completes, we free up space in the slice.
		case id := <-doneC:
			conns[id] = nil

		// we log any message a client sends us.
		case msg := <-msgC:
			log.Printf("client said: %s", msg)

		// on each tick, we message our connected clients.
		case <-ticker.C:
			write(conns)
		}
	}
}

Assuming you already know Go, the above program should be fairly straightforward to understand.

Writing the ’tick’ server in Hare

Now comes the challenge of writing the same program in a language that has no CSP features. Our implementation will rely on the poll(2) API. I recommend reading the man page before or after reading the following program. We’ll introduce the program one function at a time, so it is easier to understand.

Accepting TCP connections in Hare

The first function we can define is one that accepts client connections.

// tryconn attempts to add a client to a pollfd slice.
fn tryconn(fds: []poll::pollfd) void = {
	let socket = fds[0];
	if (socket.revents & poll::event::POLLIN > 0) {

		// try to accept the connection.
		let conn = match(net::accept(socket.fd)) {
		case let err: net::error =>
			log::println("failed to accept connection");
			return;
		case let sock: net::socket =>
			yield sock;
		};

		// the events we want to listen for
		let events = (poll::event::POLLIN | poll::event::POLLPRI);

		for (let i = 1z; i < len(fds); i += 1) {
			// empty slot is indicated by a
			// file descriptor equal to 0
			if (fds[i].fd == 0) {

				fds[i].fd = conn;
				fds[i].events = events;

				return;
			};
		};

		// close the connection if there is no room
		io::close(conn)!;
	};
};

Unlike our Go program, we are dealing with a pollfd slice. A pollfd is a struct that holds a file descriptor, an event mask, and a set of returned events. The returned events, stored in revents, are populated by the kernel if some event has taken place on the file descriptor. In this program, the socket on which we listen for connections is also a pollfd within the slice. We treat the first element in our slice as the listening socket of our server. The presence of POLLIN on the listening socket indicates an incoming connection.

Once we yield our new client connection, we define a variable named events that masks incoming events. The only events we should receive are those present in the mask. This program only needs to listen for POLLIN, but we include POLLPRI to demonstrate masking multiple event types. Later, when we receive these events, they will be set on the revents field of our pollfd.

POLLIN on revents indicates that there is data to read from the file descriptor.
POLLPRI on revents indicates that there is an exceptional condition on the file descriptor.

Next, we try to add our connected client to our pollfd slice. If a slot in our slice is available for a new connection, the file descriptor will be equal to zero. If we find an empty slot, we store the net::socket, which is just an io::file, in our empty pollfd. In the case where no empty slot is found, we will simply reject the client by closing the connection.

Reading client messages in Hare

In our Go program, each client had a corresponding goroutine that read incoming data. In our Hare program, we read incoming data by checking revents on each pollfd in a loop.

// tryread will attempt to read from all connected clients.
fn tryread(fds: []poll::pollfd) void = {
	for (let i = 1z; i < len(fds); i += 1) {

		let revents = fds[i].revents;

		// just close the file descriptor
		// on error or hangup events
		if (revents & poll::event::POLLERR
				& poll::event::POLLHUP > 0) {
			closefd(&fds[i]);
			continue;
		};

		// if there is data to read...
		if (revents & poll::event::POLLIN > 0) {

			let scanner = bufio::newscanner(fds[i].fd, 1024z);
			defer bufio::finish(&scanner);

			match(bufio::scan_line(&scanner)) {
			case let msg: const str =>
				log::printfln("client said: {}", msg);
			case =>
				// io::EOF, io::error, or utf8::invalid
				closefd(&fds[i]);
			};

		};
	};
};

We begin by checking for POLLERR or POLLHUP, which are two events we did not include in our event mask. The poll(2) API indicates those values are ignored on events but may be placed on revents anyway.

POLLERR indicates that the read end of a pipe has been closed.
POLLHUP indicates that the peer has closed its end of the channel (POLLHUP stands for poll hangup).

After reading POLLHUP there may be unread data in the channel, but our program will ignore that. If either of those events are present, we close our file descriptor and continue.

Whenever we encounter a POLLIN event, we have data to read. A simple way to read the line is to make a bufio::scanner with a buffer size of 1024 bytes. Hare documentation indicates we are responsible for freeing the scanner with bufio::finish, so we defer that function. If we successfully read the message, we log it to the console; otherwise, we close the client connection.

Closing a client connection in Hare

We can define the function closefd like this:

// closefd closes a provided file descriptor
// and resets all fields to 0.
fn closefd(fd: *poll::pollfd) void = {

	match(io::close(fd.fd)) {
	case void =>
		void;
	case io::error =>
		log::println("failed to close socket");
	};

	fd.fd = 0;
	fd.events = 0;
	fd.revents = 0;
};

It is not enough to just call io::close on our file descriptor. Closing the underlying file descriptor invalidates the descriptor stored on fd. If we don’t also reset the fd field to zero, we will run into issues when calling poll::poll after ending a client connection.

Writing to client connections in Hare

def tick: [5]u8 = ['t', 'i', 'c', 'k', '\n'];

// write sends a 'tick' message to all connected clients.
fn write(fds: []poll::pollfd) void = {
	for (let i = 1z; i < len(fds); i += 1) {
		if (fds[i].fd > 0) {
			io::writeall(fds[i].fd, tick)!;
		};
	};
};

The message we are sending to clients, ’tick’, is defined as a constant in our program. In our write function, we simply check all file descriptors and write ’tick’ to any that are opened.

Writing our poll loop in Hare

Finally, we can write our main function. Our first implementation will have a bug, but we will fix that later.

export fn main() void = {

	let socket = tcp::listen(ip::LOCAL_V4, 8080)!;

	// initialize all pollfds with zero values
	let fds: [15]poll::pollfd = [
		poll::pollfd {
			fd = 0,
			events = 0,
			revents = 0,
		}
		...
	];

	// overwrite the first pollfd with our socket
	fds[0] = poll::pollfd {
		fd = socket,
		events = (poll::event::POLLIN | poll::event::POLLPRI),
		revents = 0
	};

	for (true) {
		match(poll::poll(fds, time::SECOND)) {

		case let n: uint =>
			if (n > 0) {
				tryconn(fds);
				tryread(fds);
			} else {
				write(fds);
			};

		case let err: poll::error =>
			log::fatal("poll failed");
		};

	};

};

First, we listen for connections on a new socket on port 8080.

Next, we can define a pollfd array with a size of fifteen. We initialize each slot with an empty pollfd, and subsequently overwrite index zero with our socket.

In our main loop, we repeatedly call poll::poll with a timeout of one second. If we get events back, we accept any incoming connections, or read messages from active clients. Otherwise, if we reach the timeout, we write our ’tick’ message to each client.

You can now build this program, and connect up to fourteen clients on your local machine with telnet 127.0.0.1 8080.

The bug in our Hare program

You may notice that this program doesn’t send ’tick’ messages as it should. If you connect with a new client, or read a message from a client, the one-second timeout restarts. That means, repeatedly sending messages from clients can idefinitely prevent the timeout from firing. We’ll have to make some changes if we don’t want our writes to be blocked by our data reads and client connections.

Fixing our timeout code in Hare

We need to keep track of our tick events independently of our returned events. A naive solution would be to set our poll timeout to NONBLOCK and check if we should ’tick’ on each iteration. This will certainly work, but the CPU will work needlessly hard with this approach.

A better solution is to compute our timeout duration each time we return from poll::poll.

diff --git a/main.ha b/main.ha
index 48babb8..e9450b5 100644
--- a/main.ha
+++ b/main.ha
@@ -102,6 +102,20 @@ fn closefd(fd: *poll::pollfd) void = {
 	fd.revents = 0;
 };
 
+// shouldtick returns a boolean representing if a tick
+// message should be sent and a time::duration indicating
+// when the next tick should be.
+fn shouldtick(next: time::instant) (bool, time::duration) = {
+	let now = time::now(time::clock::MONOTONIC);
+
+	if (time::compare(now, next) >= 0) {
+		return (true, time::SECOND);
+	};
+
+	let diff = time::diff(now, next);
+	return (false, diff);
+};
+
 export fn main() void = {
 
 	let socket = tcp::listen(ip::LOCAL_V4, 8080)!;
@@ -123,17 +137,28 @@ export fn main() void = {
 		revents = 0
 	};
 
+	let start = time::now(time::clock::MONOTONIC);
+	let nexttick = time::add(start, time::SECOND);
+	let wait = time::SECOND;
+
 	for (true) {
-		match(poll::poll(fds, time::SECOND)) {
+		match(poll::poll(fds, wait)) {
 
 		case let n: uint =>
 			if (n > 0) {
 				tryconn(fds);
 				tryread(fds);
-			} else {
+			};
+
+			let (shouldtick, d) = shouldtick(nexttick);
+			if (shouldtick) {
+				let now = time::now(time::clock::MONOTONIC);
+				nexttick = time::add(now, d);
 				write(fds);
 			};
 
+			wait = d;
+
 		case let err: poll::error =>
 			log::fatal("poll failed");
 		};

With these changes, poll::poll will only return when we have events, or have reached an appropriate time to ’tick’. On each iteration we check if we reached an appropriate time to ’tick’ regardless of what caused poll::poll to return. This means we don’t skip any ’tick’ messages and we keep our CPU usage to a reasonable minimum. The full server code for both Go and Hare can be found here.

The poll(2) API is just one of several available on Linux that enable concurrent I/O. From here, I’d like to learn how to handle concurrent operations that don’t involve I/O. For example, what if we wanted to send a hash that took some time to compute instead of ’tick’ messages to our clients. To accomplish such programs in a language like Hare, I might have to read the famous CSP paper by Tony Hoare first.

Tags: