Simple Software
Author: Alexander Avery
Posted:
#computer-science
Simple Software
In my previous post, I remarked that the unifying concept behind my development philosophy is simplicity. Of the many considerations in writing software, performance, reliability, and predictable orthogonal features readily come to mind. How might simplicity be a unifying concept to these considerations and dozens of others?
It comes down to the fact that software is produced by and for people. If software is to be well understood, it needs to be simple enough for a person to grasp. Is it possible to make software performant if you don’t understand it? Is it possible to make software reliable if you don’t understand it? Is it possible to add predictable orthogonal features to a program if you don’t understand it?
If we don’t understand a piece of software, we can’t make it achieve what we want. If tools, code, or environments are not easy enough to use and understand thoroughly, it will be difficult to achieve your goals.1 Therefore, simplicity is the precursor to other desired qualities of software.
Some people object that problems can be so complex there is no option but to write complex software. If that is true, the software should be as complex as necessary to solve the problem, but not a degree more complex than required.
In his 1980 Turing Award Lecture, Tony Hoare said:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
Language and communication
In my compilers course, my professor remarked on a fascinating bit of programming language history. The first version of Fortran was developed from 1954 to 1957 by a team of roughly a dozen IBM developers led by John Backus.2 By some back-of-the-napkin math, the team spent almost 50 man-years on the project. Surely, some individuals today could write Fortran in a fraction of the time.
This is not to say John and his team were unskilled. On the contrary, they were on the cutting edge of computer science research, inventing a computer language without most of the theoretical knowledge taught in CS bachelor degrees today. It wasn’t until three years after the release of Fortran that John Backus and Peter Naur created BNF to describe language grammars.3 Only after the creation of BNF did standard parsing strategies emerge, such as recursive descent, top-down parsing, or bottom-up parsing. Surely, very few individuals today could write Fortran without applying those methodologies.
I mention this bit of history to highlight something fundamental in computer programming. Why did John Backus and his team find it so worthwhile to create Fortran when they were never guaranteed success? I don’t think it was about making a system that allowed them to type less. Instead, I argue the value comes from communicating the intent of a computer program between colleagues.
echo (echo, echo, echo…)
Let’s consider an example, the POSIX echo
.
In x86 assembly4, echo
can be written in under seventy lines, so why do most implementations use C?
Portability is certainly a reason for this, but we can isolate that variable.
Imagine you need to write echo
for one CPU architecture on one operating system.
You only need to write it once, so if it takes 70 lines of assembly, the cost in time is low.
What, then, would you gain from writing it in C?
More or less the same sorts of things gained by writing it in assembly instead of machine code.
The value is not that C is more abstract than assembly; instead, the value is the clear communication of intent.
I’d wager that isolated portions of the x86 echo
don’t help you infer what operations come before and after.
Here is a three-line snippet to consider:
movl index(%ebp), %ecx
movl 4(%ebp, %ecx, 4), %eax
movl %eax, address(%ebp)
How do those instructions relate to the process echo
must carry out?
What would you expect to come before these instructions?
What would you expect to come after these instructions?
It’s not impossible to answer these questions; it just takes time and thought.
We can juxtapose this with a random snippet from the Plan 9 echo.c:
buf = malloc(len);
if(buf == 0)
exits("no memory");
It’s still three lines, but the information is denser.
This clearly allocates a buffer with size len
, and checks for allocation errors.
We can reasonably guess that just before, the program counted the length of arguments to store in len
.
It’s also reasonable to assume that next it will copy those arguments into the buffer.
This is not an argument that more abstraction is better.
The GNU coreutils version of echo
is even harder to understand than the minimal assembly version.
case '4': case '5': case '6': case '7':
c -= '0';
if ('0' <= *s && *s <= '7')
c = c * 8 + (*s++ - '0');
I have absolutely no idea how the above code assists in writing arguments to standard out. I have no clue what ought to come before this, or after, or what this portion itself is even doing. That snippet is within a switch statement, within an if statement, within a while loop, within a while loop, within an if statement. All of which is under the label hilariously titled “just_echo”.
If I had tried to read this version of echo as a student, I would have thought I was too stupid to understand basic programs. Now I know that programs as such are complex without justification. You will learn better skills by seeking out elegant, useful programs than by trying to grok a bloated, but popular, mess.
Nobody wants to miss the forest for the trees, but it’s equally foolish to forget that a forest is made of trees. The details shouldn’t get in the way of your communication, and the details shouldn’t be impenetrable.
Software is social
Software is social, and that is why it’s so valuable for our programs to effectively communicate to other people.
There is a meme among developers that reviewing code you wrote after many months or weeks results in hours of confusion. Though many people, myself included, have experienced this, it doesn’t have to be this way. If you are regularly writing code that you are confused by some time later, that may be a sign to work on your expression.
Though software and books are decidedly not the same, sometimes it helps to consider the similarities. Let’s say you are reading a novel about a character named Sam. In the first paragraph of the chapter, Sam is walking towards the kitchen. Would you be confused or surprised if, in the second paragraph, Sam is in the kitchen?
It wouldn’t matter if you read the book monthly, daily, or if it’s your first read-through. People are not puzzled by narratives that follow a predictable sequence of cause and effect. When you are writing code, you aren’t writing a novel, but you are writing for people. The expectations and requirements people have for understanding don’t really change when the medium changes.
One of my favorite ways to practice is on CodeWars. There, you work on a focused problem, and once you solve it, you are allowed to see solutions written by other people. Because the problem is one you personally worked on, it is easy to assess the quality of other solutions. Some solutions are verbose and long-winded; others are too “clever”. And thankfully, there are a handful that are elegant, performant, and concisely explain the problem and its solution.
-
This shouldn’t be controversial, but in case you are unconvinced, apply the same thinking to any other field. Could you construct an interplanetary rocket if you don’t understand a model of physics equal to or better than the model used by SpaceX or NASA engineers? Would it be possible to build a sturdy house if the hammer you had was designed to fit on your foot instead of in your hand? ↩︎
-
https://www.dannyadam.com/blog/2017/10/echo-and-printenv-in-x86-assembly/ ↩︎