Languages Blocking Language

Author: Alexander Avery

Posted: Sun | Nov 20, 2022

#computer-science A blurred train zooming past the camera at sunset.

Software developed for businesses is usually comprised of many layers. Many are familiar with the concept of front end, authentication, back end, and database layers. Front end, back end, and databases allow many choices of language, framework, or hosting. Authentication has its fair share of choices, but as far as I’ve seen, few protocols are promoted.

A standard authentication pattern

The best supported authentication protocols I’ve come across for intracompany services include

OAuth2
SAML
Public Key Authentication (as used for SSH)

Options are scarce, but they are all fairly useful from an end user’s perspective (notice that 2 of 3 were designed for the web). Unfortunately, these protocols are rarely native for the databases we use. “No matter”, many might claim, “that’s what APIs are for”. And they are right, we can support any kind of auth protocol in our code. The reflexive solution is to create an API that sits between our front end and database. These days, a new project is announced, and immediately one or more engineers are off to make the API.

The API could use OAuth2, or SAML, and acts as a go-between for users to interact with data. A standard end-to-end architecture could look like this:

Diagram of common software architecture

Mapped out, we only need a handful of services. If we are using OAuth2, the back end uses the auth service to authenticate and authorize a user with the provided JWT. This operation is not slow, and there is usually low latency between the back end and auth server. As you might expect, however, this graph hides a crucial implementation detail and roadblock.

Personal background

Most projects I work on use different languages for back end and front end development. Typically, these are imperative or declarative languages. My latest work project for example used Go on the back end and Dart on the front end.

No matter the language, we all run into problems caused by the above project model. We can finish the product, but write many bugs, lose capabilities of the underlying database, and waste an insulting amount of time and computer resources because of this approach.

Enough context, let me explicitly state the problem I’m attempting to fix.

What I want

There exists a program that needs to query a database for records stored in a graph. This graph is an excellent choice for modeling the business domain, and the Cypher query language provides a great interface for querying and modifying the graph. I want to use Cypher to query the graph, and use the returned records in my program.

What I have

The graph database doesn’t support any auth protocol supplied by my company’s auth server. To authorize the requests we need an API, a middleman if you will, that speaks OAuth2.

This middleman must query the database directly, so we write the queries in this program… as strings. Strike number one, we have no guarantees about these queries at compile time. To be sane about this, we now must write integration tests to know they are even syntactically correct.

Provided our queries are correct, our API now has the results and puts them into statically typed structs, perfect. Now all we need to do is serialize the structs into some string, like JSON, and lose all our type safety! Wait, what?

Okay, so we return this JSON representation to our front end, and now we have to parse it into structs again. And we also have to write more tests to make sure that works properly.

Remember, I want to use Cypher because it’s ergonomic for getting work done with this graph. I am not happy about the fact that I have to unpack, pack, unpack, then pack the data again just to get it into the program that does the actual work. All with minimal compile time guarantees at each end of i/o in every layer.

The final frustration is that the front end has only a handful of provided operations hiding behind HTTP calls. All of these calls require query parameters, or JSON bodies, and HTTP methods that need to be designed, written, and documented. This completely severs the front end from the language we wanted to use - Cypher.

Let’s say in some time I need to do a new operation on the graph, I can’t just write Cypher code. I need to:

Write the query and write tests for the syntax.
Unmarshal results into structs.
Marshal structs into JSON.
Come up with a meaningful HTTP endpoint, HTTP method, request payload type, or all 3 for the front end to use.
Document the API surface for other front end developers, or myself because this is a lot to remember, and no longer looks anything like the graph we started with.
Unmarshal the JSON again on the front end.

My main idea here is that I think HTTP, REST, and GraphQL are overused for intracompany programs. An alternative could have the back end accept Cypher queries, and determine if the user is authorized to run them. If they are, it can just run the query as provided, and return the pure response from the database as provided. That would cut out a significant number of meaningless operations, but it still isn’t perfect. Perhaps the solution is to write an auth plugin for the database itself, using an existing protocol.

It would be nice if there were a more ubiquitous authentication and authorization protocol, but I REFUSE to “obligatory xkcd” myself.

For the compile time checks, maybe I want some ML inspired language like OCaml or Rust, so I can use a Cypher dsl that’s checked at compile time. Alternatively, I could do code generation for languages that don’t support metaprogramming.

I realize this post doesn’t have any concrete solutions, but I want to express my thoughts early in this process. Languages, more specifically “application layers”, are getting in the way of my languages. Until next time.

Tags:

computer-science