Clojure Error Messages are Accidental

_I've recently shifted my thinking about Clojure error messages. It is more useful to think of them as non-existent than to think of them as bad. We end with the role Spec can play in improving error messages. _

I have the good fortune of helping many people learn Clojure. One of the most common complaints is the bad error messages. I have been looking for different ways to improve the error messages for beginners. I've tried better stack trace printers. I've tried implementing some prototype solutions to catch bad arguments to functions. I've been looking at other languages, particularly Elm, for inspiration. I've also just been exploring how error messages are actually implemented in Clojure. What I've discovered has been kind of surprising. I have a completely new perspective on error messages in Clojure, and I want to share that with you.

Before I go on, I want to make something clear: Clojure error messages are bad. They make it harder for beginners. I don't think anyone really argues with that. I like Clojure and think it's well designed in general. I wanted to figure out what was going on with error messages.

It will be important to distinguish between the error messages and the stack traces. The stack traces are those horribly long printouts you get when you get an exception thrown. It's one of those cases where Clojure is relying on the host platform. That's just what JVM stack traces look like.

The worst sin of stack traces is that the most important information is printed first, then followed by less and less important information, usually a few screenfuls of it. In essence, the stack traces are printed backwards in the terminal. You have to scroll way back, looking for the beginning of the trace. The default stack traces are ergonomically difficult.

The second worst sin is that lots of details from Clojure's implementation is included in the stack trace, making them longer and noisier, and, frankly, more intimidating. Both sins suck. I lived with them for years. And they are easily solvable. I happen to use and like Pretty, which reverses the stack trace and filters out noise.

But both of these sins combined are not as bad as the Exceptions themselves. Let's dig into those.

There are actually two major types of errors in Clojure. There are compile-time errors and run-time errors. Compile-time errors are things like syntax errors, unresolvable symbols (missing variables), and errors during macro expansion. These errors tend to be decent, on average. The Exception type makes sense and the error message points to the problem.

(f)
;; throws clojure.lang.Compiler$CompilerException:
           java.lang.RuntimeException:
              Unable to resolve symbol: f in this context, compiling:(*cider-repl pftv*:2780:15)

It's not beautiful, but it's acceptable. Some Clojure macros had incomprehensible error messages, but that seems to have gotten better. It used to be:

(defn foo a)
;; throws IllegalArgumentException:
            Don't know how to create ISeq from clojure.lang.Symbol, compiling:(*cider-repl pftv*:2780:15)

It seems like Clojure just assumed the parameter list was a vector and tried to iterate over it. Now (Clojure 1.8 even) it's much nicer:

(defn foo a)
;; throws java.lang.IllegalArgumentException:
            Parameter declaration "a" should be a vector

Runtime errors, on the other hand, are not as good. There has never seemed to be any consistency to them. The Exception type is sometimes correct, but sometimes it seems to have nothing to do with what I am doing. The error messages rarely describe the actual problem. Here's an example where the error message is just wrong:

(def my-val 1)
@my-val
;; throws java.lang.ClassCastException:
            java.lang.Long cannot be cast to java.util.concurrent.Future

Who said anything about a Future? Why is Clojure bringing that up? You don't know how many times I've looked for Futures in my code, only to realize I had a stray @ somewhere.

And what's more, very often, no error is thrown at all. Maybe nil is returned, or some other unexpected value. Here's an example:

(keyword 5)   ;=> nil
(keyword nil) ;=> nil

That's not very helpful. Try passing different types to keyword to see what happens. Wouldn't you expect an error?

Before I started exploring, these behaviors seemed lazy and neglectful. However, they have been something I've learned to live with. Sometimes the error is something I've seen before. But more frequently, I don't even read the error message. I work in small increments. Any error must be somewhere in the code I just wrote. I didn't pay much attention to error message design until I started exploring it more.

Through that exploration, I've realized that it's not so much that Clojure's errors are bad. It's more that they're accidental. Clojure's core functions are, for the most part, implemented without checks on the arguments. They only code the "happy path". They assume that the arguments are of the correct type and shape, and proceed without caution. When they do explicitly throw an Exception, it's deep inside a conditional where there clearly isn't a way to proceed.

What happens if they're not correct? That's up to chance. Sometimes you get an Exception from something that does implicitly check its arguments. For example, trying to call any method on nil will throw an Exception. And numeric functions will die on non-numbers. But sometimes, after trying all of the tests in a conditional, the conditional fall through to nil. That's what happens with keyword. It appears that it worked, when nothing was done at all, and nil is returned.

I've come to believe that Clojure's errors aren't bad by design. No, it's something totally different. Clojure's errors are actually missing.

We're used to languages doing runtime argument checks. One robust way to implement checks is to implement them in the most central core of the language. Then you build new functions on top of that core. Those new functions can choose to check their own arguments and throw meaningful errors, or rely on on the underlying core's errors if they're sufficient (or you're lazy). It's a systematic way to ensure that bad runtime behavior throws an error. Even a language like JavaScript will eventually bottom out with "undefined is not a function" (meaning method not found) or a null pointer exception. Even if the errors are bad like JavaScript's, at least they exist. That's one way to do it, but that's not what Clojure does.

What has Clojure done?

Clojure, in typical "de-complecting" style, has separated out the implementation of a function from enforcing the preconditions of that function. We have the implementations, which are the "happy paths". What is missing---and has always been missing---are the preconditions. It's almost as if Clojure's functions' implementations assumed some kind of external check on their arguments would happen.

Enter Spec. Spec is that external check. Spec is not there to make the error messages better, as is widely believed. Spec is there to have error messages at all. Since the runtime error messages you see are accidental, any consistently applied error checks will be beneficial. Once we have error messages, we can begin the work of making them better.

Speculation

There's a concern that Spec may make error messages worse, and those concerned point to some macros' error messages where it actually did get worse once the macros were Specced. This is a valid concern. However, I believe that it is far worse to have accidental error messages (like we have now) than consistent error messages, even if they're bad. Spec's messages may be bad (I don't like them that much; they remind me a lot of Haskell's type error messages), but they will cover the core functions of the language. Functions that call those will also get error messages. I used to be skeptic al of the benefit of Spec, but now I'm looking forward to its release.

I think Spec is going to surprise us. Once Clojure's core functions are specced, we will be surprised by how much of our code violates the assumptions of the functions we use, but somehow worked anyway by accident. For instance, I can imagine nils flowing through functions like get that seem to tolerate them and returning nil themselves. Since nil is a valid return from get, we handle it and it may work out. But is nil really a valid argument to get? Probably not, and that would be reflected in the Spec. We've probably got a lot of code like that. When we turn on Spec instrumentation and run our tests, we'll have to face all of these violated assumptions that happened to work. There will be many errors in our existing code.

Even those of us who have worked in Clojure for a long time will have to internalize those assumptions. We're not used to a pre-condition checks. We're used to thinking of Clojure's functions as loose and dynamically typed. Before we internalize the logic of Spec, we will write code that doesn't pass the spec the first time. We will be like beginners. And some may not like the language that it will have become because it will be picky in unfamiliar ways. Lucky for them, they can completely turn it off. But we will be better off with instrumentation on.

I think we will be equally surprised by how simple the types are. For example, it's easy to see clojure.set/union as a really complex function. It takes two collections and returns a new collection with all of the elements of both collections. However, sometimes the type it returns is from the first argument, and sometimes from the second argument, depending on which one is bigger. That's so complicated.

(clojure.set/union #{0} [1 2 3 4])   ;=> [1 2 3 4 0]
(clojure.set/union #{0 1 2 3} [4])   ;=> #{0 1 4 3 2}
(clojure.set/union '(0 1 2 3 4) [5]) ;=> (5 0 1 2 3 4)

When we first use union, we are trying to apply rules that make sense in other areas of Clojure. Shouldn't the arguments be coerced to Sets, like all collections are coerced to seqs in the sequence functions? Or shouldn't it really always use the type of the first argument, like in protocol methods? And there can't be something wrong with the types, otherwise it would have thrown an error, right?

But these rules don't apply. All of the union calls above are wrong. The fact that they do anything at all is an accident. The type of clojure.set/union is so simple. It simply assumes all of the arguments are Sets. If that's true, it will always return a Set. In Haskell, you'd say Set -> Set -> Set. And that type does work with Clojure's union. We can predict that Spec will choose exactly that type. Most experienced Clojure programmers will agree to that. So what was once complex behavior will be replaced by simple behavior or an error message.

But I think there will be some cases where we don't agree, or at least we will have to rebuild our understanding.

Finally, there will be times when speccing a function will inform the implementation. Currently, the type for clojure.core/nth is quite complicated. There are eight different types that it can take. And there's no nice abstraction that covers them all. I mentioned this on Twitter and Alex Miller pointed out that we might need a new notion here, perhaps called nthable?, which would simplify the spec.

The least obvious consequence of my new perspective (that we are adding error messages where none really existed) is that we have an opportunity to correct some mistakes in the implementation. For example, should you really be able to get out of anything?

(get (java.io.File. "hello.txt") :foo) ;=> nil

Does that make any sense? Is that the expected behavior? Maybe we should stop wondering about the expected behavior when you pass in garbage input, and talk instead about the expected input. Spec will let us talk about that.

Maybe Rich has learned a lot since a lot of these functions were defined ten years ago and he wants to use the release of Spec as a way to correct those mistakes without breaking backwards compatibility. Or maybe the core specs will be extremely conservative. It wouldn't surprise me either way. But if the official spec doesn't restrict get, Spec will let people redefine the spec for get for themselves. They can choose whatever subset of the type they want. I can imagine many companies adopting strict specs that catch bugs they find in their code, like some companies release their linter configs today.

In fact, you can do this right now in your code. If get is a source of bugs for you, define a spec for it, instrument, and run your tests.

Finally, this new perspective has helped me understand Cognitect's stance on the error message issue. People have been complaining about error messages for years, and the response from Cognitect has been alienating. Perhaps I'm dense, but it hasn't been until I made this mental shift that I've understood some of what they're getting at.

Conclusions

I'm finding it useful to see Clojure error messages as missing. In practice, they're still bad, but this perspective helps me understand why and gives Spec more meaning. Clojure's core specs, where every function has a spec, will finally give us error messages. However they are, they are better than what we have now. Will core specs also uncover lots of problems in our existing code? Will core specs change our understanding of the difference between runtime behavior and valid input? Who knows. But I'm looking forward to a release of clojure.core.specs. It will contain lots of insights into Rich's design and about how to best understand Clojure. And once we have error messages, we can finally begin the work of making them better.