5 Differences between clojure.spec and Schema

Summary: Schema and clojure.spec aim to solve similar problems. There are significant differences, though, that might not be obvious at first.

Schema came out in 2013 and I started using it right away. At the company I was working at, we had a few API endpoints and we were having the classic problem of having to write custom checkers for our data. Schema seemed to solve the problem of describing the shape of the data, along with expected types at the leaves. Because it was mostly just data, it composed well. For instance, you could def an Address schema and reuse it wherever you needed an address. We also experimented with the coercion facilities of Schema to convert data from the JSON endpoint into better Clojure equivalents. For instance, we converted date strings to java.util.Date objects.

That was three years ago and Schema has since been used quite widely. It's used in many talks at Clojure conferences. And in general, it felt like it solved the problem pretty well, across Clojure and ClojureScript. Now, out of the blue, the Clojure team announced clojure.spec. I know when Rich Hickey writes a blog post, it's something important and insightful. So I take it seriously and try to parse it. And let me say, I had some trouble. It's apparent that Rich went deeper than I have on this problem.

In order to understand clojure.spec a little better, it helped me to compare it to Schema, which I already understood. Here are the main points of similarity and differences:

1. clojure.spec is not a "Data DSL".

Schema focuses foremost on describing a data shape by using data in that shape. It is a "Data DSL", where a map means "expect a map" and a vector means "expect a vector". That means that the schema looks similar to the data it specifies.

clojure.spec takes a different approach. It's not a Data DSL. Specs do not aim to look like the data they are describing. The library is a collection of small tools that do different jobs that can be used together. There is a tool for maps (called keys) that checks for the presence of required and optional keys and checks that their values conform to the named attribute. There is a tool for sequences that uses regular expression operators. And, at bottom, conformance is checked by predicate functions.

;; Schema
(def Person {:first-name s/String
             :last-name s/String
             :email #"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,63}"
             (s/optional-key :phone) s/String})
;; clojure.spec
(s/def :com.lispcast.person/first-name string?)
(s/def :com.lispcast.person/last-name string?)
(s/def :com.lispcast.person/email (s/and string? #(re-matches #"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,63}" %)))
(s/def :com.lispcast.person/phone string?)

(s/def :com.lispcast/person
  (s/keys :req [:com.lispcast.person/first-name
                :com.lispcast.person/last-name
                :com.lispcast.person/email]
          :opt [:com.lispcast.person/phone]))

This example is borrowed and modified from the spec Guide.

In this simple example, I think I prefer Schema. Its intention is much clearer. However, once the Schema meets the real world, it turns out that you can throw the "your Schema looks like the data" pipe dream out the window. For instance, what if we need either an email or a phone or both. In Schema, that means they're both optional, but that you need an extra check afterward, which kind of ruins the elegance of the DSL. You're trying to specify that the phone and email have a relationship to each other. The presence of the keys are interdependent. There are several ways to do it in Schema. And I don't like any of them.

In clojure.spec, the and can operate on the values already parsed. Carin Meier has a great example of constraining different values to be in relationship in One Fish Spec Fish.

Takeaway: I'm interested to see what uses these smaller pieces can be put to. I don't understand them well enough yet. I look forward to experimenting with them.

2. clojure.spec prefers namespaced keywords.

While both clojure.spec and Schema allow namespaced and un-namespaced keywords, clojure.spec clearly encourages a global semantic for a unique keyword. The keys function takes a list of required keywords which must be namespaced. Those keywords play double-duty. They check for the presence of required keys and they name the spec that the value must conform to. Schema is more relaxed and does not show that preference for namespaced keys.

Takeaway: Rich Hickey clearly stated that we should be naming specs for global consumption in the Cognicast interview. I'm not sure what my position is on this, but I trust he's thought about it more than I have. I will definitely have to play with it before I come to an opinion.

3. clojure.spec has powerful sequence validation.

clojure.spec has a full suite of regular expression operators for describing data in a sequence. While in general, vectors tend to be either homogeneous (e.g., a vector of Strings) or used as tuples (e.g., [:person "Luke" "Skywalker"]), clojure.spec does not forget that code is data, too. And code means complex lists. Look at the usage string from clojure.core/defn:

(defn name doc-string? attr-map? ([params*] prepost-map? body) + attr-map?)

It is clearly expressed with regex operations in mind. It uses ?, *, and +, which are classical symbols for regex operators. clojure.spec makes writing a checker to validate defn forms a straightforward translation of this documentation.

Schema did have some useful operators for talking about heterogeneous vectors. But they were nowhere near as powerful as regular expressions.

;; Schema spec with heretogeneous vector
(def FancySeq
  "A sequence that starts with a String, followed by an optional Keyword,
   followed by any number of Numbers."
  [(s/one s/Str "s")
   (s/optional s/Keyword "k")
   s/Num])

This example comes from the Schema readme.

Takeaway: Because it will be so easy to describe the expected arguments to a macro, we should expect better error messages in macros in the core library and beyond. Jonathan Claggett and Chris Houser demonstrated something similar with Sequence Expressions. And Colin Fleming uses full recursive grammars to parse macros in Cursive. Another bonus is that specs can be attached to functions and macros without modifying code using clojure.spec/fdef.

4. clojure.spec combines checking with parsing.

So often, when writing a macro, I need to parse out the pieces of the arguments that I need for each section of logic. clojure.spec requires that you name each piece of the regular expression. clojure.spec/conform uses those names to create a map of all of the pieces. So you're checking that the arguments conform as well as parsing it into parts. And since it's a regular expression, it's pretty powerful. Schema doesn't really check sequences like that. Check out David Nolen's comments on clojure.spec for an example of parsing.

Takeaway: The parsing feature is going to be really important. Regular expressions are great for defining a set of inputs far larger th an the expression itself. There are branching and backtracking built in. I'm really excited for what this means for macros. They'll be easier to make and have better error messages.

5. clojure.spec has tight test.check integration.

test.check is Clojure's implementation of generative testing. I really like generative testing. It covers a large number of cases with higher-order properties. clojure.spec specs can automatically be turned into test.check generators. If you define specs for the arguments and return value of a function, the function can be tested automatically.

Takeaway: I will be more confident in my code when I use clojure.spec. I think it's going to make generative testing more accessible, as well. It's not that generative testing is hard, but the learning curve on spec is easier. clojure.spec/fdef and clojure.spec/fspec will test functions given specs.

Conclusions

I'll confess: when I first saw clojure.spec, I was neither impressed nor excited. I was more baffled than anything. Was this what the Clojure team was working on? Weren't there more pressing matters? But when I read what the core team had produced, worked through the API docs, and listened to the Rich Hickey interview, I started to see some exciting possibilities. I'm really happy this is getting attention as a language feature. It shows that the team is listening to the community.