Summary: With a few functions from the standard library, Clojure lets you do most of what you want with regular expressions with no muss.
Other than the semantics of the regexes themselves, the API is standardized across all platforms in the core library. And the syntax is convenient because you don’t need to double escape your special characters.
Regexes can be constructed in Clojure using a literal syntax. Strings with a hash in front are interpreted as regexes.
Matching (with groups)
There is a nice function that matches the whole string. It is called
re-matches. The return is a little complex. If the whole string does not match, it returns
nil, which is nice because
nil is falsey.
=> (re-matches #"abc" "zzzabcxxx") nil
If the string does match, and there are no groups (parens) in the regex, then it returns the matched string.
=> (re-matches #"abc" "abc") "abc"
If it matches but there are groups, then it returns a vector. The first element in the vector is the entire match. The remaining elements are the group matches.
=> (re-matches #"abc(.*)" "abcxyz") ["abcxyz" "xyz"]
The three different return types can get tricky, but in general I do have groups, so it’s either a vector or
nil, which is easy to handle. You can even destructure it before you test it.
(let [[_ fn ln] (re-matches #"(\w+)\s(\w+)" full-name)] (if fn ;; successful match (println fn ln) (println "Unparsable name")))
re-matches matches the whole string. But often, we want to find a match within a string.
re-find returns the first match within the string. The return values are similar to
No match returns
=> (re-find #"sss" "Loch Ness") nil
Match without groups returns matched string
=> (re-find #"s+" "dress") "ss"
Match with groups returns a vector
=> (re-find #"s+(.*)(s+)" "success") ["success" "ucces" "s"]
Finding all substrings that match
The last function from
clojure.core I use a lot is
re-seq, which returns a lazy seq of all of the matches, not just the first. The elements of the seq are whatever type
re-find would have returned.
=> (re-seq #"s+" "mississippi") ("ss" "ss")
Replacing regex matches within a string
Well, matching strings is cool, but often you’d like to replace a substring that matches with some other string.
clojure.string/replace will replace all substring matches with a new string. Let’s take a look:
=> (clojure.string/replace "mississippi" #"i.." "obb") "mobbobbobbi"
This function is actually quite versatile. You can refer directly to the groups in the replacement string:
=> (clojure.string/replace "mississippi" #"(i)" "$1$1") "miissiissiippii"
You can also replace with the value of a function applied to the match:
=> (clojure.string/replace "mississippi" #"(.)i(.)" (fn [[_ b a]] (str (clojure.string/upper-case b) "--" (clojure.string/upper-case a)))) "M--SS--SS--Ppi"
You can replace just the first occurence with
Splitting a string on a regex
Let’s say you want to split a string on some character pattern, like one or more whitespace. You can use
=> (clojure.string/split "This is a string that I am splitting." #"\s+") ["This" "is" "a" "string" "that" "I" "am" "splitting."]
Those are all of the functions I use routinely. There are some more, which are useful when you need them.
Construct a regex from a
This one is not available in ClojureScript. On the JVM, it creates a
java.util.regex.Matcher, which is used for iterating over subsequent matches. This is not so useful since
If you find yourself with a
Matcher, you can call
re-find on it to get the next match (instead of the first). You can also call
re-groups from the most recent match. Unless you need a
Matcher for some Java API, just stick to
Well, that’s regexes as I use them. They’re super useful and easy to use in Clojure once you get the hang of them.
If you’re interested in learning the fundamentals of Clojure, may I suggest my own LispCast Introduction to Clojure video series. It guides you through a deep experience of the language. You’ll learn REPL skills, how to set up a project, and how to develop a DSL, all in a fun, interactive way.