By Renzo Borgatti

This article has been excerpted from Clojure Standard Library.

Get 37% off with code fccborgatti.

 

 

Knowing your tools

Software development is often compared to a craft, though it’s predominantly an intellectual activity. Software development is abstract in nature, but there are many craft-oriented aspects to it:

  • The keyboard requires time and dedication to operate correctly. Endless discussions on the best keyboard layout for programmers, regarding things like how to speed up typing – e.g. the Dvorak keyboard layout (http://lifehacker.com/should-i-use-an-alternative-keyboard-layout-like-dvorak-1447772004 )
  • The development environment is a key aspect of a programmer’s productivity and another source of debate (almost reaching a religious connotation). Mastering a development environment often translates into learning useful key combinations and ways to customize the most common operations.
  • Libraries, tools, and idioms surrounding the language. Almost everything above the pure syntax rules.
  • Proficiency in several programming languages is a plus in the job marketplace, and the way to achieve it is by practicing them on a regular basis, including getting familiar with the APIs and libraries the language offers.
  • Many other aspects require specific skills depending on the area of application: teaching, presenting, or leadership.

 

The focus on mastering programming skills is important enough that it became one of the key objectives of the Software Craftsmanship Movement (http://manifesto.softwarecraftsmanship.org/). Software Craftsmanship advocates learning through practice and promotes an apprenticeship, like those found in some other professions.

The standard library is one of the most important tools to master a language. One aspect that characterizes the standard library is the fact that it’s already packaged with a language when you first experiment with it. Interestingly, it doesn’t get the amount of attention you’d expect for such an easy-to-reach tool. This article will show you how much wisdom and potential is hidden inside the Clojure standard library.


Why should you care about the standard library?

The expressiveness of a language is often described as the speed at which ideas can be translated into working software. Part of the expressiveness comes from the language itself, in terms of syntax, but another fundamental part comes from the standard library, which is provided out of the box. A good standard library liberates the programmer from the most mundane tasks, like connecting to data sources, parsing XML, dealing with numbers, and a lot more. When the standard library does a good job, developers are free to concentrate on the core business aspects of an application, boosting productivity and return of investment.

Consider also that a deep knowledge of the standard library is often what distinguishes an average developer from the expert. The expert can solve problems more elegantly and faster, because, apart from having solved the same problem before, they can compose a complex solution by pulling small pieces together from the standard library

Finally, the standard library contains solutions to common programming problems that’ve been battle-tested over generations of previous applications. It’s certainly the case for Clojure. The robustness and reliability that comes from that kind of stress-testing is difficult to achieve otherwise. There are only a handful of cases where something in the standard library won’t fit your needs and must be re-implemented.


The Clojure Standard Library

The Clojure standard library is quite comprehensive and can be divided roughly into three parts:

  1. What’s commonly referred as “core”, the content of the single namespace clojure.core. Core contains the functions that’ve evolved to be the main public API for the language, including basic math operators, functions to create and manipulate other functions, and conditionals. Core currently contains around 700 definitions between functions and macros. Functions in core are always available without an explicit reference from a namespace.
  2. Namespaces other than core are shipped as part of the Clojure installation. These are usually prefixed with clojure followed by a descriptive name, like clojure.testclojure.zippers, or clojure.string. Functions in these namespaces are sometimes available by prefixing their namespace (like clojure.string/upper-case) but in other cases they need to be imported into the current namespace using [require].
  3. Finally, the content of the Java SDK, which is available as part of Clojure Java interoperability features.

In this article we’ll refer to the Clojure standard library as the first two parts described above – everything that you get by downloading the basic Clojure package, without downloading additional libraries. In general, items in the standard library are marked as public, although some functions are marked as “alpha” in the Clojure documentation string and are subject to change.

The standard library content can be roughly categorized by looking at the major features Clojure introduces, and by the most common programming tasks. Big groups of functions dedicated to Software Transactional Memory (intro to STM: https://en.wikipedia.org/wiki/Software_transactional_memory), concurrency, and persistent collections exist. Clojure also adds the necessary support for common tasks like IO, sequence processing, math operations, XML, strings, and many others. Missing from the Clojure standard library are solutions already provided by the Java SDK, for example cryptography, low-level networking, HTTP, 2D graphics, and so on. For all practical purposes those features aren’t missing, but are usable from Java without the need to re-write them in Clojure. Java interoperability is one of the big strengths of Clojure, opening the possibility to easily use the Java SDK (Standard Development Kit) from a Clojure program.

The additional (non-core) namespaces of Clojure are displayed in the diagram below (Figure 1) – let’s have a quick look at what they do.


Figure 1. All other non-core namespaces.


  • Core support namespaces integrate core with additional functionalities on top of those already present. clojure.string is possibly the best example. Core already contains [str], but any other useful string functionalities have been moved out into the clojure.string namespace. clojure.template contains a few helpers for macro creation. clojure.set is about the “set” data structure.  clojure.pprint contains formatters for almost all Clojure data types to allow you print in a nice, human-readable form. Finally clojure.stacktrace contains functions to handle Java exceptions manipulation and formatting.
  • REPL namespaces contain functionalities dedicated to the REPL, the read-evaluation-print-loop Clojure offers. clojure.main includes handling of the main entry point into the Clojure executable and part of the REPL functionalities that’ve been split into clojure.repl later on. The latest addition, clojure.core.serverimplements is the server socket functionality.
  • General support is about additional APIs beyond what core offers. The namespaces present here enrich Clojure with new functionalities. clojure.walk and clojure.zip for example are two ways to “walk” and manipulate tree-like data structures. clojure.xml offers XML parsing capabilities. clojure.test is the unit test framework included with Clojure. clojure.sh contains functions to “shell-out” commands to the operative system. clojure.core.reducers offers a model of parallel computation.
  • Java are namespaces dedicated to Java interop beyond what core already offers. clojure.java.browser and clojure.java.javadoc offer the possibility to open a native browser to display generic web pages or javadoc documentation respectively. clojure.reflect wraps the Java reflection APIs, offering an idiomatic Clojure layer on top of it. clojure.java.io offers a sane approach to java.io, removing all the idiosyncrasies that made Java IO confusing, like knowing the correct combination of constructors to transform a Stream into a Reader and vice-versa. Finally the clojure.inspector offers a UI to navigate data structures.
  • Data Serialization is about ways in which Clojure data can be encoded as string in an exchange format. clojure.edn is the main entry point into EDN (https://github.com/edn-format/edn) format serialization. clojure.data contains only one user-dedicated function – [clojure.data/diff], used to compute differences between data structures. clojure.instant defines encoding of time related types.

The above is a nice overview of what’s available beyond core functions. If you’re reading this article the assumption is that you are relatively interested in knowing where a function lives (if not to [require] it at the top of the namespace to use it), but you’re even more interested in knowing that the function exists when you’ve a particular problem to solve.

Although the majority of items in the standard library are either functions or macros, there are also dynamic variables. Dynamic variables are a special kind of reference type that can be re-bound on a thread-local basis (see the great description of dynamic variables from “Joy of Clojure” for a detailed explanation – www.manning.com/books/the-joy-of-clojure-second-edition ). Dynamic variables are important because they’re often the way functions in the standard library are configured.


Making your development life easier

The standard library isn’t there to solve the usual recurring programming problems, rather to offer elegant solutions to new development challenges. “Elegant” in this context translates to solutions that are easy to read and maintain. Let’s look at the following example.

Suppose you’re given the task of creating a report to display information on screen in a form readable by humans. Information is coming from an external system and a library exists to deal with communication. All you know is that the input arrives structured as the following XML (saved here as a local balance var definition):

(def balance
  "<balance>
    <accountId>3764882</accountId>
    <lastAccess>20120121</lastAccess>
    <currentBalance>80.12389</currentBalance>
  </balance>")

The balance needs to be displayed in a user-friendly way by:

  1. removing any unwanted symbols other than letters (like the colon at the beginning of each key);
  2. separating the words (using uppercase letters as delimiters); and
  3. formatting the balance as a currency with 2 decimal digits.

You might be tempted to solve the problem like this:


(require '[clojure.java.io :as io])
(require '[clojure.xml :as xml])
 
(defn- to-double [k m]
  (update-in m [k] #(Double/valueOf %)))
 
(defn parse [xml]                                         ; 
  (let [xml-in (java.io.ByteArrayInputStream. (.getBytes xml))
        results (to-double
                  :currentBalance
                  (apply merge
                         (map #(hash-map (:tag %) (first (:content %)))
                              (:content (xml/parse xml-in)))))]
    (.close xml-in)
    results))
 
(defn clean-key [k]                                       ; 
  (let [kstr (str k)] 
    (if (= \: (first kstr)) 
      (apply str (rest kstr))
      kstr)))
  
(defn- up-first [[head & others]]
  (apply str (conj others (.toUpperCase (str head)))))
 
(defn separate-words [k]                                  ; 
  (let [letters (map str k)]
    (up-first (reduce #(str %1 (if (= %2 (.toLowerCase %2)) %2 (str " " %2))) "" letters)))) 
 
(defn format-decimals [v]                                 ;  
  (if (float? v)
    (let [[_ nat dec] (re-find #"(\d+)\.(\d+)" (str v))]
      (cond
        (= (count dec) 1) (str v "0")
        (> (count dec) 2) (apply str nat "." (take 2 dec))
        :default (str v)))
    v)) 
  
(defn print-balance [xml]                                 ; 
  (let [balance (parse xml)]
    (letfn [(transform [acc item]
              (assoc acc
                     (separate-words (clean-key item))
                     (format-decimals (item balance))))]
      (reduce transform {} (keys balance)))))
  
(print-balance balance)
;; {"Account Id" 3764882, "Last Access" "20120121", "Current Balance" "80.12"}

❶  parse takes the XML input string and parses it into a [hash-map] containing just the necessary keys. parse also converts :currentBalance into a double.

❷  clean-key solves the problem of removing the “:” at the beginning of each attribute name. It checks the beginning of the attribute before removing potentially unwanted characters.

❸  separate-words takes care of searching upper-case letters and pre-pending a space. reduce is used here to store the accumulation of changes so far while we read the original string as the input. up-first was extracted as an handy support to upper-case the first letter.

❹  format-decimals handles floating point numbers format. It searches digits with re-find and then either append (padding zeros) or truncate the decimal digits.

❺  Finally print-balance puts all the transformations together. Again reduce is used to create a new map with the transformations while we read the original one. The reducing function was big enough to suggest an anonymous function in a letfn form. The core of the function is [assoc] the new formatted attribute with the formatted value in the new map to display.


Clojure is relatively easy to read (the three formatting rules are somehow separated into functions), and the example shows minimal use of what the standard library offers. It contains [map][reduce][apply] and a few others including XML parsing, which are important functions (and usually learned first by beginners). Other functions in the standard library exist, that make the code more concise and readable.

Let’s have a second look at the requirements to see if we can do a better job. The source of complexity in the code above can be tracked down to the following:

  • String processing: strings need to be analyzed and de-composed. The clojure.string namespace comes to mind and possibly [subs].
  • Hash-map related computations: both keys and values need specific processing. [reduce] is used here because we want to gradually mutate both the key and the value at the same time, but [zipmap] is a viable alternative worth exploring.
  • Formatting rules of the final output: things like string padding of numerals or rounding of decimals. There’s an interesting [clojure.pprint/cl-format] function that might come handy.
  • Other details like nested forms and IO side effects. In the first case, threading macros can be used to improve readability. Finally, macros like [with-open] remove the need for developers to remember to initialize the correct Java IO type and close it at the end.

By reasoning on the aspects of the problem we need to solve, we listed a few functions or macros that might be helpful. The next step is to verify our assumptions and rewrite the example:


(require '[clojure.java.io :as io])
(require '[clojure.xml :as xml])
 
(defn- to-double [k m]
  (update-in m [k] #(Double/valueOf %)))
 
(defn parse [xml]                                         ; 
  (with-open [xml-in (io/input-stream (.getBytes xml))]
    (->> (xml/parse xml-in) 
         :content
         (map #(hash-map (:tag %) (first (:content %))))
         (apply merge)
         (to-double :currentBalance))))
 
(defn separate-words [s]
  (-> (str (.toUpperCase (subs s 0 1)) (subs s 1))        ; 
      (clojure.string/replace #"([A-Z][a-z]*)" "$1 ")     ; 
      clojure.string/trim))
 
(defn format-decimals [v]
  (if (float? v)
    (clojure.pprint/cl-format nil "~$" v)                 ; 
    v))
 
(defn print-balance [xml]
  (let [balance (parse xml)
        ks (map (comp separate-words name) (keys balance))
        vs (map format-decimals (vals balance))]
    (zipmap ks vs)))                                      ; 
 
(print-balance balance)
;; {"Account Id" 3764882, "Last Access" "20120121", "Current Balance" "80.12"}

❶  parse now avoids the let block, including the annoying side-effect of having to close the input stream by making use of [with-open] macro. ->> threading macro has been used to give linear flow to the previously nested XML processing.

❷  subs, clojure.string/split and clojure.string/split-lines makes really easy to process sub-strings. We don’t need an additional function anymore because turning the first letter to upper-case is now a short single liner.

❸  The key function in the new separate-words version is clojure.string/replace. The regex finds groups of 1 upper-case letter followed by lower-case letters. The last argument conveniently offers the possibility to refer to matching groups. We just need to append a space.

❹  format-decimals delegates almost completely to clojure.pprint/cl-format which does all the job of formatting decimals.

❺  [zipmap] brings in another dramatic change in the way we process the map. We can isolate changes to the keys (composing words separation and removing the unwanted “:”) and changes to the values into two separated map operations. [zipmap] conveniently combines them back into a new map without the need of reduce or [assoc].


The second example shows an important fact about “knowing your tools” (in this case the Clojure standard library): the use of a different set of functions not only cuts the number of lines from 45 to 30, but also opens up the design to completely different decisions. Apart from the case where we delegated entire sub-tasks to other functions (like cl-format for decimals, or [name] to clean a key), the main algorithmic logic took a different approach that doesn’t use [reduce] or [assoc]. A shorter solution which is more expressive and clearly easier to evolve and maintain.

The problem of fragmented information

In 2010, Chas Emerick started asking the Clojure community a few questions in the form of a yearly survey to collect feedback about Clojure adoption in the industry. Cognitect, the company that actively sponsors the development of Clojure, is continuing the tradition with the last available results for 2015 published on their website (http://blog.cognitect.com/blog/2016/1/28/state-of-clojure-2015-survey-results ). From the beginning of the survey, one of the major concerns that people reported has been about the quantity and quality of the Clojure documentation.

The Clojure community (mainly under the guidance of Alex Miller and others from the core team) has made tremendous progress to enhance the Clojure guides and tutorials, culminating with the open source release of the Clojure documentation website, giving people an easy way to contribute (https://clojure.org/news/2016/01/14/clojure-org-live). The documentation that comes with Clojure itself is terse and to the point. This is good to quickly remember how something is supposed to work, but not exhaustive. If you type (doc interleave) at the REPL for example, you’re welcomed with:

user=> (doc interleave)
-------------------------
clojure.core/interleave
([] [c1] [c1 c2] [c1 c2 & colls])
  Returns a lazy seq of the first item in each coll, then the second etc.
nil

“Returns a lazy seq of the first item in each coll, then the second etc.” is precise, but makes some assumptions. It assumes you understand what a “lazy seq” is and leaves out details like what happens with unevenly-sized collections. You could further explore [interleave] by typing examples at the REPL or, missing ideas about what to type exactly, search for snippets on the Internet. Some of the background concepts are documented on the Clojure website under the “reference” section (http://clojure.org/reference).  That documentation has been there from the beginning, but is following the same essential style. If you’re a seasoned programmer with functional experience, you’ll be comfortable with that, but that’s not always the case for Clojure beginners. The recently introduced Clojure-Doc website at http://clojure-doc.org  is the beginning of that community-contributed effort, more directed at “getting started”.

Although http://clojure-doc.org  is now here, multiple efforts were started over the years to fill the gaps left by the original documentation. The following is a summary of the other resources available at the time of writing:

  • http://clojuredocs.org    is a community-powered documentation engine. It offers examples and notes, on top of the standard library documentation including cross-links. The quality of the documentation for a function varies from nothing to many examples and comments.
  • https://groups.google.com/forum/#!forum/clojure  is the main Clojure mailing list. Great threads are recorded in there, including topics discussing the overall Clojure vision and design by Rich Hickey and the rest of the core team.
  • http://clojure-log.n01se.net   are the IRC Clojure channel logs. Same as the mailing list, with important discussions shaping the design of the future Clojure releases.
  • http://stackoverflow.com/search?q=clojure  Clojure related questions are an amazing source of great information. Almost any conceivable problem, philosophical or practical, has been answered there.
  • Blogs: too many good blogs to enumerate. Google is your entry point for those, but a couple of always useful ones are “Jay Fields’ Thoughts on Clojure” at http://blog.jayfields.com/  and “Aphyr’s Clojure From the Ground Up” series at https://aphyr.com/posts/301-clojure-from-the-ground-up-welcome.

As you can see, documentation exists in many forms and is overall valuable, but it’s fragmented; jumping between all the different sources is time consuming, including the fact that, when searching, the right place isn’t always obvious.

The well-kept secret of the Clojure Ninja

Learning about the functions in the standard library usually starts at the beginning. It happens when you first approach some tutorial or book, and, for example, the author shows a beautiful one-liner that solves an apparently big problem.

Usually developers don’t pay explicit attention to the functions in the standard library, assuming that their knowledge will increase by studying the features of the language. This approach can work up to a certain point, but it’s unlikely to scale. If you’re serious about learning the language, consider allocating explicit time to understanding the different nuances of similar functions or the content of some obscure namespace. The proof that this is time well spent can be found reading other people’s experiences; the web contains many articles describing the process of learning Clojure or documenting discoveries (see the aforementioned Jay Field’s blog).

The following is a trick that works wonders to become a true Clojure Master. Along with learning tools like tutorials, books or exercises like the Clojure Koans (https://github.com/functional-koans/clojure-koans), consider adding the following to your learning regimen:

  • Select a Clojure function every day. It could be during lunch or commuting time for example.
  • Study the details of the selected function. Look at the official docs first, try out examples at the REPL, and search the web or www.github.com for Clojure projects that use it.
  • Try to find where the function breaks, or other special corner cases. Pass nil or unexpected types as arguments and see what happens.
  • Rinse and repeat the next day.

Don’t forget to open the sources for the function, particularly if it belongs to the “core” Clojure namespace. By looking at the Clojure sources, you’ve the unique opportunity to learn from the work of Rich Hickey and the core team. You’ll be surprised to see how much design and thinking goes into a function in the standard library. Only by expanding your knowledge about the content of the standard library will you be able to fully appreciate the power of Clojure.

If this article has whet your appetite for Clojure and its standard library, go download the free first chapter of Clojure Standard Library, see this Slideshare Presentation, and save 37% with code fccborgatti.