I want to talk about a comonad that came up atwork the other day. Actually, two of them, as the data structure in question is a comonad in at least two ways, and the issue that came up is related to the difference between those two comonads.
This post is sort of a continuation of theComonad Tutorial, and we can call this “part 3”. I’m going to assume the reader has a basic familiarity with comonads.
Atwork, we develop and use a Scala library calledQuiver for working withgraphs). In this library, a graph is a recursively defined immutable data structure. A graph, with node IDs of typeV, node labelsN, and edge labelsE, is constructed in one of two ways. It can be empty:
1 | |
Or it can be of the formc & g, wherec is thecontext of one node of the graph andg is the rest of the graph with that node removed:
12345678 | |
By the same token, we can decompose a graph on a particular node:
1234 | |
Where aGDecomp is aContext for the nodev (if it exists in the graph) together with the rest of the graph:
1 | |
Let’s say we start with a graphg, like this:

I’m using anundirected graph here for simplification. An undirected graph is one in which the edges don’t have a direction. In Quiver, this is represented as a graph where the “in” edges of each node are the same as its “out” edges.
If we decompose on the nodea, we get a view of the graph from the perspective ofa. That is, we’ll have aContext letting us look at the label, vertex ID, and edges to and froma, and we’ll also have the remainder of the graph, with the nodea “broken off”:

Quiver can arbitrarily choose a node for us, so we can look at the context of some “first” node:
123 | |
We can keep decomposing the remainder recursively, to perform an arbitrary calculation over the entire graph:
1234 | |
The implementation offold will be something like:
123 | |
For instance, if we wanted to count the edges in the graphg, we could do:
123 | |
The recursive decomposition will guarantee that our function doesn’t see any given edge more than once. For the graphg above,(g fold b)(f) would look something like this:

Let’s now say that we wanted to find the maximumdegree of a graph. That is, find the highest number of edges to or from any node.
A first stab might be:
12345 | |
But that would get the incorrect result. In our graphg above, the nodesb,d, andf have a degree of 3, but this fold would find the highest degree to be 2. The reason is that once our function gets to look atb, its edge toa has already been removed, and once it seesf, it has no edges left to look at.
This was the issue that came up at work. This behaviour offold is both correct and useful, but it can be surprising. What we might expect is that instead of receiving successive decompositions, our function sees “all rotations” of the graph through thedecomp operator:

That is, we often want to consider each node in the context of the entire graph we started with. In order to express that withfold, we have to decompose the original graph at each step:
1234567 | |
But what if we could have a combinator thatlabels each node with its context?
1 | |
Visually, that looks something like this:

If we now fold overcontextGraph(g) rather thang, we get to see the whole graph from the perspective of each node in turn. We can then write themaxDegree function like this:
1234 | |
This all sounds suspiciously like a comonad! Of course,Graph itself is not a comonad, butGDecomp definitely is. Thecounit just gets the label of the node that’s beendecomped out:
1234567 | |
Thecobind can be implemented in one of two ways. There’s the “successive decompositions” version:
1234567 | |
Visually, it looks like this:

Itexposes the substructure of the graph by storing it in the labels of the nodes. It’s very much like the familiarNonEmptyList comonad, which replaces each element in the list with the whole sublist from that element on.
So this is the comonad ofrecursive folds over a graph. Really its action is the same as as justfold. It takes a computation on one decomposition of the graph, and extends it to all sub-decompositions.
But there’s another, comonad that’s much more usefulas a comonad. That’s the comonad that works likecontextGraph from before, except instead of copying the context of a node into its label, we copy the whole decomposition; both the context and the remainder of the graph.
That one looks visually more like this:

Itscobind takes a computation focused on one node of the graph (that is, on aGDecomp), repeats that for every other decomposition of the original graph in turn, and stores the results in the respective node labels:
12345678 | |
This is useful for algorithms where we want to label every node with some information computed from its neighborhood. For example, some clustering algorithms start by assigning each node its own cluster, then repeatedly joining nodes to the most popular cluster in their immediate neighborhood, until a fixed point is reached.
As a simpler example, we could take the average value for the labels of neighboring nodes, to apply something like a low-pass filter to the whole graph:
1234567 | |
The difference between these two comonad instances is essentially the same as the difference betweenNonEmptyList and the nonempty listZipper.
It’s this latter “decomp zipper” comonad that I decided to ultimately include as theComonad instance forquiver.GDecomp.
I’ve been having fun exploring adjunctions lately and thinking about how we can take a monad apart and compose it the other way to get a comonad, and vice versa. Often I’ll find that a comonad counterpart of a given monad gives an interesting perspective on that monad, and ditto for a monad cousin to a given comonad.
Let’s take an example. There is a category of monoidsMon with monoids as objects and monoid homomorphisms as arrows between them. Then there is a functor fromSet toMon that takes any ordinary typeA to thefree monoid generated byA. This is just theList[A] type together with concatenation as the multiplication and the empty list as the identity element.
This free functor has a right adjoint that takes any monoidM inMon to itsunderlying setM. That is, this right adjoint “forgets” thatM is a monoid, leaving us with just an ordinary type.
If we compose these two functors, we get a monad. If we start with a typeA, get its free monoid (theList[A] monoid), and then go from there to the underlying type of the free monoid, we end up with the typeList[A]. Theunit of our adjunction is then a function from any given typeA to the typeList[A].
1 | |
But then what is thecounit? Remember that for any adjunction, we can compose the functors one way to get a monad, and compose them the other way to get a comonad.
In that case we have to start with a monoidM, then “forget”, giving us the plain typeM. Then we take the free monoid of that to end up with theList[M] monoid.
But notice that we are now in the monoid category. In that category,List is a comonad. And since we’re in the category of monoids, thecounit has to be amonoid homomorphism. It goes from the free monoidList[A] to the monoidA:
12 | |
If we apply thecounit for this comonad to the free monoid, we get thejoin for our monad:
1 | |
And to get theduplicate orextend operation in the comonad, we just turn the crank on the adjunction:
12 | |
Theduplicate just puts each element into its own sublist. With regard toextend, this just means that given any catamorphism onList, we can turn that into a homomorphism on free monoids.
12 | |
All the interesting parts ofList are the parts that make it a monoid, and our comonad here is already in a category full of monoids. Therefore the coKleisli composition in this comonad is kind of uninteresting. All it’s saying is that if we can fold aList[A] to aB, and aList[B] to aC, then we can fold aList[A] to aC, by considering each element as a singleton list.
Let’s now consider another category, call itEnd(Set), which is thecategory of endofunctors inSet.
The arrows in this category are natural transformations:
123 | |
There’s another category,Com, which is thecategory of comonads onSet. The arrows here arecomonad homomorphisms. A comonad homomorphism fromF toG is a natural transformationf: F ~> G satisfying the homomorphism law:
1 | |
There is a forgetful functorForget: Com -> End(Set) that takes a comonad to its underlying endofunctor (forgetting that it’s a comonad). And this functor has aright adjointCofree: End(Set) -> Com which generates a cofree comonad on a given endofunctorF. This is the following data type:
1 | |
Note that not only is the endofunctorCofree[F,?] a comonad (inSet) for any functorF, but the higher-order type constructorCofree is itself is a comonad in the endofunctor category. It’s this latter comonad that is induced by theForget ⊣ Cofree adjunction. That is, we start at an endofunctorF, then go to comonads viaCofree[F,?], then back to endofunctors viaForget.
Theunit for this adjunction is then a comonad homomorphism. Remember, this is theunit for a monad in the categoryCom of comonads:
123 | |
This will start with a value of typeF[A] in the comonadF, and thenunfold anF-branching stream from it. Note that the first level of this will have the same structure asx.
If we takeunit across to theEnd(Set) category, we get theduplicate for our comonad:
12 | |
Note that this isnot theduplicate for theCofree[F,?] comonad. It’s the duplicate forCofree itself which is a comonad in an endofunctor category.
Turning the crank on the adjunction, thecounit for this comonad now has to be the inverse of ourunit. It takes the heads of all the branches of the givenF-branching stream.
123 | |
Sending that over to the comonad category, we get thejoin for our monad:
12 | |
In theprevious post, we looked at the Reader/Writer monads and comonads, and discussed in general what comonads are and how they relate to monads. This time around, we’re going to look at some more comonads, delve briefly into adjunctions, and try to get some further insight into what it all means.
Since a comonad has to have acounit, it must be “pointed” or nonempty in some sense. That is, given a value of typeW[A] for some comonadW, we must be able to get a value of typeA out.
The identity comonad is a simple example of this. We can always get a value of typeA out ofId[A]. A slightly more interesting example is that of non-empty lists:
1 | |
So a nonempty list is a value of typeA together with either another list orNone to mark that the list has terminated. Unlike the traditionalList data structure, we can always safely get thehead.
But what is the comonadicduplicate operation here? That should allow us to go fromNEL[A] toNEL[NEL[A]] in such a way that the comonad laws hold. For nonempty lists, an implementation that satisfies those laws turns out to be:
123456 | |
Thetails operation returns a list of all the suffixes of the given list. This list of lists is always nonempty, because the first suffix is the list itself. For example, if we have the nonempty list[1,2,3] (to use a more succinct notation), thetails of that will be[[1,2,3], [2,3], [3]]
To get an idea of what thismeans in the context of a comonadic program, think of this in terms of coKleisli composition, orextend in the comonad:
12 | |
When wemap overtails, the functionf is going to receive each suffix of the list in turn. We applyf to each of those suffixes and collect the results in a (nonempty) list. So[1,2,3].extend(f) will be[f([1,2,3]), f([2,3]), f([3])].
The nameextend refers to the fact that it takes a “local” computation (here a computation that operates on a list) and extends that to a “global” computation (here over all suffixes of the list).
Or consider this class of nonempty trees (often called Rose Trees):
1 | |
A tree of this sort has a value of typeA at the tip, and a (possibly empty) list of subtrees underneath. One obvious use case is something like a directory structure, where eachtip is a directory and the correspondingsub is its subdirectories.
This is also a comonad. Thecounit is obvious, we just get thetip. And here’s aduplicate for this structure:
12 | |
Now, this obviously gives us a tree of trees, but what is the structure of that tree? It will bea tree of all the subtrees. Thetip will bethis tree, and thetip of each proper subtree under it will be the entire subtree at the corresponding point in the original tree.
That is, when we sayt.duplicate.map(f) (or equivalentlyt extend f), ourf will receive each subtree oft in turn and perform some calculation over that entire subtree. The result of the whole expressiont extend f will be a tree mirroring the structure oft, except each node will containf applied to the corresponding subtree oft.
To carry on with our directory example, we can imagine wanting a detailed space usage summary of a directory structure, with the size of the whole tree at thetip and the size of each subdirectory underneath as tips of the subtrees, and so on. Thend extend size creates the tree of sizes of recursive subdirectories ofd.
You may have noticed that the implementations ofduplicate for rose trees andtails for nonempty lists were basically identical. The only difference is that one is mapping over aList and the other is mapping over anOption. We can actually abstract that out and get a comonad for any functorF:
1234 | |
A really common kind of structure is something like the typeCofree[Map[K,?],A] of trees where thecounit is some kind of summary and each key of typeK in theMap of subtrees corresponds to some drilldown for more detail. This kind of thing appears in portfolio management applications, for example.
Compare this structure with thefree monad:
123 | |
While the free monad iseither anA or a recursive step suspended in anF, the cofree comonad isboth anAand a recursive step suspended in anF. They really are duals of each other in the sense that the monad is a coproduct and the comonad is a product.
Given this difference, we can make some statements about what it means:
Free[F,A] is a type of “leafy tree” that branches according toF, with values of typeA at the leaves, whileCofree[F,A] is a type of “node-valued tree” that branches according toF with values of typeA at the nodes.Exp defines the structure of some expression language, thenFree[Exp,A] is the type of abstract syntax trees for that language, with free variables of typeA, and monadicbind literally binds expressions to those variables. Dually,Cofree[Exp,A] is the type ofclosed exresspions whose subexpressions are annotated with values of typeA, and comonadicextendreannotates the tree. For example, if you have a type inferencerinfer, thene extend infer will annotate each subexpression ofe with its inferred type.This comparison ofFree andCofree actually says something about monads and comonads in general:
M, iff: A => M[B], thenxs map f allows us to take the values at the leaves (a:A) of a monadic structurexs andsubstitute an entire structure (f(a)) for each value. A subsequentjoin then renormalizes the structure, eliminating the “seams” around our newly added substructures. In acomonadW,xs.duplicate denormalizes, or exposes the substructure ofxs:W[A] to yieldW[W[A]]. Then we can map a functionf: W[A] => B over that to get aB for each part of the substructure andredecorate the original structure with those values. (See Uustalu and Vene’s excellent paperThe Dual of Substitution is Redecoration for more on this connection.)If we look at a Kleisli arrow in theReader[R,?] comonad, it looks likeA => Reader[R,B], or expanded out:A => R => B. If we uncurry that, we get(A, R) => B, and we can go back to the original by currying again. But notice that a value of type(A, R) => B is a coKleisli arrow in theCoreader comonad! Remember thatCoreader[R,A] is really a pair(A, R).
So the answer to the question of howReader andCoreader are related is that there is a one-to-one correspondence between a Kleisli arrow in theReader monad and a coKleisli arrow in theCoreader comonad. More precisely, the Kleisli category forReader[R,?] is isomorphic to the coKleisli category forCoreader[R,?]. This isomorphism is witnessed by currying and uncurrying.
In general, if we have an isomorphism between arrows like this, we have what’s called anadjunction:
1234 | |
In anAdjunction[F,G], we say thatF isleft adjoint toG, often expressed with the notationF ⊣ G.
We can clearly make anAdjunction forCoreader[R,?] andReader[R,?] by usingcurry anduncurry:
123456 | |
The additionaltupled anduntupled come from the unfortunate fact that I’ve chosen Scala notation here and Scala differentiates between functions of two arguments and functions of one argument that happens to be a pair.
So a more succinct description of this relationship is thatCoreader is left adjoint toReader.
Generally the left adjoint functoradds structure, or is some kind of “producer”, while the right adjoint functorremoves (or “forgets”) structure, or is some kind of “consumer”.
An interesting thing about adjunctions is that if you have an adjoint pair of functorsF ⊣ G, thenF[G[?]] always forms a comonad, andG[F[?]] always forms a monad, in a completely canonical and amazing way:
123456789101112131415 | |
Note that this says something about monads and comonads. Since the left adjointF is a producer and the right adjointG is a consumer, a monad always consumes and then produces, while a comonad always produces and then consumes.
Now, if we composeReader andCoreader, which monad do we get?
12 | |
That’s theState[S,?] monad!
Now if we compose it the other way, we should get a comonad:
12 | |
What is that? It’s theStore[S,?] comonad:
1234567891011 | |
This models a “store” of values of typeA indexed by the typeS. We have the ability to directly access theA value under a givenS usingpeek, and there is a distinguishedcursor or current position. The comonadicextract just reads the value under thecursor, andduplicate gives us a whole store full of stores such that if wepeek at any one of them, we get aStore whosecursor is set to the givens. We’re defining aseek(s) operation that moves thecursor to a given positions by taking advantage ofduplicate.
A use case for this kind of structure might be something like image processing or cellular automata, whereS might be coordinates into some kind of space (like a two-dimensional image). Thenextend takes a local computation at thecursor and extends it to every point in the space. For example, if we have an operationaverage that peeks at thecursor’s immediate neighbors and averages them, then we can apply a low-pass filter to the whole image withimage.extend(average).
The typeA => Store[S,B] is also one possible representation of aLens. I might talk about lenses andzippers in a future post.
In writing up part 2 of myScala Comonad Tutorial, and coming up with my talk forScala World, I idly pondered this question:
If all monads are given by composing adjoint pairs of functors, what adjoint pair of functors forms the `Reader` monad? And if we compose those functors the other way, which comonad do we get?
Shachaf Ben-Kiki pointed out on IRC that there are at least two ways of doing this. One is via theKleisli construction and the other is via theEilenberg-Moore construction. Dr Eugenia Cheng has afantastic set of videos explaining these constructions. She talks about how for any monadT there is a whole categoryAdj(T) of adjunctions that give rise toT (with categories as objects and adjoint pairs of functors as the arrows), and the Kleisli category is the initial object in this category while the Eilenberg-Moore category is the terminal object.
So then, searching around for an answer to what exactly the Eilenberg-Moore category for theR => ? monad looks like (I think it’s just values of typeR and functions between them), I came acrossthis Mathematics Stack Exchange question, whose answer more or less directly addresses my original question above. The adjunction is a little more difficult to see than the initial/terminal ones, but it’s somewhat interesting, and what follows is an outline of how I convinced myself that it works.
Let’s consider the reader monadR => ?, which allows us to read a context of typeR.
The first category involved isSet (orHask, orScala). This is just the familiar category where the objects are types (A,B,C, etc.) and the arrows are functions.
The other category isSet/R, which is theslice category ofSet over the typeR. This is a category whose objects are functions toR. So an objectx in this category is given by a typeA together with a function of typeA => R. An arrow fromx: A => R toy: B => R is given by a functionf: A => B such thaty(f(a)) = x(a) for alla:A.
The left adjoint isR*, a functor fromSet toSet/R. This functor sends each typeA to the function(p:(R,A)) => p._1, having type(R,A) => R.
1 | |
The right adjoint isΠ_R, a functor fromSet/R toSet. This functor sends each objectq: A => R inSet/R to the set of functionsR => A for whichq is an inverse. This is actually a dependent type inhabited by functionsp: R => A which satisfy the identityq(p(a)) = a for alla:A.
The monad is not exactly easy to see, but if everything has gone right, we should get theR => ? reader monad by composingΠ_R withR*.
We start with a typeA. Then we doR*, which gives us the objectrStar[A] in the slice category, which you will recall is just_._1 of type(R,A) => R. Then we go back to types viaΠ_R(rStar[A]) which gives us a dependent typeP inhabited by functionsp: R => (R,A). Now, this looks a lot like an action in theState monad. But it’s not. Thesep must satisfy the property that_1 is their inverse. Which means that theR they return must be exactly theR they were given. So it’s like aState action that isread only. We can therefore simplify this to the ordinary (non-dependent) typeR => A. And now we have ourReader monad.
But what about the other way around? What is the comonad constructed by composingR* withΠ_R? Well, since we end up in the slice category, our comonad is actually in that category rather than inSet.
We start with an objectq: A => R in the slice category. Then we go to types by doingΠ_R(q). This gives us a dependent typeP_A which is inhabited by allp: R => A such thatq is their inverse. Then we takerStar[Π_R(q)] to go back to the slice category and we find ourselves at an objectf: (R, Π_R(q)) => R, which you’ll recall is implemented as_._1. As an endofunctor inSet/R,λq. rStar[Π_R(q)] takes allq: A => R top: (R, R => A) => R = _._1 such thatp is only defined onR => A arguments whose inverse isq.
That is, the counit for this comonad on elementsy: A => R must be a functioncounit: (R, Π_R(y)) => A such that for_._1: (R, Π_R(y)) => R, the propertyy compose counit = _._1 holds. Note that this means that theR returned by_._1 and theR returned byy must be the same. Recall that_._1 always returns the first element of its argument, and also recall that the functions inΠ_R(y) must havey as their inverse, so they’re only defined at the first element of the argument to_._1. That isp._2(x) is only defined whenx = p._1.
If we try to encode that in Scala (ignoring all the “such that”), we get something like:
12 | |
This looks a lot like acounit for theStore comonad! Except what we constructed is not that. Because of the additional requirements imposed by our functors and by the slice category, the second element ofp can only take an argument that is exactly the first element ofp. So we can simplify that to(R, () => A) or just(R, A). And we now have the familiarCoreader comonad.
In chapter 11 ofour book, we talk about monads in Scala. This finally names a pattern that the reader has seen throughout the book and gives it a formal structure. We also give some intuition for what itmeans for something to be a monad. Once you have this concept, you start recognizing it everywhere in the daily business of programming.
Today I want to talk aboutcomonads, which are the dual of monads. The utility of comonads in everyday life is not quite as immediately obvious as that of monads, but they definitely come in handy sometimes. Particularly in applications like image processing and scientific computation.
Let’s remind ourselves of what a monad is. A monad is a functor, which just means it has amap method:
123 | |
This has to satisfy the law thatmap(x)(a => a) == x, i.e. that mapping the identity function over our functor is a no-op.
A monad is a functorM equipped with two additional polymorphic functions; One fromA toM[A] and one fromM[M[A]] toM[A].
1234 | |
Recall thatjoin has to satisfy associativity, andunit has to be an identity forjoin.
In Scala a monad is often stated in terms offlatMap, which ismap followed byjoin. But I find this formulation easier to explain.
Every monad has the above operations, the so-calledproper morphisms of a monad, and may also bring to the table somenonproper morphisms which give the specific monad some additional capabilities.
For example, theReader monad brings the ability to ask for a value:
123 | |
The meaning ofjoin in the reader monad is to pass the same context of typeR to both the outer scope and the inner scope:
12 | |
TheWriter monad has the ability to write a value on the side:
123 | |
The meaning ofjoin in the writer monad is to concatenate the “log” of written values using the monoid forW (this is using theMonoid class fromScalaz):
12 | |
And the meaning ofunit is to write the “empty” log:
1 | |
TheState monad can both get and set the state:
12 | |
The meaning ofjoin in the state monad is to give the outer action an opportunity to get and put the state, then do the same for the inner action, making sure any subsequent actions see the changes made by previous ones.
1234567 | |
TheOption monad can terminate without an answer:
1 | |
That’s enough examples of monads. Let’s now turn to comonads.
A comonad is the same thing as a monad, only backwards:
1234 | |
Note that counit is pronounced “co-unit”, not “cow-knit”. It’s also sometimes calledextract because it allows you to get a value of typeAout of aW[A]. While with monads you can generally only put values in and not get them out, with comonads you can generally only get them out and not put them in.
And instead of being able tojoin two levels of a monad into one, we canduplicate one level of a comonad into two.
Kind of weird, right? This also has to obey some laws. We’ll get to those later on, but let’s first look at some actual comonads.
A simple and obvious comonad is the dumb wrapper (the identity comonad):
12345 | |
This one is also the identitymonad.Id doesn’t have any functionality other than the proper morphisms of the (co)monad and is therefore not terribly interesting. We can get the value out with ourcounit, and we can vacuouslyduplicate by decorating our existingId with another layer.
There’s a comonad with the same capabilities as the reader monad, namely that it allows us to ask for a value:
12345 | |
It should be obvious how we can give aComonad instance for this (I’m using theKind Projector compiler plugin to make the syntax look a little nicer than Vanilla Scala):
123456 | |
Arguably, this is much more straightforward in Scala than the reader monad. In the readermonad, theask function is the identity function. That’s saying “once theR value is available, return it to me”, making it available to subsequentmap andflatMap operations. But inCoreader, we don’t have to pretend to have anR value. It’s just right there and we can look at it.
SoCoreader just wraps up some value of typeA together with some additional context of typeR. Why is it important that this is acomonad? What is the meaning ofduplicate here?
To see the meaning ofduplicate, notice that it puts the wholeCoreader in the value slot (in theextract portion). So any subsequentextract ormap operation will be able to observe both the value of typeA and the context of typeR. We can think of this as passing the context along to those subsequent operations, which is analogous to what the reader monad does.
In fact, just likemap followed byjoin is usually expressed asflatMap, by the same tokenduplicate followed bymap is usually expressed as a single operation,extend:
12345 | |
Notice that the type signature ofextend looks likeflatMap with the direction off reversed. And just like we can chain operations in a monad usingflatMap, we can chain operations in a comonad usingextend. InCoreader,extend is making sure thatf can use the context of typeR to produce itsB.
Chaining operations this way usingflatMap in a monad is sometimes calledKleisli composition, and chaining operations usingextend in a comonad is calledcoKleisli composition (or just Kleisli composition in a comonad).
The nameextend refers to the fact that it takes a “local” computation that operates on some structure and “extends” that to a “global” computation that operates on all substructures of the larger structure.
Just like the writer monad, the writer comonad can append to a log or running tally using a monoid. But instead of keeping the log always available to be appended to, it uses the same trick as the reader monad by building up an operation that gets executed once a log becomes available:
12345678 | |
Note thatduplicate returns a wholeCowriter from its constructedrun function, so the meaning is that subsequent operations (composed viamap orextend) have access to exactly onetell function, which appends to the existing log or tally. For example,foo.extend(_.tell("hi")) will append"hi" to the log offoo.
The comonad laws are analogous to the monad laws:
wa.duplicate.extract == wawa.extend(extract) == wawa.duplicate.duplicate == wa.extend(duplicate)It can be hard to get an intuition for what these lawsmean, but in short they mean that (co)Kleisli composition in a comonad should be associative and thatextract (a.k.a.counit) should be an identity for it.
Very informally, both the monad and comonad laws mean that we should be able to compose our programs top-down or bottom-up, or any combination thereof, and have that mean the same thing regardless.
Inpart 2 we’ll look at some more examples of comonads and follow some of the deeper connections. Like what’s the relationship between the reader monad and the reader comonad, or the writer monad and the writer comonad? They’re not identical, but they seem to do all the same things. Are they equivalent? Isomorphic? Something else?
I’ve found that if I’m usingscala.concurrent.Future in my code, I can get some really easy performance gains by just switching toscalaz.concurrent.Task instead, particularly if I’m chaining them withmap orflatMap calls, or withfor comprehensions.
EveryFuture is basically some work that needs to be submitted to a thread pool. When you callfutureA.flatMap(a => futureB), bothFuture[A] andFuture[B] need to be submitted to the thread pool, even though they are not running concurrently and could theoretically run on the same thread. This context switching takes a bit of time.
Withscalaz.concurrent.Task you have a bit more control over when you submit work to a thread pool and when you actually want to continue on the thread that is already executing aTask. When you saytaskA.flatMap(a => taskB), thetaskB will by default just continue running on the same thread that was already executingtaskA. If you explicitly want to dip into the thread pool, you have to say so withTask.fork.
This works since aTask is not a concurrently running computation. It’s adescription of a computation—a sequential list of instructions that may include instructions to submit some of the work to thread pools. The work is actually executed by a tight loop inTask’srun method. This loop is called atrampoline since every step in theTask (that is, every subtask) returns control to this loop.
Jumping on a trampoline is a lot faster than jumping into a thread pool, so whenever we’re composingFutures withmap andflatMap, we can just switch toTask and make our code faster.
But sometimes we know that we want to continue on the same thread and we don’t want to spend the time jumping on a trampoline at every step. To demonstrate this, I’ll use the Ackermann function. This is not necessarily a good use case forFuture but it shows the difference well.
123456 | |
This function is supposed to terminate for all positivem andn, but if they are modestly large, this recursive definition overflows the stack. We could use futures to alleviate this, jumping into a thread pool instead of making a stack frame at each step:
123456789101112131415 | |
Since there’s no actual concurrency going on here, we can make this instantly faster by switching toTask instead, using a trampoline instead of a thread pool:
1234567891011 | |
But even here, we’re making too many jumps back to the trampoline withsuspend. We don’t actually need to suspend and return control to the trampoline at each step. We only need to do it enough times to avoid overflowing the stack. Let’s say we know how large our stack can grow:
1 | |
We can then keep track of how many recursive calls we’ve made, and jump on the trampoline only when we need to:
12345678910111213141516 | |
I did some comparisons usingCaliper and made this pretty graph for you:

The horizontal axis is the number of steps, and the vertical axis is the mean time that number of steps took over a few thousand runs.
This graph shows thatTask is slightly faster thanFuture for submitting to thread pools (blue and yellow lines markedFuture andTask respectively) only for very small tasks; up to about when you get to 50 steps, when (on my Macbook) both futures and tasks cross the 30 μs threshold. This difference is probably due to the fact that aFuture is a running computation while aTask is partially constructed up front and explicitlyrun later. So with theFuture the threads might just be waiting for more work. The overhead ofTask.run seems to catch up with us at around 50 steps.
But honestly the difference between these two lines is not something I would care about in a real application, because if we jump on the trampoline instead of submitting to a thread pool (green line markedTrampoline), things arebetween one and two orders of magnitude faster.
If we only jump on the trampoline when we really need it (red line markedOptimized), we can gain another order of magnitude. Compared to the original naïve version that always goes to the thread pool,this is now the difference between running your program on a 10 MHz machine and running it on a 1 GHz machine.
If we measure without using anyTask/Future at all, the line tracks theOptimized red line pretty closely then shoots to infinity around 1000 (or however many frames fit in your stack space) because the program crashes at that point.
In summary, if we’re smart about trampolines vs thread pools,Future vsTask, and optimize for our stack size, we can go from milliseconds to microseconds with not very much effort. Or seconds to milliseconds, or weeks to hours, as the case may be.
After giving it a lot of thought I have come to the conclusion that I won’t be involved in“Functional Programming in Java”. There are many reasons, including that I just don’t think I can spend the time to make this a good book. Looking at all the things I have scheduled for the rest of the year, I can’t find the time to work on it.
More depressingly, the thought of spending a year or more writing another book makes me anxious. I know from experience that making a book (at least a good one) is really hard and takes up a lot of mental energy. Maybe one day there will be a book that I will want to forego a year of evenings and weekends for, but today is not that day.
Originally, the content of FPiJ was going to be based on “Functional Programming in Scala”, but after some discussion with the publisher I think we were all beginning to see that this book deserved its own original content specifically on an FP style in Java.
I really do think such a thing deserves its own original book. Since Java is strictly less suitable for functional programming than Scala is, a book on FP in Java will have to lay a lot of groundwork that we didn’t have to do with FPiS, and it will have to forego a lot of the more advanced topics.
I wish the author of that book, and the publisher, all the best and I hope they do well. I’m sorry to let you all down, but I’m sure this is for the best.
Our book,Functional Programming in Scala, relies heavily on exercises. Hints and answers for those exercises are not actually in the book, but arefreely available on GitHub under a permissive MIT license. Likewise, we have written chapter notes that we reference throughout the book and made themavailable as a community-editable wiki.
Naturally, readers get the most out of this book by downloading the source code from GitHub and doing the exercises as they read. But a number of readers have made the comment that they wish they could have the hints and answers with them when they read the book on the train to and from work, on a long flight, or wherever there is no internet connection or it’s not convenient to use a computer.
It is of course entirely possible to print out the chapter notes, hints, and exercises, and take them with you either as a hardcopy or as a PDF to use on a phone or tablet. Well, I’ve taken the liberty of doing that work for you. I wrote a little script to concatenate all the chapter notes, errata, hints, and answers into Markdown files and then just printed them all to a single document, tweaking a few things here and there. I’m calling thisA companion booklet to “Functional Programming in Scala”. It is released under the sameMIT license as the content it aggregates. This means you’re free to copy it, distribute or sell it, or basically do whatever you want with it. The Markdown source of the manuscript isavailable on my GitHub.
I have madean electronic version of this booklet available on Leanpub as as a PDF, ePub, and Kindle file on a pay-what-you-want basis (minimum of $0.99). It has full color syntax highlighting throughout and a few little tweaks to make it format nicely. The paper size is standard US Letter which makes it easy to print on most color printers. If you choose to buy the booklet from Leanpub, they get a small fee, a small portion of the proceeds goes to supportLiberty in North Korea, and the rest goes to yours truly. You’ll also get updates when those inevitably happen.
If you don’t care about any of that,you can grab the PDF from here with my compliments.
The booklet is alsoavailable from CreateSpace orAmazon as a full color printed paperback. This comes in a nicely bound glossy cover for just a little more than the price of printing (they print it on demand for you). I’ve ordered one and I’m really happy with the quality of this print:



The print version is of course under the same permissive license, so you can make copies of it, make derivative works, or do whatever you want. It’s important to note that with this booklet I’ve not done anything other than design a little cover and thenliterally print out this freely available content and upload it to Amazon, which anybody could have done (and you still can if you want).
I hope this makesFunctional Programming in Scala more useful and more enjoyable for more people.
Like a lot of people, I keep a list of books I want to read. And because there are a great many more books that interest me than I can possibly read in my lifetime, this list has become quite long.
In the olden days of brick-and-mortar bookstores and libraries, I would discover books to read by browsing shelves and picking up what looked interesting at the time. I might even find something that I knew was on my list. “Oh, I’ve been meaning to read that!”
The Internet changes this dynamic dramatically. It makes it much easier for me to discover books that interest me, and also to access any book that I might want to read, instantly, anywhere. At any given time, I have a couple of books that I’m “currently reading”, and when I finish one I can start another immediately. I use Goodreads to manage my to-read list, and it’s easy for me to scroll through the list and pick out my next book.
But again, this list is very long. So I wanted a good way to filter out books I will really never read, and sort it such that the most “important” books in some sense show up first. Then every time I need a new book I could take the first one from the list and make a binary decision: either “I will read this right now”, or “I am never reading this”. In the latter case, if a book interests me enough at a later time, I’m sure it will find its way back onto my list.
The problem then is to find a good metric by which to rank books. Goodreads lets users rank books with a star-rating from 1 to 5, and presents you with an average rating by which you can sort the list. The problem is that a lot of books that interest me have only one rating and it’s 5 stars, giving the book an “average” of 5.0. So if I go with that method I will be perpetually reading obscure books that one other person has read and loved. This is not necessarily a bad thing, but I do want to branch out a bit.
Another possibility is to use the number of ratings to calculate aconfidence interval for the average rating. For example, using theWilson score I could find an upper and lower bounds1 ands2 (higher and lower than the average rating, respectively) that will let me say “I am 95% sure that any random sample of readers of an equal size would give an average rating betweens1 ands2.” I could then sort the list by the lower bounds1.
But this method is dissatisfactory for a number of reasons. First, it’s not clear how to fit star ratings to such a measure. If we do the naive thing and count a 1-star rating as 1/5 and a 5 star rating as 5/5, that counts a 1-star rating as a “partial success” in some sense. We could discard 1-stars as 0, and count 2, 3, 4, and 5 stars as 25%, 50%, 75%, and 100%, respectively.
But even if we did make it fit somehow, it turns out that if you takeany moderately popular book on Goodreads at random, it will have an average rating somewhere close to 4. I couldmanufacture a prior based on this knowledge and use that instead of the normal distribution or the Jeffreys prior in the confidence interval, but that would still not be a very good ranking becausereader review metascores are meaningless.
In the article“Reader review metascores are meaningless”, Stephanie Shun suggests using thepercentage of 5-star ratings as the relevant metric rather than the average rating. This is a good suggestion, since even a single 5-star rating carries a lot of actionable information whereas an average rating close to 4.0 carries very little.
I can then use the Wilson score directly, counting a 5-star rating as a successful trial and any other rating as a failed one. I can then just use the normal distribution instead of working with an artisanally curated prior.
Mathematica makes it easy to generate the Wilson score. Here,pos is the number of positive trials (number of 5-star ratings),n is the number of total ratings, andconfidence is the desired confidence percentage. I’m taking the lower bound of the confidence interval to get my score.
12345678910 | |
Now I just need to get the book data from Goodreads. Fortunately,it has a pretty rich API. I just need a developer key, which anyone can get for free.
For example, to get the ratings for a given bookid, we can use their XML api for books and pattern match on the result to get the ratings by score:
12345678910 | |
Here,key is my Goodreads developer API key, defined elsewhere. I put aPause[1] in the call since Goodreads throttles API calls so you can’t make more than one call per second to each API endpoint. I’m also memoizing the result, by assigning toRatings[id] in the global environment.
Ratings will give us an association list with the number of ratings for each score from 1 to 5, together with the total. For example, for the first book in their catalogue,Harry Potter and the Half-Blood Prince, here are the scores:
12345678 | |
Sweet. Let’s see how Harry Potter #6 would score with our rating:
123 | |
So Wilson is 95% confident that in any random sample of about 1.2 million Harry Potter readers, at least 61.572% of them would giveThe Half-Blood Prince a 5-star rating. That turns out to be a pretty high score, so if this book were on my list (which it isn’t), it would feature pretty close to the very top.
But now the score for a relatively obscure title is too low. For example, the lower bound of the 95% confidence interval for a single-rating 5-star book will be 0.206549, which will be towards the bottom of any list. This means I would never get to any of the obscure books on my reading list, since they would be edged out by moderately popular books with an average rating close to 4.0.
See, if I’ve picked a book thatI want to read, I’d consider five ratings that are all five stars a much stronger signal than the fact that people who like Harry Potter enough to read 5 previous books loved the 6th one. Currently the 5*5 book will score 57%, a bit weaker than the Potter book’s 62%.
I can fix this by lowering the confidence level. Because honestly, I don’t need a high confidence in the ranking. I’d rather err on the side of picking up a deservedly obscure book than to miss out on a rare gem. Experimenting with this a bit, I find that a confidence around 80% raises the obscure books enough to give me an interesting mix. For example, a 5*5 book gets a 75% rank, while the Harry Potter one stays at 62%.
I’m going to call that theRúnar rank of a given book. The Rúnar rank is defined as the lower bound of the1-1/q Wilson confidence interval for scoring in theqthq-quantile. In the special case of Goodreads ratings, it’s the 80% confidence for a 5-star rating.
12 | |
Unfortunately, there’s no way to get the rank of all the books in my reading list in one call to the Goodreads API. And when I asked them about it they basically said “you can’t do that”, so I’m assuming that feature will not be added any time soon. So I’ll have to get the reading list first, then callRunarRank for each book’sid. In Goodreads, books are managed by “shelves”, and the API allows getting the contents of a given shelf, 200 books at a time:
123456789101112131415161718 | |
I’m doing a bunch of XML pattern matching here to get theid,title,average_rating, and firstauthor of each book. Then I put that in an association list. I’m getting only the top-200 books on the list by average rating (which currently is about half my list).
With that in hand, I can get the contents of my “to-read” shelf withGetShelf[runar, "to-read"], whererunar is my Goodreads user id. And given that, I can callRunarRank on each book on the shelf, then sort the result by that rank:
123 | |
To get the ranked reading list of any user:
1 | |
And to print it out nicely:
1234567 | |
Now I can get, say, the first 10 books on my improved reading list:
1 | |
| 9934419 | Kvæðasafn | 75.2743% |
| Snorri Hjartarson | 5.00 | |
| 17278 | The Feynman Lectures on Physics Vol 1 | 67.2231% |
| Richard P. Feynman | 4.58 | |
| 640909 | The Knowing Animal: A Philosophical Inquiry Into Knowledge and Truth | 64.6221% |
| Raymond Tallis | 5.00 | |
| 640913 | The Hand: A Philosophical Inquiry Into Human Being | 64.6221% |
| Raymond Tallis | 5.00 | |
| 4050770 | Volition As Cognitive Self Regulation | 62.231% |
| Harry Binswanger | 4.86 | |
| 8664353 | Unbroken: A World War II Story of Survival, Resilience, and Redemption | 60.9849% |
| Laura Hillenbrand | 4.45 | |
| 13413455 | Software Foundations | 60.1596% |
| Benjamin C. Pierce | 4.80 | |
| 77523 | Harry Potter and the Sorcerer’s Stone (Harry Potter #1) | 59.1459% |
| J.K. Rowling | 4.39 | |
| 13539024 | Free Market Revolution: How Ayn Rand’s Ideas Can End Big Government | 59.1102% |
| Yaron Brook | 4.48 | |
| 1609224 | The Law | 58.767% |
| Frédéric Bastiat | 4.40 | |
I’m quite happy with that. Some very popular and well-loved books interspersed with obscure ones with exclusively (or almost exclusively) positive reviews. The most satisfying thing is that the rating carries a real meaning. It’s basically the relative likelihood that I will enjoy the book enough to rate it five stars.
I can test this ranking against books I’ve already read. Here’s the top of my “read” shelf, according to their Rúnar Rank:
| 17930467 | The Fourth Phase of Water | 68.0406% |
| Gerald H. Pollack | 4.85 | |
| 7687279 | Nothing Less Than Victory: Decisive Wars and the Lessons of History | 64.9297% |
| John David Lewis | 4.67 | |
| 43713 | Structure and Interpretation of Computer Programs | 62.0211% |
| Harold Abelson | 4.47 | |
| 7543507 | Capitalism Unbound: The Incontestable Moral Case for Individual Rights | 57.6085% |
| Andrew Bernstein | 4.67 | |
| 13542387 | The DIM Hypothesis: Why the Lights of the West Are Going Out | 55.3296% |
| Leonard Peikoff | 4.37 | |
| 5932 | Twenty Love Poems and a Song of Despair | 54.7205% |
| Pablo Neruda | 4.36 | |
| 18007564 | The Martian | 53.9136% |
| Andy Weir | 4.36 | |
| 24113 | Gödel, Escher, Bach: An Eternal Golden Braid | 53.5588% |
| Douglas R. Hofstadter | 4.29 | |
| 19312 | The Brothers Lionheart | 53.0952% |
| Astrid Lindgren | 4.33 | |
| 13541678 | Functional Programming in Scala | 52.6902% |
| Rúnar Bjarnason | 4.54 | |
That’s perfect. Those are definitely books I thouroughly enjoyed and would heartily recommend. Especially that last one.
I’ve published this function as a Wolfram Cloud API, and you can call it athttps://www.wolframcloud.com/app/objects/4f4a7b3c-38a5-4bf3-81b6-7ca8e05ea100. It takes two URL query parameters,key anduser, which are your Goodreads API key and the Goodreads user ID whose reading list you want to generate, respectively. Enjoy!
It’s well known that there is a trade-off in language and systems design between expressiveness and analyzability. That is, the more expressive a language or system is, the less we can reason about it, and vice versa. The more capable the system, the less comprehensible it is.
This principle is very widely applicable, and it’s a useful thing to keep in mind when designing languages and libraries. A practical implication of being aware of this principle is that we always make components exactly as expressive as necessary, but no more. This maximizes the ability of any downstream systems to reason about our components. And dually, for things that we receive or consume, we should require exactly as much analytic power as necessary, and no more. That maximizes the expressive freedom of the upstream components.
I find myself thinking about this principle a lot lately, and seeing it more or less everywhere I look. So I’m seeking a more general statement of it, if such a thing is possible. It seems that more generally than issues of expressivity/analyzability, a restriction at one semantic level translates to freedom and power at another semantic level.
What I want to do here is give a whole bunch of examples. Then we’ll see if we can come up with an integration for them all. This is all written as an exercise in thinking out loud and is not to be taken very seriously.
In formal language theory, context-free grammars are more expressive than regular grammars. The former can describe strictly more sets of strings than the latter. On the other hand, it’s harder to reason about context-free grammars than regular ones. For example, we can decide whether two regular expressions are equal (they describe the same set of strings), but this is undecidable in general for context-free grammars.
If we know that an applicative functor is a monad, we gain some expressive power that we don’t get with just an applicative functor. Namely, a monad is an applicative functor with an additional capability: monadic join (or “bind”, or “flatMap”). That is, context-sensitivity, or the ability to bind variables in monadic expressions.
This power comes at a cost. Whereas we can always compose any two applicatives to form a composite applicative, two monads do not in general compose to form a monad. It may be the case that a given monad composes with any other monad, but we need some additional information about it in order to be able to conclude that it does.
Futures have an algebraic theory, so we can reason about them algebraically. Namely, they form an applicative functor which means that two futuresx andy make a composite future that doesx andy in parallel. They also compose sequentially since they form a monad.
Actors on the other hand have no algebraic theory and afford no algebraic reasoning of this sort. They are “fire and forget”, so they could potentially do anything at all. This means that actor systems can do strictly more things in more ways than systems composed of futures, but our ability to reason about such systems is drastically diminished.
When we have an untyped function, it could receive any type of argument and produce any type of output. The implementation is totally unrestricted, so that gives us a great deal of expressive freedom. Such a function can potentially participate in a lot of different expressions that use the function in different ways.
A function of typeBool -> Bool however is highly restricted. Its argument can only be one of two things, and the result can only be one of two things as well. So there are 4 different implementations such a function could possibly have. Therefore this restriction gives us a great deal of analyzability.
For example, since the argument is of typeBool and notAny, the implementation mostly writes itself. We need to consider only two possibilities.Bool (a type of size 2) is fundamentally easier to reason about thanAny (a type of potentially infinite size). Similarly, any usage of the function is easy to reason about. A caller can be sure not to call it with arguments other thanTrue orFalse, and enlist the help of a type system to guarantee that expressions involving the function are meaningful.
Programming in non-total languages affords us the power of general recursion and “fast and loose reasoning” where we can transition between valid states through potentially invalid ones. The cost is, of course, the halting problem. But more than that, we can no longer be certain that our programs aremeaningful, and we lose some algebraic reasoning. For example, consider the following:
1 | |
This states that addingn to every number in a list and then subtractingn again should be the identity. But what ifn actually throws an exception or never halts? In a non-total language, we need some additional information. Namely, we need to know thatn is total.
The example above also serves to illustrate the trade-off between purely functional and impure programming. Ifn could have arbitrary side effects, algebraic reasoning of this sort involvingn is totally annihilated. But if we know thatn is referentially transparent, algebraic reasoning is preserved. The power of side effects comes at the cost of algebraic reasoning. This price includes loss of compositionality, modularity, parallelizability, and parametricity. Our programs can do strictly more things, but we can conclude strictly fewer things about our programs.
There is a principle in computer security calledThe Principle of Least Privilege. It says that a user or program should have exactly as much authority as necessary but no more. This constrains the power of the entity, but greatly enhances the power of others to predict and reason about what the entity is going to do, resulting in the following benefits:
Some might notice an analogy between the Principle of Least Privilege and the idea of a constitutionally limited government. An absolute dictatorship or pure democracy will have absolute power to enact whatever whim strikes the ruler or majority at the moment. But the overall stability, security, and freedom of the people is greatly enhanced by the presence of legal limits on the power of the government. A limited constitutional republic also makes for a better neighbor to other states.
More generally, a ban on the initiation of physical force by one citizen against another, or by the government against citizens, or against other states, makes for a peaceful and prosperous society. The “cost” of such a system is the inability of one person (or even a great number of people) to impose their preferences on others by force.
The framework of two-dimensional Euclidean geometry is simply an empty page on which we can construct lines and curves using tools like a compass and straightedge. When we go from that framework to a Cartesian one, we constrain ourselves to reasoning on a grid of pairs of numbers. This is a tradeoff between expressivity and analyzability. When we move fom Euclidean to Cartesian geometry, we lose the ability to assume isotropy of space, intersection of curves, and compatibility between dimensions. But we gain much more powerful things through the restriction: the ability to precisely define geometric objects, to do arithmetic with them, to generalize to higher dimensions, and to reason with higher abstractions like linear algebra and category theory.
Roads constrain the routes we can take when we drive or walk. We give up moving in a straight line to wherever we want to go. But the benefit is huge. Roads let us get to where we’re going much faster and more safely than we would otherwise.
Let’s say you make a decision to have only one kind of outfit that you wear on a daily basis. You just go out and buy multiple identical outfits. Whereas you have lost the ability to express yourself by the things you wear, you have gained a certain ability to reason about your clothing. The system is also fault-tolerant and compositional!
What is this principle? Here are some ways of saying it:
What do you think? Can you think of a way to integrate these examples into a general principle? Do you have other favorite examples of this principle in action? Is this something everyone already knows about and I’m just late to the party?
