## What are monads and why are they used?

### eben.cowley42@gmail.com

Introduction

Introduction

I'd like to preface this article by explaining why I think many people find monads confusing. In the context of programming, a monad is not fundamentally unlike an interface; it's just that the functions that a monad must implement (and perhaps the properties that the functions must satisfy) seem alien at first. Many examples of a certain monad in action can be given, but they all seem unrelated, which can be confusing. It's ultimately the existence of these many examples that makes monads such a critical idea - it turns out that many useful notions of computation (to use Moggi's terminology) can be encoded in a pure functional language as monads.

In other words: I think you think monads are confusing because you are looking for a motivation for the definition, but the usefulness of the definition comes after we start applying it.

A pure functional program is simply a composition of functions. That is, a sequence of functions whose domains and codomains match up so that input can be passed through consecutive functions. A pure imperative program is a sequence of commands for the retrieval and manipulation of data. It's difficult to see how commands like this could be composed in the way pure functions can be. A monad is a special kind of operator on types that allows us to compose imperative actions in the same way that we do functions. This is why monads are important - they allow us to do things like achieve side effects, monitor state, and throw exceptions in a pure functional language.

It's best to know a little bit about Haskell syntax before reading this article. If you know any other functional language you can probably catch on pretty easily. The only Haskell features that pop up are lambdas, pattern matching, and type signatures. If you're totally new to Haskell, I would recommend checking out Real World Haskell or Miran Lipovača's Learn You a Haskell for Great Good!

Monads answer the question of how to get imperative behavior in a pure functional language. What distinguishes functional behavior from imperative behavior? In a pure functional language the only thing you can actually do is compose and apply functions, where functions are always look-up tables that don't change and do nothing but return a value whenever they are passed an argument of the correct type. Just like in all forms of programming, it is common to solve a problem by composing many functions, each of which solves part of the problem and is more understandable to the programmer. Most compilers for imperative languages aren't optimized for this sort of behavior in its purest form, which always uses recursion instead of loops. Imperative programs are sometimes written as a sequence of commands; each one without any input or output. A crude example:

 void main() { getInput(); processInput(); printOutput(); } 

You could of course make processInput() a pure function, bind the input to a variable, pass that variable to your new pure function, and then print out the result. However, the act of getting input and printing output are still side effects. They can't be composed like ordinary functions. This doesn't mean there isn't any way to compose them at all. Let's take a look at the defintion of a monad in Haskell:

 class Monad m where return :: a -> m a (>>=):: m a -> (a -> m b) -> m b (>>):: m a -> m b -> m b 

What the first line of this declaration is saying is that there exists a class called Monad whose instances, m, act as wrappers for an arbitrary type a. Below the declaration is a number of functions which this wrapper must implement. So it's kind of like an interface in object oriented programming.

The first function that our monad, m, must implement is called return, not at all to be confused with the return statement in imperative programming. The type signature for this function says that it goes from the arbitrary type a into the monadic value of that type, m a, i.e. the monad wrapping that type. This is like the generic constructor that classes/structs/objects have in object-oriented programming. That is to say, every monad must have at least one type constructor. We'll get into what exactly is special about return when I talk about the constraints that these functions must satisfy.

The next weird looking function is called the bind operator. Looking at it's type signature, we see that it takes in m a, followed by a function from a to m b, and then outputs something in m b. What could that be useful for? Well, let's return to the example of the imperative program with side effects, there is a monad called IO, which deals with input/output actions, there is a value in IO String called getLine, which secretly holds the value of the string that a user enters at the console. There is also a function putStrLn with type String -> IO () that simply takes a string as input and outputs a monad action that prints it to the console. Recall that () is the type of tuples that wrap no values - i.e. the type with only one element. Then I could write a Haskell program that looks like this:

 main = getLine >>= putStrLn . processInput 

In this case, processInput must be a function that takes any string to a string. The "." represents the composition of processInput and putStrLn. So this program executes the action of getting the input from the console, the bind operator then unwraps it and sends it to a function that processes it and then prints the output to the console. I like to think of the  >>= symbol as squeezing the value out of the monadic value and then injecting it into the function on the right. Regardless of metaphor, what we've done is compose imperative actions even though they don't output values! This may look very alien and non-imperative, but Haskell has something called "do notation" that can be used to make particularly long sequences of monad actions more readable.

Finally, >> gives us a way to compose arbitrarily many monad actions of different types (the monad must remain the same but the type that it wraps can be different for each of the actions being composed), without having to explicitly unwrap any data. This means that >> can literally be read as "and then do" in code. For example:

 main = (getLine >>= putStrLn . processInput) >> main 

Is the exact same program that we constructed before except that it repeats itself every time after printing. I like to think of the >> symbol as an arrow telling the program where to go next.

You can't just make any old parametrized type that implements functions with the above type signatures and call it a monad. You could in Haskell, GHC doesn't check that an instance of Monad obeys these rules; that would be extrmely difficult, so its up to the programmer. If you wrote an instance of monad that didn't satisfy the properties described below then it wouldn't be a monad by the mathematical definition and it probably wouldn't be as useful as an actual monad. For a monad to be a monad it's functions have to satisfy the following "monad laws:"

Left Identity: return a >>= f = f a
Right Identity: m >>= return = m
Associativity: (m >>= f) >>= g = m >>= (\x -> f x >>= g)

These requirements are a lot simpler than they appear. The first one says that if we wrap up a value using return, then we want >>= to be able to unwrap that same value and feed it to the next function. The second condition says something similar it just looks more strange because there aren't any non-monadic values in the definition. It says that if >>= unwraps a value and feeds it to return, then we'll get the same wrapped value back. Hopefully it makes sense that both of these things would be something desirable for a default type constructor, which is basically what return is.

As for the associativity condition, since function composition is a associative, i.e. (f . g) . h = f . (g . h), we desire the same thing to be true when we compose imperative actions. The left side of the equation represents feeding the value inside m into f and then feeding the result into g; the right side of the equation represents first constructing a function (the lambda) that takes x then feeds it into f then into g and then feeding the value inside m into this lambda. The associativity condition ensures that these two ways of feeding the value inside m through the functions are the same.

Examples

The example that I've already brought up is the IO monad. IO stands for Input/Output, so it's role is pretty self-exaplantory - monad actions get compiled to input/output actions.

Another common example of a monad that isn't often referred to as a monad is [a], the type of lists with elements from a. Here's what the monad functions look like for lists:

 return x = [x] xs >>= f = concat (map f xs) xs >> ys = ys 

return simply takes a value and makes a list that contains only that value. The bind operator maps the function onto the list that is passed to it, which makes a list of lists, and then concat flattens that list. The then operator simply returns the second list. This may seem useless, but its not quite useless. In functional programming the great thing about the type of lists is that it is an instance of Functor. This is the typeclass which gives us the map function for lists, which takes a list and a function and applies the function to every element of the list. It turns out that the monad structure of the list type isn't completely essential for applications, but map can be written in terms of >>= and return (how?).

Maybe a is a type that is probably less familiar to object-oriented programmers but it's not at all difficult to understand:

 Maybe a = Just a | Nothing 

This means that Maybe a, for some type a, is either a box called Just with a value from a inside it or it's a box called Nothing with nothing inside it. This wrapper for types is excellent for representing computations that could fail. Here's how it works as a monad:

 return x = Just x Nothing >>= f = Nothing Just x >>= f = f x Nothing >> _ = Nothing Just _ >> m = m 

In practice, the use of this type's monad functionality is extremely useful. Imagine this, you have an important function, foo :: String -> Maybe String, and it's broken into steps:

foo = stepThree . stepTwo . stepOne.

But the failure that causes the output to be Nothing could happen at any step, i.e.:

 stepOne :: String -> Maybe MyType stepTwo :: MyType -> Maybe [Int] stepThree :: [Int] -> Maybe String 

So since the domains and ranges don't match up, foo = stepThree . stepTwo . stepOne would actually cause a type error. What we can do, however, is say

foo = \x -> stepOne x >>= stepTwo >>= stepThree

And huzzah! We have a way of composing computations that could fail without having to manually unwrap the value for each step!

I want to conclude this section by mentioning that if the metaphor of >>= unwrapping the value in its first argument and injecting that value into the function that is its second argument, that's okay. The Control.Monad library provides a few variations of the bind operator:

 (=<<) :: (a -> m b) -> m a -> m b (>=>) (a -> m b) -> (b -> m c) -> a -> m c 

The first operator can be thought of as if its "lifting" a function a -> m b to a function m a -> m b. And the second operator could be thought of as a literal composition of functions, so we can rewrite our important functions like so:

foo = stepOne >=> stepTwo >=> stepThree

Category Theory

Everything that I've covered so far should allow you to effectively use monads as a programmer. But there is more to the definition of a monad. I'm assuming you're familiar with basic categry theoretic terminology from this point on. If you aren't, I'd recommend Bartosz Milewski's online book Category Theory for Programmers, which has lots of C++ examples, or his YouTube series, which makes more use of Haskell.

The first thing we should discuss is monoids. Monoids are usually formulated in terms of sets. Specifically, a monoid is a set $S$ with an operation $(\cdot) : S \times S \rightarrow S$ and $1\in S$ such that $$\forall s\in S \quad 1\cdot s = s \cdot 1 = s$$ $$\forall a,b,c\in S \quad (a\cdot b)\cdot c = a \cdot (b\cdot c)$$ One familiar example of a monoid is the type of lists of any data structure. That is, the identity, $1$, is the empty list and the binary operation is concatenation. Another example of interest is the set of all functions from a given set to itself; the identity is the identity function and the operation is composition of functions.

In category theory however, we don't like talking about sets. We want to construct a monoid based on the morphisms of some category. The first thing we have to do is define a monoidal category. A monoidal category $\mathbb{C}$ is one with a bifunctor, called the tensor product, $\otimes : \mathbb{C}\rightarrow \mathbb{C} \rightarrow \mathbb{C}$ and an "identity object," $I$, such that $\forall A,\, B, \, C\in Ob(\mathbb{C})$ there is an isomorphism $\alpha_{A,B,C} : (A \otimes B)\otimes C \rightarrow A\otimes (B\otimes C)$ (associativity) as well as isomorphisms $\lambda_A : I\otimes A \rightarrow A$ (left identity), and $\, \rho_A : A\otimes I \rightarrow A$ (right identity).

We can generalize the idea of a monoid categorically by defining a monoid in a monoidal category. A monoid in the monoidal category $\mathbb{C}$ is an object $M$ in $\mathbb{C}$ with a morphism $\mu: M\otimes M \rightarrow M$ and another morphism $\eta: I \rightarrow M$. These morphisms must satisfy the following "coherence conditions:" $$\mu \circ (id_M\otimes \mu) \circ \alpha_{M, M, M} = \mu \circ (\mu \otimes id_M)$$ $$\mu \circ (id_M \otimes \eta) = \lambda_M$$ $$\mu \circ (\eta \otimes id_M) = \rho_M$$ Recall that $\alpha$ is an isomorphism that "reassociates" the tensor product, so the first condition is analogous to associativity. The second and third conditions are left identity and right identity, respectively. The meaning of these conditions is made much more clear with diagrams, something I do not have but that Wikipedia does. If one assumes that the underlying category is the category of sets, with $\otimes$ being the Cartesian product and $I$ being a set with one element, then the original algebraic defintion of a monoid is recovered.

But again, sets are boring. What's another example of a monoidal category? The category of endofunctors on some category $\mathbb{C}$. $\otimes$ is identified as the composition of endofunctors (which, like the composition of all morphisms, is associative) and the identity object is $id_{\mathbb{C}}$ - the identity endofunctor. So what would a monoid in this category look like? Well, it would be an endofunctor $T$ with a natural tranformation $\mu:T\circ T \rightarrow T$ and another natural transformation $\eta : id_{\mathbb{C}} \rightarrow T$ that satisfy the conditions for a monoid; that's what a monad is!

For a monad, the coherence conditions are as follows: $$\mu \circ T \eta = \mu \circ \eta T = id_T$$ $$\mu \circ T \mu = \mu \circ \mu T$$ In this case, the left and right identity morphisms, $\lambda_T$ and $\rho_T$, are the identity on $T$. $\, (T\eta)_x=T(\eta_x)$ and $(\eta T)_x = \eta_{T\, x}$. Likewise for $T\mu$ and $\mu T$.

But wait how could any of that possibly relate to all the programming? To see, let's make a Haskell type class:

 class MyMonad M where fun :: (a -> b) -> (M a -> M b) mu:: M (M a) -> M a eta :: a -> M a 
fun is the endofunctor itself. The endofunctor is acting on the category of all Haskell types. A functor acts on both objects (types) and morphisms (functions). When the functor acts on a morphism $f$ it yields a morphism whose domain is the functor acting on the domain of $f$, likewise for the codoamin. This explains the type signature of the first function in our class. The others shouldn't be too difficult to understand as they are equivalent to $\mu:M\circ M \rightarrow M$ and $\eta : id_{\mathbb{C}} \rightarrow M$. The difference is that in the category-theoretic notation we consider the endofunctor as acting on the entire category but in Haskell notation we are considering it as acting on a particular object.

We also demand that these functions satisfy the above conditions for a monad. In Haskell, these can be written like so:

 eta . f = fun f . eta nu . fun (fun f) = fun f . nu 

Now that we have that set up, let's show that it's equivalent to the definition of monad in Haskell. Haskell monads are actually called Kleisli triples in Category theory, so let's call our next class that:

 class KleisliTriple T where ret :: a -> T a >>== :: T a -> (a -> T b) -> T b 

Notice that the then operator,  >> is not defined in a Kleisli triple. That's because it actually has the following default implementation:

 m1 >> m2 = m1 >>= (\_ -> m2) 

The KleisliTriple class must satisfy the "monad laws" discussed earlier in the article. Now all that's left to do is the following:

 instance KleisliTriple t => MyMonad t where fun f m = m >>== (ret . f) eta = ret mu m = m >>== id instance MyMonad t => KleisliTriple t where ret = eta m >>== f = mu (fun f m) 

One can verify that if the conditions for either instance are assumed, the conditions for the other follow. I won't show that here since I think its a useful exercise for the reader. Once that's been done, however, you've proven that Haskell monads are monoids on the category of endofunctors on the underlying category of Haskell types!