In a lazy functional language where variables are immutable and effects are explicit, new powerful tools for understanding what programs mean become available - substitution and equational reasoning.

In Haskell, when we write the equals sign (=) what we mean is that the two sides in this context are the same, and we can substitute one for another keeping the same behaviour of the program.

So that in:

twentyOne =
    let
        seven = 7
    in
        seven + seven + 7

We can substitute seven and 7 with one another and still get the same program.

The program above is identical (in behaviour) to:

twentyOne =
    let
        seven = 7
    in
        7 + 7 + seven

and

twentyOne = 7 + 7 + 7

This is true for bigger and more complicated expressions as well, including ones that use IO. You can always replace one side with the other and get the same behaviour.

This is useful when you want to understand the behaviour of an expression, and is often called 'applying equational reasoning'.

Example: factorial 3

For example, what does factorial 3 means given this definition?

factorial :: Integer -> Integer
factorial =
  \n ->
      if n == 1
          then
              1
          else
              n * factorial (n - 1)

(Note that I'm using a lambda instead of regular function definition because it's easier to substitute visually, but it's the same as factorial n =)

We can use substitution to find out:

(1) We start with factorial 3 and replace factorial with the body of the function:

  ( \n ->
      if n == 1
          then
              1
          else
              n * factorial (n - 1)
  ) 3

(2) Application of lambda function means that we can replace n with 3 in the body of the lambda expression:

if 3 == 1
    then
        1
    else
        3 * factorial (3 - 1)

(3) Next, we can substitute == with it's definition, only that for ints this is a primitive operation (meaning it's not written in haskell but in the haskell runtime), but we know that the implementation means True if the two sides are the same and False if they are not, so we'll do it for them:

if False
    then
        1
    else
        3 * factorial (3 - 1)

(4) Next, again, if is a primitive operation, so we'll just do what we know it expects: the first branch if the test is True, or the second branch if the test is False. So we know that this is the second branch here.

3 * factorial (3 - 1)

(5) Now what? we can again replace *, but it's primitive which means multiply the two sides. Nothing to substitute for the left side, but we can work on the other side:

3 * ( \n ->
      if n == 1
          then
              1
          else
              n * factorial (n - 1)
    ) (3 - 1)

Note: Why did I decide to first substitute factorial and not (3 - 1)? Because in Haskell we first try to evaluate/substitute the function being called before we do that to its arguments.

(6) Now, we can again substitute n with (3 - 1) in the body of the lambda expression:

3 * ( if (3 - 1) == 1
          then
              1
          else
              (3 - 1) * factorial ((3 - 1) - 1)
    )

(7) And then substitute the (3 - 1) in the if with it's implementation (the result here is 2):

3 * ( if 2 == 1
          then
              1
          else
              2 * factorial (2 - 1)
    )

(Hold up: did you notice that all instances of (3 - 1) were substituted to 2? As we discussed, they mean the same thing so you don't have to do that when you are trying to understand the code, but the Haskell runtime is smart and knows in this case that these are the same thing (they are all the n from before), so it calculates it once and applies it to other instances as well to save us time. Thanks Haskell!)

(8) Again, here 2 == 1 means False and then the if False means substitute for the second branch:

3 * ( 2 * factorial (2 - 1) )

(9) Substituting factorial with its implementation again:

3 * ( 2 * ( \n ->
            if n == 1
                then
                    1
                else
                    n * factorial (n - 1)
          ) (2 - 1)
    )

(10) And substitute the n with (2 - 1) in the lambda body:

3 * ( 2 * ( if (2 - 1) == 1
                then
                    1
                else
                    (2 - 1) * factorial ((2 - 1) - 1)
          )
    )

(11) And we substitute (2 - 1) with its implementation:

3 * ( 2 * ( if 1 == 1
                then
                    1
                else
                    1 * factorial (1 - 1)
          )
    )

(12) And finally, 1 == 1 is actually True, so we substitute with the first branch this time:

3 * ( 2 * ( 1 ) )

And voila, we now know what the result is: 3 * 2 * 1 = 6!

What did we learn from substituting this?

  1. We learned that we get the expected result when running factorial 3
  2. We learned that this implementation has to remember the list of numbers it needs to multiply in memory until the end, so it's not very space-efficient.

But what about factorial 0? Does it give the expected result of 1? Try to substitute for yourself and see!

Also, what about the following implementation of factorial, does it has the same space issue as our previous implementation? Evaluate factorial 3 using substitution and find out:

factorial :: Integer -> Integer
factorial = impl 1

impl :: Integer -> Integer -> Integer
impl =
  \acc n ->                 -- acc stands for accumulator
      if n == 0
          then
              acc
          else
              impl (n * acc) (n - 1)

(Is the answer still yes? What can we do about it? (See BangPatterns).)

Conclusion and Further Reading

I hope this was helpful to you in some way, I think substitution and equational reasoning are key tools in a Haskeller's belt, and having a good grasp on the subject will make writing and debugging Haskell code that much easier.

If you'd like to learn more about this and related topics, check out: