Giml's type inference engine uses unification-based constraint solving.

It has 6 important parts:

The general process is as follows: we sort and group definitions by their dependencies, we elaborate the program with type variables and types that we know and collect constraints on those variables. We then solve the constraints for each group using unification, instatiating polymorphic types we run into, and create substitutions, which are mapping from type variables to types, and merge them together. Then we substitute the type variables we generated in the elaboration stage with their mapped types from the substitution in the program, and then generalize the types of top-level definitions and close over their free variables.

Lets unpack all of that, step by step.

## Group and order definitions

Before we start actually doing type inference, we figure out the dependencies between the various definitions. This is because we want to figure out the type of a definition before its use, and because we want definitions that depend on each other (mutual recursion) to be solved together.

So for example if we have this file as input:

``````main = ffi("console.log", odd 17)

even x =
if int_equals x 0
then True
else if int_equals x 1
then False
else odd (sub x 1)

odd x =
if int_equals x 0
then False
else if int_equals x 1
then True
else even (sub x 1)``````

The output of the reordering and grouping algorithm will be:

``[["even", "odd"], ["main"]]``

This is because `even` and `odd` are mutually recursive, so we want to typecheck them together, and because main depends on `odd` we want to typecheck it after we finished typechecking `odd`.

The module implementing this algorithm is Language.Giml.Rewrites.PreInfer.GroupDefsByDeps.

## Elaboration

Now the real work begins. We start by going over the program's AST and annotate it with the types we know. When we don't know what the type of something is we "invent" a new fresh type variable as a placeholder and mark that type with a constraint according to its usage.

### Types

Language.Giml.Types.Types contains a datatype that can be used to represent Giml types. For example:

• `TypeCon "Int"` represents the type `Int`
• `TypeVar "t1"` represents a polymorphic type variable
• `TypeApp (TypeCon "Option") (TypeCon "Int")` represents the type `Option Int`
• `TypeRec [("x", TypeCon "Int")]` represents a record with the type `{ x : Int }`

### Constraints

A Constraint between two types describes their relationship.

Currently we only have one type of constraint: `Equality`. Which means that the two types are interchangeable and should unify to the same type. For example from the constraint `Equality (TypeVar t1) (TypeCon "Int")` we learn that `t1` is the same as `Int`.

One instance of an `Equality` constraints stands out from the rest - when the second argument is a `TypeScheme`. In that case we mean to say that the first type of the `Equality` has an `Equality` relationship with the instantiation of the second type.

An instantiation is a monomorphic instance of a polymorphic type, for example `forall a b. (a -> b) -> List a -> List b` is polymorphic, and can be instantiated, among other examples, to `(Int -> String) -> List Int -> List String` by replacing the polymorphic type variables `a` and `b` which are declared in the `forall` clause with monomorphic types such as `Int` and `String`.

We will talk about solving these constraints a bit later.

### Implementation

We also use `State`, `Reader` and `Except` in our algorithm:

• The State contains:
• A monotonically increasing integer for generating type variables
• A list of constraints we generated through the elaboration phase, which we will return as output at the end
• The currently scoped variables in the environment and their type
• The type of builtin functions and expressions (such as `int_equals` and `pure`)
• An environment of data definitions, some are built-in and others are user defined and added as part of the elaboration stage
• The Except part can throw a few errors such as "unbound variable" error.

Most of the action happens in elaborateExpr. I'm not going to cover every usecase, but lets look at a few examples:

#### Elaborating literals

Elaborating literals is simple - we already know what the types are - if we see a string literal we annotate the AST node with the type `String`, if we see an integer literal we annotate it with `Int`, and we don't need to generate new constraints because we know exactly what the type is.

#### Elaborating lambda functions

Elaborating lambda expressions is interesting, because we don't know what type each argument is going to be: it might be inferred by the usage like in this case: `\n -> add 1 n` or not like in this case `\x y -> x`. What we do here is:

1. Generate a type variable for each function argument
2. Elaborate the body of the lambda with the extended environment containing the mapping between the function arguments and their newly generated types
3. Annotate the lambda AST node with the type of a function from the argument types to the type of the body, which we get from the elaborated node after (2)

And we will trust the constraint solving algorithm to unify all of the types.

#### Elaborating function application

To elaborate a function application we:

1. Elaborate the arguments
2. Elaborate the applied expression
3. Generate a fresh type variable to represent the return type
4. Constrain the type of the applied expression as a function from the arguments to the return type
5. Annotate the function application AST node with the return type (which we just generated)

Remember that the type of the applied expression may just be a generated type variable, so we can't examine its type to see what the return type is. For example if we had this expression `\f x -> f x`, as we saw before we generated type variables for `f` and `x`! So we have to make sure the type inference engine knows what their type should be (by generating a constraint) according to the usage and trust that it'll do its job.

#### Elaborating variables

For variables, we expect an earlier stage (such as elaborating lambdas, lets or definitions) to have already elaborated their type and added them to the environment, so we look them up. We first try to look them up in the scoped environment and if that fails we try the built-ins environment. If that fails as well, we throw an error that the variable is unbound.

If we do find a variable in the scoped environment, we generate a fresh variable and add constrain the type of the fresh variable to be equals to the type we found in the env.

Generating a fresh variable is important, because if the type we found in the environment is polymorphic (i.e. it is a `TypeScheme`), the constraint solving phase will instantiate the polymorphic type. We'll talk more about what this means a bit later.

## Constraint solving

The heart of the type inference engine. Here we visit each the constraints generated from each group of substitutions separately and generate a substitution (mapping from type variables to types) as output.

the solveConstraints function takes an accumulated substitution and a list of constraints as input and tries to solve the first constraint on the list. Solving a constraint might generate more constraints to solve, and may return a substitution as well. We then apply this substitution (substitute all occurences of the keys with the values) to the rest of the constraints and the accumulated substitution itself, merge the new substitution with the accumulated substitution, and keep going until there are no more constraints left to solve.

(solveConstraint) uses a process called unification to solve a constraint. The unification process tries to match two types and produce additional clues to what concrete types they represent or fail when the types do not match.

Here are a few important cases that it handles:

### `Equality t1 t2 (where t1 == t2)`

When the two types are the same (for example `Int` and `Int`, or `t1` and `t1`), the unification succeeds without extra work needed, we don't need to create new constraints or create a substitution.

### `Equality (TypeVar tv) t2`

When one of the types is a type variable, we substitute (replace) it with `t2` in all places we know of (accumulated substitution and rest of the constraints). So the rest of the constraints that contain it will refer to `t2` instead, and when substituting it with another type it'll be `t2`.

### `Equality (TypeApp f1 a1) (TypeApp f2 a2)`

We generate two new constraints: `Equality f1 f2` and `Equality a1 a2`, reducing a constraint of complex types a step further until eventually we get something we already know how to handle: a constraint on simple types.

### `Equality t1 (TypeScheme vars t2)`

When the second argument of `Equality` is a `TypeScheme`, we instantiate the second type `t2` and produce an `Equality` constraint between `t1` and the instantiation of `t2`.

Instantiating is basically "copying" the type signature with freshly generated type variables, and producing the `Equality` constraint with `t1` to make the type inference engine find the right type for each of the instantiated type variables.

There are a few more cases, but this is the gist of it.

When we fail to unify we throw an error of type mismatch.

## Instantiation

When we run into a type scheme (a type with a `forall`) during the constraint solving stage, we instantiate the type by looking up all of the type variables in the type and generate a fresh new type variable for each one (though all instances of a type variable should be mapped to the same new type variable).

So for example in the type `forall a b. (a -> b) -> List a -> List b`, if `t1` and `t2` are two new type variables we haven't seen before, we can replace `a` with `t1` and `b` with `t2` everywhere in the type and get `(t1 -> t2) -> List t1 -> List t2`

After that, we let the `Equality` constraints and the constraint solving algorithm find the right monomorphic types for `t1` and `t2`.

## Substitution

As part of the constraint solving process, we generate substitutions when we unify type variables. We need to use the generated substitution immediately by applying it over our accumulated substitution and remaining constraints.

When we apply a substitution to a constraint or another substitution, we replace all occurences of a type variable `tv` with `t` if `t := tv` is in the substitution. But, we also need to check that `tv` does not contain `t` inside of it! (this is called an occurs check)

if `tv` does contain `t`, it means that the we ran into an infinite type and we don't support those here, so we will throw an error.

A simple way to generate such case is this:

``x y = x``

The constraint generated from this definition is: `Equality top0 (t1 -> top0)`, which will fail the occurs check.

If you try to desugar this definition into a lambda expression, you'll quickly discover that you've been had: anytime you'll go to replace `x` with `\y -> x` you'll just add another level of lambda.

## Generalization

Once we applied the substitution to the elaborated types for a group of definitions, we have figured out the types of all definitions in a group. What's left is to declare the polymorphic definitions as polymorphic in a way that when we reference these definitions in subsequent groups we'll know that we need to instantiate them.

Basically, we need to convert the types of top-level definitions to `TypeScheme`s (or `forall`s) where the type variables are the leftover type variables. For example, if we have the type `t1 -> Int -> t2 -> t1`, we find all of the free variables (`t1` and `t2`), replace them in the type signature with nicer types (`t1` => `a`, `t2` => `b`), and write them down at the `forall` clause of the type (`forall a b. a -> Int -> b -> a`).

If there are no free variables in the top-level type signature we will generate an empty `forall` list, so we just omit the `forall` clause for convenience.

## Example time!

Let's see an example for this (very) short program:

``````one = id 1

id = \x -> x``````

### Topological ordering

We start with topological ordering. We can see that `one` depends on `id`, but not the other way around, so we separate them to different groups and `id` comes first.

``[["id"], ["one"]]``

### First group

The first group is:

``id = \x -> x``

#### Elaboration

After the elaboration stage we generate the following elaborated AST:

``````      +---t1
|
|    +---- t2
|    |
---  ---
id = \x -> x
---  -------
|      |
|      +----- t1 -> t2
|
|
+--- top0``````

And constraints:

``````[ Equality t2 t1
, Equality top0 (t1 -> t2)
]``````
##### How did we get these constraints?
1. We look at the definition of `id` and generate the type variable `top0` for it. We continue to infer the expression to the right with the environment where `id := top0`.
2. The lambda expression case generates the type `t1` for the argument `x`.
3. The lambda expression infers the type of the lambda body, in the environment where `id := top0` and `x := t1`. We look for `x` in the environment and then generate a new type variable `t2` for it as well, and generate an equality constraint: `Equality t2 t1`.
4. The lambda expression infers its type as a function from the input type `t1` to the output type `t2`: `t1 -> t2`.
5. After infering the type of the body of `id`, we constraint the type variable generated for the name `id` with the inferred type of the expression: `Equality top0 (t1 -> t2)`.

#### Constraint solving

We go over the constraints in order and unify them, generating a substitution when needed and applying it to both the remaining constraints and the accumulating substitution.

``````- Constraints:
[ Equality t2 t1
, Equality top0 (t1 -> t2)
]
- Accumulated Substitution:
[
]

1. Equality t2 t1
=> t2 := t1
-- The first type is a type variable, so we add the
-- above to the accumulated substitution and apply it

- Constraints:
[ Equality top0 (t1 -> t1)
]
- Accumulated Substitution:
[ t2 := t1
]

2. Equality top0 (t1 -> t1)
=> top0 := t1 -> t1
-- Same as before

- Constraints:
[
]
- Accumulated Substitution:
[ t2 := t1
, top0 := t1 -> t1
]``````

When we run out of constraints, we are done. Lets apply the accumulated substitution to our elaborated AST:

``````      +---t1
|
|    +---- t1
|    |
---  ---
id = \x -> x
---  -------
|      |
|      +----- t1 -> t1
|
|
+--- t1 -> t1``````

#### Generalization

We take a look of the type of `id`, find all of the type variables (`t1`), generate a type variable for each one (`a`) and substitute the original type variable with the new one, then we wrap the resulting type with a `forall` and list the new type variables:

``id :: forall a. a -> a``

### Second group

``one = id 1``

#### Elaboration

Note: we use the types of the top-level definitions of previous groups when elaborating a group! In our case our environment only contains: `id :: forall a. a -> a`

We get this elaborated AST:

`````` +--- top0
|
|      +--- t2
|      |
---   ----
one = id 1
-- -
|  |
|  +--- Int
|
+------ t1``````

And constraints:

``````[ Equality t1 (forall a. a -> a)
, Equality t1 (Int -> t2)
, Equality top0 t2
]``````
##### How did we get these constraints?
1. For the definition `one` we generate the type variable `top0`.
2. We look at the function application in three steps, the first: looking at the argument, in this case `1` which is a int literal. We annotate it as such.
3. We then lookup `id` in our environment, generate the type variable `t1`, and produce a constraint between the two: `Equality t1 (forall a. a -> a)`
4. We generate a type variable `t2` to represent the return value, and express our expectation that `id`'s is equals to a function from the type of the argument `Int` to the return type `t2`: `Equality t1 (Int -> t2)`
5. Finally, we constrain the type of `one` with the type of the expression: `Equality top0 t2`

#### Constraint solving

Let's go over them in order:

``````- Constraints:
[ Equality t1 (forall a. a -> a)
, Equality t1 (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution:
[
]

1. Equality t1 (forall a. a -> a)
=> [ Equality t1 (ti_1 -> ti_1) ]
-- A forall! Instantiate and generate a new constraint.

- Constraints:
[ Equality t1 (ti_1 -> ti_1)
, Equality t1 (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution: unchanged

2. Equality t1 (ti_1 -> ti_1)
=> t1 := ti_1 -> ti_1

- Constraints:
[ Equality (ti_1 -> ti_1) (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution:
[ t1 := ti_1 -> ti_1
]

3. Equality (ti_1 -> ti_1) (Int -> t2)
=> [ Equality (->) (->), Equality ti_1 Int, Equality ti_1 t2]

- Constraints:
[ Equality (->) (->)
, Equality ti_1 Int
, Equality ti_1 t2
[ Equality top0 t2
]
- Accumulated Substitution: unchanged

4. Equality (->) (->)
=> []
-- Two type constructors are equal - safe to skip

- Constraints:
[ Equality ti_1 Int
, Equality ti_1 t2
[ Equality top0 t2
]
- Accumulated Substitution: unchanged

5. Equality ti_1 Int
=> ti_1 := Int

- Constraints:
[ Equality Int t2
[ Equality top0 t2
]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
]

5. Equality Int t2
=> t2 := Int

- Constraints:
[ Equality top0 Int
]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
, t2 := Int
]

5. Equality top0 Int
=> top0 := Int

- Constraints:
[]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
, t2 := Int
, top0 := Int
]``````

#### Substitution

Now we apply the substitution elaborated program AST:

`````` +--- Int
|
|      +--- Int
|      |
---   ----
one = id 1
-- -
|  |
|  +--- Int
|
+------ Int -> Int``````

#### Generalization

Nothing to do in this case! We get:

``one :: Int``

### Result

This concludes our type inference process, and this is the result:

``````      +---a
|
|    +---- a
|    |
---  ---
id = \x -> x
---  -------
|      |
|      +----- a -> a
|
|
+--- forall a. a -> a

+--- Int
|
|      +--- Int
|      |
---   ----
one = id 1
-- -
|  |
|  +--- Int
|
+------ Int -> Int``````

# Summary

This is the general overview of Giml's type inference engine. If you are interested in learning more about it, check out the source code and documentation under the Language.Giml.Types hierarchy.

In the next blog posts I will cover the type inference of extensible records and polymorphic variants.

If this helped you, there's something you want me to clarify, you think there's something I missed, there's something you want to know more about, or there's something I got totally wrong, do contact me on mastodon or via Email.