Giml's type inference engine uses unification-based constraint solving.

It has 5 important parts:

  1. Topologically order definitions and group those that depend on one another
  2. Elaboration and constraint generation
  3. Constraint solving
  4. Instantiation
  5. Substitution

In summary, we sort and group definitions by their dependencies, we elaborate the program with type variables and types that we know and collect constraints on those variables, we then solve the constraints for each group using unification and create substitutions which are mapping from type variables to types and merge them together, and we then substitute the type variables we generated in the elaboration stage with their mapped types from the substitution in the program.

Instantiation occurs during the elaboration stage and the constraint solving stage, when we want to use a polymorphic type somewhere.

Group and order definitions

Before we start actually doing type inference, we figure out the dependencies between the various definitions. This is because we want to figure out the type of a definition before its use, and because we want definitions that depend on each other (mutual recursion) to be solved together.

So for example if we have this file as input:

main = ffi("console.log", odd 17)

even x =
    if int_equals x 0
        then True
        else if int_equals x 1
            then False
            else odd (sub x 1)

odd x =
    if int_equals x 0
        then False
        else if int_equals x 1
            then True
            else even (sub x 1)

The output of the reordering and grouping algorithm will be:

[["even", "odd"], ["main"]]

This is because even and odd are mutually recursive, so we want to typecheck them together, and because main depends on odd we want to typecheck it after we finished typechecking odd.

The module implementing this algorithm is Language.Giml.Rewrites.PreInfer.GroupDefsByDeps.

Elaboration

Now the real work begins. We start by going over the program's AST and annotate it with the types we know. When we don't know what the type of something is we "invent" a new fresh type variable as a placeholder and mark that type with a constraint according to its usage.

Types

Language.Giml.Types.Types contains a datatype that can be used to represent Giml types. For example:

  • TypeCon "Int" represents the type Int
  • TypeVar "t1" represents a polymorphic type variable
  • TypeApp (TypeCon "Option") (TypeCon "Int") represents the type Option Int
  • TypeRec [("x", TypeCon "Int")] represents a record with the type { x : Int }

Constraints

A Constraint between two types describes their relationship.

The most common constraint is Equality. Which means that the two types are interchangeable and should unify to the same type. For example from the constraint Equality (TypeVar t1) (TypeCon "Int") we learn that t1 is the same as Int.

The other, less common constraint, is InstanceOf. Which means that the first type has an Equality relationship with the instantiation of the second type. an instantiation is a monomorphic instance of a polymorphic type, for example (a -> b) -> List a -> List b is polymorphic, and can be instantiated, among other examples, to (Int -> String) -> List Int -> List String by replacing the polymorphic type variables a and b with monomorphic types such as Int and String.

We will talk about solving these constraints a bit later.

Implementation

We also use State, Reader and Except in our algorithm:

  • The State contains:
    • A monotonically increasing integer for generating type variables
    • A list of constraints we generated through the elaboration phase, which we will return as output at the end
    • An environment of data definitions, some are built-in and others are user defined and added as part of the elaboration stage (maybe I should've put this in the Reader part...)
  • The Reader contains:
    • The currently scoped variables in the environment and their type (which might be polymorphic!)
    • The type of builtin functions and expressions (such as int_equals and pure, which might also be polymorphic)
  • The Except part can throw a few errors such as "unbound variable" error.

Most of the action happens in elaborateExpr and I'm not going to cover every usecase, but lets look at a few examples:

Elaborating literals

Elaborating literals is simple - we already know what the types are - if we see a string literal we annotate the AST node with the type String, if we see an integer literal we annotate it with Int, and we don't need to generate new constraints because we know exactly what the type is.

Elaborating lambda functions

Elaborating lambda expressions is interesting, because we don't know what the type for the arguments is going to be: it might be inferred by the usage like in this case: \n -> add 1 n or not like in this case \x y -> x. What we do here is:

  1. Generate a type variable for each function argument
  2. Elaborate the body of the lambda with the extended environment containing the mapping between the function arguments and their newly generated types
  3. Annotate the lambda AST node with the type of a function from the argument types to the type of the body, which we get from the elaborated node after (2)

And we will trust the constraint solving algorithm to unify all of the types.

Elaborating function application

To elaborate a function application we:

  1. Elaborate the arguments
  2. Elaborate the applied expression
  3. Generate a fresh type variable to represent the return type
  4. Constrain the type of the applied expression as a function from the arguments to the return type
  5. Annotate the function application AST node with the return type (which we just generated)

Remember that the type of the applied expression may just be a generated type variable, so we can't examine its type to see what the return type is. For example if we had this expression \f x -> f x, as we saw before we generated type variables for f and x! So we have to make sure the type inference engine knows what their type should be (by generating a constraint) according to the usage and trust that it'll do its job.

Elaborating variables

For variables, we expect an earlier stage (such as elaborating lambdas, lets or definitions) to have already elaborated their type and added them to the environment, so we look them up. We first try in the scoped environment and if that fails we try the built-ins environment. If that fails as well, we throw an error that the variable is unbound.

If we do find a variable in the scoped environment, there are two scenarios we need to take care of:

  1. The variable was bound in a place that expect it to be monomorphic, such as a lambda function argument
  2. The variable was bound in a place that expect it to be polymorphic, such as a term definition

Monomorphic types can be used as is, but polymorphic types represent an instantiation of a type. When we find that the variable we are elaborating is polymorphic, we will generate a new type variable for it and constrain it as an InstanceOf the type we found in the environment.

Later in the constraints solving stage we will instantiate the polymorphic type we found in the environment and say that the type variable we generate is equals to the instantiation. We delay this because we want the polymorphic type to be solved first before we instantiate it.

If we do not find the variable in the scoped environment, we check the builtins environment. If we find the type there, we instantiate it immediately and annotate the AST node with this type. Here we instantiate the type immediately because it is already solved, so we don't need to wait.

Constraints solving

The heart of the type inference engine. Here we visit each the constraints generated from each group of substitutions separately and generate a substitution (mapping from type variables to types) as output.

the solveConstraints function takes an accumulated substitution and a list of constraints as input and tries to solve the first constraint on the list. Solving a constraint might generate more constraints to solve, and may return a substitution as well. We then apply this substitution (substitute all occurences of the keys with the values) to the rest of the constraints and the accumulated substitution itself, merge the new substitution with the accumulated substitution, and keep going until there are no more constraints left to solve.

(solveConstraint) uses unification to solve a constraint. Here are a few important cases that it handles:

Equality t1 t2 (where t1 == t2)

When the two types are the same (for example Int and Int, or t1 and t1), the unification succeeds without extra work needed, we don't need to create new constraints or create a substitution.

Equality (TypeVar tv) t2

When one of the types is a type variable, we substitute (replace) it with t2 in all places we know of (accumulated substitution and rest of the constraints). So the rest of the constraints that contain it will use t2 instead, and when substituting it with another type it'll be t2.

Equality (TypeApp f1 a1) (TypeApp f2 a2)

We generate two new constraints: Equality f1 f2 and Equality a1 a2, reducing a constraint of complex types a step further until eventually we get something we already know how to handle: a constraint on simple types.

InstanceOf t1 t2

When we run into an InstanceOf constraint, we instantiate the second type t2 and produce an Equality constraint between t1 and the instantiation of t2.

Instantiating is basically "copying" the type signature with freshly generated type variables, and producing the Equality constraint with t1 to make the type inference engine find the right type for the instantiated type variables.

We use InstanceOf instead of Equality when the type is polymorphic, for example when we want to use id that we defined as id x = x. If we were to use an Equality constraint instead, every use of id would constrain it to have the same type. For example, replicate (id 1) (id 'x') would fail to type check because we wouldn't know if id should have the type Int -> Int or Char -> Char. With InstanceOf, each use of id generates a separate type, each with the same shape but with different type variables.

There are a few more cases, but this is the gist of it.

When we fail to unify we throw an error of type mismatch.

Example

Let's see an example for this (very) short program:

one = (\x -> x) 1

After the elaboration stage we generate the following AST and constraints:

          +-- t1 -> t1
          |
      --------- 
one = (\x -> x) 1
 |           |  |
 +-- t2      |  +------ Int
             |
             +---- t1
[ Equality top0 t2
, Equality (t1 -> t1) (Int -> t2)
]

How did we get these constraints?

  1. The lambda expression case generated the type t1 for x, so it's type is t1 -> t1
  2. The literal case annotated 1 as Int
  3. The function application generated the return type t2 and generated the constraint Equality (t1 -> t1) (Int -> t2) which is "the type of the applied expression is equals to a function from the argument to the return type I just generated"
  4. The definition of one generated the type variable top0 for it so it can be used in other definitions polymorphically, but for this function it should be used monomorphically, so it's constrainted as equal to the return type and generated the constraint Equality top0 t2.

How do we unify them?

Let's go over them in order. We begin with an empty substitution:

- Constraints: [Equality top0 t2, Equality (t1 -> t1) (Int -> t2)]
- Substitution: []

1. Equality top0 t2
    -- ==> The first is a type variable, so add `top0 := t2` to the substitution

- Constraints: [Equality (t1 -> t1) (Int -> t2)]
- Substitution: [top0 := t2]

2. Equality (t1 -> t1) (Int -> t2)
    -- ==> replace with two new constraints:
    --  1. Equality (-> t1) (-> Int)
    --  2. Equality t1 t2

- Constraints: [Equality (-> t1) (-> Int), Equality t1 t2]
- Substitution: [top0 := t2]

3. Equality (-> t1) (-> Int)
    -- ==> replace again with two new constraints:
    --  1. Equality (->) (->)
    --  2. Equality t1 Int

- Constraints: [Equality (->) (->), Equality t1 Int, Equality t1 t2]
- Substitution: [top0 := t2]

4. Equality (->) (->)
    -- ==> these two types are identical, so we can continue

- Constraints: [Equality t1 Int, Equality t1 t2]
- Substitution: [top0 := t2]

5. Equality t1 Int
    -- ==> The first is a type variable, so add `t1 := Int` to the substitution

- Constraints: [Equality Int t2]
- Substitution: [top0 := t2, t1 := Int]

6. Equality Int t2
    -- ==> The second is a type variable, so add `t2 := Int` to the substitution

- Constraints: []
- Substitution: [top0 := Int, t1 := Int, t2 := Int]

And we're done! The output is the substitution [top0 := Int, t1 := Int, t2 := Int], we can now apply it to our elaborated program and get the fully type checked program:

          +-- Int -> Int
          |
      --------- 
one = (\x -> x) 1
 |           |  |
 +-- Int     |  +------ Int
             |
             +---- Int

Two more things we talk to talk about before we wrap this up:

Instantiation

We instantiate a type by looking up all of the type variables in the type and generate a fresh new type variable for each one (though all instances of a type variable should be mapped to the same new type variable).

So for example in the type (a -> b) -> List a -> List b, if t1 and t2 are two new type variables we haven't seen before, we can replace a with t1 and b with t2 everywhere in the type and get (t1 -> t2) -> List t1 -> List t2

After that, we let the Equality constraints and the constraint solving algorithm find the right monomorphic type for t1 and t2.

Substitution

When we apply a substitution to a constraint or another substitution, we replace all occurences of a type variable tv with t if t := tv is in the substitution. But, we also need to check that tv does not contain t inside of it! (this is called an occurs check)

if tv does contain t, it means that the we ran into an infinite type and we don't support those here, so we will throw an error.

A simple way to generate such case is this:

x y = x

The constraint generated from this definition is: Equality top0 (t1 -> top0), which will fail the occurs check.

If you try to desugar this definition into a lambda expression, you'll quickly discover that you've been had: anytime you'll go to replace x with \y -> x you'll just add another level of lambda.

Summary

This is the general overview of Giml's type inference engine. If you are interested in learning more about it, check out the source code.

In the next blog posts I will cover the type inference of extensible records and polymorphic variants.

If this helped you, there's something you want me to clarify, you think there's something I missed, there's something you want to know more about, or there's something I got totally wrong, do contact me on Twitter or via Email.