Giml's type inference engine uses unificationbased constraint solving.
It has 5 important parts:
 Topologically order definitions and group those that depend on one another
 Elaboration and constraint generation
 Constraint solving
 Instantiation
 Substitution
In summary, we sort and group definitions by their dependencies, we elaborate the program with type variables and types that we know and collect constraints on those variables, we then solve the constraints for each group using unification and create substitutions which are mapping from type variables to types and merge them together, and we then substitute the type variables we generated in the elaboration stage with their mapped types from the substitution in the program.
Instantiation occurs during the elaboration stage and the constraint solving stage, when we want to use a polymorphic type somewhere.
Group and order definitions
Before we start actually doing type inference, we figure out the dependencies between the various definitions. This is because we want to figure out the type of a definition before it's use, and because we want definitions that depend on each other (mutual recursion) to be solved together.
So for example if we have this file as input:
main = ffi("console.log", odd 17)
even x =
if int_equals x 0
then True
else if int_equals x 1
then False
else odd (sub x 1)
odd x =
if int_equals x 0
then False
else if int_equals x 1
then True
else even (sub x 1)
The output of the reordering and grouping algorithm will be:
[["even", "odd"], ["main"]]
This is because even
and odd
are mutually recursive, so we want to typecheck them together,
and because main depends on odd
we want to typecheck it after we finished typechecking odd
.
The module implementing this algorithm is Language.Giml.Rewrites.PreInfer.GroupDefsByDeps.
Elaboration
Now the real work begins. We start by going over the program's AST and annotate it with the types we know. When we don't know what the type of something is we "invent" a new fresh type variable as a placeholder and mark that type with a constraint according to its usage.
Types
Language.Giml.Types.Types contains a datatype that can be used to represent Giml types. For example:
TypeCon "Int"
represents the typeInt
TypeVar "t1"
represents a polymorphic type variableTypeApp (TypeCon "Option") (TypeCon "Int")
represents the typeOption Int
TypeRec [("x", TypeCon "Int")]
represents a record with the type{ x : Int }
Constraints
A Constraint between two types describes their relationship.
The most common constraint is Equality
. Which means that the two types are
interchangeable and should unify to the same type. For example from the constraint Equality (TypeVar t1) (TypeCon "Int")
we learn that t1
is the same as Int
.
The other, less common constraint, is InstanceOf
. Which means that the first type has
an Equality
relationship with the instantiation of the second type.
an instantiation is a monomorphic instance of a polymorphic type,
for example (a > b) > List a > List b
is polymorphic, and can be instantiated, among other examples, to (Int > String) > List Int > List String
by replacing the polymorphic type variables a
and b
with monomorphic types such as Int
and String
.
We will talk about solving these constraints a bit later.
Implementation
We also use State
, Reader
and Except
in our algorithm:
 The State contains:
 A monotonically increasing integer for generating type variables
 A list of constraints we generated through the elaboration phase, which we will return as output at the end
 An environment of data definitions, some are builtin and others are user defined and added as part of the elaboration stage (maybe I should've put this in the Reader part...)
 The Reader contains:
 The currently scoped variables in the environment and their type (which might be polymorphic!)
 The type of builtin functions and expressions (such as
int_equals
andpure
, which might also be polymorphic)
 The Except part can throw a few errors such as "unbound variable" error.
Most of the action happens in elaborateExpr and I'm not going to cover every usecase, but lets look at a few examples:
Elaborating literals
Elaborating literals is simple  we already know what the types are  if we see a string literal
we annotate the AST node with the type String
, if we see an integer literal we annotate it
with Int
, and we don't need to generate new constraints because we know exactly what the type is.
Elaborating lambda functions
Elaborating lambda expressions is interesting, because we don't know what the type for the arguments
is going to be: it might be inferred by the usage like in this case: \n > add 1 n
or not
like in this case \x y > x
. What we do here is:
 Generate a type variable for each function argument
 Elaborate the body of the lambda with the extended environment containing the mapping between the function arguments and their newly generated types
 Annotate the lambda AST node with the type of a function from the argument types to the type of the body, which we get from the elaborated node after (2)
And we will trust the constraint solving algorithm to unify all of the types.
Elaborating function application
To elaborate a function application we:
 Elaborate the arguments
 Elaborate the applied expression
 Generate a fresh type variable to represent the return type
 Constrain the type of the applied expression as a function from the arguments to the return type
 Annotate the function application AST node with the return type (which we just generated)
Remember that the type of the applied expression may just be a generated type variable, so we
can't examine its type to see what the return type is. For example if we had this expression
\f x > f x
, as we saw before we generated type variables for f
and x
!
So we have to make sure the type inference engine knows what their type should be
(by generating a constraint) according to the usage and trust that it'll do its job.
Elaborating variables
For variables, we expect an earlier stage (such as elaborating lambdas, lets or definitions) to have already elaborated their type and added them to the environment, so we look them up. We first try in the scoped environment and if that fails we try the builtins environment. If that fails as well, we throw an error that the variable is unbound.
If we do find a variable in the scoped environment, there are two scenarios we need to take care of:
 The variable was bound in a place that expect it to be monomorphic, such as a lambda function argument
 The variable was bound in a place that expect it to be polymorphic, such as a term definition
Monomorphic types can be used as is, but polymorphic types represent an instantiation of a type.
When we find that the variable we are elaborating is polymorphic, we will generate a new type
variable for it and constrain it as an InstanceOf
the type we found in the environment.
Later in the constraints solving stage we will instantiate the polymorphic type we found in the environment and say that the type variable we generate is equals to the instantiation. We delay this because we want the polymorphic type to be solved first before we instantiate it.
If we do not find the variable in the scoped environment, we check the builtins environment. If we find the type there, we instantiate it immediately and annotate the AST node with this type. Here we instantiate the type immediately because it is already solved, so we don't need to wait.
Constraints solving
The heart of the type inference engine. Here we visit each the constraints generated from each group of substitutions separately and generate a substitution (mapping from type variables to types) as output.
the solveConstraints function takes an accumulated substitution and a list of constraints as input and tries to solve the first constraint on the list. Solving a constraint might generate more constraints to solve, and may return a substitution as well. We then apply this substitution (substitute all occurences of the keys with the values) to the rest of the constraints and the accumulated substitution itself, merge the new substitution with the accumulated substitution, and keep going until there are no more constraints left to solve.
(solveConstraint) uses unification to solve a constraint. Here are a few important cases that it handles:
Equality t1 t2 (where t1 == t2)
When the two types are the same (for example Int
and Int
, or t1
and t1
),
the unification succeeds without extra work needed,
we don't need to create new constraints or create a substitution.
Equality (TypeVar tv) t2
When one of the types is a type variable, we substitute (replace) it with t2
in all places we know of
(accumulated substitution and rest of the constraints). So the rest of the constraints that
contain it will use t2
instead, and when substituting it with another type it'll be t2
.
Equality (TypeApp f1 a1) (TypeApp f2 a2)
We generate two new constraints: Equality f1 a2
and Equality f2 a2
, reducing a constraint
of complex types into something we already know how to handle: a constraint on simple types.
InstanceOf t1 t2
When we run into an InstanceOf
constraint, we instantiate the second type t2
and produce an Equality
constraint between t1
and the instantiation of t2
.
Instantiating is basically "copying" the type signature, and producing the Equality
constraint with t1
makes the type inference engine find the right type for the instantiated
type variables.
We use InstanceOf
instead of Equality
when the type is polymorphic,
for example when we want to use id
that we defined as id x = x
.
If we were to use an Equality
constraint instead, every use of id
would constrain
it to have a particular type (so id 1
would make the type of id Int > Int
instead of only where we use it).
There are a few more cases, but this is the gist of it.
When we fail to unify we throw an error of type mismatch.
Example
Let's see an example for this (very) short program:
one = (\x > x) 1
After the elaboration stage we generate the following AST and constraints:
+ t1 > t1


one = (\x > x) 1
  
+ t2  + Int

+ t1
[ Equality top0 t2
, Equality (t1 > t1) (Int > t2)
]
How did we get these constraints?
 The lambda expression case generated the type
t1
forx
, so it's type ist1 > t1
 The literal case annotated
1
asInt
 The function application generated the return type
t2
and generated the constraintEquality (t1 > t1) (Int > t2)
which is "the type of the applied expression is equals to a function from the argument to the return type I just generated"  The definition of
one
generated the type variabletop0
for it so it can be used in other definitions polymorphically, but for this function it should be used monomorphically, so it's constrainted as equal to the return type and generated the constraintEquality top0 t2
.
How do we unify them?
Let's go over them in order. We begin with an empty substitution:
 Constraints: [Equality top0 t2, Equality (t1 > t1) (Int > t2)]
 Substitution: []
1. Equality top0 t2
 ==> The first is a type variable, so add `top0 := t2` to the substitution
 Constraints: [Equality (t1 > t1) (Int > t2)]
 Substitution: [top0 := t2]
2. Equality (t1 > t1) (Int > t2)
 ==> replace with two new constraints:
 1. Equality t1 Int
 2. Equality t1 t2
 Constraints: [Equality (> t1) (> Int), Equality t1 t2]
 Substitution: [top0 := t2]
3. Equality (> t1) (> Int)
 ==> replace again with two new constraints:
 1. Equality (>) (>)
 2. Equality t1 Int
 Constraints: [Equality (>) (>), Equality t1 Int, Equality t1 t2]
 Substitution: [top0 := t2]
4. Equality (>) (>)
 ==> these two types are identical, so we can continue
 Constraints: [Equality t1 Int, Equality t1 t2]
 Substitution: [top0 := t2]
5. Equality t1 Int
 ==> The first is a type variable, so add `t1 := Int` to the substitution
 Constraints: [Equality Int t2]
 Substitution: [top0 := t2, t1 := Int]
6. Equality Int t2
 ==> The second is a type variable, so add `t2 := Int` to the substitution
 Constraints: []
 Substitution: [top0 := Int, t1 := Int, t2 := Int]
And we're done! The output is the substitution [top0 := Int, t1 := Int, t2 := Int]
,
we can now apply it to our elaborated program and get the fully type checked program:
+ Int > Int


one = (\x > x) 1
  
+ Int  + Int

+ Int
Two more things we talk to talk about before we wrap this up:
Instantiation
We instantiate a type by looking up all of the type variables in the type and generate a fresh new type variable for each one (though all instances of a type variable should be mapped to the same new type variable).
So for example in the type (a > b) > List a > List b
, if t1
and t2
are two
new type variables we haven't seen before, we can replace a
with t1
and b
with t2
everywhere in the type and get (t1 > t2) > List t1 > List t2
After that, we let the Equality
constraints and the constraint solving algorithm
find the right monomorphic type for t1
and t2
.
Substitution
When we apply a substitution to a constraint or another substitution, we replace all occurences
of a type variable tv
with t
if t := tv
is in the substitution. But,
we also need to check that tv
does not contain t
inside of it! (this is called an occurs check)
if tv
does contain t
, it means that the we ran into an infinite type and we don't support those here,
so we will throw an error.
A simple way to generate such case is this:
x y = x
The constraint generated from this definition is: Equality top0 (t1 > top0)
, which will fail
the occurs check.
If you try to desugar this definition into a lambda expression, you'll quickly discover that
you've been had: anytime you'll go to replace x
with \y > x
you'll just add another level
of lambda.
Summary
This is the general overview of Giml's type inference engine. If you are interested in learning more about it, check out the source code.
In the next blog posts I will cover the type inference of extensible records and polymorphic variants.
If this helped you, there's something you want me to clarify, you think there's something I missed, there's something you want to know more about, or there's something I got totally wrong, do contact me on Twitter or via Email.