Note: this article was last updated on 2022-06-07
Giml's type inference engine uses unification-based constraint solving.
It has 6 important parts:
- Topologically order definitions and group those that depend on one another
- Elaboration and constraint generation
- Constraint solving
- Instantiation
- Substitution
- Generalization
The general process is as follows: we sort and group definitions by their dependencies, we elaborate the program with type variables and types that we know and collect constraints on those variables. We then solve the constraints for each group using unification, instatiating polymorphic types we run into, and create substitutions, which are mapping from type variables to types, and merge them together. Then we substitute the type variables we generated in the elaboration stage with their mapped types from the substitution in the program, and then generalize the types of top-level definitions and close over their free variables.
Lets unpack all of that, step by step.
Group and order definitions
Before we start actually doing type inference, we figure out the dependencies between the various definitions. This is because we want to figure out the type of a definition before its use, and because we want definitions that depend on each other (mutual recursion) to be solved together.
So for example if we have this file as input:
main = ffi("console.log", odd 17)
even x =
if int_equals x 0
then True
else if int_equals x 1
then False
else odd (sub x 1)
odd x =
if int_equals x 0
then False
else if int_equals x 1
then True
else even (sub x 1)
The output of the reordering and grouping algorithm will be:
[["even", "odd"], ["main"]]
This is because even
and odd
are mutually recursive, so we want to typecheck them together,
and because main depends on odd
we want to typecheck it after we finished typechecking odd
.
The module implementing this algorithm is Language.Giml.Rewrites.PreInfer.GroupDefsByDeps.
Elaboration
Now the real work begins. We start by going over the program's AST and annotate it with the types we know. When we don't know what the type of something is we "invent" a new fresh type variable as a placeholder and mark that type with a constraint according to its usage.
Types
Language.Giml.Types.Types contains a datatype that can be used to represent Giml types. For example:
TypeCon "Int"
represents the typeInt
TypeVar "t1"
represents a polymorphic type variableTypeApp (TypeCon "Option") (TypeCon "Int")
represents the typeOption Int
TypeRec [("x", TypeCon "Int")]
represents a record with the type{ x : Int }
Constraints
A Constraint between two types describes their relationship.
Currently we only have one type of constraint: Equality
.
Which means that the two types are interchangeable and should unify to the same type.
For example from the constraint Equality (TypeVar t1) (TypeCon "Int")
we learn
that t1
is the same as Int
.
One instance of an Equality
constraints stands out from the rest - when the second argument
is a TypeScheme
. In that case we mean to say that the first type of the Equality
has an Equality
relationship with the instantiation of the second type.
An instantiation is a monomorphic instance of a polymorphic type,
for example forall a b. (a -> b) -> List a -> List b
is polymorphic,
and can be instantiated, among other examples, to
(Int -> String) -> List Int -> List String
by replacing the polymorphic
type variables a
and b
which are declared in the forall
clause
with monomorphic types such as Int
and String
.
We will talk about solving these constraints a bit later.
Implementation
We also use State
, Reader
and Except
in our algorithm:
- The State contains:
- A monotonically increasing integer for generating type variables
- A list of constraints we generated through the elaboration phase, which we will return as output at the end
- The Reader contains:
- The currently scoped variables in the environment and their type
- The type of builtin functions and expressions (such as
int_equals
andpure
) - An environment of data definitions, some are built-in and others are user defined and added as part of the elaboration stage
- The Except part can throw a few errors such as "unbound variable" error.
Most of the action happens in elaborateExpr. I'm not going to cover every usecase, but lets look at a few examples:
Elaborating literals
Elaborating literals is simple - we already know what the types are - if we see a string literal
we annotate the AST node with the type String
, if we see an integer literal we annotate it
with Int
, and we don't need to generate new constraints because we know exactly what the type is.
Elaborating lambda functions
Elaborating lambda expressions is interesting, because we don't know what type each argument
is going to be: it might be inferred by the usage like in this case: \n -> add 1 n
or not
like in this case \x y -> x
. What we do here is:
- Generate a type variable for each function argument
- Elaborate the body of the lambda with the extended environment containing the mapping between the function arguments and their newly generated types
- Annotate the lambda AST node with the type of a function from the argument types to the type of the body, which we get from the elaborated node after (2)
And we will trust the constraint solving algorithm to unify all of the types.
Elaborating function application
To elaborate a function application we:
- Elaborate the arguments
- Elaborate the applied expression
- Generate a fresh type variable to represent the return type
- Constrain the type of the applied expression as a function from the arguments to the return type
- Annotate the function application AST node with the return type (which we just generated)
Remember that the type of the applied expression may just be a generated type variable, so we
can't examine its type to see what the return type is. For example if we had this expression
\f x -> f x
, as we saw before we generated type variables for f
and x
!
So we have to make sure the type inference engine knows what their type should be
(by generating a constraint) according to the usage and trust that it'll do its job.
Elaborating variables
For variables, we expect an earlier stage (such as elaborating lambdas, lets or definitions) to have already elaborated their type and added them to the environment, so we look them up. We first try to look them up in the scoped environment and if that fails we try the built-ins environment. If that fails as well, we throw an error that the variable is unbound.
If we do find a variable in the scoped environment, we generate a fresh variable and add constrain the type of the fresh variable to be equals to the type we found in the env.
Generating a fresh variable is important, because if the type we found in the environment
is polymorphic (i.e. it is a TypeScheme
), the constraint solving phase will instantiate
the polymorphic type. We'll talk more about what this means a bit later.
Constraint solving
The heart of the type inference engine. Here we visit each the constraints generated from each group of substitutions separately and generate a substitution (mapping from type variables to types) as output.
the solveConstraints function takes an accumulated substitution and a list of constraints as input and tries to solve the first constraint on the list. Solving a constraint might generate more constraints to solve, and may return a substitution as well. We then apply this substitution (substitute all occurences of the keys with the values) to the rest of the constraints and the accumulated substitution itself, merge the new substitution with the accumulated substitution, and keep going until there are no more constraints left to solve.
(solveConstraint) uses a process called unification to solve a constraint. The unification process tries to match two types and produce additional clues to what concrete types they represent or fail when the types do not match.
Here are a few important cases that it handles:
Equality t1 t2 (where t1 == t2)
When the two types are the same (for example Int
and Int
, or t1
and t1
),
the unification succeeds without extra work needed,
we don't need to create new constraints or create a substitution.
Equality (TypeVar tv) t2
When one of the types is a type variable, we substitute (replace) it with t2
in all places we know of
(accumulated substitution and rest of the constraints). So the rest of the constraints that
contain it will refer to t2
instead, and when substituting it with another type it'll be t2
.
Equality (TypeApp f1 a1) (TypeApp f2 a2)
We generate two new constraints: Equality f1 f2
and Equality a1 a2
, reducing a constraint
of complex types a step further until eventually we get something we already know how to handle:
a constraint on simple types.
Equality t1 (TypeScheme vars t2)
When the second argument of Equality
is a TypeScheme
, we instantiate the second type t2
and produce an Equality
constraint between t1
and the instantiation of t2
.
Instantiating is basically "copying" the type signature with freshly generated type variables,
and producing the Equality
constraint with t1
to make the type inference engine find the
right type for each of the instantiated type variables.
There are a few more cases, but this is the gist of it.
When we fail to unify we throw an error of type mismatch.
Instantiation
When we run into a type scheme (a type with a forall
) during the constraint solving stage,
we instantiate the type by looking up all of the type variables in the type and generate
a fresh new type variable for each one
(though all instances of a type variable should be mapped to the same new type variable).
So for example in the type forall a b. (a -> b) -> List a -> List b
,
if t1
and t2
are two new type variables we haven't seen before, we can replace a
with t1
and b
with t2
everywhere in the type and get (t1 -> t2) -> List t1 -> List t2
After that, we let the Equality
constraints and the constraint solving algorithm
find the right monomorphic types for t1
and t2
.
Substitution
As part of the constraint solving process, we generate substitutions when we unify type variables. We need to use the generated substitution immediately by applying it over our accumulated substitution and remaining constraints.
When we apply a substitution to a constraint or another substitution,
we replace all occurences
of a type variable tv
with t
if t := tv
is in the substitution. But,
we also need to check that tv
does not contain t
inside of it!
(this is called an occurs check)
if tv
does contain t
,
it means that the we ran into an infinite type and we don't support those here,
so we will throw an error.
A simple way to generate such case is this:
x y = x
The constraint generated from this definition is: Equality top0 (t1 -> top0)
, which will fail
the occurs check.
If you try to desugar this definition into a lambda expression, you'll quickly discover that
you've been had: anytime you'll go to replace x
with \y -> x
you'll just add another level
of lambda.
Generalization
Once we applied the substitution to the elaborated types for a group of definitions, we have figured out the types of all definitions in a group. What's left is to declare the polymorphic definitions as polymorphic in a way that when we reference these definitions in subsequent groups we'll know that we need to instantiate them.
Basically, we need to convert the types of top-level definitions to
TypeScheme
s (or forall
s) where the type variables are the leftover type variables.
For example, if we have the type t1 -> Int -> t2 -> t1
,
we find all of the free variables (t1
and t2
), replace them in the type signature
with nicer types (t1
=> a
, t2
=> b
), and write them down at the forall
clause
of the type (forall a b. a -> Int -> b -> a
).
If there are no free variables in the top-level type signature we will generate
an empty forall
list, so we just omit the forall
clause for convenience.
Example time!
Let's see an example for this (very) short program:
one = id 1
id = \x -> x
Topological ordering
We start with topological ordering.
We can see that one
depends on id
, but not the other
way around, so we separate them to different groups and id
comes first.
[["id"], ["one"]]
First group
The first group is:
id = \x -> x
Elaboration
After the elaboration stage we generate the following elaborated AST:
+---t1
|
| +---- t2
| |
--- ---
id = \x -> x
--- -------
| |
| +----- t1 -> t2
|
|
+--- top0
And constraints:
[ Equality t2 t1
, Equality top0 (t1 -> t2)
]
How did we get these constraints?
- We look at the definition of
id
and generate the type variabletop0
for it. We continue to infer the expression to the right with the environment whereid := top0
. - The lambda expression case generates the type
t1
for the argumentx
. - The lambda expression infers the type of the lambda body, in the environment
where
id := top0
andx := t1
. We look forx
in the environment and then generate a new type variablet2
for it as well, and generate an equality constraint:Equality t2 t1
. - The lambda expression infers its type as a function from the input type
t1
to the output typet2
:t1 -> t2
. - After infering the type of the body of
id
, we constraint the type variable generated for the nameid
with the inferred type of the expression:Equality top0 (t1 -> t2)
.
Constraint solving
We go over the constraints in order and unify them, generating a substitution when needed and applying it to both the remaining constraints and the accumulating substitution.
- Constraints:
[ Equality t2 t1
, Equality top0 (t1 -> t2)
]
- Accumulated Substitution:
[
]
1. Equality t2 t1
=> t2 := t1
-- The first type is a type variable, so we add the
-- above to the accumulated substitution and apply it
- Constraints:
[ Equality top0 (t1 -> t1)
]
- Accumulated Substitution:
[ t2 := t1
]
2. Equality top0 (t1 -> t1)
=> top0 := t1 -> t1
-- Same as before
- Constraints:
[
]
- Accumulated Substitution:
[ t2 := t1
, top0 := t1 -> t1
]
When we run out of constraints, we are done. Lets apply the accumulated substitution to our elaborated AST:
+---t1
|
| +---- t1
| |
--- ---
id = \x -> x
--- -------
| |
| +----- t1 -> t1
|
|
+--- t1 -> t1
Generalization
We take a look of the type of id
, find all of the type variables (t1
),
generate a type variable for each one (a
) and substitute the original
type variable with the new one, then we wrap the resulting type with a forall
and list the new type variables:
id :: forall a. a -> a
Second group
one = id 1
Elaboration
Note: we use the types of the top-level definitions of previous groups when
elaborating a group! In our case our environment only contains:
id :: forall a. a -> a
We get this elaborated AST:
+--- top0
|
| +--- t2
| |
--- ----
one = id 1
-- -
| |
| +--- Int
|
+------ t1
And constraints:
[ Equality t1 (forall a. a -> a)
, Equality t1 (Int -> t2)
, Equality top0 t2
]
How did we get these constraints?
- For the definition
one
we generate the type variabletop0
. - We look at the function application in three steps, the first:
looking at the argument, in this case
1
which is a int literal. We annotate it as such. - We then lookup
id
in our environment, generate the type variablet1
, and produce a constraint between the two:Equality t1 (forall a. a -> a)
- We generate a type variable
t2
to represent the return value, and express our expectation thatid
's is equals to a function from the type of the argumentInt
to the return typet2
:Equality t1 (Int -> t2)
- Finally, we constrain the type of
one
with the type of the expression:Equality top0 t2
Constraint solving
Let's go over them in order:
- Constraints:
[ Equality t1 (forall a. a -> a)
, Equality t1 (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution:
[
]
1. Equality t1 (forall a. a -> a)
=> [ Equality t1 (ti_1 -> ti_1) ]
-- A forall! Instantiate and generate a new constraint.
- Constraints:
[ Equality t1 (ti_1 -> ti_1)
, Equality t1 (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution: unchanged
2. Equality t1 (ti_1 -> ti_1)
=> t1 := ti_1 -> ti_1
- Constraints:
[ Equality (ti_1 -> ti_1) (Int -> t2)
, Equality top0 t2
]
- Accumulated Substitution:
[ t1 := ti_1 -> ti_1
]
3. Equality (ti_1 -> ti_1) (Int -> t2)
=> [ Equality (->) (->), Equality ti_1 Int, Equality ti_1 t2]
- Constraints:
[ Equality (->) (->)
, Equality ti_1 Int
, Equality ti_1 t2
[ Equality top0 t2
]
- Accumulated Substitution: unchanged
4. Equality (->) (->)
=> []
-- Two type constructors are equal - safe to skip
- Constraints:
[ Equality ti_1 Int
, Equality ti_1 t2
[ Equality top0 t2
]
- Accumulated Substitution: unchanged
5. Equality ti_1 Int
=> ti_1 := Int
- Constraints:
[ Equality Int t2
[ Equality top0 t2
]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
]
5. Equality Int t2
=> t2 := Int
- Constraints:
[ Equality top0 Int
]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
, t2 := Int
]
5. Equality top0 Int
=> top0 := Int
- Constraints:
[]
- Accumulated Substitution:
[ t1 := Int -> Int
, ti_1 := Int
, t2 := Int
, top0 := Int
]
Substitution
Now we apply the substitution elaborated program AST:
+--- Int
|
| +--- Int
| |
--- ----
one = id 1
-- -
| |
| +--- Int
|
+------ Int -> Int
Generalization
Nothing to do in this case! We get:
one :: Int
Result
This concludes our type inference process, and this is the result:
+---a
|
| +---- a
| |
--- ---
id = \x -> x
--- -------
| |
| +----- a -> a
|
|
+--- forall a. a -> a
+--- Int
|
| +--- Int
| |
--- ----
one = id 1
-- -
| |
| +--- Int
|
+------ Int -> Int
Summary
This is the general overview of Giml's type inference engine. If you are interested in learning more about it, check out the source code and documentation under the Language.Giml.Types hierarchy.
In the next blog posts I will cover the type inference of extensible records and polymorphic variants.
If this helped you, there's something you want me to clarify, you think there's something I missed, there's something you want to know more about, or there's something I got totally wrong, do contact me on mastodon or via Email.