In the last blog post we covered the general structure and algorithms of Giml's type inference engine.
In this blog post we'll take a closer look at extensible records in Giml and how to infer their types.
Records
A record is a collection of values, each associated with a name (also called a label).
For example, { name = "Giml", age = 0 }
.
The type of records is very similar to their structure for example the type of the record above is:
{ age : Int, name : String }
.
Records are used to build compound data aggregating multiple values into one value that
contains all of the information. The different values (also called fields) inside the record
can be accessed (or selected) by their labels, like this: <record>.<label>
.
Records can also be extended with additional fields:
let
lang = { name = "Giml", age = 0 }
in
{ website = "https://giml-lang.org" | lang }
The result of the expression is a new record that has all of the fields lang
has and
has one extra field website
as well.
Note that in Giml, we can't have multiple values with the same label. When we add a new
field with an existing label, that previous one is replaced with the new value.
There are other reasonable behaviours here such as disallowing such operation or having a scope of labels, but this is the behaviour I chose for Giml :)
Records are first class and can be used anywhere an Int
can be expected.
They can also be pattern matched:
case { name = "Giml" } of
| { name = "Giml" } ->
"Yeah I heard about it, funny name."
| { name = "Haskell" } ->
"Haskell is pretty cool!"
| other ->
concat (concat "I'm sure " other.name) " is pretty good too!"
end
So far we've seen record literals and three operations on records: field selection, extension and pattern matching.
When we approach to type inference of features we need to take into consideration a few things:
- What the types of the feature look like
- What to do for each operation (elaboration and constraints generation)
- How to unify the types (constraint solving)
- How to instantiate
- How to substitute
We are going to first try a naive approach: Just represent the types of record as TypeRec [(Label, Type)]
and see how it's just not enough to make this feature useful without type annotations.
Let see how we elaborate each operation of Record literals, field selection, record extension, pattern matching:
Record literals: simple, we have an expression that looks like this: { label1 = expr1, label2 = expr2, ... }
, so we elaborate each expression and annotate the node with the type TypeRec [(label1, t1), ...]
.
Record selection: Also simple, we have something that looks like expr.label
, so we elaborate the type of the expression and constrain it as equals to TypeRec [(label, t)]
.
But hold up, if we try to work with this scheme we'll see that it is not good enough.
Trying to typecheck this expression { a = 1, b = 1 }.a
will create this constraint:
Equality { a : Int, b : Int } { a : t }
, and while they both have a field a
,
one of them is missing a field! So they are definitely not equal.
So we need to go at this differently and there are several approaches, one is to create a new type of constraint that describe the relationship of "subtyping", another is to describe the types differently.
For Giml I chose the second approach, we add extra information to the types to encode that "this record type may have more fields". This approach is called row polymorphism.
Row Polymorphism
Row polymorphism provides us with the ability to represent extra information in an additional type variable (called a row type variable) in the type.
So in addition to the variant we created TypeRec [(Label, Type)]
, we add another one:
TypeRecExt [(Label, Type)] TypeVar
. Syntactically the type will look like this:
{ <field1> : <type1>, ... | <extension-type-variable> }
What this extra type variable does is represent additional fields that may be part of the type that we don't know yet.
Let's use this for record field selection, instead of saying "this type is equals to a record with one field of type t", we can now say "this type is equals to a record that has this field with this type, and other fields represented by this type variable".
With this scheme we can move forward. Let's try again:
Elaboration and constraint generation
Record literals
same as before
Record selection
We have something that looks like expr.label
, so we:
- Generate a type variable representing the return type (which is the field type), let's call this
t
- Elaborate the type of the expression
- Generate a type variable representing the rest of the fields, let's call this
ext
- Constrain the type of the expression as equals to
{ label : t | ext }
.
Now when we try to typecheck the expression { a = 1, b = 1 }.a
, which generates this constraint:
Equality { a : Int, b : Int } { a : t | ext }
, we can match the fields and types that we know
(a : Int
with a : t
), and also match the fields that we don't ({ b : Int }
with ext
) - more on that later.
And when we find out what the real type of ext
is, we'll substitute it (we'll talk about that later as well).
Record extension
We have something that looks like { <field1> = <expr1>, ... | <expression> }
, so we:
- Elaborate the type of each field
- Elaborate the type of the expression
- Generate a type variable for the expression which we will call
ext
- Constrain the type of the expression to be equal to
{ | ext }
- Annotate the node with the type
{ <field1> : <type1>, ... | ext }
Pattern matching
Pattern matching is similar to record literals case, but instead of matching the expression in the case
with a rigid TypeRec
, we generate a type variable and match with TypeRecExt
.
That way we can select less fields in the pattern than might be available in the expression type.
Constraint solving
We need to handle 3 new cases - equality between:
- Two
TypeRec
- A
TypeRec
and aTypeRecExt
- Two
TypeRecExt
Two TypeRec
The two records should have the same labels and for each label we generate a new constraint of Equality
between the types for the label on each side.
So for example { a : Int, b : String }
and { a : t1, b : t2 }
, we generate the two constraint:
Equality Int t1
and Equality String t2
.
If one TypeRec
has a field that the other do not have, we throw an error of a missing field.
A TypeRec and a TypeRecExt
Each field in the TypeRecExt
should match with the matching field in TypeRec
.
If there's a field in TypeRecExt
that does not exist in TypeRec
we throw an error.
The other side is different, all the missing field in TypeRecExt
that exist in TypeRec
will
be matched with the row type variable of TypeRecExt
. Remember - we said that with row polymorphism
we use a type variable as a representative of fields we don't know of yet! So we treat the type variable
in TypeRecExt
as if it is a TypeVar
that matches the fields that exist in the TypeRec
but not in the TypeRecExt
.
So for example in Equality { a : Int, b : String } { a : t1 | ext }
we generate 2 new constraints: Equality Int t1
and Equality { b : String } ext
.
Two TypeRecExt
This scenario is slightly trickier, of course - the specified fields in both sides should be matched just like in previous cases, but what do we do with the missing fields from each side?
Let's check a concrete example, we'll make it simpler by not including matching fields.
Equality { a : Int | t1 } { b : Int | t2 }
What this constraint mean, is that t2
is equals to { a : Int | t1 }
without { b : Int }
, and t1
is equals to { b : Int | t2 }
without { a : Int }
.
Since we don't have any notion of subtracting a type from a type (some other type system do support this), we can try and represent this differently, we could represent this subset as a new type variable, and translate the Equality above to these two constraints:
Equality { a : Int | t' } t2
Equality { b : Int | t' } t1
Now t'
represents t2
without { a : Int }
and t'
also represents t1
without { b : Int }
.
So more generally, when we try to unify two TypeRecExt
, we match the matching fields and also
create a new row type variable to represents all of the unspecified fields in both types, and match each row type variable
with the fields specified on the other side.
Let's describe this one more time in psuedo code, adding new constraints:
Equality { <fields-only-found-on-the-left-side> | new-type-var } extRight
Equality { <fields-only-found-on-the-right-side> | new-type-var } extLeft
<matched-fields>
Instantiation
Instantiation occurs as usual, the row type variable in TypeRecExt
should be instantiated as well.
Substitution
What do we do if the row type variable in TypeRecExt
appears in the substitution?
One, if the row type variable is mapped to a different type variable, we just replace it.
Two, if it's mapped to a TypeRec
, we merge it with the existing fields and return a TypeRec
with all fields,
but it's important to note that some fields in the TypeRec
from the substitution may be the same as ones from our TypeRecExt
.
There are a few approaches one could take here, one is to keep a scope of labels, another is to try and unify the types of the field.
In Giml we take the type of the left-most instance, discarding the right one.
So semantically this expression is legal: { a = "Hi!" | { a = 1} }
and its type is { a : String }
.
Three, if the row type variable is mapped to a TypeRecExt
, we do the same thing as in Two, but return a TypeRecExt
instad of a TypeRec
with the row type variable from the mapped value as our new row type variable.
This is basically what we need to do to infer the type of extensible records.
Example
Let's see how to infer the type of a simple record field access example. This process is a bit much, but I've included it for those who want a live demonstration of how this works. Feel free to skip it!
giml = { name = "Giml", age = 0 }
getName record = record.name
gimlName = getName giml
The first stage of inference is to group dependencies and order them topologically. Since we don't have any mutual dependencies, each definition is standalone and the order of dependencies is kept (the last definition uses the previous two).
Also note that the names of type variables do not matter to the algorithm. They are a bit different so I have better time knowing where they are introduced.
First group
giml = { name = "Giml", age = 0 }
Elaboration
Through elaboration, we end up with the following AST and constraints:
-- Ast:
+-- { age : Int, name : String }
| |
| |
| __________|_______________
giml = { name = "Giml", age = 0 }
-- Constraints:
[ Equality top0 {age : Int, name : String}
]
Constraint solving
We go over the first group and solve the constraints:
-- Constraints: [Equality top0 {age : Int, name : String}]
-- Substitution: []
1. Equality top0 {age : Int, name : String}
-- => add `top0 := {age : Int, name : String}` to the substitution
-- Constraints: []
-- Substitution: [top0 := {age : Int, name : String}]
From this process we learn that the definition giml
has the type
{ name = "Giml", age = 0 }
. We add this to the type environment
(mapping from definition names to types) and use it from the next groups.
Second group
getName record = record.name
Elaboration
Through elaboration, we end up with the following AST and constraints:
-- Ast:
+--- targ4 -> t5
|
| +--- targ4
| |
| __|___
getName record = record.name
-----------
|
+------- t5
-- Constraints:
[ Equality targ4 {name : t5 | t6}
, Equality tfun3 (targ4 -> t5)
, Equality top1 (targ4 -> t5)
]
Constraint solving
-- Constraints:
[ Equality targ4 {name : t5 | t6}
, Equality tfun3 (targ4 -> t5)
, Equality top1 (targ4 -> t5)
]
-- Substitution:
[]
1. Equality targ4 {name : t5 | t6}
-- => add `targ4 := {name : t5 | t6}` to the substitution and apply
-- Constraints:
[Equality tfun3 ({name : t5 | t6} -> t5), Equality top1 ({name : t5 | t6} -> t5)]
-- Substitution:
[targ4 := {name : t5 | t6}]
2. Equality tfun3 ({name : t5 | t6} -> t5)
-- => add `tfun3 := ({name : t5 | t6} -> t5)` to the substitution and apply
-- Constraints:
[Equality top1 ({name : t5 | t6} -> t5)]
-- Substitution:
[ targ4 := {name : t5 | t6}
, tfun3 := ({name : t5 | t6} -> t5)
]
3. Equality top1 ({name : t5 | t6} -> t5)
-- => add `top1 := ({name : t5 | t6} -> t5)` and apply
-- Constraints:
[]
-- Substitution:
[ targ4 := {name : t5 | t6}
, tfun3 := ({name : t5 | t6} -> t5)
, top1 := ({name : t5 | t6} -> t5)
]
Substitute
We apply the substitution to our elaborated AST:
+--- {name : t5 | t6} -> t5
|
| +--- {name : t5 | t6}
| |
| __|___
getName record = record.name
-----------
|
+------- t5
Generalize
We then generalize the type by finding all of the free variables, giving them nicer names,
and wrap the type of the top-level definition in a forall
:
We find t5
which we will name a
, and t6
which we will name b
:
+--- forall a b. {name : a | b} -> a
|
| +--- {name : a | b}
| |
| __|___
getName record = record.name
-----------
|
+------- a
We add this definition to the type environment.
Third group
gimlName = getName giml
Elaboration
Through elaboration, we end up with the following AST and constraints:
-- Ast:
+--- t9
|
|
gimlName = getName giml
| |
| |
| +---- top0_i8
|
+---- top1_i7
-- Constraints:
[ Equality top1_i7 (top0_i8 -> t9)
, Equality top2 t9
, Equality top0_i8 {name = "Giml", age = 0}
, Equality top1_i7 (forall a b. {name : a | b} -> a)
]
Constraint solving
-- Constraints:
[ Equality top1_i7 (top0_i8 -> t9)
, Equality top2 t9
, Equality top0_i8 {name = "Giml", age = 0}
, Equality top1_i7 (forall a b. {name : a | b} -> a)
]
-- Substitution:
[]
1. Equality top1_i7 (top0_i8 -> t9)
-- => add `top1_i7 := (top0_i8 -> t9)`
-- Constraints:
[ Equality top2 t9
, Equality top0_i8 {name = "Giml", age = 0}
, Equality top1_i7 (forall a b. {name : a | b} -> a)
]
-- Substitution:
[ top1_i7 := (top0_i8 -> t9)
]
2. Equality top2 t9
-- add `top2 := t9` and apply
-- Constraints:
[ Equality top0_i8 {name = "Giml", age = 0}
, Equality top1_i7 (forall a b. {name : a | b} -> a)
]
-- Substitution:
[ top1_i7 := (top0_i8 -> t9)
, top2 := t9
]
3. Equality top0_i8 {name = "Giml", age = 0}
-- => we add `top0_i8 := {age : Int, name : String}`
-- Constraints:
[ Equality ({age : Int, name : String} -> t9) (forall a b. {name : a | b} -> a)
]
-- Substitution:
[ top1_i7 := ({age : Int, name : String} -> t9)
, top2 := t9
, top0_i8 := {age : Int, name : String}
]
4. Equality ({age : Int, name : String} -> t9) (forall a b. {name : a | b} -> a)
-- => new constraint:
-- `Equality ({age : Int, name : String} -> t9) ({name : ti10 | ti11} -> ti10)`
-- Constraints:
[ Equality ({age : Int, name : String} -> t9) ({name : ti10 | ti11} -> ti10)
]
-- Substitution:
unchanged
5. Equality ({age : Int, name : String} -> t9) ({name : ti10 | ti11} -> ti10)
-- => add constraints `Equality {age : Int, name : String} {name : ti10 | ti11}`
-- and `Equality t9 ti10`
-- Constraints:
[ Equality {age : Int, name : String} {name : ti10 | ti11}
, Equality t9 ti10
]
-- Substitution:
unchanged
6. Equality {age : Int, name : String} {name : ti10 | ti11}
-- => This is a TypeRec & TypeRecExt scenario. we match the shared fields
-- and match the row type variable with the missing fields from the typeRec
-- adding `Equality String ti10` and `Equality ti11 { age : Int }`
-- Constraints:
[ Equality String ti10
, Equality ti11 { age : Int }
, Equality t9 ti10
]
-- Substitution:
[ top1_i7 := ({age : Int, name : String} -> t9)
, top2 := t9
, top0_i8 := {age : Int, name : String}
]
7. Equality String ti10
-- => ti10 := String
-- Constraints:
[ Equality ti11 { age : Int }
, Equality t9 String
]
-- Substitution:
[ top1_i7 := ({age : Int, name : String} -> t9)
, top2 := t9
, top0_i8 := {age : Int, name : String}
, ti10 := String
]
7. Equality ti11 { age : Int }
-- => ti11 := { age : Int }
-- Constraints:
[ Equality t9 String
]
-- Substitution:
[ top1_i7 := ({age : Int, name : String} -> t9)
, top2 := t9
, top0_i8 := {age : Int, name : String}
, ti10 := String
, ti11 := { age : Int }
]
8. Equality t9 String
-- => t9 := String
-- Constraints: []
-- Substitution:
[ top1_i7 := ({age : Int, name : String} -> String)
, top2 := String
, top0_i8 := {age : Int, name : String}
, ti10 := String
, ti11 := { age : Int }
, t9 := String
]
Phew, we made it! Let's apply the substitution back to our AST and see the result!
+--- String
|
|
gimlName = getName giml
| |
| |
| +---- {age : Int, name : String}
|
+---- {age : Int, name : String} -> String
This look correct! (and yey me who did the constraint solving by hand)
Summary
I found implementing extensible records using row polymorphism to be relatively straightforward. I hope this article makes it more approachable as well.
In the next blog post we'll discuss polymorphic variants and how to infer their types.