
Executing Code in the Rust Type System (Part 2)
To briefly recap what happened in the previous part: I have defined the ground rules for this project and what is expected of the solution. This was followed by a few type and trait definitions to illustrate the goal. However, as already mentioned in the previous part, this could not be further from the code execution we are looking for. This part will therefore dive into building up the foundational data types that are required to achieve the goal of this project.
Data Types
Now that we have some thin abstractions for the coarse code elements, we need to tackle the next actual
problem: data types. It might not be surprising to hear that the Rust type system does not actually have
any usual data types you would expect from a programming language (there are const generics, but
generic_const_exprs
is still unstable so we cannot yet use const generics the way we need to here in stable Rust).
The solution to numbers is easy - I present to you the typenum crate. Some readers may already be familiar with it, but in short, it provides type representations for numbers with support for full integer arithmetic (the implementation of this crate already very interesting on its own).
So, numbers are solved… what’s next? We want to have some program output (remember the Function::Io
and Program::Execute
?). For this we need strings. But how can we represent arbitrary strings using only
types?
If we look at the implementation of String
from the Rust standard library, we see the following:
pub struct String {
vec: Vec<u8>,
}
Pretty simple, right? To the surprise of nobody, a String
is just a list of characters. Specifically in Rust,
it is actually always a list of UTF-8 bytes. However, for this project we will only work with ASCII. So it seems
like having a list representation in the type system is the first step. Types in Rust do not support sequential
data like lists (no varargs generics) and the only way to actually get what we want is to go recursive.
use std::marker::PhantomData;
pub trait TypeList {
/// The first element in the list.
type Head;
/// Everything except for the head.
type Tail;
/// The last element in the list.
type Last;
}
pub struct TypeListElem<T>(PhantomData<T>);
We now have a trait to represent a list in the type system. Additionally, there is the TypeListElem
, which
is just a convenience wrapper around the entries in the list to give us more freedom with the trait bounds.
Let’s implement this trait for some types so that we can actually use it, starting with the empty list as the
simplest case:
impl TypeList for () {
type Head = ();
type Tail = ();
type Last = ();
}
Next, we move on to the one-element list, built using a one-element tuple:
impl<T> TypeList for (TypeListElem<T>,) {
type Head = TypeListElem<T>;
type Tail = ();
type Last = TypeListElem<T>;
}
Both of these cases are still trivial. For the one-element list, the head and last are equal to the only element in the list, and the tail is the empty list. Based on those two definitions, we can now inductively define lists of arbitrary length. This is in my opinion one of the great strengths of the Rust type system compared to many other (non-proof) languages.
impl<Tail: TypeList, T> TypeList for (Tail, TypeListElem<T>) {
type Head = Tail::Head;
type Tail = Tail;
type Last = TypeListElem<T>;
}
This is still fairly trivial, and the only big difference is the recursion in Head
through Tail::Head
. We have
defined the trait bound Tail: TypeList
to guarantee that every type Tail
this trait is implemented for also
implements TypeList
. Because of this requirement, we can recursively access the associated type Head
from it.
With this simple (read-only) list implementation, we can already start defining strings.
#[macro_export]
macro_rules! type_str {
(@parse
[ $char:tt $($rest:tt)* ]
[ $($res:tt)* ]
) => {
$crate::type_str!(@parse [ $($rest)* ] [ ($($res)*, TypeListElem<$crate::type_char!($char)>) ])
};
(@parse [] [ $($res:tt)* ]) => {
$($res)*
};
($char:tt $($rest:tt)*) => {
$crate::type_str!(@parse [ $($rest)* ] [ (TypeListElem<$crate::type_char!($char)>,) ])
};
}
#[macro_export]
macro_rules! type_char {
// ...
(__) => { typenum::P32 }; // ASCII space
(!) => { typenum::P33 }; // ASCII exclamation mark
// ...
(A) => { typenum::P65 };
// ...
(~) => { typenum::P126 };
};
WHOA, that’s some macro spaghetti! But don’t worry, I will explain this mess. Let’s start with the simpler
of the two: type_char
. This macro has a lot of rules, one for each printable ASCII character. Some ASCII
characters like the space cannot be matched in a Rust macro, so they are replaced by special sequences that
can be matched. All the rules simply expand to the decimal number representing each ASCII character, encoded
using the typenum
crate.
The other macro, type_str
, is a little bit more tricky. It is built around an
Incremental TT Muncher, which sounds
very complex, so let’s walk through an example expansion of the macro:
// Macro invocation
type_str!(F o o)
// Matches the third rule with char="F" and rest="o o". Expands to:
type_str!(@parse [ o o ] [ (TypeListElem<type_char!(F)>,) ])
type_str!(@parse [ o o ] [ (TypeListElem<P70>,) ])
// Matches the first rule with char="o" and rest="o". Expands to:
type_str!(@parse [ o ] [ ((TypeListElem<P70>,), TypeListElem<type_char!(o)>) ])
type_str!(@parse [ o ] [ ((TypeListElem<P70>,), TypeListElem<P111>) ])
// Matches the first rule with char="o" and rest="". Expands to:
type_str!(@parse [ ] [ (((TypeListElem<P70>,), TypeListElem<P111>), TypeListElem<type_char!(o)>) ])
type_str!(@parse [ ] [ (((TypeListElem<P70>,), TypeListElem<P111>), TypeListElem<P111>) ])
// Finally, matches the second rule and outputs everything in the pair of square brackets:
(((TypeListElem<P70>,), TypeListElem<P111>), TypeListElem<P111>)
And there we have our string “Foo” represented as a list of ASCII byte values in the Rust type system.
With this, we can already implement (in a hacky way) our main
function example from the start. Recall that
we wanted to implement this in the type system:
fn main() {
println!("Hello, World!");
}
Using the type abstractions we defined, it is now possible to translate this very simple program into the type system:
impl Program for MyProject {
type Execute = type_str!(H e l l o , __ W o r l d !);
}
This is a start, but it is obviously very hacky. If we wanted two separate println
statements, the current
solution would simply not work. Additionally, we cannot even execute any code yet.
The next step will be to make the list implementation more flexible to allow for list mutation:
pub trait TypeList {
// ...
type PushFront<T>: TypeList;
type PushBack<T>: TypeList;
type PopFront;
type PopBack;
}
This extension to the TypeList
allows us to push and pop elements on both sides. It makes use of the great
power of GATs (Generic Associated Types) to provide the element to be added to the list to the associated types.
Let’s start again by implementing this new functionality for the trivial cases:
impl TypeList for () {
// ...
type PushFront<T> = (TypeListElem<T>,),
type PushBack<T> = (TypeListElem<T>,),
type PopFront = ();
type PopBack = ();
}
impl<T> TypeList for (TypeListElem<T>,) {
// ...
type PushFront<T1> = ((TypeListElem<T1>,), TypeListElem<T>);
type PushBack<T1> = ((TypeListElem<T>,), TypeListElem<T1>);
type PopFront = ();
type PopBack = ();
}
These cases should be trivial enough to not require any further explanation, so let’s head straight to the more interesting inductive definition:
impl<Tail: TypeList, T> TypeList for (Tail, TypeListElem<T>) {
// ...
type PushFront<T1> = (Tail::PushFront<T1>, TypeListElem<T>);
type PushBack<T1> = ((Tail, TypeListElem<T>), TypeListElem<T1>);
type PopFront = /* ??? */;
type PopBack = Tail;
}
We again use recursion in Tail::PushFront<T1>
to operate on the list. The rest of the implementation is
still fairly simple, but we run into an issue for PopFront
. We need to stop not at the one-element list, but
instead at the two-element list, because popping from a one-element list would yield an empty list, which we
do not want to have inside a normal list. For this, we need to introduce a helper type:
pub trait TypeList {
// ...
type PopFrontHelper<Next>;
}
impl TypeList for () {
// ...
type PopFrontHelper<Next> = ();
}
impl<T> TypeList for (TypeListElem<T>,) {
// ...
type PopFrontHelper<Next> = Next;
}
The way the helper type is implemented, the last element (the one-element list at the end of the recusion) is nautrally excluded properly instead of just being replaced by an empty list. We can now abuse this helper type in the inductive list implementation:
pub trait TypePair<A, B> {
type ValA;
type ValB;
}
impl<A, B> TypePair<A, B> for (A, B) {
type ValA = A;
type ValB = B;
}
impl<Tail: TypeList, T> TypeList for (Tail, TypeListElem<T>) {
// ...
type PopFront = Tail::PopFrontHelper<T>;
type PopFrontHelper<Next> = (Tail::PopFrontHelper<T>, Next);
}
This way, the implementation recurses until it hits the one-element list, which skips itself using the Next
parameter. On the way up, it builds the full list again, but this time without the first element.
Finally, let’s add some convenience types for manipulating the lists:
pub trait TypeList {
// ...
/// Appends `L` onto this list in forward order by first
/// reversing `L` and then performing a reverse append.
type Append<L>: TypeList
where L: TypeList;
/// Appends this list onto `L` in the forward order by
/// performing the same steps as [`TypeList::Append`].
type AppendTo<L>: TypeList
where L: TypeList;
/// Appends this list onto `L` in the default reverse order.
type RevAppendTo<L>: TypeList
where L: TypeList;
/// Reverses the list.
type Reverse: TypeList;
}
I will skip all the trivial implementations (if you are interested in seeing them, you can find them in the GitHub repository that contains this entire project) and so here is the complex case:
impl<Tail: TypeList, T> TypeList for (Tail, TypeListElem<T>) {
// ...
type Append<L> = <L::Reverse as TypeList>::RevAppendTo<Self>
where L: TypeList;
type AppendTo<L> = L::Append<Self>
where L: TypeList;
type RevAppendTo<L> = Tail::RevAppendTo<L::PushBack<T>>
where L: TypeList;
type Reverse = Self::RevAppendTo<()>;
}
To summarize, we now have the following list implementation purely in the type system:
pub trait TypeList {
type Head;
type Tail: TypeList;
type Last;
type PushFront<T>: TypeList;
type PushBack<T>: TypeList;
type PopFront;
type PopBack;
/// Appends `L` onto this list in forward order by first
/// reversing `L` and then performing a reverse append.
type Append<L>: TypeList
where L: TypeList;
/// Appends this list onto `L` in the forward order by
/// performing the same steps as [`TypeList::Append`].
type AppendTo<L>: TypeList
where L: TypeList;
/// Appends this list onto `L` in the default reverse order.
type RevAppendTo<L>: TypeList
where L: TypeList;
/// Reverses the list.
type Reverse: TypeList;
type PopFrontInner<Next>;
}
What does the TypeList
look like in practice? Let’s take a look at this simple example:
pub type EmptyList = ();
// (u32,)
pub type ListA = <EmptyList as TypeList>::PushBack<u32>;
// ((u32,), f32)
pub type ListB = ((f32,), u8);
// (((u32,), f32), u8)
pub type ListC = <ListA as TypeList>::Append<ListB>;
As you can see, we can define an empty list, push an element onto it, and get the expected result. We can also define a non-empty list with two elements, append it to the previous list and get the expected list with three elements.
Functions
Now we have all the abstractions required to properly implement functions in the type system. First, let’s revisit the function definition we had before:
pub trait Function<Args> {
type Result;
type Io;
}
We can now introduce a helper trait for IO state management using the list abstractions we just created:
pub trait Function<Args> {
// ...
type Io: IoState;
}
pub trait IoState: TypeList {
type Print<S>: IoState
where S: IoState;
}
impl<L: TypeList> IoState for L {
type Print<S> = Self::Append<S>
where S: IoState;
}
Let’s take a short break and look at what we have accomplished so far. We now have numbers, lists and strings purely in the type system. Additionally, we can easily modify those lists just like we could in a normal language (mostly). This is a good foundation for our next steps.
The function trait introduced above can now represent functions with an output and an IO side-effect. We can use this to implement our first function:
pub struct Print;
impl<Str: TypeList> Function<(Str,)> for Print {
type Result = ();
type Io = Str;
}
The Print
function takes a single argument, which is the string (represented by our TypeList
) it should
print to the output. To print the string, it simply defines it as the value of the Function::Io
associated type.
The return value of Print
is the unit type ()
.
It is already possible to rewrite the original implementation of our type system program using this new Print
implementation, but it does not add much and we still cannot chain function calls. This is the next step:
pub struct Then<A, B>(PhantomData<(A, B)>);
impl<A, B, Args1, Args2> Function<()> for Then<A, B>
where
A: Function<Args1>,
B: Function<Args2>,
{
type Result = B::Result;
type Io = <A::Io as IoState>::Print<B::Io>;
}
And there we have it. Now we can chain an arbitrary number of functions while still correctly capturing their
IO output. It is important to note that the functions under a Then
are executed in isolation and the return
value of the second function becomes the return value of the entire Then
.
Before we move on, notice that the implementation of Then
actually does not care about the arguments of the
functions. Specifically, we only care about the function output. Let’s introduce an additional trait to reduce
the syntax clutter a bit:
pub trait FunctionCall {
type Result;
type Io: IoState;
}
impl<A, B> Function<()> for Then<A, B>
where
A: FunctionCall,
B: FunctionCall,
{
type Result = B::Result;
type Io = <A::Io as IoState>::Print<B::Io>;
}
So now we have a FunctionCall
trait which only represents the output of calling a function. Now we also need
another type to actually get us from a function to a function call:
pub struct Call<F, Args>(PhantomData<(F, Args)>);
impl<F: Function<Args>, Args> FunctionCall for Call<F, Args> {
type Result = F::Result;
type Io = F::Io;
}
// Just for completeness, because Call does
// not have any function parameters
impl<A, B> FunctionCall for Then<A, B>
where
A: FunctionCall,
B: FunctionCall,
{
type Result = B::Result;
type Io = <A::Io as IoState>::Print<B::Io>;
}
The new Call
type binds the function and its call arguments together into one type, so that we can then
implement the FunctionCall
trait on it. Finally, I will introduce another macro here to simplify the syntax
a bit and do most of the heavy lifting for the weird type syntax:
#[macro_export]
macro_rules! chain {
(io:
$($rest:tt)*
) => {
<$crate::chain!(@inner $($rest)*) as Function<()>>::Io
};
(@inner $func:ident ( $($($arg:tt)*),* $(,)? );) => {
Call<$func, ( $($($arg)*,)* )>
};
(@inner
$func:ident ( $($($arg:tt)*),* $(,)? );
$($rest:tt)*
) => {
Then<Call<$func, ( $($($arg)*,)* )>, $crate::chain!(@inner $($rest)*)>
};
}
For those interested, this is another incremental TT muncher macro (it could have been implemented in a simpler way, but I wanted more flexibility here). The expansion of the macro should be relatively simple, but the exact way it does its magic is not important here. Much more important is the result that we can now achieve when putting everything we have worked on so far together:
struct MyProject;
impl Program for MyProject {
type Execute = chain! { io:
Print(type_str!(H e l l o));
Print(type_str!(, __));
Print(type_str!(W o r l d !));
};
}
Isn’t this crazy? Thanks to the macro magic, we can write what looks like normal code… but in a type position inside a trait.
This is all well and good, but can we actually do something with the output currently? Let’s check:
type Output = <MyProject as Program>::Execute;
fn main() {
println!("{}", std::any::type_name::<Output>());
}
(((((((((((((type_code::list::TypeListElem<type_code::string::chars::H>,), type_code::list::TypeListElem<type_code::string::chars::e>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::o>), type_code::list::TypeListElem<type_code::string::chars::Comma>), type_code::list::TypeListElem<type_code::string::chars::Space>), type_code::list::TypeListElem<type_code::string::chars::W>), type_code::list::TypeListElem<type_code::string::chars::o>), type_code::list::TypeListElem<type_code::string::chars::r>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::d>), type_code::list::TypeListElem<type_code::string::chars::ExclamationMark>)
I don’t think anyone would complain if I were to call this output a completely unreadable mess. If you look carefully, you can see that the output is actually correct. However, nobody wants to read that. Ideally we want to have the list representing a string be converted to an actual Rust string that can be read normally.
But how would we go about converting this type into a string? This and more is what we will dive into in Part 3.