Executing Code in the Rust Type System (Part 3)


This is part 2 of the series:


You can find the code for this project in the GitHub Repository.



Recap

The first part created some types and traits to outline the goals of this project. In order to eventually achieve those goals, part two showed the implementation of numbers, lists and simple functions. As a result, it is already possible to execute simple print statements in the type system:

struct MyProject;

impl Program for MyProject {
    type Execute = chain! { io:
        Print(type_str!(H e l l o));
        Print(type_str!(, __));
        Print(type_str!(W o r l d !));
    };
}

However, as observed in the previous part, the output of this is suboptimal at best:

type Output = <MyProject as Program>::Execute;

fn main() {
    println!("{}", std::any::type_name::<Output>());
}
(((((((((((((type_code::list::TypeListElem<type_code::string::chars::H>,), type_code::list::TypeListElem<type_code::string::chars::e>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::o>), type_code::list::TypeListElem<type_code::string::chars::Comma>), type_code::list::TypeListElem<type_code::string::chars::Space>), type_code::list::TypeListElem<type_code::string::chars::W>), type_code::list::TypeListElem<type_code::string::chars::o>), type_code::list::TypeListElem<type_code::string::chars::r>), type_code::list::TypeListElem<type_code::string::chars::l>), type_code::list::TypeListElem<type_code::string::chars::d>), type_code::list::TypeListElem<type_code::string::chars::ExclamationMark>)

Getting Readable Output

I promised to turn this output into a readable string. But how would we go about doing this? Here is the first observation one could make:

#[repr(C)]
struct Foo((((u8,), u8), u8));

println!("{}", offset_of!(Foo, 0.0.0)); // Prints: 0
println!("{}", offset_of!(Foo, 0.0.1)); // Prints: 1
println!("{}", offset_of!(Foo, 0.1)); // Prints: 2

The C memory layout guarantees that the fields of a type occur in memory in the same order they were declared in. For tuples, the declaration order goes from left to right. What this means is that the nested tuples used for the TypeList actually have a memory layout in the order of the list. More precisely, if we have a tuple like above containing 3 u8 values, it actually has the same memory layout as [u8; 3] (only guaranteed using #[repr(C)]).

This is a very significant discovery, which allows us to do some const-magic to get the strings we want. Let’s start by defining more helper traits, this time for the string construction:

pub trait BuildStringChars<T: 'static> {
    const RESULT: T;
    const LEN: usize;
}

impl<Tail, C, T> BuildStringChars<(T, u8)> for (Tail, C)
where
    Tail: BuildStringChars<T>,
    C: Integer,
    T: 'static,
{
    const RESULT: (T, u8) = (Tail::RESULT, <C::Number>::I8 as u8);
    const LEN: usize = Tail::LEN + 1;
}

impl<C: Integer> BuildStringChars<u8> for (C,) {
    const RESULT: u8 = <C::Number>::I8 as u8;
    const LEN: usize = 1;
}

pub trait StringCharsTy {
    type Result: 'static;
}

impl<Prev, C> StringCharsTy for (Prev, C)
where
    Prev: StringCharsTy,
    C: Character,
{
    type Result = (Prev::Result, u8);
}

impl<C: Character> StringCharsTy for (C,) {
    type Result = u8;
}

This is quite the wall of code! There is some boilerplate syntax here, but essentially what we have done is define the trait BuildStringChars which has the associated constants RESULT and LEN. The former contains a tuple with values (like (((1,),2),3)) and the latter contains the number of elements in this tuple (as if it were a list). The second trait, StringCharsTy, is a helper trait which provides us with the type used as the parameter in BuildStringChars (in the example it could be (((u8,),u8),u8)).

Those two traits might not seem that important now, but they are actually crucial for the next step, which is defining the string type:

#[repr(C)]
pub struct TString<T> {
    val: T,
    len: usize,
}

This is now our type string. It contains the value representing the string and the length of the string. Next, let’s look at how we construct such a TString:

impl<T: 'static> TString<T> {
    pub const fn new<B: BuildStringChars<T>>() -> Self {
        Self {
            val: B::RESULT,
            len: B::LEN,
        }
    }
}

As you can see, we simply take a type that implements BuildStringChars<T> (which strings already are due to the way the trait is implemented above) and extracts the result from it. Now we can get to the quick magic that gives us the string we want:

impl<T: 'static> TString<T> {
    pub const fn get(&self) -> &str {
        let ptr = (&raw const self.val).cast::<u8>();
        let data = unsafe {
            std::slice::from_raw_parts(ptr, self.len)
        };
        match std::str::from_utf8(data) {
            Ok(x) => x,
            Err(_) => "",
        }
    }
}

And that’s it. But wait, what are we doing here? Let’s briefly dive into the details here. The function starts by getting a raw pointer to the self.val and casting it to a *const u8. Next, it takes this pointer and the known length of the underlying data and convert it to a slice &[u8]. This is possible because of our discovery that the nested tuples used by TypeList have the same layout as an equivalent slice type. Lastly, the function converts this slice to a string. If the slice does not contain valid UTF-8, an empty string is returned for now. Later, there could be better error handling, but this is good enough for now. And that’s it! We can verify that this works:

struct MyProject;

impl Program for MyProject {
    type Execute = chain! { io:
        Print(type_str!(H e l l o));
        Print(type_str!(, __));
        Print(type_str!(W o r l d !));
    };
}

type Output = <MyProject as Program>::Execute;
const OUTPUT: TString<<Output as StringCharsTy>::Result>
    = TString::new::<Output>();

fn main() {
    println!("{}", OUTPUT.get()); // Prints: Hello, World!
}

And that’s it for this part. While it was shorter than usual, we now have a very good foundation for upcoming features and improvements. The next part will be about introducing more language features like support for custom functions, math expressions, flow control, and more!