loading...
Cover image for Procedural macro in Rust 101

Procedural macro in Rust 101

naufraghi profile image Matteo Bertini Updated on ・8 min read

Cover Image by Photo by Ricardo Gomez Angel on Unsplash

Procedural macro in Rust 101

How to pick a function and make it a macro with added superpowers

One upon a time I had the need to take a list of items and return all the combinations of all couples.

This task in Python is easily accomplished using the built-in itertools.combinations:

itertools.combinations(iterable, r)
Return r length subsequences of elements from the input iterable.
Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

That offers a baseline implementation in Python (the real one is in C and based on iterators).

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

What we have here is a functions that takes a list and an integer r and returns a tuple of length r... hmm, not (yet) possible in plain Rust, but let's focus on my problem, I currently only need couples!

Here is a basic porting in Rust (generic enough to be modified for r > 2):

fn combinations2<T>(pool: &Vec<T>) -> Vec<[&T; 2]> {
    let r = 2; // <----- is the length of this ^
    let mut res = vec![];
    let n = pool.len();
    if r > n {
        return res;
    }
    let mut indices: Vec<_> = (0..r).collect();
    // we cannot `yield` (yet) in Rust, so we need an accumulator
    res.push([&pool[indices[0]], &pool[indices[1]]]);
    loop {
        // since this is not a generator like in Python, we cannot `return`, so the code
        // is slightly changed, `find` replaces the first `for` in Python.
        match (0..r).rev().find(|&i| indices[i] != (i + n - r)) {
            Some(i) => {
                indices[i] += 1;
                for j in (i + 1)..r {
                    indices[j] = indices[j - 1] + 1;
                }
                res.push([&pool[indices[0]], &pool[indices[1]]]);
            }
            None => break,
        }
    }
    res
}

Months later, I tried to achieve something using basic macros by example1 in Rust, but with no luck, it's not easy to take a single integer and expand2 it into a list of something, the code burden was going too high, and I'd have to chose upfront some maximum value for the r number 3.

So I gave up till the more powerful procedural macro reached the almost stable point!

But here be dragons! syn, proc_macro, proc_macro2, quote!, WTF?

Parsing and quoting

Luckily the syn crate repository contains some very useful examples, in particular: examples/lazy-static is An example of parsing a custom syntax within a functionlike!(...) procedural macro.4

But lets start from the syn crate, that quoting from the README is:

Syn is a parsing library for parsing a stream of Rust tokens into a syntax tree of Rust source code.

So, given some Rust-like code, Syn will give us a syntax tree, ready to be used through all the basic types of the syntax tree, like Expr, Ident, Token, Type, and many others.
As noted in the README, Syn offers proc_macro2 a Proc macro shim , that allows Syn to support back to Rust 1.15.

The little quirk of proc_macro2 is that we are not allowed to use everything from a single place, because:

In general all of your code should be written against proc-macro2 rather than proc-macro. The one exception is in the signatures of procedural macro entry points, which are required by the language to use proc_macro::TokenStream.

Once we have parsed our input, we need to generate back some Rust code, and here kicks in the quote crate, that:

... provides the quote! macro for turning Rust syntax tree data structures into tokens of source code.

Our workflow will be:

  1. define some syntax for our procedural macro5, for example combinations!(comb3, 3) that will declare a function comb3(values: &[T]) -> Vec<[T; 3]>,
  2. define a struct to store the parsed parts from the macro call: comb3 and 3,
  3. use those values to parametrize the specialized code of combinations2,
  4. return the abstract new function and build back the Rust code for the compiler,
  5. call the new macro-generated function!

Putting the pieces together

The first piece of the puzzle is a struct that will contain the user values in the macro definition, in this case we have only a name and a number:

struct  Combinations {
    name: syn::Ident,
    n: syn::LitInt,
}

And now we need some way to parse something into this struct, and here kicks in the Parse trait a Parsing interface implemented by all types that can be parsed in a default way from a token stream:

impl Parse for Combinations {
    fn parse(input: ParseStream) -> Result<Self> {
        let name = input.parse()?;
        input.parse::<syn::Token![,]>()?;
        let n = input.parse()?;
        Ok(Combinations { name, n })
    }
}

Thanks to the Rust type inference super powers, calling input.parse()? let us parse the values into the right type or return and error if the parsing fails.
The strange line input.parse::<syn::Token![,]>()?; is a special syntax to parse simple tokens.
Now that we have parsed our input, we can write our super minimal macro, to test that everything worked as expected:

#[proc_macro]
pub fn minimal(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    quote!{
        fn #name() -> i32 {
            #n
        }
    }
}

Ops, compilation error :/

   Compiling comb v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git/comb)
error[E0308]: mismatched types
  --> comb/src/lib.rs:27:5
   |
25 |   pub fn minimal(input: TokenStream) -> TokenStream {
   |                                         ----------- expected `proc_macro::TokenStream` because of return type
26 |       let Combinations { name, n } = parse_macro_input!(input as Combinations);
27 | /     quote!{
28 | |         fn #name() -> i32 {
29 | |             #n
30 | |         }
31 | |     }
   | |_____^ expected struct `proc_macro::TokenStream`, found struct `proc_macro2::TokenStream`
   |
   = note: expected type `proc_macro::TokenStream`
              found type `proc_macro2::TokenStream`

Remember, syn offers a drop-in replacement for everything, except TokenStream, that is supposed to be the original from the compiler interfaces. Luckily the conversion from one into() the other is simple.

#[proc_macro]
pub fn minimal(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    (quote!{
        fn #name() -> i32 {
            #n
        }
    }).into()
}

Without the extra parentheses you will have another strange compilation error, sadly is not even possible to expand the macro to see exactly what's happening, ...

   Compiling comb v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git/comb)
error: expected expression, found `.`
  --> comb/src/lib.rs:31:6
   |
31 |     }.into()
   |      ^ expected expression
error[E0308]: mismatched types
  --> comb/src/lib.rs:27:5
   |
27 | /     quote!{
28 | |         fn #name() -> i32 {
29 | |             #n
30 | |         }
31 | |     }.into()
   | |_____^ expected (), found struct `proc_macro2::TokenStream`
   |
   = note: expected type `()`
              found type `proc_macro2::TokenStream`

Did I said expand? Yes, we can look the macro expanded code with the cargo expand command, and this is the result:

$ cargo expand --bin comb
    Checking circle v0.1.0 (/home/naufraghi/Documents/src/circle-rs.git)
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::v1::*;
#[macro_use]
extern crate std as std;
fn mini3() -> i32 { 3 }    // <----- here is our expansion!
// ...

Summing all up, we have this pipeline:

TokenStream -> parse_macro_input!(...) -> quote!{...} -> TokenStream

And inside quote!{} the variables are accessed with the #name syntax.

Extra powers

Here we are, we have written a lot more code to reach a point we already know how to reach with the good old macros by example, but we had a problem, how to expand a number into a list, because this is the monster of this level:

res.push([&pool[indices[0]], &pool[indices[1], ...]])

We need some way to accumulate a little piece of syntax, &pool[indices[#i]], and at the end join the parts into a comma separated list. Digging a bit into syn, what we need is syn::punctuated:

A punctuated sequence of syntax tree nodes separated by punctuation.
Lots of things in Rust are punctuated sequences.

  • The fields of a struct are Punctuated<Field, Token![,]>. ...

What we can do is:

#[proc_macro]
pub fn punctuated(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as i32)
        .map(|i| quote! { #i })  // <-- add the quoted variable, `i` as a token
        .collect();
    (quote! {
      fn #name() -> Vec<i32> {
        vec![#indices]  // put the inner part of the list into something
      }
    })
    .into()
}

We can finally call the macro punctuated!(punct4, 4); that expands into:

fn punct4() -> Vec<i32> { <[_]>::into_vec(box [0i32, 1i32, 2i32, 3i32]) }

We now have all the pieces to make our combinations function generic on the size r of the returned array:

#[proc_macro]
/// `combinations!(comb, N)` macro will define a function `comb` that takes a `&[T]` and returns
/// a vector of arrays of length _N_, `Vec<[&T; N]>`, covering all the combinations of _N_ elements
/// from the source vector.
pub fn combinations(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let pool_indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as usize)
        .map(|i| quote! { &pool[indices[#i]] })
        .collect();
    let comb = quote! {
        fn #name<T>(pool: &[T]) -> Vec<[&T; #n]> {
            let r = #n;
            let mut res = vec![];
            let n = pool.len();
            if r > n {
                return res;
            }
            let mut indices: Vec<_> = (0..r).collect();
            res.push([#pool_indices]);
            loop {
                match (0..r).rev().find(|&i| indices[i] != (i + n - r)) {
                    Some(i) => {
                        indices[i] += 1;
                        for j in (i + 1)..r {
                            indices[j] = indices[j - 1] + 1;
                        }
                        res.push([#pool_indices]);
                    },
                    None => break,
                }
            }
            res
        }
    };
    comb.into()
}

Here we are, we can now define with a simple macro call a function with the wanted size, in the future we may also create and call a macro in a single shot, but not yet 5.

A final note, the macro can be used only from another crate, the suggested layout is highlighted in the lazy-static example.

Thanks to @dodomorandi for the review of the first draft and for the suggestions about a more idiomatic combinations2, and for his Iterator based implementation, and thanks to @lu-zero for the review and for the suggestion to avoid syn and search for a simpler method, task I left as an exercise for the reader.

Iterator based implementation

#[proc_macro]
pub fn iter_combinations(input: TokenStream) -> TokenStream {
    let Combinations { name, n } = parse_macro_input!(input as Combinations);
    let pool_indices: syn::punctuated::Punctuated<_, syn::Token![,]> = (0..n.value() as usize)
        .map(|i| quote! { &self.pool[self.indices[#i]] })
        .collect();

    let initial_indices: syn::punctuated::Punctuated<_, syn::Token![,]> =
        (0..n.value() as usize).collect();

    let iter_name = proc_macro2::Ident::new(
        &format!("CombIndicesIter{}", n.value()),
        proc_macro2::Span::call_site(),
    );

    let comb = quote! {
        pub struct #iter_name<'a, T> {
            pool: &'a [T],
            indices: [usize; #n],
            started: bool,
        }

        impl<'a, T> Iterator for #iter_name<'a, T> {
            type Item = [&'a T; #n];

            fn next(&mut self) -> Option<Self::Item> {
                if !self.started {
                    self.started = true;
                    Some([#pool_indices])
                } else {
                    let n = self.pool.len();
                    (0..#n).rev().find(|&i| self.indices[i] != i + n - #n)
                        .map(|i| {
                                self.indices[i] += 1;
                                for j in (i + 1)..#n {
                                    self.indices[j] = self.indices[j - 1] + 1;
                                }
                                [#pool_indices]
                        })
                }
            }
        }

        fn #name<T>(pool: &[T]) -> impl Iterator<Item = [&T; #n]> {
            #iter_name {
                pool,
                indices: [#initial_indices],
                started: false,
            }
        }
    };
    comb.into()
}

  1. Super in-depth intro by DanielKeep@github: A Practical Intro to Macros in Rust 1.0 

  2. I'd have to expand 2 -> 0, 1, and then map it to &pool[indices[0]], &pool[indices[1]] 

  3. Again by DanielKeep, The Little Book of Rust Macros: Counting 

  4. One may try to avoid syn and use a combination of macro by example and procedural macros, but this is left as an exercise for the reader. 

  5. Sadly the usage of a macro in expression position, like let res = combinations!(&data, 3), is feature gated behind #![feature(proc_macro_hygiene)] and limited to the nightly edition, so we use another approach. 

Posted on Mar 17 '19 by:

naufraghi profile

Matteo Bertini

@naufraghi

I'm a developer since the ages of gorilla.bas, I program in Python 🐍 and I'm searching good reasons to use more Rust

Discussion

markdown guide
 

I wish to read more articles like this 😁