I have been working on a side project in Rust recently and thought I would write up some of the interesting things I've encountered while building something substantial and learning Rust at the same time.

Early in the project, I had a need to some strings into structured types. Coming from Haskell, my first thought was to use parser combinators; however, I was open to other options if there was something better or more conventional in the Rust ecosystem. It is probably worth a separate post to explore the parsing ecosystem, but long story short: none of the solutions really stood out to me. In the realm of parser combinators, the most prominent solution seems to be nom, but it really did not click for me and seemed to have poor composition tools, which is a problem for a parser combinator library. I ended up using combine, which is more like what I would expect from a parser combinator library.

Consider this definition of types for a language:

pub enum Type {
    PrimInteger,
    PrimBoolean,
    PrimString,
    List(Box<Type>),
}

The first three constructors are primitive types, while the last type is a recursive type that can contain other types. For example, List<List<PrimInteger>> is allowed. I'll walk through my journey trying to parse this simple type. First, start with some imports for the combine library:

use combine::parser::char::string;
use combine::{between, choice, parser};
use combine::{ParseError, Parser, Stream};

Next, this combinator parses the basic types:

fn basic_type_parser<Input>() -> impl Parser<Input, Output = Type>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    choice!(
        string("int").map(|_| Type::PrimInteger),
        string("boolean").map(|_| Type::PrimBoolean),
        string("string").map(|_| Type::PrimString)
    )
}

I want to make a few observations before continuing. The basic combinator implementation with the choice! macro and simple string parsing combinator are exactly what I expect from a parser combinator library. I miss the clean Applicative interface from Haskell's parser combinator libraries, but the map interface in combine is fine. I like types, but the type signature is unwieldy and bordering on unreasonable; the required definitions in the where clause are especially egregious.

Next, the following two parser combinators build on the primitive combinator above to parse the rest of the type language:

// A parser for the list type (note that List<List<X>> is allowed, so this can
// be recursive)
fn aggregate_type_parser<Input>() -> impl Parser<Input, Output = Type>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    choice!(
        (
            string("List"),
            between(string("<"), string(">"), type_parser())
        ).map(|(_, ty)| Type::List(Box::new(ty)))
    )
}

// A top-level parser to parse any type
fn type_parser<Input>() -> impl Parser<Input, Output = Type>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    basic_type_parser().or(aggregate_type_parser())
}

Unfortunately, this does not compile. The compiler gives the following error:

error[E0720]: cannot resolve opaque type
   --> src/main.rs:27:38
    |
27  |   fn aggregate_type_parser<Input>() -> impl Parser<Input, Output = Type>
    |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ recursive opaque type
...
33  | /         (
34  | |             string("List"),
35  | |             between(string("<"), string(">"), type_parser())
36  | |         )
37  | |             .map(|(_, ty)| Type::List(Box::new(ty)))
    | |                                                    -
    | |____________________________________________________|
    | |____________________________________________________returning here with type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`
    | |____________________________________________________returning here with type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`
    | |____________________________________________________returning here with type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`
    |                                                      returning here with type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`
...
41  | | fn type_parser<Input>() -> impl Parser<Input, Output = Type>
    | |                            --------------------------------- returning this opaque type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`
    |
   ::: /home/tristan/.cargo/registry/src/github.com-1ecc6299db9ec823/combine-4.6.6/src/parser/char.rs:260:46
    |
260 |   pub fn string<'a, Input>(s: &'static str) -> impl Parser<Input, Output = &'a str>
    |                                                ------------------------------------ returning this opaque type `combine::parser::combinator::Map<(impl Parser<Input, Output = &str>, Between<Input, impl Parser<Input, Output = &str>, impl Parser<Input, Output = &str>, impl Parser<Input, Output = Type>>), [closure@src/main.rs:37:18: 37:27]>`

error[E0720]: cannot resolve opaque type
  --> src/main.rs:41:28
   |
13 | fn basic_type_parser<Input>() -> impl Parser<Input, Output = Type>
   |                                  --------------------------------- returning this opaque type `Or<impl Parser<Input, Output = Type>, impl Parser<Input, Output = Type>>`
...
27 | fn aggregate_type_parser<Input>() -> impl Parser<Input, Output = Type>
   |                                      --------------------------------- returning this opaque type `Or<impl Parser<Input, Output = Type>, impl Parser<Input, Output = Type>>`
...
41 | fn type_parser<Input>() -> impl Parser<Input, Output = Type>
   |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ recursive opaque type
...
46 |     basic_type_parser().or(aggregate_type_parser())
   |     -----------------------------------------------
   |     |
   |     returning here with type `Or<impl Parser<Input, Output = Type>, impl Parser<Input, Output = Type>>`
   |     returning here with type `Or<impl Parser<Input, Output = Type>, impl Parser<Input, Output = Type>>`

This set of errors is overwhelming, but has a single cause. The first error message indicates that impl Parser<Input, Output = Type> (the return type of the aggregate type parser) is a "recursive opaque type". It sure is. The impl Trait feature is a way to tell the compiler that the function will return some type that implements the named trait and ask the compiler to infer the concrete type. My understanding is that the Rust compiler uses a constraint solver to do so, but will not attempt to solve impl Trait declarations for functions that occur in recursive cycles. That is a problem for parser combinators, where recursion is essential.

It seems like it should be possible to break the recursive cycle by writing the type by hand to save the compiler from having to solve it. It is not clear what that type should really be; I suspect that it is challenging to write. It turns out that the right answer is to rewrite the functions as follows:

fn type_parser<Input>() -> impl Parser<Input, Output = Type>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    type_parser_()
}

parser! {
    fn type_parser_[Input]()(Input) -> Type
    where [ Input: Stream<Token = char> ]
    {
        basic_type_parser().or(aggregate_type_parser())
    }
}

The parser! macro provided by the combine library makes the type explicit, accomplishing that goal1 1 I was not able to find this explicitly called out in the documentation. I found it by reading an example in the GitHub repository. . While this works, it is unfortunate that it has to be this way. There are at least two major factors at play here:

  1. The design of the combine library makes this problem inevitable
  2. The implementation of the Rust language feels incomplete

The combine library made the choice for one style of ergonomic type signature, which relies on the impl Trait pattern. I do not know what all of the alternatives look like, so it is difficult to evaluate that choice right now.

The Rust language chooses not to require or support inference of impl Trait types in the presence of recursion. That does not seem like an inherent limitation to the model, as general constraint solvers have no trouble with recursion. It seems like this is omitted because the implementation cannot reliably handle it. I have not looked for sources on that, but I would be shocked if it was computationally infeasible to support.

For now, the parser! macro is an acceptable workaround. I hope the Rust language evolves to be more general and reduce the number of sharp edges in the type system.