Braden++

One char at a time

Last time I had a brief explanation of the library I want to design. I want to create parser combinators using expression templates, where all information is in the parser types themselves, with no data members.

Writing this library has been a hugely enjoyable experience. I’m sure I’ll find flaws and issues to be addressed, after writing these articles. So let’s start simple and work up to the more complex cases. How simple can we get? How about a type that parses exactly 1 character.


Exactly 1 character

I want a type with a function that takes a string as an input, and returns whether the input starts with a given char. Because I want all parser data available as part of the type itself, the char to match will be a template parameter, and the parsing function will be static.

template <char c>
struct OneChar
{
    static constexpr bool parse(std::string_view input)
    {
        return not input.empty() && input.front() == c;
    }
};

This does exactly what it says on the tin. The following code will print “Success” to cout;

OneChar<'a'> parser;
if (parser.parse("abc"))
    std::cout << "Success";
else
    std::cout << "Failure";


Exactly 1 character part 2 - from a set

Instead of what I wrote above, I found it much more useful to actually take a StaticString as a non-type template parameter to OneChar, instead of just a single char. I wrote the first article for a reason eh? (<- to show I’m Canadian)

This way, we can match one character from a set of characters. By “set” I actually do mean a sorted and unique list of characters. A binary search is less efficient than a linear search in this case, because of the small list size (this is anecdotal for my code; please benchmark your own code). However, even with that being true, keeping the list unique will help. There’s no need to check the same character twice, and there won’t be multiple equivalent-but-separate template instantiations with the same characters in different orders and quantities.

I’m renaming it AnyOf, instead of OneChar.

template <StaticString str>
requires SortedAndUniqued<str>
struct AnyOf
{
    static constexpr bool parse(std::string_view input)
    {
        return not input.empty() && str.contains(input.front());
    }
};

This looks basically the same as before. But let’s add a few things to our StaticString so it compiles.


Adding a few things to StaticString

These functions aren’t strictly necessary. They could obviously be implemented manually at the call site, but I plan on reusing them in other places. I think size(), view(), and contains() all make sense as member functions, since they fit a large variety of purposes. On the other hand, functions like is_sorted_and_uniqued() (the implementation of the concept SortedAndUniqued) make more sense as free functions. There’s an infinite set of “querying” functions we could write, and we don’t want to make our class infinitely large. As they say, “Classes are made of Velcro”.

template <std::size_t N>
struct StaticString
{
    // ...
    constexpr std::string_view view() const
    {
        return std::string_view(data.data(), N);
    }

    constexpr bool contains(char c) const
    {
        return view().find(c) != std::string_view::npos;
    }

    constexpr std::size_t size() const
    {
        return N;
    }
    // ...
}

constexpr bool is_sorted_and_uniqued(const char* arr, std::size_t N)
{
    if (N <= 1)
        return true;

    for (std::size_t i = 0; i != N - 1; ++i)
        if (arr[i + 1] <= arr[i])
            return false;
    return true;
}

template <StaticString str>
concept SortedAndUniqued = is_sorted_and_uniqued(str.data.data(), str.size());

I would like to use std::views::slide for is_sorted_and_uniqued(), but that’s in C++23 and I want to support C++20. Maybe this will be motivation to remove std::ranges from this class entirely, but that’ll be a topic for another time. For now, I’m writing is_sorted_and_uniqued() as a non-template function, to be used in the concept SortedAndUniqued.

I also think it’s important to add a constructor and a deduction guide from char to a StaticString<1>, so that this AnyOf<'a'> works instead of requiring AnyOf<"a">. It’s a minor detail, but I prefer it. Did you catch the usage of OneChar<'a'> above?

template <std::size_t N>
struct StaticString
{
    // ...
    constexpr StaticString(char c) requires (N == 1)
	{
		data[0] = c;
	}
    // ...
}

StaticString(char) -> StaticString<1>;


AnyOf in practice, feat. compile time testing

For now, this is what using an AnyOf parser will look like.

AnyOf<"ab"> parser;
if (parser.parse("abc"))
    std::cout << "Success - abc\n";
if (parser.parse("bca"))
    std::cout << "Success - bca\n";
if (not parser.parse("cab"))
    std::cout << "Failure - cab\n";

All of these lines will be printed. Even better, because we made the parse() function constexpr, then we can check the correctness at compile time! Like so:

AnyOf<"ab"> parser;
static_assert(parser.parse("abc"));
static_assert(parser.parse("bca"));
static_assert(not parser.parse("cab"));

I’m a huge fan of compile time testing. If my code compiles, then it’s correct. How amazing is that?!

Also notice that parser is a run time variable, and yet we can use it inside of a static_assert. This is because parse() is a static function, so it’s irrelevant whether the object itself is constexpr or not. In terms of the semantics, it’s more precise to instead write decltype(parser)::parse(), but it’s not as nice to read. And it’s exactly equivalent either way.


A problem lurking

Back in December, I was working on Advent of Code, and I encountered an MSVC compiler bug that made my AnyOf parser give different results at compile time and run time. But I’ll leave that story for next time.

If you have been, thanks for reading! Here is the code written in this article.