Quantcast
Channel: std – Eric Niebler
Viewing all articles
Browse latest Browse all 11

Iterators++, Part 2

$
0
0

Disclaimer: This is a long, boring post about minutia. For serious library wonks only.

This is the third in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. In the first post I explained what proxy iterators are (an iterator like vector<bool>‘s that, when dereferenced, returns a proxy object rather than a real reference) and three specific difficulties they cause in today’s STL:

  1. What, if anything, can we say in general about the relationship between an iterator’s value type and its reference type?
  2. How do we constrain higher-order algorithms like for_each and find_if that take functions that operate on a sequence’s elements?
  3. How do we implement algorithms that must swap and move elements around, like sort and reverse?

In the second post, I zoomed in on the problem (3) and showed how the existing std::iter_swap API could be pressed into service, along with a new API that I propose: std::iter_move. Together, these APIs give an iterator a channel through which to communicate to the algorithms how its elements should be swapped and moved. With the addition of the iter_move API, iterators pick up a new associated type: rvalue_reference, which can live in std::iterator_traits alongside the existing value_type and reference associated types.

In this post, I’ll dig into the first problem: how we define in code what an iterator is.

Values and References

As in the first two articles, I’ll use the zip view to motivate the discussion, because it’s easy to grok and yet totally bedeviling for the STL algorithms. Recall that zip lazily adapts two sequences by making them look like one sequence of pairs, as demonstrated below:

std::vector<int> x{1,2,3,4};
std::vector<int> y{9,8,7,6};

using namespace ranges;
auto zipped = view::zip(x, y);

assert(*zipped.begin() == std::make_pair(1,9));
assert(&(*zipped.begin()).first == &x[0]);

As the two assertions above show, dereferencing a zip iterator yields a pair, and that the pair is actually a pair of references, pointing into the underlying sequences. The zip range above has the following associated types:

Associated type… … for the zip view
value_type pair<int, int>
reference pair<int &, int &>
rvalue_reference pair<int &&, int &&>

With Concepts coming to C++, we’re going to need to say in code what an iterator is. The Palo Alto TR, published in 2012, takes a stab at it: an InputIterator is Readable and Incrementable, where Readable is defined as follows:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        { *i } -> const ValueType<I> &;
    };

This says that a Readable type has an associated ValueType. It also says that *i is a valid expression, and that the result of *i must be convertible to const ValueType<I> &. This is fine when *i returns something simple like a real reference. But when it returns a proxy reference, like the zip view does, it causes problems.

Substituting a zip iterator into the requires clause above results in something like this:

const pair<int,int>& x = *i;

This tries to initialize x with a pair<int&, int&>. This actually works in a sense; the temporary pair<int &, int &> object is implicitly converted into a temporary pair<int, int> by copying the underlying integers, and that new pair is bound to the const & because temporaries can bind to const references.

But copying values is not what we want or expect. If instead of ints, we had pairs of some move-only type like unique_ptr, this wouldn’t have worked at all.

So the Readable concept needs to be tweaked to handle proxy references. What can we do?

One simple way to make the zip iterator model the Readable concept is to simply remove the requirement that *i be convertible to const ValueType<I>&. This is unsatisfying. Surely there is something we can say about the relationship between an iterator’s reference type and its value type. I think there is, and there’s a hint in the way the Palo Alto TR defines the EqualityComparable constraint.

Common Type Constraints

What do you think about code like this?

vector<string> strs{"three", "blind", "mice"};
auto it = find(strs.begin(), strs.end(), "mice");

Seems reasonable, right? This searches a range of string‘s for a char const*. This should this work, even though it’s looking for an orange in a bucket of apples. The orange is sufficiently apple-like, and because we know how to compare apples and oranges; i.e., there is an operator== that compares strings with char const*. But what does “sufficiently apple-like” mean? If we are ever to constrain the find algorithm with Concepts, we need to be able to say in code what “apple-like” means for any apple and any orange.

The Palo Alto TR doesn’t think that the mere existence of an operator== is enough. Instead, it defines the cross-type EqualityComparable concept as follows:

template< typename T1, typename T2 >
concept bool EqualityComparable =
    EqualityComparable<T1> &&
    EqualityComparable<T2> &&
    Common<T1, T2> &&
    EqualityComparable< std::common_type_t<T1, T2> > &&
    requires(T1 a, T2 b) {
        { a == b } -> bool;
        { b == a } -> bool;
        { a != b } -> bool;
        { b != a } -> bool;
        /* axioms:
            using C = std::common_type_t<T1, T2>;
            a == b <=> C{a} == C{b};
            a != b <=> C{a} != C{b};
            b == a <=> C{b} == C{a};
            b != a <=> C{b} != C{a};
        */
    };

In words, what this says is for two different types to be EqualityComparable, they each individually must be EqualityComparable (i.e., with themselves), they must be comparable with each other, and (the key bit) they must share a common type which is also EqualityComparable, with identical semantics.

The question then becomes: do std::string and char const * share a common type, to which they can both be converted, and which compares with the same semantics? In this case, the answer is trivial: std::string is the common type.

Aside: why does the Palo Alto TR place this extra CommonType requirement on the argument to find when surely that will break some code that works and is “correct” today? It’s an interesting question. The justification is mathematical and somewhat philosophical: when you compare things for equality, you are asking if they have the same value. Just because someone provides an operator== to compare, say, an Employee with a SocialSecurityNumber doesn’t make an employee a social security number, or vice versa. If we want to be able to reason mathematically about our code (and we do), we have to be able to substitute like for like. Being able to apply equational reasoning to our programs is a boon, but we have to play by its rules.

Readable and Common

You may be wondering what any of this have to do with the Readable concept. Let’s look again at the concept as the Palo Alto TR defines it:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        { *i } -> const ValueType<I> &;
    };

To my mind, what this is trying to say is there there is some substitutability, some mathematical equivalence, between an iterator’s reference type and its value type. EqualityComparable uses Common to enforce that substitutability. What if we tried to fix Readable in a similar way?

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires Common< ValueType<I>, decltype(*i) >;
    };

Here we’re saying that for Readable types, the reference type and the value type must share a common type. The common type is computed using something like std::common_type_t, which basically uses the ternary conditional operator (?:). (I say “something like” since std::common_type_t isn’t actually up to the task. See lwg2408 and lwg2465.)

Sadly, this doesn’t quite solve the problem. If you try to do common_type_t<unique_ptr<int>, unique_ptr<int>&> you’ll see why. It doesn’t work, despite the fact that the answer seems obvious. The trouble is that common_type always strips top-level const and reference qualifiers before testing for the common type with the conditional operator. For move-only types, that causes the conditional operator to barf.

I’ve always found it a bit odd that common_type decays its arguments before testing them. Sometimes that’s what you want, but sometimes (like here) its not. Instead, what we need is a different type trait that test for the common type, but preserves reference and cv qualifications. I call it common_reference. It’s a bit of a misnomer though, since it doesn’t always return a reference type, although it might.

The common reference of two types is the minimally qualified type to which objects of both types can bind. common_reference will try to return a reference type if it can, but fall back to a value type if it must. Here’s some examples to give you a flavor:

Common reference… … result
common_reference_t<int &, int const &> int const &
common_reference_t<int &&, int &&> int &&
common_reference_t<int &&, int &> int const &
common_reference_t<int &, int> int

With a common_reference type trait, we could define a CommonReference concept and specify Readable in terms of it, as follows:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires CommonReference<
            ValueType<I> &,
            decltype(*i) && >;
    };

The above concept requires that there is some common reference type to which both *i and a mutable object of the iterator’s value type can bind.

This, I think, is sufficiently general to type check all the iterators that are valid today, as well as iterators that return proxy references (though it takes some work to see that). We can further generalize this to accommodate the iter_move API I described in my previous post:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires CommonReference<
            ValueType<I> &,
            decltype(*i) && >;          // (1)
        requires CommonReference<
            decltype(iter_move(i)) &&,
            decltype(*i) && >;          // (2)
        requires CommonReference<
            ValueType<I> const &,
            decltype(iter_move(i)) &&>; // (3)
    };

OK, let’s see how this works in practice.

Iterators and CommonReference

First, let’s take the easy case of an iterator that returns a real reference like int&. The requirements are that its value type, reference type, and rvalue reference type satisfy the three CommonReference constraints above. (1) requires a common reference between int& and int&. (2), between int&& and int&, and (3) between int const& and int&&. These are all demonstrably true, so this iterator is Readable.

But what about the zip iterator? Things here are much trickier.

The three common reference constraints for the zip iterator amount to this:

Common reference… … result
common_reference_t<
pair<int,int> &,
pair<int&,int&> &&>
???
common_reference_t<
pair<int&&,int&&> &&,
pair<int&,int&> &&>
???
common_reference_t<
pair<int,int> const &,
pair<int&&,int&&> &&>
???

Yikes. How is the common_reference trait supposed to evaluate this? The ternary conditional operator is just not up to the task.

OK, let’s first imagine what we would like the answers to be. Taking the last one first, consider the following code:

void foo( pair< X, Y > p );

pair<int,int> const & a = /*...*/;
pair<int &&,int &&> b {/*...*/};

foo( a );
foo( move(b) );

If there are types that we can pick for X and Y that make this compile, then we can make pair<X,Y> the “common reference” for pair<int&&,int&&>&& and pair<int,int> const &. Indeed there are: X and Y should both be int const &.

In fact, for each of the CommonReference constraints, we could make the answer pair<int const&,int const&> and be safe. So in principle, our zip iterator can model the Readable concept. W00t.

But look again at this one:

common_reference_t<pair<int,int> &, pair<int&,int&> &&>

If this coughs up pair<int const&,int const&> then we’ve lost something in the translation: the ability to mutate the elements of the pair. In an ideal world, the answer would be pair<int&,int&> because a conversion from both pair<int,int>& and pair<int&,int&>&& would be safe and meets the “minimally qualified” spirit of the common_reference trait. But this code doesn’t compile:

void foo( pair< int&,int& > p );

pair<int,int> a;
pair<int&,int&> b {/*...*/};

foo( a );       // ERROR here
foo( move(b) );

Unfortunately, pair doesn’t provide this conversion, even though it would be safe in theory. Is that a defect? Perhaps. But it’s something we need to work with.

Long story short, the solution I went with for range-v3 is to define my own pair-like type with the needed conversions. I call it common_pair and it inherits from std::pair so that things behave as you would expect. With common_pair and a few crafty specializations of common_reference, the Readable constraints are satisfied for the zip iterator as follows:

Common reference… … result
common_reference_t<
pair<int,int> &,
common_pair<int&,int&> &&>
common_pair<int&,int&>
common_reference_t<
common_pair<int&&,int&&> &&,
common_pair<int&,int&> &&>
common_pair<int const&,int const&>
common_reference_t<
pair<int,int> const &,
common_pair<int&&,int&&> &&>
common_pair<int const&,int const&>

Computing these types is not as tricky as it may appear at first. For types like pair<int,int>& and common_pair<int&,int&>&&, it goes like this:

  1. Distribute any top-level ref and cv qualifiers to the members of the pair. pair<int,int>& becomes pair<int&,int&>, and common_pair<int&,int&>&& becomes common_pair<int&,int&>.
  2. Compute the element-wise common reference, and bundle the result into a new common_pair, resulting in common_pair<int&,int&>.

Generalizing

Our zip iterator, with enough ugly hackery, can model our re-specified Readable concept. That’s good, but what about other proxy reference types, like vector<bool>‘s? If vector<bool>‘s reference type is bool_ref, then we would need to specialize common_reference such that the Readable constraints are satisfied. This will necessarily involve defining a type such that it can be initialized with either a bool_ref or with a bool&. That would be a decidedly weird type, but it’s not impossible. (Imagine a variant<bool&,bool_ref> if you’re having trouble visualizing it.)

Getting vector<bool>‘s iterators to fit the mold is an ugly exercise in hackery, and actually using its common reference (the variant type) would incur a performance hit for every read and write. But the STL doesn’t actually need to use it. It just needs to exist.

What is the point of jumping through these hoops to implement an inefficient type that in all likelihood will never actually be used? This is going to be unsatisfying for many, but the answer is for the sake of mathematical rigour. There must be some substitutability relationship between an iterator’s reference type and its value type that is enforceable. Requiring that they share a common reference is the best I’ve come up with so far. And as it turns out, this “useless” type actually does have some uses, as we’ll see in the next installment.

Summary

So here we are. There is a way to define the Readable concept — and hence the InputIterator concept — in a way that is general enough to permit proxy iterators while also saying something meaningful and useful about an iterator’s associated types. Actually defining a proxy iterator such that it models this concept is no small feat and requires extensive amounts of hack work. BUT IT’S POSSIBLE.

One could even imagine defining a Universal Proxy Reference type that takes a getter and setter function and does all the hoop jumping to satisfy the Iterator concepts — one proxy reference to rule them all, if you will. That’s left as an exercise for the reader.

If you made it this far, congratulations. You could be forgiven for feeling a little let down; this solution is far from ideal. Perhaps it’s just awful enough to spur a real discussion about how we could change the language to improve the situation.

In the next installment, I’ll describe the final piece of the puzzle: how do we write the algorithm constraints such that they permit proxy iterators? Stay tuned.

As always, you can find all code described here in my range-v3 repo on github.


Viewing all articles
Browse latest Browse all 11

Trending Articles