Quantcast
Viewing all articles
Browse latest Browse all 11

Iterators++, Part 1

In the last post, I described the so-called proxy iterator problem: the fact that iterators that return proxy references instead of real references don’t sit comfortably within the STL’s framework. Real, interesting, and useful iterators fall foul of this line, iterators like vector<bool>‘s or like the iterator of the zip view I presented. In this post, I investigate what we could do to bring proxy iterators into the fold — what it means for both the iterator concepts and for the algorithms. Since I’m a library guy, I restrict myself to talking about pure library changes.

Recap

As in the last post, we’ll use the zip view to motivate the discussion. Given two sequences like:

vector<int> x{1,2,3,4};
vector<int> y{9,8,7,6};

…we can create a view by “zipping” the two into one, where each element of the view is a pair of corresponding elements from x and y:

using namespace ranges;
auto rng = view::zip(x, y);

assert(*rng.begin() == make_pair(1,9));

The type of the expression “*rng.begin()” — the range’s reference type — is pair<int&,int&>, and the range’s value type is pair<int,int>. The reference type is an example of a proxy: an object that stands in for another object, or in this case two other objects.

Since both x and y are random access, the resulting zip view should be random access, too. But here we run foul of STL’s “real reference” requirement: for iterators other than input iterators, the expression *it must return a real reference. Why? Good question! The requirement was added sometime while the STL was being standardized. I can only guess it was because the committee didn’t know what it meant to, say, sort or reverse elements that aren’t themselves persistent in memory, and they didn’t know how to communicate to the algorithms that a certain temporary object (the proxy) is a stand-in for a persistent object. (Maybe someone who was around then can confirm or deny.)

The real-reference requirement is quite restrictive. Not only does it mean the zip view can’t be a random access sequence, it also means that you can’t sort or reverse elements through a zip view. It’s also the reason why vector<bool> is not a real container.

But simply dropping the real-reference requirement isn’t enough. We also need to say what it means to sort and reverse sequences that don’t yield real references. In the last post, I described three specific problems relating to constraining and implementing algorithms in the presence of proxy references.

  1. What, if anything, can we say about the relationship between an iterator’s value type and its reference type?
  2. How do we constrain higher-order algorithms like for_each and find_if that take functions that operate on a sequence’s elements?
  3. How do we implement algorithms that must swap and move elements around, like sort?

Let’s take the last one first.

Swapping and Moving Elements

If somebody asked you in a job interview to implement std::reverse, you might write something like this:

template< class BidiIter >
void reverse( BidiIter begin, BidiIter end )
{
    using std::swap;
    for(; begin != end && begin != --end; ++begin)
        swap(*begin, *end);
}

Congratulations, you’re hired. Now, if the interviewer asked you whether this algorithm works on the zip view I just described, what would you say? The answer, as you may have guessed, is no. There is no overload of swap that accepts pair rvalues. Even if there were, we’re on thin ice here with the zip view’s proxy reference type. The default swap implementation looks like this:

template< class T >
void swap( T & t, T & u )
{
    T tmp = move(u);
    u = move(t);
    t = move(tmp);
}

Imagine what happens when T is pair<int&,int&>. The first line doesn’t move any values; tmp just aliases the values referred to by u. The next line stomps the values in u, which mutates tmp because it’s an alias. Then we copy those stomped values back to t. Rather than swapping values, this makes them both equal to t. Oops.

If at this point you’re smugly saying to yourself that pair has its own swap overload that (almost) does the right thing, you’re very smart. Shut up. But if you’re saying that the above is not a standard-conforming reverse implementation because, unlike all the other algorithms, reverse is required to use iter_swap, then very good! That’s the clue to unraveling this whole mess.

iter_swap

iter_swap is a thin wrapper around swap that takes iterators instead of values and swaps the elements they refer to. It’s an exceedingly useless function, since iter_swap(a,b) is pretty much required to just call swap(*a,*b). But what if we allowed it to be a bit smarter? What if iter_swap were a full-fledged customization point that allowed proxied sequences to communicate to the algorithms how their elements should be swapped?

Imagine the zip view’s iterators provided an iter_swap that knew how to truly swap the elements in the underlying sequences. It might look like this:

template< class It1, class It2 >
struct zip_iterator
{
    It1 it1;
    It2 it2;

    /* ... iterator interface here... */

    friend void iter_swap(zip_iterator a, zip_iterator b)
    {
        using std::iter_swap;
        iter_swap(a.it1, b.it1);
        iter_swap(a.it2, b.it2);
    }
};

Now we would implement reverse like this:

template< class BidiIter >
void reverse( BidiIter begin, BidiIter end )
{
    using std::iter_swap;
    for(; begin != end && begin != --end; ++begin)
        iter_swap(begin, end);
}

Voilà! Now reverse works with zip views. That was easy. All that is required is (a) to advertise iter_swap as a customization point, and (b) use iter_swap consistently throughout the standard library, not just in reverse.

iter_move

We haven’t fixed the problem yet. Some algorithms don’t just swap elements; they move them. For instance stable_sort might allocate a temporary buffer and move elements into it while it works. You can’t use iter_swap to move an element into raw storage. But we can use a play from the iter_swap playbook to solve this problem. Let’s make an iter_move customization point that gives iterators a way to communicate how to move values out of the sequence.

iter_move‘s default implementation is almost trivial:

template< class I,
    class R = typename iterator_traits< I >::reference >
conditional_t<
    is_reference< R >::value,
    remove_reference_t< R > &&,
    R >
iter_move( I it )
{
    return move(*it);
}

The only tricky bit is the declaration of the return type. If *it returns a temporary, we just want to return it by value. Otherwise, we want to return it by rvalue reference. If you pass a vector<string>::iterator to iter_move, you get back a string && as you might expect.

How does the zip view implement iter_move? It’s not hard at all:

template< class It1, class It2 >
struct zip_iterator
{
    It1 it1;
    It2 it2;

    /* ... iterator interface here... */

    friend auto iter_move(zip_iterator a)
    {
        using std::iter_move;
        using RRef1 = decltype(iter_move(a.it1));
        using RRef2 = decltype(iter_move(a.it2));
        return pair<RRef1, RRef2>{
            iter_move(a.it1),
            iter_move(a.it2)
        };
    }
};

The algorithms can use iter_move as follows:

// Move an element out of the sequence and into a temporary
using V = typename iterator_traits< I >::value_type;
V tmp = iter_move( it );
// Move the value back into the sequence
*it = move( tmp );

As an aside, this suggests a more general default implementation of iter_swap:

template< class I >
void iter_swap( I a, I b )
{
    using V = typename iterator_traits< I >::value_type;
    V tmp = iter_move( a );
    *a = iter_move( b );
    *b = move( tmp );
}

Now proxy sequences like zip only have to define iter_move and they gets a semantically correct iter_swap for free. It’s analogous to how the default std::swap is defined in terms of std::move. (Doing it this way doesn’t pick up user-defined overloads of swap. That’s bad. There’s a work-around, but it’s beyond the scope of this post.)

For a zip view that has value type pair<T,U> and reference type pair<T&,U&>, the return type of iter_move is pair<T&&,U&&>. Makes perfect sense. Take another look at the default implementation of iter_swap above and satisfy yourself that it correctly swaps zipped elements, even if the underlying sequences have move-only value types.

One final note about iter_move: the implication is that to support proxied sequences, iterators need an extra associated type: the return type of iter_move. We can call it rvalue_reference and put it in iterator_traits alongside value_type and reference.

Alternate Design

I find the above design clean and intuitive. But it raises an interesting question: is it OK that iter_swap(a,b) and swap(*a,*b) might mean different things? Personally I think that’s OK, but let’s imagine for a moment that it’s not. What else could we do?

An obvious alternate design is to overload swap for proxy references to swap the objects they refer to. Let’s imagine we add the following overload to namespace std:

template< class T, class U >
void swap( pair< T&, U& > && a, pair< T&, U& > && b )
{
    swap(a.first, b.first);
    swap(a.second, b.second);
}

With enough SFINAE magic we could further generalize this to support swapping pairs of proxy references, but let’s stick with this. I could live with it.

But as before, this isn’t enough; we would also need to overload move to take a pair<T&,U&> and return a pair<T&&,U&&>. And this is where I start getting uncomfortable, because move is used everywhere and it’s currently not a customization point. How much code is out there that assumes the type of a move expression is <some-type>&&? What breaks when that’s no longer true?

Purely as a matter of library evolution, overloading move that way for pairs of references is a non-starter because it would be changing the meaning of existing code. We could avoid the problem by changing zip‘s reference type from pair<T&,U&> to magic_proxy_pair< T&, U& > and overloading swap and move on that. magic_proxy_pair would inherit from pair, so most code would be none the wiser. Totally valid design.

Summary, For Now

I’ve run long at the mouth, and I still have two more issues to deal with, so I’ll save them for another post. We’ve covered a lot of ground. With the design suggested above, algorithms can permute elements in proxied sequences with the help of iter_swap and iter_move, and iterators get a brand new associated type called rvalue_reference.

Whether you prefer this design or the other depends on which you find more distasteful:

  1. iter_swap(a,b) can be semantically different than swap(*a,*b), or
  2. move is a customization point that is allowed to return some proxy rvalue reference type.

In the next installment, I’ll describe what we can say about the relationship between an iterator’s value type and its reference type (and now its rvalue reference type), and how we can constrain higher-order algorithms like for_each and find_if.

As always, you can find all code described here in my range-v3 repo on github.


Viewing all articles
Browse latest Browse all 11

Trending Articles