Disclaimer: This is a long, boring post about minutia. For serious library wonks only.
This is the third in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. In the first post I explained what proxy iterators are (an iterator like vector<bool>
‘s that, when dereferenced, returns a proxy object rather than a real reference) and three specific difficulties they cause in today’s STL:
- What, if anything, can we say in general about the relationship between an iterator’s value type and its reference type?
- How do we constrain higher-order algorithms like
for_each
andfind_if
that take functions that operate on a sequence’s elements? - How do we implement algorithms that must swap and move elements around, like
sort
andreverse
?
In the second post, I zoomed in on the problem (3) and showed how the existing std::iter_swap
API could be pressed into service, along with a new API that I propose: std::iter_move
. Together, these APIs give an iterator a channel through which to communicate to the algorithms how its elements should be swapped and moved. With the addition of the iter_move
API, iterators pick up a new associated type: rvalue_reference
, which can live in std::iterator_traits
alongside the existing value_type
and reference
associated types.
In this post, I’ll dig into the first problem: how we define in code what an iterator is.
Values and References
As in the first two articles, I’ll use the zip
view to motivate the discussion, because it’s easy to grok and yet totally bedeviling for the STL algorithms. Recall that zip
lazily adapts two sequences by making them look like one sequence of pair
s, as demonstrated below:
std::vector<int> x{1,2,3,4}; std::vector<int> y{9,8,7,6}; using namespace ranges; auto zipped = view::zip(x, y); assert(*zipped.begin() == std::make_pair(1,9)); assert(&(*zipped.begin()).first == &x[0]);
As the two assertions above show, dereferencing a zip
iterator yields a pair
, and that the pair is actually a pair of references, pointing into the underlying sequences. The zip
range above has the following associated types:
Associated type… | … for the zip view |
---|---|
value_type |
pair<int, int> |
reference |
pair<int &, int &> |
rvalue_reference |
pair<int &&, int &&> |
With Concepts coming to C++, we’re going to need to say in code what an iterator is. The Palo Alto TR, published in 2012, takes a stab at it: an InputIterator
is Readable
and Incrementable
, where Readable
is defined as follows:
template< typename I > concept bool Readable = Semiregular<I> && requires(I i) { typename ValueType<I>; { *i } -> const ValueType<I> &; };
This says that a Readable
type has an associated ValueType
. It also says that *i
is a valid expression, and that the result of *i
must be convertible to const ValueType<I> &
. This is fine when *i
returns something simple like a real reference. But when it returns a proxy reference, like the zip
view does, it causes problems.
Substituting a zip
iterator into the requires
clause above results in something like this:
const pair<int,int>& x = *i;
This tries to initialize x
with a pair<int&, int&>
. This actually works in a sense; the temporary pair<int &, int &>
object is implicitly converted into a temporary pair<int, int>
by copying the underlying integers, and that new pair is bound to the const &
because temporaries can bind to const references.
But copying values is not what we want or expect. If instead of int
s, we had pairs of some move-only type like unique_ptr
, this wouldn’t have worked at all.
So the Readable
concept needs to be tweaked to handle proxy references. What can we do?
One simple way to make the zip
iterator model the Readable
concept is to simply remove the requirement that *i
be convertible to const ValueType<I>&
. This is unsatisfying. Surely there is something we can say about the relationship between an iterator’s reference type and its value type. I think there is, and there’s a hint in the way the Palo Alto TR defines the EqualityComparable
constraint.
Common Type Constraints
What do you think about code like this?
vector<string> strs{"three", "blind", "mice"}; auto it = find(strs.begin(), strs.end(), "mice");
Seems reasonable, right? This searches a range of string
‘s for a char const*
. This should this work, even though it’s looking for an orange in a bucket of apples. The orange is sufficiently apple-like, and because we know how to compare apples and oranges; i.e., there is an operator==
that compares string
s with char const*
. But what does “sufficiently apple-like” mean? If we are ever to constrain the find
algorithm with Concepts, we need to be able to say in code what “apple-like” means for any apple and any orange.
The Palo Alto TR doesn’t think that the mere existence of an operator==
is enough. Instead, it defines the cross-type EqualityComparable
concept as follows:
template< typename T1, typename T2 > concept bool EqualityComparable = EqualityComparable<T1> && EqualityComparable<T2> && Common<T1, T2> && EqualityComparable< std::common_type_t<T1, T2> > && requires(T1 a, T2 b) { { a == b } -> bool; { b == a } -> bool; { a != b } -> bool; { b != a } -> bool; /* axioms: using C = std::common_type_t<T1, T2>; a == b <=> C{a} == C{b}; a != b <=> C{a} != C{b}; b == a <=> C{b} == C{a}; b != a <=> C{b} != C{a}; */ };
In words, what this says is for two different types to be EqualityComparable, they each individually must be EqualityComparable (i.e., with themselves), they must be comparable with each other, and (the key bit) they must share a common type which is also EqualityComparable, with identical semantics.
The question then becomes: do std::string
and char const *
share a common type, to which they can both be converted, and which compares with the same semantics? In this case, the answer is trivial: std::string
is the common type.
Aside: why does the Palo Alto TR place this extra CommonType requirement on the argument to find
when surely that will break some code that works and is “correct” today? It’s an interesting question. The justification is mathematical and somewhat philosophical: when you compare things for equality, you are asking if they have the same value. Just because someone provides an operator==
to compare, say, an Employee
with a SocialSecurityNumber
doesn’t make an employee a social security number, or vice versa. If we want to be able to reason mathematically about our code (and we do), we have to be able to substitute like for like. Being able to apply equational reasoning to our programs is a boon, but we have to play by its rules.
Readable and Common
You may be wondering what any of this have to do with the Readable
concept. Let’s look again at the concept as the Palo Alto TR defines it:
template< typename I > concept bool Readable = Semiregular<I> && requires(I i) { typename ValueType<I>; { *i } -> const ValueType<I> &; };
To my mind, what this is trying to say is there there is some substitutability, some mathematical equivalence, between an iterator’s reference type and its value type. EqualityComparable
uses Common
to enforce that substitutability. What if we tried to fix Readable
in a similar way?
template< typename I > concept bool Readable = Semiregular<I> && requires(I i) { typename ValueType<I>; requires Common< ValueType<I>, decltype(*i) >; };
Here we’re saying that for Readable
types, the reference type and the value type must share a common type. The common type is computed using something like std::common_type_t
, which basically uses the ternary conditional operator (?:
). (I say “something like” since std::common_type_t
isn’t actually up to the task. See lwg2408 and lwg2465.)
Sadly, this doesn’t quite solve the problem. If you try to do common_type_t<unique_ptr<int>, unique_ptr<int>&>
you’ll see why. It doesn’t work, despite the fact that the answer seems obvious. The trouble is that common_type
always strips top-level const and reference qualifiers before testing for the common type with the conditional operator. For move-only types, that causes the conditional operator to barf.
I’ve always found it a bit odd that common_type
decays its arguments before testing them. Sometimes that’s what you want, but sometimes (like here) its not. Instead, what we need is a different type trait that test for the common type, but preserves reference and cv qualifications. I call it common_reference
. It’s a bit of a misnomer though, since it doesn’t always return a reference type, although it might.
The common reference of two types is the minimally qualified type to which objects of both types can bind. common_reference
will try to return a reference type if it can, but fall back to a value type if it must. Here’s some examples to give you a flavor:
Common reference… | … result |
---|---|
common_reference_t<int &, int const &> |
int const & |
common_reference_t<int &&, int &&> |
int && |
common_reference_t<int &&, int &> |
int const & |
common_reference_t<int &, int> |
int |
With a common_reference
type trait, we could define a CommonReference
concept and specify Readable
in terms of it, as follows:
template< typename I > concept bool Readable = Semiregular<I> && requires(I i) { typename ValueType<I>; requires CommonReference< ValueType<I> &, decltype(*i) && >; };
The above concept requires that there is some common reference type to which both *i
and a mutable object of the iterator’s value type can bind.
This, I think, is sufficiently general to type check all the iterators that are valid today, as well as iterators that return proxy references (though it takes some work to see that). We can further generalize this to accommodate the iter_move
API I described in my previous post:
template< typename I > concept bool Readable = Semiregular<I> && requires(I i) { typename ValueType<I>; requires CommonReference< ValueType<I> &, decltype(*i) && >; // (1) requires CommonReference< decltype(iter_move(i)) &&, decltype(*i) && >; // (2) requires CommonReference< ValueType<I> const &, decltype(iter_move(i)) &&>; // (3) };
OK, let’s see how this works in practice.
Iterators and CommonReference
First, let’s take the easy case of an iterator that returns a real reference like int&
. The requirements are that its value type, reference type, and rvalue reference type satisfy the three CommonReference
constraints above. (1) requires a common reference between int&
and int&
. (2), between int&&
and int&
, and (3) between int const&
and int&&
. These are all demonstrably true, so this iterator is Readable
.
But what about the zip
iterator? Things here are much trickier.
The three common reference constraints for the zip
iterator amount to this:
Common reference… | … result |
---|---|
common_reference_t< pair<int,int> &, pair<int&,int&> &&> |
??? |
common_reference_t< pair<int&&,int&&> &&, pair<int&,int&> &&> |
??? |
common_reference_t< pair<int,int> const &, pair<int&&,int&&> &&> |
??? |
Yikes. How is the common_reference
trait supposed to evaluate this? The ternary conditional operator is just not up to the task.
OK, let’s first imagine what we would like the answers to be. Taking the last one first, consider the following code:
void foo( pair< X, Y > p ); pair<int,int> const & a = /*...*/; pair<int &&,int &&> b {/*...*/}; foo( a ); foo( move(b) );
If there are types that we can pick for X
and Y
that make this compile, then we can make pair<X,Y>
the “common reference” for pair<int&&,int&&>&&
and pair<int,int> const &
. Indeed there are: X
and Y
should both be int const &
.
In fact, for each of the CommonReference
constraints, we could make the answer pair<int const&,int const&>
and be safe. So in principle, our zip
iterator can model the Readable
concept. W00t.
But look again at this one:
common_reference_t<pair<int,int> &, pair<int&,int&> &&>
If this coughs up pair<int const&,int const&>
then we’ve lost something in the translation: the ability to mutate the elements of the pair. In an ideal world, the answer would be pair<int&,int&>
because a conversion from both pair<int,int>&
and pair<int&,int&>&&
would be safe and meets the “minimally qualified” spirit of the common_reference
trait. But this code doesn’t compile:
void foo( pair< int&,int& > p ); pair<int,int> a; pair<int&,int&> b {/*...*/}; foo( a ); // ERROR here foo( move(b) );
Unfortunately, pair
doesn’t provide this conversion, even though it would be safe in theory. Is that a defect? Perhaps. But it’s something we need to work with.
Long story short, the solution I went with for range-v3 is to define my own pair
-like type with the needed conversions. I call it common_pair
and it inherits from std::pair
so that things behave as you would expect. With common_pair
and a few crafty specializations of common_reference
, the Readable
constraints are satisfied for the zip
iterator as follows:
Common reference… | … result |
---|---|
common_reference_t< pair<int,int> &, common_pair<int&,int&> &&> |
common_pair<int&,int&> |
common_reference_t< common_pair<int&&,int&&> &&, common_pair<int&,int&> &&> |
common_pair<int const&,int const&> |
common_reference_t< pair<int,int> const &, common_pair<int&&,int&&> &&> |
common_pair<int const&,int const&> |
Computing these types is not as tricky as it may appear at first. For types like pair<int,int>&
and common_pair<int&,int&>&&
, it goes like this:
- Distribute any top-level ref and cv qualifiers to the members of the pair.
pair<int,int>&
becomespair<int&,int&>
, andcommon_pair<int&,int&>&&
becomescommon_pair<int&,int&>
. - Compute the element-wise common reference, and bundle the result into a new
common_pair
, resulting incommon_pair<int&,int&>
.
Generalizing
Our zip
iterator, with enough ugly hackery, can model our re-specified Readable
concept. That’s good, but what about other proxy reference types, like vector<bool>
‘s? If vector<bool>
‘s reference type is bool_ref
, then we would need to specialize common_reference
such that the Readable
constraints are satisfied. This will necessarily involve defining a type such that it can be initialized with either a bool_ref
or with a bool&
. That would be a decidedly weird type, but it’s not impossible. (Imagine a variant<bool&,bool_ref>
if you’re having trouble visualizing it.)
Getting vector<bool>
‘s iterators to fit the mold is an ugly exercise in hackery, and actually using its common reference (the variant type) would incur a performance hit for every read and write. But the STL doesn’t actually need to use it. It just needs to exist.
What is the point of jumping through these hoops to implement an inefficient type that in all likelihood will never actually be used? This is going to be unsatisfying for many, but the answer is for the sake of mathematical rigour. There must be some substitutability relationship between an iterator’s reference type and its value type that is enforceable. Requiring that they share a common reference is the best I’ve come up with so far. And as it turns out, this “useless” type actually does have some uses, as we’ll see in the next installment.
Summary
So here we are. There is a way to define the Readable
concept — and hence the InputIterator
concept — in a way that is general enough to permit proxy iterators while also saying something meaningful and useful about an iterator’s associated types. Actually defining a proxy iterator such that it models this concept is no small feat and requires extensive amounts of hack work. BUT IT’S POSSIBLE.
One could even imagine defining a Universal Proxy Reference type that takes a getter and setter function and does all the hoop jumping to satisfy the Iterator concepts — one proxy reference to rule them all, if you will. That’s left as an exercise for the reader.
If you made it this far, congratulations. You could be forgiven for feeling a little let down; this solution is far from ideal. Perhaps it’s just awful enough to spur a real discussion about how we could change the language to improve the situation.
In the next installment, I’ll describe the final piece of the puzzle: how do we write the algorithm constraints such that they permit proxy iterators? Stay tuned.
As always, you can find all code described here in my range-v3 repo on github.