Show Lecture.Iterators as a slide show.
CS253 Iterators
Foundation
- Before we talk about iterators, let’s review just what
a
class
can do.
- We all know a
class
can provide public
methods & data.
- It can also provide types.
class Foo {
public:
string bar() { return "Yow!\n"; }
int x = 42;
using cash = float;
};
Foo::cash salary = 2.43;
Foo f;
cout << salary << ' ' << f.x << ' ' << f.bar();
2.43 42 Yow!
Array Traversal
You have an array, and you want to traverse (walk through) it.
int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int i=0; i!=8; ++i)
cout << a[i] << ' ';
2 3 5 7 11 13 17 19
a[i]
is the same as *(a+i)
.
- It requires addition, scaling, and indirection.
- Though they’re often combined in a single machine instruction.
- I usually say
i<8
, rather than i!=8
.
- Similarly, I usually say
i++
rather than ++i
.
- Patience.
Problems with indexing
- We’re really used to traversing an array like that.
- We’re so much used to it, that it seems the right & natural way.
- After all, mathematicians do that with ai, ai+1, ai+2, …
- Trouble is, it doesn’t extend naturally. We can speak of
the nth element of a linked list or a binary tree,
but we don’t compute it that way.
- Instead, we think in terms of pointers.
- We naturally
use pointers to walk a linked list, or to traverse a binary tree.
Array Traversal via Pointers
int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
cout << *p << ' ';
2 3 5 7 11 13 17 19
- This uses pointers.
- Instead of
int i
, we use int *p
. It points to each element,
one-by-one.
p
initially points to &a[0]
, the address of the first element.
a
contains eight elements, a[0]
…a[7]
. Therefore, when
we get to the address &a[8]
, it’s time to stop.
- This is a bit faster: instead of
a[i]
(addition+scaling+indirection) we have just indirection.
- Not really a radical improvement, however.
You can traverse a vector
in the same way:
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
cout << *p << ' ';
2 3 5 7 11 13 17 19
or, avoiding the magic number 8:
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[a.size()]; ++p)
cout << *p << ' ';
2 3 5 7 11 13 17 19
Methods
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (vector<int>::iterator it = a.begin(); it != a.end(); ++it)
cout << *it << ' ';
2 3 5 7 11 13 17 19
iterator
is a type provided by the templated vector
class.
- You can treat it like a pointer.
- You don’t care what type it really is. You know that:
*
and ++
work on it
- you can assign
.begin()
to it
- you can compare it to
.end()
- What type is it?
- might be
int *
, might not (it’s unspecified!)
auto
is your friend
This is prettier:
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto it = a.begin(); it != a.end(); ++it)
cout << *it << ' ';
2 3 5 7 11 13 17 19
All that we did is we used auto
to tell the compiler to
make it
the same type as a.begin()
returns,
namely, vector<int>::iterator
.
This is (nearly) exactly the same, and gorgeous:
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto v : a)
cout << v << ' ';
2 3 5 7 11 13 17 19
The for
loop is defined to use .begin()
and .end()
,
just as the previous code.
It’s simple
Really, the for
-loop like that (sometimes called a for…each loop)
is just a mechanical source-level transformation. It rewrites this:
set<string> s = {"my", "dog", "has", "fleas"};
for (auto v : s)
cout << v << '\n';
dog
fleas
has
my
as this:
set<string> s = {"my", "dog", "has", "fleas"};
for (auto it = s.begin(); it != s.end(); ++it) {
auto v = *it;
cout << v << '\n';
}
dog
fleas
has
my
Really
See this error message:
double zulu;
for (auto v : zulu)
cout << v;
c.cc:2: error: 'begin' was not declared in this scope
“begin”? Why is it complaining about “begin”? Because it
rewrote that for
loop to use .begin()
and .end()
.
Variations
Which is best? Why?
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int v : a)
cout << v << ' ';
2 3 5 7 11 13 17 19
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto v : a)
cout << v << ' ';
2 3 5 7 11 13 17 19
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto &v : a)
cout << v << ' ';
2 3 5 7 11 13 17 19
vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto &v : a)
cout << v << ' ';
2 3 5 7 11 13 17 19
Other containers
The same iterator code works for all STL containers.
forward_list<char> l = {'a', 'c', 'k', 'J'};
for (auto it = l.begin(); it != l.end(); ++it)
cout << *it << ' ';
a c k J
unordered_set<string> u = {"a", "c", "k", "J"};
for (auto it = u.begin(); it != u.end(); ++it)
cout << *it << ' ';
J k c a
set<char> s = {'a', 'c', 'k', 'J'};
for (auto it = s.begin(); it != s.end(); ++it)
cout << *it << ' ';
J a c k
Of the top 100 ♀ + top 100 ♂ U.S. names, 1918–2017, the ones in ASCII
order: Amy, Ann, Betty, Billy, Gary, Harry, Henry, Jack, Jerry, Kelly,
Larry, Mary, Roy, Scott, and Terry.
iterator
type
- What type is
set<int>::iterator
?
- It’s not an
int *
, that’s for sure!
- It’s a user-defined type, contining an internal node pointer,
with
operator*
and operator++
defined.
- Similarly for
list<int>::iterator
and unordered_set<int>::iterator
.
- Most iterator types cannot be a native pointer.
- What happens when you increment a plain pointer?
- It goes to the next memory location.
- Is that what you want for a linked list, a tree, or a hash? No!
- It must be that
operator++
is overloaded.
- Can’t do that for a plain pointer.
- C++ is extensible, not mutable.
- Therefore, most iterators cannot be plain pointers.
iterator vs. pointer
- All pointers are iterators.
- Not all iterators are pointers.
- Some iterators are pointers.
Iterator classifications
- BidirectionalIterator (
list
, set
)
- RandomAccessIterator (
std::
array
, vector
, string
)
++
, --
,
+=
int, -=
int,
+
int, -
int,
iter-
iter,
==
, !=
,
<
, <=
,
>
, >=
,
*
, ->
, =
- ForwardIterator (
forward_list
, unordered_set
)
These are not C++ types. These are descriptions, used to discuss
the behavior of a given iterator.
begin()
& end()
.begin()
& .end()
return iterators. Not necessarily pointers.
.begin()
is, conceptually, a pointer to the
first element of the container.
.end()
is, conceptually, a pointer one past
the end of the container.
- It’s a half-open interval!
string s = "bonehead";
cout << "First char: " << *s.begin() << '\n';
cout << "Badness: " << *s.end() << '\n';
First char: b
Badness: ␀
string s = "genius";
cout << "First char: " << *s.begin() << '\n';
cout << "Last char: " << *(s.end()-1) << '\n';
First char: g
Last char: s
Intervals
- Let’s get the concept of a half-open interval clear,
because it's important.
- In mathematics, [1.0, 3.0] means the infinitude of real numbers,
starting & including 1.0, and ending & including 3.0.
- (4.0, 4.5) means the infinitude of real numbers,
starting at but excluding 4.0, and ending at but excluding 4.5.
- [6.2, 9.0) means the infinitude of real numbers,
starting & including 6.2, and ending at but excluding 9.0.
Therefore, 8.99999 is included, but 9.0 is not.
- C++ positively loves the last one.
Intervals
It’s all about [square brackets]
vs. (round parens).
- [a,b] includes a and b.
- (a,b) doesn’t include a, and doesn’t include b.
- [a,b) includes a, but doesn’t include b.
- This is a half-open interval.
Half-open intervals
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string::iterator c = alpha.begin()+2, f = c+3;
string foo(c, f);
cout << *c << '\n';
cout << *f << '\n';
cout << foo << '\n';
C
F
CDE
-or-
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
auto c = &alpha[2], f = c+3;
cout << *c << '\n';
cout << *f << '\n';
cout << string(c,f) << '\n';
C
F
CDE
foo
is not CDEF
, because it’s a half-open interval.
- Are both
f
variables the same type?
The first is string::iterator
, the second is char *
. Are those the same?
Half-open Intervals
int a[] = {00, 11, 22, 33, 44, 55, 66, 77};
vector<int> v(a+2, a+5);
for (auto n : v)
cout << n << ' ';
22 33 44
00 | 11 | 22 | 33 | 44 | 55 | 66 | 77 |
| | ↑ | | | ↑ | | |
| | start | | | end | | |
The end location is not included.
.front()
& .back()
Some containers have .front()
and .back()
, which return
references to the first and last elements.
list<double> c = {2.3, 5.7, 11.13, 17.19, 23.29};
cout << "First: " << c.front() << '\n';
cout << "Last: " << c.back() << '\n';
First: 2.3
Last: 23.29
- These are not iterators; they are references to the actual data.
- The return type of
list<double>::front()
is double &
,
not list<double>::iterator
.
- Note the difference between the types
double
and double &
.
string name = "Jack";
name.front() = 'H';
cout << name;
Hack
Comparisons, part one
This won’t work:
list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it < l.end(); ++it)
cout << *it << ' ';
c.cc:2: error: no match for 'operator<' in 'it <
l.std::__cxx11::list<std::__cxx11::basic_string<char> >::end()' (operand
types are 'std::_List_iterator<std::__cxx11::basic_string<char> >' and
'std::__cxx11::list<std::__cxx11::basic_string<char> >::iterator' {aka
'std::_List_iterator<std::__cxx11::basic_string<char> >'})
Read the message. It says that <
isn’t defined for those iterators.
Comparisons, part two
This will work:
list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it != l.end(); ++it)
cout << *it << ' ';
kappa alpha gamma
list<>::iterator
is a BidirectionalIterator, not a RandomAccessIterator,
and so <
isn’t defined. What would it compare? The addresses of the
linked list nodes? That’s not useful.
Constructors
Nearly all containers accept a pair of iterators as ctor arguments.
These do not have to be iterators for the same type of container.
char date[] = __DATE__; // E.g., Nov 22 2024
string now(date, date+6);
cout << "month & day: " << now << '\n';
string day_of_month(now.begin()+4, now.begin()+6);
cout << "day: " << day_of_month << '\n';
multiset<int> ms(now.begin(), now.end());
for (auto n : ms)
cout << n << ' ';
cout << '\n';
month & day: Nov 22
day: 22
32 50 50 78 111 118
begin()
and end()
functions
Let’s copy from a C array to a C++ string
:
int fido[] = {68, 79, 71};
string current(fido, fido+2);
cout << current << '\n';
DO
Oh dear, I counted wrong. Why am I counting!? Am I a computer?
int fido[] = {68, 79, 71};
string current(fido.begin(), fido.end());
cout << current << '\n';
c.cc:2: error: request for member 'begin' in 'fido', which is of non-class type
'int [3]'
Alas, fido
has no methods.
begin()
and end()
functions
There are also free functions begin()
and end()
,
which work on arrays (not pointers) and all standard containers:
int fido[] = {68, 79, 71};
string current(begin(fido), end(fido));
cout << current << '\n';
DOG
Try that again
It was crazy to use integer ASCII values. Let’s just
use a C string:
char fido[] = "DOG";
string current(begin(fido), end(fido));
cout << current << '\n';
DOG␀
What fresh hell is this?
What is sizeof(fido)
?
Invalidation
Consider this poor code:
int *p = new int(42);
cout << "Before: " << *p << '\n';
delete p;
cout << "After: " << *p << '\n';
Before: 42
After: 3732
- Nothing mysterious here.
p
was a valid pointer, then it became
invalid.
- Just because a pointer exists, doesn’t mean that it points
to something that you may access.
- I know the address of the White House, but they won’t let me move in.
More Invalidation
Another way to invalidate a pointer:
string *p;
{
string name = "John Jacob Jingleheimer Schmidt";
p = &name;
cout << p->size() << ' ' << *p << '\n';
}
cout << p->size() << ' ' << *p << '\n';
There’s some undefined behavior!
Yet More Invalidation
Yet another way to invalidate a value:
vector<int> &foo() {
vector<int> v = {11,22,33,44,55,66,77};
auto &r = v;
return r;
}
int main() {
for (auto val : foo())
cout << val << '\n';
}
6794
0
1238082857
519613739
55
66
77
More undefined behavior.
Iterator invalidation
vector<long> v = {253};
vector<long>::iterator it = v.begin();
cout << "Before: " << *it << '\n';
for (long i=1; i<1000; i++)
v.push_back(i);
cout << "After: " << *it << '\n';
Before: 253
After: 5561
- Sure,
it
“pointed” to v[0]
just fine, at first.
- As we added all those elements to
v
,
unavoidably, v
’s data got reallocated to make more room.
- The poor iterator referring to
v[0]
continued to point to
the old, obsolete, location.
Lipstick on a pig
Using auto
makes the code prettier, but no better:
vector<long> v = {253};
auto it = v.begin();
cout << "Before: " << *it << '\n';
for (long i=1; i<1000; i++)
v.push_back(i);
cout << "After: " << *it << '\n';
Before: 253
After: 5303
Reservation
Using .reserve()
pre-allocates memory:
vector<long> v = {253};
v.reserve(1005);
auto it = v.begin();
cout << "Before: " << *it << '\n';
for (long i=1; i<1000; i++)
v.push_back(i);
cout << "After: " << *it << '\n';
Before: 253
After: 253
How often?
How often does re-allocation happen? We can find out,
for any particular implemention:
vector<int> v;
for (int i=1; i<=1000; i++) {
auto before = v.capacity();
v.push_back(i);
auto after = v.capacity();
if (before != after)
cout << i << ' ' << after << '\n';
}
1 1
2 2
3 4
5 8
9 16
17 32
33 64
65 128
129 256
257 512
513 1024
.capacity()
: how much memory is allocated
.size()
: how much memory is used
How often?
Similarly, because a std::string
is not much more
than a vector<char>
:
string s;
cout << s.size() << ' ' << s.capacity() << '\n';
for (int i=1; i<10000; i++) {
auto before = s.capacity();
s += 'x';
auto after = s.capacity();
if (before != after)
cout << s.size() << ' ' << after << '\n';
}
0 15
16 30
31 60
61 120
121 240
241 480
481 960
961 1920
1921 3840
3841 7680
7681 15360
What happened to our nice powers of two?
Curious String Behavior
for (string s; s.size()<60; s+="abcde")
cout << s.size() << ' ' << (void *) s.data() << '\n';
0 0x7fffa22aae80
5 0x7fffa22aae80
10 0x7fffa22aae80
15 0x7fffa22aae80
20 0x8c52c0
25 0x8c52c0
30 0x8c52c0
35 0x8c52f0
40 0x8c52f0
45 0x8c52f0
50 0x8c52f0
55 0x8c52f0
Casting? Why can’t we use << s.data()
or << &s[0]
to get the address of the string data?
Small String Optimization
- This is the popular small string optimization.
- Big strings (>15 bytes, for g++ 8) are stored in the heap,
as usual.
- Small strings are kept right in the
string
object,
in the same space as the pointer and lengths.
- Small strings are much faster:
everything’s on the stack, in cache, no pointers!
Small String Optimization
Possible small string implementation:
class string {
private:
size_t size; // actual string length
union {
struct {
size_t alloc_size; // amount of heap data
char *heap; // ptr to heap data
};
char local[16]; // or, put it here
};
public:
char &operator[](size_t i) {
return (i < sizeof(local)) ? return local[i] : return heap[i];
}
… and many more methods …
};
Order Calculation
- Still, the small string optimization only applies to,
well, small strings.
- Bigger data still undergoes reallocation and copying.
- All that reallocation looks expensive.
- However, science is better than first impressions.
- What is the big-O time cost for adding n items to a
vector
?
Big-O
- Let’s push 0…77 onto a
vector
. Reallocation will occur
at power of two sizes:
- Push 0: grow to size=1, copy nothing
- Push 1: grow to size=2, copy 1
int
- Push 2: grow to size=4, copy 2
int
s
- Push 4: grow to size=8, copy 4
int
s
- Push 8: grow to size=16, copy 8
int
s
- Push 16: grow to size=32, copy 16
int
s
- Push 32: grow to size=64, copy 32
int
s
- Push 64: grow to size=128, copy 64
int
s
- That’s 1+2+4+8+16+32+64 copies for 78
int
s, 127÷78≈1.6 copies/int
. O(1)!
Does it scale?
vector<char> v;
int copies = 0, iterations = 1000000;
for (int i=0; i<iterations; i++) {
auto before = v.capacity();
v.push_back(i);
if (before != v.capacity())
copies += before;
}
cout << double(copies)/iterations;
1.04858
Pre-allocation helps
Again, if we know how many items we’re going to add, we can
.reserve()
the space:
vector<int> v;
v.reserve(900);
cout << v.size() << ' ' << v.capacity() << '\n';
for (int i=1; i<1000; i++) {
auto before = v.capacity();
v.push_back(i);
auto after = v.capacity();
if (before != after)
cout << v.size() << ' ' << after << '\n';
}
0 900
901 1800
Half-open interval
.begin()
returns an iterator that “points” to the first element
.end()
returns an iterator that “points” one past the last element
- They form a half-open interval.
- Remember, iterators are often not pointers.
.begin()
and .end()
vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::iterator it = v.begin(); it != v.end(); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
or:
vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.begin(); it != v.end(); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
Why does this fail?
class Foo {
int sum() const {
int total = 0;
for (vector<int>::iterator it = data.begin(); it != data.end(); ++it)
total += *it;
return total;
}
vector<int> data;
};
c.cc:4: error: conversion from '__normal_iterator<const int*,[...]>' to
non-scalar type '__normal_iterator<int*,[...]>' requested
.sum()
is a const
method.
- Therefore, inside
.sum()
, data
is a const vector<int>
.
- Let that sink in for a moment.
- How is a
const
method implemented, anyway?
- It’s not that the compiler checks for changes to member data
inside a
const
method.
- Instead, the compiler simply regards all data members as
const
when compiling a const
method.
- Therefore,
data.begin()
and data.end()
return iterators
of type const_iterator
, not type iterator
.
- They’re overloaded methods!
- You can’t assign
const_iterator
to iterator
.
Restate the problem
This is the same problem:
const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::iterator it = v.begin(); it != v.end(); ++it)
cout << *it << ' ';
c.cc:2: error: conversion from '__normal_iterator<const int*,[...]>' to
non-scalar type '__normal_iterator<int*,[...]>' requested
Solution #1
One solution—use the correct type:
const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
Solution #2
A better solution—let auto
figure it out:
const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.begin(); it != v.end(); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
Solution #3
The best solution—avoid all of this:
const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto val : v)
cout << val << ' ';
1 1 2 3 5 8 13 21 34
.cbegin()
and .cend()
.begin()
is an overloaded method. It returns a const_iterator
if the object is const
, iterator
otherwise.
.end()
is an overloaded method. It returns a const_iterator
if the object is const
, iterator
otherwise.
.cbegin()
: like .begin()
, but always returns const_iterator
.cend()
: like .end()
, but always returns const_iterator
vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.cbegin(); it != v.cend(); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
.rbegin()
and .rend()
.rbegin()
: like .begin()
, but goes the other way
.rend()
: like .end()
, but goes the other way
.crbegin()
: like .cbegin()
, but goes the other way
.crend()
: like .cend()
, but goes the other way
array<int, 9> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.rbegin(); it != v.rend(); ++it)
cout << *it << ' ';
34 21 13 8 5 3 2 1 1
list<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.crbegin(); it != v.crend(); ++it)
cout << *it << ' ';
34 21 13 8 5 3 2 1 1
Don’t get too excited
- Not all containers have the reversed versions.
- For example,
forward_list
, as a singly-linked list,
doesn’t have reverse iterators.
- For some containers,
iterator
and const_iterator
are the same.
- A
set
has inherently-constant iterators, because it’s in order.
If you could change the values via iterators, then it wouldn’t be in
order any more.
- An
unordered_set
, a hash, has its own special order, and
changing the values in place would be bad.
Plain old data types:
How about old C-style data?
int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = a.begin(); it != a.end(); ++it)
cout << *it << ' ';
c.cc:2: error: request for member 'begin' in 'a', which is of non-class type
'int [9]'
That failed miserably. Perhaps it’s because arrays are not objects.
Free functions
Fortunately, there exist begin()
and end()
functions,
which work for C arrays or STL containers.
begin(con)
is defined as, conceptually:
if con is a C-style array
then
con
else
con.begin()
Similarly, end(con)
is:
if con is a C-style array
then
con+sizeof(con)/sizeof(con[0])
else
con.end()
Free functions
int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(a); it != end(a); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
These work for objects, as well:
deque<int> s = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(s); it != end(s); ++it)
cout << *it << ' ';
1 1 2 3 5 8 13 21 34
What use is that? Generality for the sake of generality?
for loop
Consider any for-loop:
forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (double v : fl)
cout << v << ' ';
1 1 2 3 5 8 13 21 34
The compiler turns this into (approximately):
forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(fl); it != end(fl); ++it) {
double v = *it;
cout << v << ' ';
}
1 1 2 3 5 8 13 21 34
Which works for any container type, even C-style arrays.
Efficiency
Foo operator++(int) {
const auto save = *this;
++*this;
return save;
}
- Why do our examples use preincrement (
++it
) instead of postincrement
(it++
)?
- Preincrement’s generally faster. Postincrement copies & preincrements.
- Even if efficiency isn’t a consideration, there’s no clear readability
difference, so we may as well be fast.
- A smart compiler might make them equivalent.
- For actual pointers (
int *
) or int
s, do what you like.
Some programmers always use preincrement, as a good habit.