CS253: Software Development with C++

Spring 2021

Iterators

Show Lecture.Iterators as a slide show.

CS253 Iterators

Foundation

class Foo {
  public:
    string bar() { return "Yow!\n"; }
    int x = 42;
    using cash = float;
};

Foo::cash salary = 2.43;
Foo f;
cout << salary << ' ' << f.x << ' ' << f.bar();
2.43 42 Yow!

Array Traversal

You have an array, and you want to traverse (walk through) it.

int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int i=0; i!=8; ++i)
    cout << a[i] << ' ';
2 3 5 7 11 13 17 19 

Problems with indexing

Array Traversal via Pointers

int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

vector traversal

You can traverse a vector in the same way, since the vector elements are contiguous:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

or, avoiding the magic number 8:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[a.size()]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

Types and Methods

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (vector<int>::iterator it = a.begin(); it != a.end(); ++it)
    cout << *it << ' ';
2 3 5 7 11 13 17 19 

auto is your friend

This is prettier:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto it = a.begin(); it != a.end(); ++it)
    cout << *it << ' ';
2 3 5 7 11 13 17 19 

All that we did is we used auto to tell the compiler to make it the same type as a.begin() returns, namely, vector<int>::iterator.

for loop

This is (nearly) exactly the same, and gorgeous:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

The for loop is defined to use .begin() and .end(), just as the previous code.

It’s simple

Really, the for-loop like that (sometimes called a for…each loop) is just a mechanical source-level transformation. It rewrites this:

set<string> s = {"my", "dog", "has", "fleas"};
for (auto v : s)
    cout << v << '\n';
dog
fleas
has
my

as this:

set<string> s = {"my", "dog", "has", "fleas"};
for (auto it = s.begin(); it != s.end(); ++it) {
    auto v = *it;
    cout << v << '\n';
}
dog
fleas
has
my

Really

See this error message:

double zulu;
for (auto v : zulu)
    cout << v;
c.cc:2: error: 'begin' was not declared in this scope

“begin”? Why is it complaining about “begin”? Because it rewrote that for loop to use .begin() and .end().

Variations

Which is best? Why?

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto &v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto &v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

Other containers

The same iterator code works for all STL containers.

forward_list<char> l = {'a', 'c', 'k', 'J'};
for (auto it = l.begin(); it != l.end(); ++it)
    cout << *it << ' ';
a c k J 

unordered_set<string> u = {"a", "c", "k", "J"};
for (auto it = u.begin(); it != u.end(); ++it)
    cout << *it << ' ';
J k c a 

set<char> s = {'a', 'c', 'k', 'J'};
for (auto it = s.begin(); it != s.end(); ++it)
    cout << *it << ' ';
J a c k 

Note that “Jack” is ASCII-sorted.

Of the top 100 ♀ + top 100 ♂ U.S. names, 1918–2017, the ones in ASCII order: Amy, Ann, Betty, Billy, Gary, Harry, Henry, Jack, Jerry, Kelly, Larry, Mary, Roy, Scott, and Terry.

iterator type

iterator vs. pointer

Iterator classifications

Standard iterator classifications, least to most powerful:

These are not C++ types. These are descriptions, used to discuss the behavior of a given iterator.

advance()

++ works on all iterators, but + and += only work on RandomAccessIterators:

list<int> li{11,22,33,44,55,66,77,88,99};
auto b = li.begin();
b += 5;
cout << *b;
c.cc:3: error: no match for 'operator+=' in 'b += 5' (operand types are 
   'std::_List_iterator<int>' and 'int')

But, sometimes, you want to do that, even if it’s inefficient:

list<int> li{00,11,22,33,44,55,66};
auto b = li.begin();
advance(b, 5);
cout << *b;
55

advance() just works, no matter what the type of the iterator. It does simple addition for a RandomAccessIterator, and loops otherwise.

next()

next() is like advance(), but it returns the modified iterator:

list<int> li{00,11,22,33,44,55,66};
auto b = li.begin();
cout << *(next(b, 5));
55

next() does not modify its iterator argument. Think of advance() as +=, whereas next() is more like +.

begin() & end()

string s = "bonehead";
cout << "First char: " << *s.begin() << '\n';
cout << "Badness: " << *s.end() << '\n';
First char: b
Badness: ␀

string s = "genius";
cout << "First char: " << *s.begin() << '\n';
cout << "Last char:  " << *(s.end()-1) << '\n';
First char: g
Last char:  s

Intervals

Intervals

It’s all about [square brackets] vs. (round parens).

Half-open intervals

string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string::iterator c = alpha.begin()+2, f = c+3;
string foo(c, f);
cout << *c << ' ' << *f << ' ' << foo << '\n';
C F CDE

-or-

string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
auto c = &alpha[2], f = c+3;
cout << *c << ' ' << *f << ' ' << string(c,f) << '\n';
C F CDE
Are both f variables the same type?

The first is string::iterator, the second is char *. Are those the same?

Half-open Intervals

int a[] = {00, 11, 22, 33, 44, 55, 66, 77};
vector<int> v(a+2, a+5);
for (auto n : v)
    cout << n << ' ';
22 33 44 
0011223344556677
      
  start  end  

The end location is not included.

.front() & .back()

Some containers have .front() and .back(), which return references to the first and last elements.

list<double> c = {2.3, 5.7, 11.13, 17.19, 23.29};
cout << "First: " << c.front() << '\n';
cout << "Last:  " << c.back() << '\n';
First: 2.3
Last:  23.29
string action = "Gripping";
action.front() = 'T';
cout << action;
Tripping

Comparisons, part one

This won’t work:

list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it < l.end(); ++it)
    cout << *it << ' ';
c.cc:2: error: no match for 'operator<' in 'it < 
   l.std::__cxx11::list<std::__cxx11::basic_string<char> >::end()' (operand 
   types are 'std::_List_iterator<std::__cxx11::basic_string<char> >' and 
   'std::__cxx11::list<std::__cxx11::basic_string<char> >::iterator' {aka 
   'std::_List_iterator<std::__cxx11::basic_string<char> >'})

Read the message. It says that < isn’t defined for those iterators.

Comparisons, part two

This will work:

list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it != l.end(); ++it)
    cout << *it << ' ';
kappa alpha gamma 

list<>::iterator is a BidirectionalIterator, not a RandomAccessIterator, and so < isn’t defined. What would it compare? The addresses of the linked list nodes? That’s not useful.

Constructors

Nearly all containers accept a pair of iterators as ctor arguments. These do not have to be iterators for the same type of container.

char date[] = __DATE__;    // E.g., Nov 21 2024
string now(date, date+6);
cout << "month & day: " << now << '\n';

string day_of_month(now.begin()+4, now.begin()+6);
cout << "day: " << day_of_month << '\n';

multiset<int> ms(now.begin(), now.end());
for (auto n : ms)
    cout << n << ' ';
cout << '\n';
month & day: Nov 21
day: 21
32 49 50 78 111 118 

begin() and end() functions

Let’s copy from a C array to a C++ string:

int fido[] = {'d','o','g'};
string current(fido, fido+2);
cout << current << '\n';
do

Oh dear, I counted wrong. Why am I counting!? Am I a computer?

int fido[] = {'d','o','g'};
string current(fido.begin(), fido.end());
cout << current << '\n';
c.cc:2: error: request for member 'begin' in 'fido', which is of non-class type 
   'int [3]'

Alas, fido has no methods.

begin() and end() functions

There are also free functions begin() and end(), which work on arrays (not pointers) and all standard containers:

int fido[] = {'d','o','g'};
string current(begin(fido), end(fido));
cout << current << '\n';
dog

Try that again

It was crazy to separate char values. Let’s just use a C string:

char fido[] = "DOG";
string current(begin(fido), end(fido));
cout << current << '\n';
DOG␀
␀? What fresh hell is this?

␀ is how these web pages display '\0', the null character.

What is sizeof(fido)? Did you count the null character ('\0') at the end? We’re not asking for the length of the C-string, which is clearly 3. Instead, we’re asking how many bytes are used to store fido, which is 4.

Invalidation

Consider this poor code:

int *p = new int(42);
cout << "Before: " << *p << '\n';
delete p;
cout << "After:  " << *p << '\n';
Before: 42
After:  2244

More Invalidation

Another way to invalidate a pointer:

string *p;
{
    string name = "John Jacob Jingleheimer Schmidt";
    p = &name;
    cout << p->size() << ' ' << *p << endl;
}
cout << p->size() << ' ' << *p << '\n';

There’s some undefined behavior!

Yet More Invalidation

Yet another way to invalidate a value:

vector<int> &foo() {
    vector<int> v = {11,22,33,44,55,66,77};
    auto &r = v;
    return r;
}

int main() {
    for (auto val : foo())
        cout << val << '\n';
}
2522
0
-1569583197
-1468884368
55
66
77

More undefined behavior.

Iterator invalidation

vector<long> v = {253};
vector<long>::iterator it = v.begin();

cout << "Before: " << *it << '\n';

for (long i=1; i<1000; i++)
    v.push_back(i);

cout << "After:  " << *it << '\n';
Before: 253
After:  2364

Lipstick on a pig

Using auto makes the code prettier, but no better:

vector<long> v = {253};
auto it = v.begin();

cout << "Before: " << *it << '\n';

for (long i=1; i<1000; i++)
    v.push_back(i);

cout << "After: " << *it << '\n';
Before: 253
After: 5732

Reservation

Using .reserve() pre-allocates memory:

vector<long> v = {253};
v.reserve(1005);
auto it = v.begin();

cout << "Before: " << *it << '\n';

for (long i=1; i<1000; i++)
    v.push_back(i);

cout << "After:  " << *it << '\n';
Before: 253
After:  253

How often?

How often does re-allocation happen? We can find out, for any particular implemention:

vector<int> v;

for (int i=1; i<=1000; i++) {
    auto before = v.capacity();
    v.push_back(i);
    auto after = v.capacity();
    if (before != after)
        cout << i << ' ' << after << '\n';
}
1 1
2 2
3 4
5 8
9 16
17 32
33 64
65 128
129 256
257 512
513 1024

.capacity(): how much memory is allocated
.size(): how much memory is used

How often?

Similarly, because a std::string is not much more than a vector<char>:

string s;

cout << s.size() << ' ' << s.capacity() << '\n';
for (int i=1; i<10'000; i++) {
    auto before = s.capacity();
    s += 'x';
    auto after = s.capacity();
    if (before != after)
        cout << s.size() << ' ' << after << '\n';
}
0 15
16 30
31 60
61 120
121 240
241 480
481 960
961 1920
1921 3840
3841 7680
7681 15360

What happened to our nice powers of two?

Curious String Behavior

for (string s; s.size()<60; s+="abcde")
    cout << s.size() << ' ' << (void *) s.data() << '\n';
0 0x7fffbf760f70
5 0x7fffbf760f70
10 0x7fffbf760f70
15 0x7fffbf760f70
20 0x25d82c0
25 0x25d82c0
30 0x25d82c0
35 0x25d82f0
40 0x25d82f0
45 0x25d82f0
50 0x25d82f0
55 0x25d82f0

Casting? Why can’t we use << s.data() or << &s[0] to get the address of the string data?

Small String Optimization

Small String Optimization

Possible small string implementation:

class string {
  private:
    size_t size;            // actual string length
    union {
      struct {
        size_t alloc_size;  // amount of heap data
        char *heap;         // ptr to heap data
      };
      char local[16];       // or, put it here
    };
  public:
    char &operator[](size_t i) {
        return (i < sizeof(local)) ? return local[i] : return heap[i];
    }
    … and many more methods …
};

Order Calculation

Big-O

Does it scale?

vector<char> v;
int copies = 0, iterations = 1'000'000;

for (int i=0; i<iterations; i++) {
    auto before = v.capacity();
    v.push_back(i);
    if (before != v.capacity())
        copies += before;
}

cout << double(copies)/iterations;
1.04858

Pre-allocation helps

Again, if we know how many items we’re going to add, we can .reserve() the space:

vector<int> v;
v.reserve(900);

cout << v.size() << ' ' << v.capacity() << '\n';
for (int i=1; i<1000; i++) {
    auto before = v.capacity();
    v.push_back(i);
    auto after = v.capacity();
    if (before != after)
        cout << v.size() << ' ' << after << '\n';
}
0 900
901 1800

Half-open interval

.begin() and .end()

vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::iterator it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

or:

vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

const

Why does this fail?

class Foo {
    int sum() const {
        int total = 0;
        for (vector<int>::iterator it = data.begin(); it != data.end(); ++it)
            total += *it;
        return total;
    }
    vector<int> data;
};
c.cc:4: error: conversion from '__normal_iterator<const int*,[...]>' to 
   non-scalar type '__normal_iterator<int*,[...]>' requested

const

Restate the problem

This is the same problem:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::iterator it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
c.cc:2: error: conversion from '__normal_iterator<const int*,[...]>' to 
   non-scalar type '__normal_iterator<int*,[...]>' requested

Solution #1

One solution—use the correct type:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

Solution #2

A better solution—let auto figure it out:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

Solution #3

The best solution—avoid all of this:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto val : v)
    cout << val << ' ';
1 1 2 3 5 8 13 21 34 

.cbegin() and .cend()

vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.cbegin(); it != v.cend(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

.rbegin() and .rend()

array<int, 9> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.rbegin(); it != v.rend(); ++it)
    cout << *it << ' ';
34 21 13 8 5 3 2 1 1 
list<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.crbegin(); it != v.crend(); ++it)
    cout << *it << ' ';
34 21 13 8 5 3 2 1 1 

Don’t get too excited

Plain old data types:

How about old C-style data?

int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = a.begin(); it != a.end(); ++it)
    cout << *it << ' ';
c.cc:2: error: request for member 'begin' in 'a', which is of non-class type 
   'int [9]'

That failed miserably. Perhaps it’s because arrays are not objects.

Free functions

Fortunately, begin(), end(), and size() free functions work for containers and C arrays.

begin(thing) is defined as, conceptually (through the magic of templates):

    if thing is a C-style array
    then
        address of the start of the array
    else
        thing.begin()

Similarly, end(thing) is:

    if thing is a C-style array
    then
        address just after the end of the array
    else
        thing.end()

Free functions

int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(a); it != end(a); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

These work for objects, as well:

deque<int> s = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(s); it != end(s); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

What use is that? Generality for the sake of generality? No, it’s so that for loops will work for arrays as well as objects (even user-defined classes) with .begin() and .end() methods.

for loop

Consider any for loop:

forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (double v : fl)
    cout << v << ' ';
1 1 2 3 5 8 13 21 34 

The compiler turns this into (approximately):

forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto iter = begin(fl), e = end(fl); iter != e; ++iter) {
    double v = *iter;
    cout << v << ' ';
}
1 1 2 3 5 8 13 21 34 

Which works for any container type, even C-style arrays. end() is efficiently called only once.

Efficiency

Foo operator++(int) {
    const auto save = *this;
    ++*this;
    return save;
}