Show Lecture.Algorithms as a slide show.
CS253 Algorithms
First published computer algorithm, by Ada Lovelace
Inclusion
Algorithms are defined in:
#include <algorithm>
A few numeric algorithms are defined in:
#include <numeric>
Definition
- In Computer Science, algorithm means “how to do something”.
- In C++, algorithm refers to templatized functions
from the <algorithm> and <numeric> header files.
- There are many algorithms available. We will focus on a few:
Arguments
- Algorithms generally take their input from half-open
iterator ranges, which always (😟) come first.
- For output, algorithms take a single iterator, which says
where the output starts.
- A second iterator indicating the end of the output is not required,
since the length of the output is determined by the size of the input,
possibly filtered in some way, as in copy_if().
- Additional arguments may specify a value to look for,
a predicate to select items, etc.
vector v = {1, 1,2, 1,2,3, 1,2,3,4};
cout << count(v.begin(), v.end(), 1) << '\n'
<< count(v.begin(), v.begin()+v.size()/2, 2.0) << '\n'
<< count(&v[0], 1+&v.back(), true) << '\n';
4
2
4
- count() counts how many times a thing is found. 🤯
- The first two arguments form a half-open interval, which is exactly
what
.begin()
and .end()
give, since .end()
“points” one
past the last element.
- Each element in the interval is compared to the third argument,
which does not have to be the same type as the items
in the range.
- The interval can be two iterators into any sort of container. As
long as the first iterator can be incremented, and compared to the
second iterator, and assuming that the first iterator will
eventually become equal to the second, it’s ok.
- Pointers are iterators, so pointers into C arrays, C strings, or
the heap are ok.
bool small(int n) {
return n < 5;
}
int main() {
multiset ms = {3,1,4,1,5,9,2,6,5,3,5,8,9,7,9};
cout << count_if(ms.begin(), ms.end(), small) << '\n'
<< count_if(ms.begin(), ms.end(),
[](int n){return n>7;}) << '\n';
}
6
4
count_if() is like count(), except it takes a predicate
(a function that returns a bool) instead of a target value.
- The find() algorithm searches a half-open range for a value.
- If it finds the value, it returns:
- not an index to the value found ✘
- not a pointer to the value found ✘
- an iterator that “points” to the value found.
✔️
- What type of iterator? The same type that you gave it to
indicate the range.
- If it can’t find the value, it returns:
- not a 0 or −1 ✘
- not a null pointer ✘
- not a pointer ✘
- the second iterator given; the end of the half-open interval.
✔️
- OK, technically, if you give find() raw pointers, then it does
return the same type, namely, a pointer.
vector primes{ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
auto it = find(primes.begin(), primes.end(), 13);
if (it == primes.end())
cout << "Not found\n";
else
cout << "Found "<< *it << " at " << it-primes.begin() << '\n';
Found 13 at 5
An algorithm; not a method! Some containers have a .find()
method, which is preferred, if it exists. All that the poor find()
algorithm can do is to search linearly, from front to back, but
set::find() can take advantage of a set’s binary tree structure to
perform the search in O(log n) time, and unordered_set::find()
simply uses magic.
vector primes{ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
int *start = &primes[3], *finish = &primes[20];
cout << "Search the interval [" << *start << ',' << *finish << ")\n"
<< "It includes " << *start << ", but not " << *finish << '\n';
auto it = find(start, finish, 13);
if (it == finish)
cout << "Not found, *it=" << *it << '\n';
else
cout << "Found "<< *it << " at " << it-start << '\n';
Search the interval [7,73)
It includes 7, but not 73
Found 13 at 2
vector primes{ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
int *start = &primes[3], *finish = &primes[20];
cout << "Search the interval [" << *start << ',' << *finish << ")\n"
<< "It includes " << *start << ", but not " << *finish << '\n';
auto it = find(start, finish, 14);
if (it == finish)
cout << "Not found, *it=" << *it << '\n';
else
cout << "Found "<< *it << " at " << it-start << '\n';
Search the interval [7,73)
It includes 7, but not 73
Not found, *it=73
bool pred(int n) {
return n > 50; // Should find 53
}
int main() {
set primes{ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
if (auto it = find_if(primes.begin(), primes.end(), pred); it == primes.end())
cout << "Failure\n";
else
cout << "Found " << *it << '\n';
}
Found 53
- Note the C++17 if statement with
if (
init;
condition)
- find_if() is like find(), but it takes a predicate,
not a target value.
- find() and find_if() stop at the first success.
Can’t return all the matches!
string str = "bonehead";
set alpha = {'P', 'D', 'Q'};
copy(alpha.begin(), alpha.end(), str.begin());
cout << str << '\n';
DPQehead
string alpha = "abcdefghijklmnopqrstuvwxyz";
string initials = "JRA";
copy(initials.begin(), initials.begin()+2, alpha.begin()+20);
cout << alpha << '\n';
abcdefghijklmnopqrstJRwxyz
The iterator arguments don’t just have to be .begin()
and
.end()
.
- copy_if() is like copy(), except that it doesn’t copy everything.
- Instead it takes a predicate that determines whether or not
to copy a given element.
- A predicate is a function that returns bool.
- copy_if() takes three iterators and a predicate:
copy_if(
source-begin,
source-end,
dest-begin,
predicate)
Why isn’t there a dest-end ?
It isn’t needed. source-begin and source-end
say how much to copy. Anyway, we might not even copy all of that!
First attempt
Let’s ensure that we know how to use copy()
before moving on to copy_if():
string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
copy(foo.begin(), foo.end(), bar.begin()); // 🦡
cout << bar << "\n";
SIGSEGV: Segmentation fault
Of course, plain bar = foo;
would have worked nicely.
Why did the example fail?
There is no space allocated in bar
.
You can’t allocate space by pretending that it exists.
Second attempt
string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
copy(foo.begin(), foo.end(), bar.begin());
cout << bar << "\n";
I have to ration my Diet Mountain Dew!
I have to ration my Diet Mountain Dew!
Third attempt
string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
// Don’t copy vowels:
copy_if(foo.begin(), foo.end(), bar.begin(),
[](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );
cout << bar << "\n";
I have to ration my Diet Mountain Dew!
hv t rtn my Dt Mntn Dw!␀␀␀␀␀␀␀␀␀␀␀␀␀␀
Hooray, copy_if() worked!
Hey, what’s with those ␀ characters?
.resize()
filled the string with '\0', which display as ␀
here. Your terminal may simply ignore them and so not display them.
bar.size()
is unchanged.
Fourth attempt
string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar(foo.size(), 'X');
// Don’t copy vowels:
auto it = copy_if(foo.begin(), foo.end(), bar.begin(),
[](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );
// Make bar the correct size:
bar.resize(it-bar.begin());
cout << bar << "\n";
I have to ration my Diet Mountain Dew!
hv t rtn my Dt Mntn Dw!
We resized bar
to the correct size.
In-place
string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
auto it = copy_if(foo.begin(), foo.end(), foo.begin(),
[](char c){return "aeiouyAEIOUY"s.find(c)==string::npos;} );
// Make foo the correct size:
foo.resize(it-foo.begin());
cout << foo << "\n";
I have to ration my Diet Mountain Dew!
hv t rtn m Dt Mntn Dw!
We can copy from & to the same location.
string fact = "Warren Harding’s middle name was Gamaliel.";
replace(fact.begin(), fact.end(), ' ', '_');
cout << fact << '\n';
Warren_Harding’s_middle_name_was_Gamaliel.
string fact = "Warren Harding’s middle name was Gamaliel.";
replace_if(fact.begin(), fact.end(),
[](char c) { return c=='o' || c=='a';}, '*');
cout << fact << '\n';
W*rren H*rding’s middle n*me w*s G*m*liel.
string name = "Joseph Robinette Biden Jr.";
string out;
transform(name.begin(), name.end(), out.begin(),
[](char c) { return c ^ 040; }); // 🦡
cout << out << '\n';
SIGSEGV: Segmentation fault
Oops! Didn’t allocate any memory in out
!
string name = "Joseph Robinette Biden Jr.";
string out(name.size(), 'X'); // fill it
transform(name.begin(), name.end(), out.begin(),
[](char c) { return c ^ 040; });
cout << out << '\n';
jOSEPH␀rOBINETTE␀bIDEN␀jR␎
The sort() algorithm (from the header file <algorithm>) has two forms:
sort(
begin, end );
sort(
begin, end, comparison-object-or-function);
Or, you can think of the third argument as optional, defaulting to
less<
whatever>()
, where whatever is the type of the
things that the iterators point to.
- Only a single half-open interval is given.
How do I sort() from container1 to container2?
copy() to container2, sort() the data there. It’s still O(n ).
Containers
- Of course, some containers are intrinsically sorted.
- You might specify a comparison functor for
those containers.
- You wouldn’t use the sort() algorithm on those containers.
- However, you might want to apply the sort() algorithm to
an unsorted container, such as a std::array, vector, string,
or even a C array.
- list has a sort() method.
Default comparison
string s = "Kokopelli";
sort(s.begin(), s.end());
cout << s << '\n';
Keiklloop
Duplicates and both upper-case K and lower-case k.
Why aren’t K and k together?
In ASCII, and hence Unicode, A…Z all come before
a…z. The order is
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, not
AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz
Explicit comparison
string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>); // 🦡
cout << s << '\n';
c.cc:2: error: expected primary-expression before ‘)’ token
Why doesn’t that work?
What does sort() want for the third argument?
It wants something that the function call operator ()
works on:
a function or functor object.
What kind of thing is less<char>
?
It’s a type. Not an object—a type.
However, less<char>()
is an object of that type.
Can you pass a type as a function argument?
No. An object of that type, sure, but not a type.
Explicit comparison
string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>());
cout << s << '\n';
Keiklloop
less<char>
is a type.
less<char>()
is a temporary object of that type.
- The
()
invoke the ctor—not operator()
.
Reverse sort
string s = "Kokopelli";
sort(s.begin(), s.end(), greater<char>());
cout << s << '\n';
poollkieK
Comparison function
bool lt(char a, char b) {
return a < b;
}
int main() {
string s = "Kokopelli";
sort(s.begin(), s.end(), lt);
cout << s << '\n';
}
Keiklloop
λ-function
string s = "Kokopelli";
sort(s.begin(), s.end(),
[](char a, char b){return a<b;});
cout << s << '\n';
sort(s.begin(), s.end(),
[](char a, char b){return a>b;});
cout << s << '\n';
Keiklloop
poollkieK
Case folding
bool lt(char a, char b) {
return toupper(a) < toupper(b);
}
int main() {
string s = "Kokopelli";
sort(s.begin(), s.end(), lt);
cout << s << '\n';
}
eiKklloop
Unique
bool lt(char a, char b) {
return toupper(a) < toupper(b);
}
int main() {
string s = "Kokopelli";
sort(s.begin(), s.end(), lt);
auto it = unique(s.begin(), s.end());
s.resize(it-s.begin());
cout << s << '\n';
}
eiKklop
If you want to avoid duplicates, then use unique(), which requires that
its input is in order already. That way, it can run in O(n )
time, as opposed to O(n ²) time.
Unique
bool lt(char a, char b) {
return toupper(a) < toupper(b);
}
bool eq(char a, char b) {
return toupper(a) == toupper(b);
}
int main() {
string s = "Kokopelli";
sort(s.begin(), s.end(), lt);
auto it = unique(s.begin(), s.end(), eq);
s.resize(it-s.begin());
cout << s << '\n';
}
eiKlop
Alas, case-independent uniqueness doesn’t come free:
we’ve duplicated the calls to toupper()
.
Unique and DRY
bool lt(char a, char b) {
return toupper(a) < toupper(b);
}
bool eq(char a, char b) {
return !lt(a,b) && !lt(b,a); // a=b ⇔ a≮b ∧ b≮a
}
int main() {
string s = "Kokopelli";
sort(s.begin(), s.end(), lt);
auto it = unique(s.begin(), s.end(), eq);
s.resize(it-s.begin());
cout << s << '\n';
}
eiKlop
Duplication of code is a bad thing, but avoiding it might cost: two
lt()
calls, hence four toupper() calls. However, there’s no
cost. Our smart compiler generated exactly the same code for this
eq()
as the previous version.
Generality
It’s not just about strings:
int a[] = {333, 22, 4444, 1};
sort(begin(a), end(a));
for (auto val : a)
cout << val << '\n';
1
22
333
4444
vector<double> v = {1.2, 0.1, 6.7, 4.555};
sort(v.begin(), v.end(), greater<double>());
for (auto val : v)
cout << val << '\n';
6.7
4.555
1.2
0.1
Why didn’t I say a.begin()
?
- Because
a
is a C array. It’s not an object—no methods!
However, the free functions begin() and end() work on C arrays.
- Plain
a
works as well as begin(a)
,
but begin(a),end(a)
is nicely symmetric.
Attitude
- These algorithms may strike you as simplistic. “I could write that!”.
- You could write a for loop as a while loop, but that would just
confuse everybody. A for loop has semantic value.
- Sure, you could write your own code. But would it be correct?
Even the corner cases, like searching an empty range?
- Using standard algorithms conveys meaning. Educated C++
programmers recognize the standard algorithms, just as we
all know that “brother” means “male sibling”.
- Compilers might recognize algorithms and replace them with special
machine code. copy for chars might get replaced with ultra-fast
looping instructions to copy memory 64 bytes at a time.