Show Lecture.Threads as a slide show.
CS253 Threads
pthreads
- pthreads
- A POSIX standard, not a C++ standard.
- They’re great.
- They still work, and will continue to do so forever.
- If your existing code uses pthreads, leave it alone.
- C++ threads
- I prefer them for new work.
- They became standard with C++ 2011, so they’re fairly well
established, now.
- I’ll bet that they’re implemented using pthreads.
Simple task
Count the numbers under a billion that are divisible by seventeen:
int count = 0;
for (int i=1; i<=1e9; i++)
count += (i % 17 == 0);
cout << count;
58823529
Real time: 717 ms
Of course it’s a stupid program. Here’s a better version:
cout << 1000000000/17;
58823529
Real time: 6.78 ms
Threading
- This task lends itself to parallel processing, with each sub-task
(or thread) executing on a separate CPU.
- One thread could scan 1–100M,
the next thread would scan 100M–200M, the next 200M–300M, etc.
- After all threads are done, add up the results.
cout << "This computer has "
<< thread::hardware_concurrency() << " CPUs.\n";
This computer has 12 CPUs.
How it Usually Works
- With traditional threading facilities
(e.g., pthreads), you have to:
- create a new thread for each sub-task
- wait for the threads to finish
- obtain the results for each thread
- You can still operate that way with C++ threads, but you can
also do it a lot easier using the
future
<
type>
templated class.
- It contains a value that will be computed asynchronously.
- When you ask for the value, it’ll wait for the value to be computed.
Threaded
int count_them(int start, int count) {
int found = 0;
for (int i=start; i<start+count; i++)
found += (i % 17 == 0);
return found;
}
int main() {
vector<future<int>> counts;
const auto block = 1e9 / thread::hardware_concurrency();
for (int b=1; b<=1e9; b+=block)
counts.emplace_back(async(count_them, b, block));
int total = 0;
for (auto &w : counts)
total += w.get();
cout << total;
}
63725489
Real time: 175 ms
Race Conditions
Here’s some code that increments an evil global variable many times:
int counter(0);
void foo() {
for (int i=0; i<100'000'000; i++)
counter++;
}
int main() {
foo();
cout << counter << '\n';
}
100000000
Real time: 194 ms
It works well.
Race Conditions
Let’s have two threads that do the same thing:
int counter(0);
void foo() {
for (int i=0; i<50'000'000; i++)
counter++;
}
int main() {
future<void> v1(async(foo)), v2(async(foo));
v1.wait();
v2.wait();
cout << counter << '\n';
}
50231310
Real time: 216 ms
That’s no good. Why did it fail?
Operations on counter
are not atomic. counter++
might become
r1=counter; r1++; counter=r1;
. Another thread might update
counter
in the middle of that.
Solution
atomic<int> counter(0);
void foo() {
for (int i=0; i<50'000'000; i++)
counter++;
}
int main() {
future<void> v1(async(foo)), v2(async(foo));
v1.wait();
v2.wait();
cout << counter << '\n';
}
100000000
Real time: 1.62 seconds
An atomic
<int> counter
has all-atomic operations.
Slower, probably, but correct. It’s poor code that writes
to memory that often, anyway. Ever hear of local variables‽
Details
Remember to:
Other
We’ve just scratched the surface. Other cool stuff:
-
<mutex>
-
mutexes, for mutual exclusion, which guard critical sections of
code that musn’t run in several threads simutaneously
-
<condition>
-
blocking and resuming threads until it’s ok for them to run
-
<thread>
-
general thread control: creation, status, killing, joining, etc.