CS253 Threads

pthreads

pthreads
- A POSIX standard, not a C++ standard.
- They’re great.
- They still work, and will continue to do so forever.
- If your existing code uses pthreads, leave it alone.
C++ threads
- I prefer them for new work.
- They became standard with C++ 2011, so they’re fairly well established, now.
- I’ll bet that they’re implemented using pthreads.

Simple task

Count the numbers under a billion that are divisible by seventeen:

int count = 0;
for (int i=1; i<=1e9; i++)
    count += (i % 17 == 0);
cout << count;

58823529

Real time: 717 ms

Of course it’s a stupid program. Here’s a better version:

cout << 1000000000/17;

58823529

Real time: 6.78 ms

Threading

This task lends itself to parallel processing, with each sub-task (or thread) executing on a separate CPU.
One thread could scan 1–100M, the next thread would scan 100M–200M, the next 200M–300M, etc.
After all threads are done, add up the results.

cout << "This computer has "
     << thread::hardware_concurrency() << " CPUs.\n";

This computer has 12 CPUs.

How it Usually Works

With traditional threading facilities (e.g., pthreads), you have to:
- create a new thread for each sub-task
- wait for the threads to finish
- obtain the results for each thread
You can still operate that way with C++ threads, but you can also do it a lot easier using the future<type> templated class.
- It contains a value that will be computed asynchronously.
- When you ask for the value, it’ll wait for the value to be computed.

Threaded

int count_them(int start, int count) {
    int found = 0;
    for (int i=start; i<start+count; i++)
        found += (i % 17 == 0);
    return found;
}

int main() {
    vector<future<int>> counts;
    const auto block = 1e9 / thread::hardware_concurrency();
    for (int b=1; b<=1e9; b+=block)
        counts.emplace_back(async(count_them, b, block));
    int total = 0;
    for (auto &w : counts)
        total += w.get();
    cout << total;
}

63725489

Real time: 175 ms

Race Conditions

Here’s some code that increments an evil global variable many times:

int counter(0);

void foo() {
    for (int i=0; i<100'000'000; i++)
        counter++;
}

int main() {
    foo();
    cout << counter << '\n';
}

100000000

Real time: 194 ms

It works well.

Race Conditions

Let’s have two threads that do the same thing:

int counter(0);

void foo() {
    for (int i=0; i<50'000'000; i++)
        counter++;
}

int main() {
    future<void> v1(async(foo)), v2(async(foo));
    v1.wait();
    v2.wait();
    cout << counter << '\n';
}

50231310

Real time: 216 ms

That’s no good. Why did it fail?

Operations on counter are not atomic. counter++ might become r1=counter; r1++; counter=r1;. Another thread might update counter in the middle of that.

Solution

atomic<int> counter(0);

void foo() {
    for (int i=0; i<50'000'000; i++)
        counter++;
}

int main() {
    future<void> v1(async(foo)), v2(async(foo));
    v1.wait();
    v2.wait();
    cout << counter << '\n';
}

100000000

Real time: 1.62 seconds

An atomic<int> counter has all-atomic operations. Slower, probably, but correct. It’s poor code that writes to memory that often, anyway. Ever hear of local variables‽

Details

Remember to:

#include <future> for future and async.
#include <atomic> for atomic.
Compile & link with g++ -O3 -pthread to get optimization and the thread library.

Other

We’ve just scratched the surface. Other cool stuff:

<mutex>: mutexes, for mutual exclusion, which guard critical sections of code that musn’t run in several threads simutaneously
<condition>: blocking and resuming threads until it’s ok for them to run
<thread>: general thread control: creation, status, killing, joining, etc.

CS253: Software Development with C++

Spring 2020

Threads

CS253 Threads

pthreads

Simple task

Threading

How it Usually Works

Threaded

Race Conditions

Race Conditions

Solution

Details

Other