Map Reduce
Cpp
#include <atomic>
#include <iostream>
#include <random>
#include <string>
#include <thread>
#include <vector>
using namespace std;
using vec_iter = std::vector<int>::iterator;
int main() {
random_device rd;
mt19937 gen(rd());
uniform_int_distribution<> dis(0, 100);
auto nums = vector<int>(4096);
for (auto i = 0; i < nums.size(); i++) {
nums[i] = dis(gen);
}
atomic<int> totalSum;
vector<thread> threads;
// divide into 1024 blocks
int from = 0;
int to = 1024;
while (to < nums.size()) {
from += 1024;
to += 1024;
threads.push_back(
thread([&totalSum,
start = nums.begin() + from,
end = nums.begin() + to ]() -> void {
int sum = 0;
for (vec_iter cur = start; cur != end; cur++) {
sum += *cur;
}
totalSum.fetch_add(sum);
})
);
}
for (auto &t: threads) {
t.join();
}
cout << "Total Sum: " << totalSum.load() << "\n";
}
Go
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
rand.Seed(time.Now().UnixNano())
nums := make([]int, 4096)
for i := range nums {
nums[i] = rand.Intn(100)
}
results := make(chan int, 4)
// divide into 1024 blocks
from := 0
to := 1024
jobs := 0
for to < len(nums) {
go func() {
results <- sum(nums[from:to])
}()
from += 1024
to += 1024
jobs += 1
}
// collect all of the results
totalSum := 0
for jobs > 0 {
totalSum += <-results
jobs -= 1
}
fmt.Printf("Total Sum: %d\n", totalSum)
}
func sum(nums []int) int {
sum := 0
for _, n := range nums {
sum += n
}
return sum
}
What This Code Does
This code splits a task (summing numbers in a sequence) amongst multiple units of concurrency (threads/goroutines) and then collects the results using idiomatic approaches.
What's the Same
Both approaches use a random number generator to populate the initial input-data. Both approaches also use a slice or view when handing bits of the input data to the concurrent sub-routine. That is, both implementations avoid explicit copies when passing data.
Lastly, both implementations wait for the concurrent sub-routines to finish before
displaying the final result. This is done in Go by reading from a channel N
times, where N
represents the number of goroutines created. C++ does this
by calling join()
on the thread object, which waits for the thread to exit
before continuing.
What's Different
The first noticeable difference, although not the main point of this versus, is the random number generation. See the later section Digging Deeper: Random Number Generation if this most interests you. The main semantic differences lie in how we collect and combine (reduce) the results from the concurrent operations.
Again, it is idiomatic in Go to pass results via channels. As such our "reduction" of results
happens within the main goroutine (summing all individual results). C++, on the other hand, uses
an atomic<int>
, which allows each thread to participate in the "reduction"
by atomically adding their results to the totalSum
object. Atomics in C++ rely on
the compiler to efficiently implement for the target platform you're compiling too. That is,
some CPUs have the ability to support atomic operations at the instruction level. For CPUs that
do not the compiler may result to other methods, such as locks, to implement atomic operations.
The second difference is how we represent "slices" of arrays; that is a view of an array that
doesn't involve performing copies. Go uses the type []T
to represent slices and
slices can also be "sliced" using the [M:N]
notation. I would consider this "first
class" support for the concept of slices. C++ on the other hand achieves this through the use of
iterators.
Digging Deeper: C++ Slices (AKA Iterators)
An iterator in C++ is a cursor object that points to a particular position in a data-structure.
In our case, it is pointing to a position in a std::vector
. In many respects the cursor
acts like a pointer. We can advance the position by incrementing, such as cur++
. We
get the value by dereferencing *cur
. We ensure that we do not advance the iterator
beyond the end of the data-structure by comparing it to a stop position, end
. However,
since we are working with a fixed-size vector, this is easy for us to ensure. If we were iterating
over a dynamically sized vector, our code would look like:
for (auto curr = my_vec.begin(); curr++; curr != my_vec.end()) {
// process items
}
However, if we were going to iterate the entire array and perform the same operation for each loop, we could also write this code as:
for (auto &val: my_vec) {
// process items
}
While the two abstractions look similar, the second method is called a "ranged for loop" and was introduced in C++ 11. This form is the most convenient, the most limiting, and most often seen when operating over all elements within a loop. The first approach, using iterators, is the most flexible option for when a simple for loop is not all that you need. With the iterator you could:
- Pass the iterator to a function as a slice (like our example above)
- Increment the iterator only under certain conditions and possibly re-processing an item multiple times within a loop
- Process every other element within a loop (by incrementing the cursor by 2:
cur += 2
) - Process a collection backwards using the reverse iterators (
rbegin()
andrend()
)
Digging Deeper: Random Number Generation
The Go implementation offers a random-number generator similar to many high-level languages. However C++, in true C++ fashion, provides some lower level abstractions and gives you more control over how your random numbers are generated.
random_device
-
Generates a random numeric value from hardware entropy.
The exact method of how this is generated is compiler dependent (e.g. Clang uses
/dev/urandom
on Linux, while GCC uses Intel'srdrand
instruction). It is also possible that your platform doesn't support a way to generate random numbers from hardware entropy, in which case this type would not be constructable. A value from arandom_device
is often used as a seed value to any of the stdlib random number generators. mt19937
-
A random number generator based on the Mersenne Twister algorithm. The C++ standard library provides a decently large selection of random number generators to use, and depending on your application you might care deeply about which one is used. The list of random-number algorithms is:
minstd_rand0
minstd_rand
mt19937
mt19937_64
ranlux24_base
ranlux48_base
ranlux24
ranlux48
knuth_b
default_random_engine
For more information on what the differences are, you can view the reference page for the
<random>
header here. uniform_int_distribution
-
Allows us to specify a range in which all numbers in that range are equally likely to be returned as a random number. While not responsible for generating random numbers, this allows us to specify ranges of random numbers we'd like to receive which is different from the range of random numbers being generated, while not losing "randomness" of values.
Alternatively, other distributions allow us to shift the probabilities of getting certain numbers, such as an Exponential Distribution, which may bias number generation to one side of the range over time.
At the time of writing, there are 20 distributions defined in the
<random>
header. You can see the full list of standard-library distributions here.