C++ vs Ruby - Lambdas

Lambdas

Cpp

#include <algorithm>
#include <functional>
#include <vector>

int main() {
    std::function<int(int)> addOne = [](int x) -> int {
        return x + 1;
   };

   addOne(5);
   // returns 6

    // use lambda as value, do in-place modification
    std::vector<int> v = {1, 2, 3, 4, 5};
    std::transform(v.begin(), v.end(), v.begin(),
        addOne);
    // new value of 'v' is [2, 3, 4, 5, 6]
}

Ruby

addOne = -> (x) { x + 1 }

addOne.call(5)
# returns 6

# do in-place modification
v = [1, 2, 3, 4, 5]
v.map! &addOne
# new value of 'v' is [2, 3, 4, 5, 6]

What This Code Does

The above code creates a simple lambda that adds 1 to the value provided. It is called directly and also passed as a value into functions that operate over lists of data.

What's the Same

The Ruby and the C++ versions define a lambda which is callable and also passable as a first-class-object into other functions. Both versions also define "pure" lambdas that do not capture any local state or perform any kind of mutating state-changes (i.e. calling the code, alone, does not affect our program).

What's Different

The Ruby version uses a .call function on the lambda itself to invoke it whereas the C++ version is directly callable (no different than a regular function). Additionally, in Ruby, lambdas (which are also a form of Procs) are not always directly passable to functions. Functions instead require a 'block'. The & is a unary operator that automaitcally converts a lambda or proc into a block. In this way, the C++ version is more straight-forward / intuitive.

Beyond this difference, there is syntax differences in how the lambda is declared. Ruby very simply declares a lambda as a paramter list followed by a code-block and doesn't specify types. C++, being strongly and statically typed, must specify all the types and consists of 4 distinct parts; the capture, the parameters, the return type, and the body.

[](int param1, int param2) -> int {
    return param1 + param2;
}

The [] part is a list of variables we'd like to "capture" in our lambda (more on this in the next section). The parameters are everything between ( and ) and what comes after the ->, but before the {, is the return type. Lastly, what's in the curly-braces is the function body (it's worth pointing out that C++ lambdas require a return statement for any lambdas that produce a value).

Another difference is how we type the variable we are assigning the lambda to.

std::function<ReturnType(Arg1Type, Arg2Type, Arg3Type)>

It is worth pointing out that for complex types such as the one above, the auto keyword is used which instructs the compiler to figure out the type for the variable rather than specifying it by hand. So our original lambda assignment could be rewritten to:

auto addOne = [](int x) -> int { return x + 1; };

Capturing Local State

Cpp

#include <functional>

template<typename T>
std::function<T(T)> addX(T x) {
    return [x](T n) -> T {
        return n + x;
    };
}

int main() {
    auto addFive = addX<int>(5);

    addFive(10);
    // returns 15
}

Ruby

def addX(x)
  -> (n) { x + n }
end

addFive = addX(5)

addFive.call(10) # returns 15

What This Code Does

addX is a function that returns another function. The function it returns adds x to whatever value is provided. x is initially given to the addX function but is captured by the lambda that is returned.

What's Different

The first notable difference is the use of template <typename T> written before the addX function. C++ must make use of "generics" in order to account for different numeric types, such as int, float, double, etc. This is done with what is called C++ templates. Templates are expanded at compile-time into concrete implementations. So, to be fair, this is less generic programming and more akin to sophisticated macros.

// When used in code, a version is compiled to match the usage.
template<typename T>
sum(T a, T b) {
    return a + b;
}

void main() {
    sum<int>(1, 2);
    sum<double>(1.0, 2.0);
    // this causes 2 specialized versions to be compiled
}

In Ruby, all checking is done at runtime. That is to say, if we were to sum or multiply two types that were not compatible (e.g 4 * {some: "object"})), we would not know until the program ran. Many would point this out as a drawback to writing scalable, maintainable code and is reflected with the popularity of such projects as TypeScript which attempts to add types to JavaScript or the addition of optional typing to Python 3. With C++, we will know at compile time when the templates are expanded into concrete functions. At that point we will know if the types can be added, multiplied, etc. Of course, most modern IDEs will give you some advanced notice as well. :-)

The second notable difference is that the capture portion of our lambda ([]) contains the variable x. This value is copied into the lambda. Since C++ is not a garbage-collected language, we have to be specific about which variables we want to capture and how we'd like to capture them. For things we may not want to copy, we can pass by reference.

int main() {
    auto bigString = "I'm a really long string, don't copy me ...";
    auto printFn = [&bigString]() -> void {
        std::cout << bigString;
    };
    printFn();
}

Note the & before the variable name in [&bigString]. This copies the reference to the variable instead of copying the value into a new variable. The caveat with this is that a reference is just a pointer to a location in memory. If the variable is cleaned up before the lambda is called, bad things could happen.

Digging Deeper

Everything in C++ must have some sort of type and representation in memory, and the same is just as true for lambdas as it is any other type. Take the following:

int main() {
    auto age = 65;
    std::string name = "Sir Robert Christianson Manyard Sr";

    auto printPerson = [age, &name](bool includeAge) -> void {
        std::cout << name;
        if (includeAge) {
            std::cout << ", age: " << age;
        }
        std::cout << "\n";
    }
}

The specification for C++ says that a lambda is an object of an anonymous type, created on the stack. You can imagine the equivalent for this would look like:

struct LambdaPrintPerson {
    int age;
    std::string *name;

    void printPerson(bool includeAge) {
        std::cout << name;
        if (includeAge) {
            std::cout << ", age: " << age;
        }
        std::cout << "\n";
    }
};
int main() {
    auto age = 65;
    std::string name = "Sir Robert Christianson Manyard Sr";

    auto printPerson = LambdaPrintPerson { age, &name };
}

The reason you can imagine a lambda to look like this is because the exact layout (padding, alignment, etc) is compiler dependent according to the specification. However, it should make more sense now how values are "captured" by the lambda and the difference between capturing a reference (no copy) and capturing a value (copy).

Note that it is possible to allocate a lambda on the heap, even though the default is to allocate on the stack. Since a lambda is just an object, it can be heap allocated with the new keyword, such as:

auto printPerson = new auto ([age, &name](bool includeAge) -> void {
    // function body
});

Note the only additional bit of syntax is that we must wrap the lambda in parens ().

Consuming

Cpp

#include <cstdlib>
#include <functional>
#include <iostream>
#include <string>

void sendEmail(std::string to, std::string from, std::string subject,
               std::string body,
               std::function<void(std::string)> success_cb,
               std::function<void(std::string)> failure_cb) {
    if (std::rand() > 0.5) {
        success_cb(to);
    } else {
        failure_cb(to);
    }
}

int main() {
    sendEmail("you@your_domain.com", "me@my_domain.com",
        "Very Important Email",
        "TODO: remember to write email body. :-D",
        [](std::string to) -> void {
            std::cout << "Successful email sent to: " << to << "\n";
        },
        [](std::string to) -> void {
            std::cout << "OH NO! Very important email not sent to "
                      << to
                      << "\n";
        });
}

Ruby

def sendEmail(to, from, subject, body, success_cb, failure_cb)
  if rand > 0.5
    success_cb.call(to)
  else
    failure_cb.call(to)
  end
end

sendEmail('you@your_domain.com', 'me@my_domain.com',
          'Very Important Email',
          'TODO: remember to write email body. :-D',
          ->(to) { puts "Succcessful email sent to #{to}" },
          ->(to) { puts "OH NO! Very important email not sent to #{to}" })

What This Code Does

sendEmail is a user-defined function that sends and email and then calls one of two callbacks provided by the user. This shows how to receive (consume) lambdas in your own code.

What's Different

Consuming lambdas in user-defined functions is very straight forward. The main difference here, yet again, is that the C++ version has to do slightly more work in order to define all of the types. Beyond the type declarations, the two versions are very similar.