C++ Async Development (not only for) for C# Developers Part I: Lambda Functions

And now for something completely different…

I was recently tasked with implementing the SignalR C++ client. The beginnings were (are?) harder than I had imagined. I did some C++ development in late nineties and early 2000s (yes I am that old) but either I have forgotten almost everything since then or I had never really understood some of the C++ concepts (not sure which is worse). In any case the C++ landscape has changed dramatically in the last 15 or so years. Writing tests is a common practice, standard library became really “standard” and C++ 11 is now available. There is also a great variety of third party libraries making the life of a C++ developer much easier. One of such libraries is cpprestsdk codename Casablanca. Casablanca is a native, cross platform and open source library for asynchronous client-server communication. It contains a complete HTTP stack (client/server) with support for WebSockets and OAuth, support for JSon and provides APIs for scheduling and running tasks asynchronously – similar to what is offered by C# async. As such cpprestsdk is a perfect fit for a native SignalR client where the client communicates over HTTP using JSon and takes the full advantage of asynchrony.
I am currently a couple of months into my quest and I would like to share the knowledge about C++ asynchronous development with the cpprestsdk I have gained so far. My background is mostly C# and it took me some time to wrap my head around native async programming even though C# async programming is somewhat similar to C++ async programming (with cpprestsdk). Two things I wish I had had when I started were to know how to translate patterns used in C# async programming to C++ and what the most common pitfalls are. In this blog series I would like to close this gap. Even though some of the posts will talk about C# quite a lot I hope the posts will be useful not only for C# programmers but also for anyone interested in async C++ programming with cpprestsdk.

Preamble done – let’s get started.

Cpprestsdk relies heavily on C++ 11 – especially on newly introduced lambda functions. Therefore before doing any actual asynchronous programming we need to take a look at and understand C++ lambda functions especially that they are a bit different (and in some ways more powerful) than .NET lambda expressions.
The simplest C++ lambda function looks as follows:

auto l = [](){};

It does not take any parameter, does not return any value and does not have any body so doesn’t do anything. A C# equivalent would look like this:

Action l = () => {};

The different kinds of brackets one next to each other in the C++ lambda function may seem a bit peculiar (I saw some pretty heated comments from long time C++ users how they feel about this syntax) but they do make sense. Let’s take a look what they are and how they are used.

[]
Square brackets are used to define what variables should be captured in the closure and how they should be captured. You can capture zero (as in the example above) or more variables. If you want to capture more than one variable you provide a comma separated list of variables. Note that each variable can be specified only once. In general you can capture any variable you can access in the scope and – except for the this pointer – they can be captured either by reference or by value. When a variable is captured by reference you access the actual variable from the lambda function body. If a variable is captured by value the compiler creates a copy of the variable for you and this is what you access in your lambda. The this pointer is always captured by value. To capture a variable by value you just provide the variable name inside the square brackets. To capture a variable by reference you prepend the variable name with &. You can also capture variables implicitly – i.e. without enumerating variables you want to capture – letting the compiler figure what variables to capture by inspecting what variables are being used in the lambda function. You still need to tell how you want to capture the variables though. To capture variables implicitly by value you use =. To capture variables implicitly by reference you use &. What you specify will be the default way of capturing variables which you can refine if you need to capture a particular variable differently. You do that just by adding the variable to the capture list. One small wrinkle when specifying variables explicitly in the capture list is that if the variable is not being used in the lambda you will not get any notification or warning. I assume that the compiler just get rids of this variable (especially in optimized builds) but still it would be nice to have an indication if this happens. Here are some examples of capture lists with short explanations:

[] // no variables captured
[x, &y] // x captured by value, y captured by reference
[=] // variables in scope captured implicitly by value
[&] // variables in scope captured implicitly by reference
[=, &x] // variables in scope captured implicitly by value except for x which is captured by reference
[&, x] // variables in scope captured implicitly by reference except for x which is captured by value

[=, &] // wrong – variables can be captured either by value or by reference but not both
[=, x] // wrong – x is already implicitly captured by value
[&, &x] // wrong – x is already implicitly captured by reference
[x, &x] // wrong – x cannot be captured by reference and by value at the same time

How does this compare to C# lambda expressions? In C# all variables are implicitly captured by reference (so it is an equivalent of [&] used in C++ lambda functions) and you have no way of changing it. You can work around it by assigning the variable you want to capture to a local variable and capture the local variable. As long as you don’t modify the local variable (e.g. you could create an artificial scope just for declaring the variable and defining the lambda expression – since the variable would not be visible out of scope it could not be modified) you would get something similar to capturing by mutable value. This was actually a way to work around a usability issue where sometimes if you closed over a loop variable you would get unexpected results (if you are using Re# you have probably seen the warning reading “Access to foreach variable in closure. May have different behaviour when compiled with different versions of compiler.” – this is it). You can read more about this here (btw. the behavior was changed in C# 5.0 and the results are now what most developers actually expect).

()
Parenthesis are used to define a formal parameter list. There is nothing particularly fancy – you need to follow the same rules as when you define parameters for regular functions.

Return type
The simplest possible lambda I showed above did not have this but sometimes you may need to declare the return type of the lambda function. In simple cases you don’t have to and the compiler should be able to deduce the return type for you based on what your function returns. However, in more complicated cases – e.g. your lambda function has multiple return statements – you will need to specify the return type explicitly. A lambda with an explicitly defined return type of int looks like this:

auto l = []()->int { return 42;};

mutable
Again something that was not needed for the “simplest possible C++ lambda” example but I think you will encounter it relatively quickly when you start capturing variables by value. By default all C++ lambda functions are const. This means that you can access variables captured by value but you cannot modify them. You also cannot call non-const functions on variables captured by value. Prepending lambda body with the mutable keyword makes the above possible. You need to be careful though because this can get tricky. For instance what you think the following function prints?

void lambda_by_value()
{
    auto x = 42;
    auto l = [x]()
    mutable {
        x++;
        std::cout << x << std::endl;
    };

    std::cout << x << std::endl;
    l();
    l();
    std::cout << x << std::endl;
}

If you guessed:

42
43
44
42

– congratulations – you were right and you can skip the next paragraph; otherwise read on.
Imagine your lambda was compiled to the following class (disclaimer: this is my mental model and things probably work a little bit differently and are far more complicated than this but I believe that this is what actually happens at the high level):

class lambda
{
public:
    lambda(int x) : x(x)
    {}

    void operator()()
    {
        x++;
        std::cout << x << std::endl;
    }

private:
    int x;
};

Since we capture x by value the ctor creates a copy of the passed parameter and stores it in the class variable called x. Overloading the function call operator (i.e. operator()) makes the object callable making it similar to a function but the object still can maintain the state which is perfect in our case because we need to be able to access the x variable. Note that the body of the operator() function is the same as the body of the lambda function above. Now let’s create a counterpart of the lambda_by_value() function:

void lambda_by_value_counterpart()
{
    auto x = 42;
    auto l = lambda(x);
    std::cout << x << std::endl;
    l();
    l();
    std::cout << x << std::endl;
}

As you can see the lambda_by_value_counterpart() function not only looks almost the same as the original lambda_by_value() (except that instead of defining the lambda inline we create the lambda class) but it also prints the same result. This shows that when you define a lambda the compiler creates a structure for you that maintains the state and is used across lambda calls. Now also the mutable keyword should make more sense. If, in our lambda class the operator() function was actually const we would have not been able to modify the class variable x. So the function must not be const or the class variable x must be defined as (surprise!)… mutable int x;. (In the example I decided to make the function operator() non-const since it feels more correct).

auto vs. var
Since C++ lambda definitions contain full type specifications for parameters and the return type you can use auto and leave it to the compiler to deduce the type of the lambda variable. In fact I don’t even know how to write the type of the lambda variable explicitly (when I hover over the auto in Visual Studio it shows something like class lambda []void () mutable->void but this cannot be used as the type of the variable). In C# you cannot assign a lambda expression to var for several reasons. You can read up why here.

That’s more or less what I have learned about C++ lambda functions in the past few weeks apart from… common pitfalls and mistakes I am thinking about writing about soon.

2 thoughts on “C++ Async Development (not only for) for C# Developers Part I: Lambda Functions

Leave a comment