std::optional? Proceed with caution!

The std::optional type is a great addition to the standard C++ library in C++ 17. It allows to end the practice of using some special (sentinel) value, e.g., -1, to indicate that an operation didn’t produce a meaningful result. There is one caveat though – std::optional<bool> may in some situation behave counterintuitively and can lead to subtle bugs.

Let’s take a look at the following code:

  bool isMorning = false;
  if (isMorning) {
    std::cout << "Good Morning!" << std::endl;
  } else {
    std::cout << "Good Afternoon" << std::endl;
  }

Running this code prints:

Good Afternoon!

This is not a surprise. But let’s see what happens if we change the bool type to std::optional<bool> like this:

  std::optional<bool> isMorning = false;
  if (isMorning) {
    std::cout << "Good Morning!" << std::endl;
  } else {
    std::cout << "Good Afternoon!" << std::endl;
  }

This time the output is:

Good Morning!

Whoa? Why? What’s going on here?

While this is likely not what we expected, it’s not a bug. The std::optional type defines an explicit conversion to bool that returns true if the object contains a value and false if it doesn’t (exactly the same as the has_value() method). In some contexts – most notably the if, while, for expressions, logical operators, the conditional (ternary) operator (a complete list can be found in the Contextual conversions section on cppreference) – C++ is allowed to use it to perform an implicit cast. In our case it led to a behavior that at the first sight seems incorrect. Thinking about this a bit more, the seemingly intuitive behavior should not be expected. An std::optional<bool> variable can have one of three possible values:

true
false
std::nullopt (i.e., not set)

and there is no interpretation under which the behavior of expressions like if (std::nullopt) is universally meaningful. Having said that, I have seen multiple engineers (myself included) fall into this trap.

The problem is that spotting the bug can hard as there are no compiler warnings or any other indications of the issue. This is especially problematic when changing an existing variable from bool to std::optional<bool> in large codebases because it is easy to miss some usages and introduce regressions.

The problem can also sneak easily to your tests. Here is an example of a test that happily passes but only due to a bug:

TEST(stdOptionalBoolTest, IncorrectTest) {
  ASSERT_TRUE(std::optional<bool>{false});
}

How to deal with `std::optional<bool>`?

Before we discuss the ways to handle std::optional<bool> type in code, it could be useful to mention a few strategies that can prevent bugs caused by std::optional<bool>:

raise awareness of the unintuitive behavior of std::optional<bool> in some contexts
when a new std::optional<bool> variable or function is introduced make sure all places where it is used are reviewed and amended if needed
have a good test unit coverage that can detect bugs caused by introducing std::optional<bool> to your codebase
if feasible, create a lint rule that flags suspicious usages of std::optional<bool>

In terms of code there are few ways to handle the std::optional<bool> type:

Compare the optional value explicitly using the `==` operator

If your scenario allows treating std::nullopt as true or false you can use the == operator like this:

std::optional<bool> isMorning = std::nullopt;
if (isMorning == false) {
  std::cout << "It's not morning anymore..." << std::endl;
} else {
  std::cout << "Good Morning!" << std::endl;
}

This works because the std::nullopt value is never equal to an initialized variable of the corresponding optional type. One big disadvantage of this approach is that someone will inevitably want to simplify the expression by removing the unnecessary == false and, as a result, introducing a regression.

Unwrap the optional value with the `value()` method

If you know that the value in the given codepath is always set you can unwrap the value by calling the value() method like in the example below:

    std::optional<bool> isMorning = false;
    if (isMorning.value()) {
      std::cout << "Good Morning!" << std::endl;
    } else {
      std::cout << "Good Afternoon!" << std::endl;
    }

Note that it won’t work if the value might not be set – invoking the .value() method if the value was not set will throw the std::bad_optional_access exception

Dereference the optional value with the * operator

This is very similar to the previous option. If you know that the value on the given code path is always set you can use the * operator to dereference it like this:

std::optional<bool> isMorning = false;
if (*isMorning) {
  std::cout << "Good Morning!" << std::endl;
} else {
  std::cout << "Good Afternoon!" << std::endl;
}

One big difference from using the value() method is that the behavior is undefined if you dereference an optional whose value was not set. Personally, I never go with this solution.

Use `value_or()` to provide the default value for cases when the value is not set

std::optional has the value_or() method that allows providing the default value that will be returned if the value is not set. Here is an example:

std::optional<bool> isMorning = std::nullopt;
if (!isMorning.value_or(false)) {
  std::cout << "It's not morning anymore..." << std::endl;
} else {
  std::cout << "Good Morning!" << std::endl;
}

If your scenario allows treating std::nullopt as true or false using value_or() could be a good choice.

Handle `std::nullopt` explicitly

There must have been a specific reason you decided to use std::optional<bool> – you wanted to enable the scenario where the value is not set. Now you need to handle this case. Here is how:

    std::optional<bool> isMorning = std::nullopt;    
    if (isMorning.has_value()) {
      if (isMorning.value()) {
        std::cout << "Good Morning!" << std::endl;
      } else {
        std::cout << "Good Afternoon!" << std::endl;
      }
    } else {
      std::cout << "I am lost in time..." << std::endl;
    }

Fixing tests

If your tests use ASSERT_TRUE or ASSERT_FALSE assertions with std::optional<bool> variables they might be passing even if they shouldn’t as they suffer from the very same issue as your code. As an example, the following assertion will happily pass:

ASSERT_TRUE(std::optional{false});

You can fix this by using ASSERT_EQ to explicitly compare with the expected value or by using some of the techniques discussed above. Here are a couple of examples:

ASSERT_EQ(std::optional{false}, true);
ASSERT_TRUE(std::optional{false}.value());

Other `std::optional` type parameters

We spent a lot of time discussing the std::optional<bool> case. How about other types? Do they suffer from the same issue? The std::optional type is a template so its behavior is the same for any type parameter. Here is an example with std::optional<int>:

    std::optional<int> n = 0;
    if (n) {
      std::cout << "n is not 0" << std::endl;
    }

which generates the following output:

n is not 0

The problem with std::optional<bool> is just more pronounced due to the typical usages of bool. For non-bool types it is fortunately no longer a common practice to rely on the implicit cast to bool. These days it is much common to write the condition above explicitly as: if (n != 0) which will compare with the value of as long as it is populated.

Enums and Exhaustive switch statements in C++

Exhaustive switch statements are switch statements that do not have the default case because all possible values of the type in question have been covered by one of the switch cases. Exhaustive switch statements are a perfect match when working with scoped enum types. This can be illustrated by the following code:

	#include <iostream>
	#include <string>

	enum class Color {
	Red,
	Green,
	Blue,
	};

	std::string getColorName(Color c) {
	switch(c) {
	case Color::Red: return "red";
	case Color::Green: return "green";
	case Color::Blue: return "blue";
	}
	}

	int main() {
	std::cout << "Color: " << getColorName(Color::Green) << std::endl;
	return 0;
	}

view raw example.cpp hosted with ❤ by GitHub

Now, if a new enum member is added:

	enum class Color {
	Red,
	Green,
	Blue,
	Yellow,
	};

view raw updated_enum.cpp hosted with ❤ by GitHub

the compiler will show warnings pointing to all the places that need to be fixed:

	enum.cpp:12:12: warning: enumeration value 'Yellow' not handled in switch [-Wswitch]
	switch(c) {
	^
	enum.cpp:12:12: note: add missing switch cases
	switch(c) {
	^
	enum.cpp:17:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]

view raw compilation_warnings hosted with ❤ by GitHub

If you configure warnings as errors, which you should do if you can, you will have to address all these errors to successfully compile your program.

Without the exhaustive switch, the compiler would happily compile the program after adding the new enum member because it would be handled by the default case. Even if the default case was coded to throw an exception to help detect unhandled enum members, this exception would only be thrown at runtime which could be too late to prevent failures. In a bigger system or application there could be many switch statements like this and without the help from the compiler it can be hard to find and test all of them. To sum up – exhaustive switch statements help quickly find usages that need to be looked at and fixed before the code can ship.

So far so good, but there is a problem – C++ allows this:

Color color = static_cast<Color>(42);

view raw cast.cpp hosted with ❤ by GitHub

Believe it or not, this is valid C++ as long as the value being cast is within the range of the underlying enum type. If you flow an enum value created this way through the exhaustive switch it won’t match any of the switch cases and since there is no default case the behavior is undefined.

The right thing to do is to always use enums instead of mixing integers and enums which ultimately is the reason to cast. Unfortunately, this isn’t always possible. If you receive values from users or external systems, they would typically be integer numbers that you may need to convert to an enum in your system or library and the way to do this in C++ is static casting.

Because you can never trust values received from external systems, you need to convert them in a safe way. This is where the exhaustive switch statement can be extremely useful as it allows to write a conversion function like this:

	Color convertToColor(int c) {
	auto color = static_cast<Color>(c);
	switch(color) {
	case Color::Red:
	case Color::Green:
	case Color::Blue:
	return color;
	}
	throw std::runtime_error("Invalid color");
	}

view raw convert.cpp hosted with ❤ by GitHub

If a new enum member is added, the compiler will fail to compile the convertToColor function (or at least will show warnings), so you know you need to update it. For enum values outside of the defined set of members the convertToColor throws an exception. If you use a safe conversion like this immediately after receiving the value, you will prevent unexpected values from entering your system. You will also have a single place where you can detect invalid values and reject and log incorrect invocations.

Using native libraries in ASP.NET 5

In ASP.NET 5 RC1 we are adding a built-in support for native libraries. This may seem a little surprising – all in all ASP.NET 5 is all about managed code running on different platforms. There are, however, scenarios where this managed code needs to talk to unmanaged code. You can find such examples in ASP.NET 5 itself – the Kestrel Http server is build on top of the libuv library and Entity Framework ships a SQLite provider that uses the SQLite library. Before adding support for native libraries developers had to load native libraries on their own and then manually find and expose exported functions. Given the differences between platforms (Windows vs. non-Windows), architectures (x86 vs. x64 vs. ARM) and, possibly, runtimes (desktop Clr vs. CoreClr vs. Mono) the code to achieve the task was non-trivial. With the new feature the idea is that native libraries are shipped in a NuGet package, are part of a referenced project or live in a location where they can be automatically loaded from (e.g. /usr/lib) and the user just uses the DllImport attribute to get access to functions exported by the native library like this:

[DllImport(“mylib”)]
public static extern int get_number();

This is the simplest way of using the DllImport attribute. The name of the native library is passed as the parameter to the DllImport attribute while the exported function will be matched by convention using the name of the method attributed with the DllImport attribute. One important thing to note is that the name of the native library does not contain any filename extension. Both CoreClr and Mono will try appending different filename extensions if they can’t find a library with the exact name provided by the user and they will eventually use the extension specific for the given OS (Mono may also prepend the name with “lib”). Mono on OS X requires using __Internal as the name of the library – see below for more details.

Now, that you know how to consume native libraries in ASP.NET 5 let’s take a look at how to create a NuGet package that contains native libraries that can be consumed using the DllImport attribute. The most important thing is the package structure. Native libraries need to located in the $/runtimes/{runtime-id}/native folder where $ is the root of the package and the {runtime-id} describes the target platform the library is supposed to run on. Runtime Ids is a vast topic and I won’t go into a lot of details here – especially that, currently, they are not fully baked in when using native libraries and project references. For our purposes we can assume that a runtime id consists of the operating system name, version and architecture. For working with native libraries effectively it is enough to know the following runtime ids:

win7-x64
win7-x86
win7-arm
osx.10.9-x64
osx.10.10-x64
osx.10.11-x64

You can find a few oddities in this list. First a runtime id for Linux is missing. This is because you can’t really ship an .so that would work on any distribution of Linux in a package. While there are runtime ids that target selected distros (e.g. ubuntu.14.04-x64) or even just vanilla Linux (linux-x64) the recommended way of using native libraries on Linux is to have the user install the .so in a location from where it can be loaded automatically (e.g. install into /usr/lib or set the LD_LIBRARY_PATH environment variable accordingly). The second oddity is win7-arm. This seems odd because there was never an ARM version of Windows 7. The super-correct runtime to use here is actually win10-arm (ASP.NET 5 is not supported on Windows RT which would be win8-arm or win81-arm). However, if you used win10-arm you would need to remember to explicitly specify the runtime id each time you restore the package or publish your project. If you did not, you wouldn’t get your native library since without providing the runtime id packages will be restored for win7-{arch} (win7-arm in this case). So, because there aren’t any ASP.NET 5 runtimes that can run on pre-Windows 10 version of Windows it is safe to use win7-arm instead of win10-arm and it is much less troublesome. The last oddity is multiple runtime ids for osx. Currently, if you want to support multiple versions of OS X (officially CoreClr supports only OS X 10.10) you need to specify each version separately. Even if you use the same .dylib across all the versions it needs to be copied for each version of OS X you want to support.
For project references you need to use the same folder structure as for packages except that your root is now your project folder (i.e. the folder where the project.json lives).
The dnx repo contains sample projects for testing packages and project with native libraries. This is a good reference point if you ever get stuck.

What’s been covered so far should be enough to get started with native libraries in ASP.NET 5. You now know how to bind to functions exported from native libraries and how to build NuGet packages or structure projects containing native libraries. If you would like to understand how things are implemented under the cover (or need to know how to troubleshoot things) read on.
Loading native libraries works differently on different operating systems. In addition, different runtimes (CoreClr, desktop Clr, Mono) handle native libraries differently as well. CoreClr is consistent across all the platforms – whenever a native library needs to be loaded CoreClr will call ASP.NET 5 first and ask to provide the library. If the library CoreClr is asking for is in one of the referenced packages or projects ASP.NET 5 runtime will load it using a function exposed by CoreClr that allows to load a native library from a path. If you turn on tracing you will see the following trace entries when a native library is loaded:

Information: [LoaderContainer]: Load unmanaged library name=nativelib
Information: [PackageAssemblyLoader]: Loaded unmanaged library=nativelib in 1ms

On non-CoreClr runtimes things are a bit messy because non-CoreClr runtimes keep loading of native libraries to themselves. As a result it takes some tricks to help them load libraries from the right place. On Windows the LoadLibrary WINAPI function searches various locations when looking for a library. The search locations include folders specified in the %PATH% environment variable. So, when running on desktop Clr ASP.NET 5 will prepend the %PATH% environment variable with paths to native libraries. Note that one of disadvantages of this approach it is possible to hit the path length limit if a project references many packages with native libraries and this will most likely cause issues down the road. In traces this looks like this:

Information: [PackageDependencyProvider]: Enabling loading native libraries from packages by extendig %PATH% with: ;C:\Users\moozzyk\.dnx\packages\NativeLib\1.0.0\runtimes\win7-x86\native

On non-Windows systems the trick with the path does not work. While there is an environment variable that can be set to point the runtime linker to load a library from a non-default location (LD_LIBRARY_PATH on Linux and DYLD_LIBRARY_PATH on OS X) it has to be set before the process starts. The runtime linker reads and caches the value of the *_LIBRARY_PATH environment variable when the process starts and any further modifications have no effect. To work around this ASP.NET 5 on Mono on OS X will pre-load native libraries. This works but has one drawback – the name passed to the DllImport attribute can no longer be the library name but has to be __Internal to tell mono to look among already loaded libraries. This is ugly because it requires maintaining two sets of DllImport attributes – one that is specific to Mono on OS X with __Internal as the name and one that has the dll name which works everywhere else. If you want to see this ugliness at work you can take a look at Kestrel or the project for testing the DllImport attribute. In this case the traces will look as follows:

Information: [PackageDependencyProvider]: Attempting to preload: /Users/moozzyk/.dnx/packages/NativeLib/1.0.0/runtimes/osx.10.11-x64/native/nativelib.dylib
Information: [PackageDependencyProvider]: Preloading: /Users/moozzyk/.dnx/packages/NativeLib/1.0.0/runtimes/osx.10.11-x64/native/nativelib.dylib succeeded

The pre-loading trick does not work on Mono on Linux. In this scenario the user must make sure that the library can be loaded by the runtime linker. I don’t think this is a big problem – as explained above distributing .so’s in NuGet packages is not that a great idea anyways.

If you are trying to use a native library and ASP.NET 5 cannot find/load the library there are a few things you can try. First turn on tracing and check if you see traces similar to shown above. If you don’t see them and you are using a package reference most likely the native library was not restored correctly. To check this open the project.lock.json file and find a corresponding target section that contains the both the runtime and the runtime id (i.e. you are looking for "DNXCore,Version=v5.0/osx.10.11-x64" not "DNXCore,Version=v5.0"). Inside find the package description for your package with native libraries – e.g.:

"NativeLib/1.0.0": {
  "type": "package",
  "dependencies": {
    "System.Runtime": "4.0.21-beta-23506"
  },
  "compile": {
    "lib/dnxcore50/NativeLib.dll": {}
  },
  "runtime": {
    "lib/dnxcore50/NativeLib.dll": {}
  },
  "native": {
    "runtimes/osx.10.11-x64/native/nativelib.dylib": {}
  }
},

If the "native" does not exist, is empty or does not contain a correct path to your library this is the problem. In most cases the issue is with the runtime id – either the packages were not restored using the correct/expected runtime id or the package containing the native library did not have library for the given runtime id.
If you are seeing problems with loading native libraries on Mono you can try using the MONO_LOG_LEVEL environment variable. Setting it to debug will print a lot of traces which can be helpful troubleshooting issues. You can also filter out some categories. More details here.
Another helpful environment variable useful for debugging problems with loading libraries on Linux is LD_DEBUG. Again, if you turn on all tracing you will get a lot of stuff. You can however filter out things. This post contains a nice summary of available options.
Looks like on Mac you should be able to use DYLD_PRINT_LIBRARIES (I did not have to so far).
On Windows you can use Process Monitor to see what files are being accessed.
Also, remember that loading a dynamic library will fail if it depends on a library that cannot be found. You can check dependencies with ldd on Linux, otool -L on Mac OS X and dumpbin on Windows.

So, this more or less how native libraries can be used in ASP.NET 5 and how you troubleshoot issues.

C++ Async Development (not only for) for C# Developers Part V: Cancellation

In the previous part of the C++ Async Development (not only for) for C# Developers we looked at exception handling in the C++ Rest SDK/Casablanca. Today we will take look at cancellation. However, because in C++ Rest SDK cancellation and exceptions have a lot in common I recommend reading the post about exception handling (if you have not already read it) before looking at cancellation.

C++ Rest SDK allows cancelling running or scheduled tasks. Running tasks cannot, however, be forcefully cancelled from outside. Rather, an external code may request cancellation and the task may (but is not required to) cancel an ongoing operation. (This is actually very similar to how cancellation of tasks works in C# async.)

So, cancellation in its simplest form is about performing a check inside the task to determine if cancellation was requested and – if it was – giving up the current activity and returning from the task. While conceptually simple, implementing a reusable mechanism that allows to “perform a check to determine if cancellation was requested” is not an easy task due to threading and intricacies around capturing variables in lambda expressions. Fortunately, we don’t have to implement such a mechanism ourselves since C++ Rest SDK already offers one. It is based on two classes – pplx::cancellation_token and pplx::cancellation_token_source. The idea is that you create a pplx::cancellation_token_source instance and capture it in the lambda expression representing a task. Then inside the task you use this pplx::cancellation_token_source instance to obtain a pplx::cancellation_token whose is_cancelled method tells you if cancellation was requested. To request a cancellation you just invoke the cancel method on the same pplx::cancellation_token_source instance you passed to your task. Typically you will call the cancel method from a different thread. Let’s take a look at this example:

void simple_cancellation()
{
    pplx::cancellation_token_source cts;

    auto t = pplx::task<void>([cts]()
    {
        while (!cts.get_token().is_canceled())
        {
            std::cout << "task is running" << std::endl;
            pplx::wait(90);
        }

        std::cout << "task cancelled" << std::endl;
    });

    pplx::wait(500);

    std::cout << "requesting task cancellation" << std::endl;
    cts.cancel();
    t.get();
}

Here is the output:

task is running
task is running
task is running
task is running
task is running
task is running
requesting task cancellation
task cancelled
Press any key to continue . . .

One noteworthy thing in the code above is that the pplx::cancellation_token_source variable is captured by value. This is possible because pplx::cancellation_token_source behaves like a smart pointer. It also guarantees thread safety. All of this makes it really easy to pass it around or capture it in lambda functions in this multi-threaded environment.

We now know how to cancel a single task. However, in real world it is very common to have continuations scheduled to run after a task completes. In most scenarios when cancelling a task you also want to cancel its continuations. You could capture the pplx::cancellation_token_source instance and do a check in each of the continuations but fortunately there is a better way. The ppxl::task::then() function has an overload that takes pplx::cancellation_token as a parameter. When using this overload the task will not run if the token passed to this function is cancelled. The following sample shows how this works:

void cancelling_continuations()
{
    pplx::cancellation_token_source cts;

    auto t = pplx::task<void>([cts]()
    {
        std::cout << "cancelling continuations" << std::endl;

        cts.cancel();
    })
    .then([]()
    {
        std::cout << "this will not run" << std::endl;
    }, cts.get_token());

    try
    {
        if (t.wait() == pplx::task_status::canceled)
        {
            std::cout << "task has been cancelled" << std::endl;
        }
        else
        {
            t.get(); // if task was not cancelled - get the result
        }
    }
    catch (const std::exception& e)
    {
        std::cout << "excption :" << e.what() << std::endl;
    }
}

Here is the output:

cancelling continuations
task has been cancelled
Press any key to continue . . .

In the code above the first task cancels the token. This results in canceling its continuation because we pass the cancellation token when scheduling the continuation. This is pretty straightforward. The important thing, however, is to also look at what happens on the main thread. In the previous example we called pplx::task::get(). We did this merely to block the main thread from exiting since it would kill all the outstanding threads including the one our task is running on. Here we call pplx::task::wait(). It blocks the main thread until the task reaches the terminal state and returns the status telling whether the task ran to completion or was cancelled. Note that if the task threw an exception this exception would be re-thrown from pplx::task::wait(). If this happens we need to handle this exception or the application will crash. Finally, for the completeness, we also have code that gets the result of the task (t.get()). It will never be executed in our case since we always cancel the task but in reality more often than not the task won’t be cancelled in which case you would want to get the result of the task.

There is a couple of questions I would like to drill into here:
– Why do we even need to use pplx::task::wait() – couldn’t we just use pplx::task::get()? The difference between pplx::task::wait() and pplx::task::get() is that pplx::task::get() will throw a pplx::task_canceled exception if a task has been cancelled whereas pplx::task::wait() does not throw for cancelled tasks. Therefore by using pplx::task::wait() you can avoid exceptions for cancelled tasks. This also may simplify your code when you handle exceptions in a different place than you would want to handle cancellation
– What happens if we don’t call pplx::task::wait()/pplx::task::get() and the task is cancelled? Sometimes we want to start an activity in the “fire-and-forget” manner and we don’t really care about the result so calling pplx::task::wait()/pplx::task::get() does not seem to be necessary. Unfortunately this won’t fly – if you call neither pplx::task::wait() nor pplx::task::get() on a task that has been cancelled a pplx::task_canceled exception will be eventually thrown which will crash your app – the same way as any other unobserved exception would do.

In the previous example we blocked the main thread to be able to handle the cancellation. This is not good – we use asynchronous programming to avoid blocking so it would be nice if we figured something out to not block the main thread. An alternative to handling a cancellation in the main thread is to handle cancellations in continuations. We can use the same pattern we used before to handle exceptions i.e. we can create a task based continuation in which we will call pplx::task:wait() to get the task status or pplx::task::get() and get a potential pplx::task_canceled exception. The important thing is that the continuation must not be scheduled using the overload of the pplx::then() function that takes the cancellation token. Otherwise the continuation wouldn’t run if the task was cancelled and the cancellation wouldn’t be handled resulting in an unobserved pplx::task_canceled exception that would crash your app. Here is an example of handling cancellation in a task based continuation:

void handling_cancellation_in_continuation()
{
    pplx::cancellation_token_source cts;

    auto t = pplx::task<void>([cts]()
    {
        std::cout << "cancelling continuations" << std::endl;

        cts.cancel();
    })
    .then([]()
    {
        std::cout << "this will not run" << std::endl;
    }, cts.get_token())
    .then([](pplx::task<void>previous_task)
    {
        try
        {
            if (previous_task.wait() == pplx::task_status::canceled)
            {
                std::cout << "task has been cancelled" << std::endl;
            }
            else
            {
                previous_task.get();
            }
        }
        catch (const std::exception& e)
        {
            std::cout << "exception " << e.what() << std::endl;
        }
    });

    t.get();
}

and the output:

cancelling continuations
task has been cancelled
Press any key to continue . . .

This is mostly it. There is one more topic related to cancellation – linked cancellation token sources but it is beyond the scope of this post. I believe that information included in this post should make it easy to understand how linked cancellation token sources work and how to use them if needed.

SignalR C++ Client

For the past several months I have been working on the SignalR C++ Client. The first, alpha 1 version has just shipped on NuGet and because there isn’t any real documentation for it at the moment I decided to write a blog post showing how to get started with it.
The SignalR C++ Client NuGet package contains Win32 and x64 bits for native desktop applications and is meant to be used with Visual Studio 2013.
Since the SignalR C++ Client ships on NuGet adding it to a project is easy. After creating a C++ project (e.g. a console application) right click on the project node in the solution explorer and select the “Manage NuGet Packages” option. In the Manage NuGet Package window make sure to include prelease packages in your search by selecting “Include Prerelease” in the dropdown and enter “SignalR C++” in the search window. Finally click the “Install” button next to the Microsoft ASP.Net SignalR C++ Client package which will install the SignalR C++ Client (and its dependency – C++ Rest SDK) into your project.

You can also install the package from the Package Manager Console – just open the package manager console (Tools → NuGet Package Manager → Package Manager Console) and type:
Install-Package Microsoft.AspNet.SignalR.Client.Cpp.v120.WinDesktop –Pre
The SignalR C++ Client relies heavily on asynchronous facilities provided by the C++ Rest SDK (codename Casablanca) which in turn extensively uses lambda functions introduced in C++ 11. Understanding both – asynchronous programming and lambda functions is crucial to being able to use the SignalR C++ Client effectively. I started a blog mini-series on asynchronous programming in C++ which I would recommend to read if you are not familiar with these concepts.
The SignalR C++ Client supports both programming models available in SignalR – Persistent Connections and Hubs. Persistent Connections is just a simple way of exchanging data between the server and the client while Hubs enable RPC-like programming where it is possible to invoke a method on the server from the client and vice versa. In this post I will show how to use the SignalR C++ Client to handle both – Persistent Connections and Hubs.
Before we can move to the client code we need to set up a server. Our server will have two endpoints – one for persistent connections and one for hubs. To make it easy we will use the chat server from the SignalR tutorial as a starting point and we will add a persistent connection endpoint to it. Just follow the steps in the tutorial to create the server. (Alternatively you can get the code from ths SignalR C++ Client GitHub repo – just clone the repo and open the samples_VS2013.sln file with Visual Studio 2013). Note that if you follow the steps you will end up installing the latest stable available NuGet packages into your project instead of the version used in the tutorial and therefore you need to update the index.html file to use the version of the jquery.signalR-x.x.x.min.js that was installed into your project instead of the one used in the tutorial – i.e. if you installed version 2.2.0 of SignalR you will need to change
<script src="Scripts/jquery.signalR-2.0.3.min.js"></script>
to
<script src="Scripts/jquery.signalR-2.2.0.min.js"></script>
You should also set up index.html as the start page by right-clicking on this file in the Solution Explorer and selecting “Set As Start Page”. After you complete all these steps you should be able to run the server (just press Ctrl+F5) and send and receive messages from the browser window that opens.
Now we need to add a Persistent Connection endpoint to our server. It’s quite easy – we just need to add the following class to the project:

using System.Threading.Tasks;
using Microsoft.AspNet.SignalR;

namespace SignalRServer
{
    public class EchoConnection : PersistentConnection
    {
        protected override Task OnConnected(IRequest request,
            string connectionId)
        {
            return Connection.Send(connectionId, "Welcome!");
        }

        protected override Task OnReceived(IRequest request,
            string connectionId, string data)
        {
            return Connection.Broadcast(data);
        }
    }
}

and configure the server to treat requests sent to the /echo path as SignalR persistent connection requests which should be handled by the EchoConnection class. Adding the following line to the Configuration method in the Startup class will do the trick:

app.MapSignalR<EchoConnection>("/echo");

Our server should be now ready to use so we can now start playing with the client.

Using Persistent Connections

Our EchoConnection sends the "Welcome!" string to the client when it connects and then broadcasts messages it receives from the client to all connected clients. On the client side, after the client connects successfully, we will wait for the user to enter a string which will be sent to the server. We also print any message the client receives from the server. To be notified about messages we need to register a callback which will be invoked for whenever a message is received. Finally, if the user types “:q” (this is the command you want to remember when you try to git commit on a new box but forgot to configure the text editor git should use) the client will close the connection and exit. The code that does all of it is shown below (again, you can get the code from the SignalR-Client-Cpp repo – it is in the PersistentConnectionSample.cpp file)

void send_message(signalr::connection &connection,
                  const utility::string_t& message)
{
    connection.send(message)
        // fire and forget but we need to observe exceptions
        .then([](pplx::task<void> send_task)
    {
        try
        {
            send_task.get();
        }
        catch (const std::exception &e)
        {
            ucout << U("Error while sending data: ") << e.what();
        }
    });
}

int main()
{
    signalr::connection connection{ U("http://localhost:34281/echo") };
    connection.set_message_received([](const utility::string_t& m)
    {
        ucout << U("Message received:") << m 
              << std::endl << U("Enter message: ");
    });

    connection.start()
        // fine to capture by reference - we are blocking 
        // so it is guaranteed to be valid
        .then([&connection]()
        {
            for (;;)
            {
                utility::string_t message;
                std::getline(ucin, message);

                if (message == U(":q"))
                {
                    break;
                }

                send_message(connection, message);
            }

            return connection.stop();
        })
        .then([](pplx::task<void> stop_task)
        {
            try
            {
                stop_task.get();
                ucout << U("connection stopped successfully") << std::endl;
            }
            catch (const std::exception &e)
            {
                ucout << U("exception when starting or closing connection: ") 
                      << e.what() << std::endl;
            }
        }).get();

    return 0;
}

You may want to try more than one instance of the client to see that all clients receive messages broadcast by the server.

The code is intuitively simple but there are some interesting nuances so let’s take a closer look at it. In the main function, as explained above, we create a connection and we use the set_message_received function to set a handler that will be called whenever we receive a message from the server. Then we start the connection using the connection.start() function. If the connection started successfully we run a loop that reads messages from the console. Note that we know that connection started successfully because if connection.start() threw an exception this continuation would not run at all because it is a value based continuation (you can find more on how exceptions in C++ async work in my blog post on this very subject). Whenever a user enters a message we send the message to the server using the connection.send() function. Sending messages happens in the fire-and-forget manner but we still need to handle exceptions to prevent from crashes caused by unobserved exceptions. When the user enters “:q” we break the loop and move on to the next continuation which stops the connection. This continuation is interesting because it actually can be invoked in one more case. Note that this is a task based continuation so it will be invoked always – even if a previous task threw. As a result this continuation is also an exception handler for the task that starts the connection. Moving back to stopping the connection – stopping a connection can potentially throw so again we need to handle the exception to prevent from crashes.
There are two more important things. One is that tasks are executed asynchronously so we have to block the main thread to prevent the program from exiting and terminating all the threads (see another blog post of mine for more details). In our case we can just use the task::get() function – it is sufficient, simple and works. The second important thing is related to how we capture the connection variable in one of the continuations. We capture it by reference. We can do that because we block the main thread and therefore we ensure that the reference we captured will be always valid. In general case however capturing local variables by reference will lead to undefined behavior and crashes if the function that started a task exited before the task completes (or is even started) since the variable will go out of scope and the captured reference will no longer be valid. Blocking a thread to wait for the task works but usually is not the best way to solve the problem. If you cannot ensure that the reference will be valid when a task runs you should consider capturing variables by value or, if it is not possible (like in the case of the connection and hub_connection instances) capture a std::shared_ptr (or std::weak_ptr). Regardless of how you capture your variables (maybe except for primitive values captured by value) you need to make sure you work with them in a thread safe way because you never know what thread a task is going to run on.

Using Hubs

The sample for hub connections shows how to connect and communicate with the SignalR sample chat server. The code looks like this (you can also find it on GitHub in the HubConnectionSample.cpp):

void send_message(signalr::hub_proxy proxy, const utility::string_t& name,
                  const utility::string_t& message)
{
    web::json::value args{};
    args[0] = web::json::value::string(name);
    args[1] = web::json::value(message);

    proxy.invoke<void>(U("send"), args)
        // fire and forget but we need to observe exceptions
        .then([](pplx::task<void> invoke_task)
    {
        try
        {
            invoke_task.get();
        }
        catch (const std::exception &e)
        {
            ucout << U("Error while sending data: ") << e.what();
        }
    });
}

void chat(const utility::string_t& name)
{
    signalr::hub_connection connection{U("http://localhost:34281")};
    auto proxy = connection.create_hub_proxy(U("ChatHub"));
    proxy.on(U("broadcastMessage"), [](const web::json::value& m)
    {
        ucout << std::endl << m.at(0).as_string() << U(" wrote:") 
              << m.at(1).as_string() << std::endl << U("Enter your message: ");
    });

    connection.start()
        .then([proxy, name]()
        {
            ucout << U("Enter your message:");
            for (;;)
            {
                utility::string_t message;
                std::getline(ucin, message);

                if (message == U(":q"))
                {
                    break;
                }

                send_message(proxy, name, message);
            }
        })
        // fine to capture by reference - we are blocking
        // so it is guaranteed to be valid
        .then([&connection]()
        {
            return connection.stop();
        })
        .then([](pplx::task<void> stop_task)
        {
            try
            {
                stop_task.get();
                ucout << U("connection stopped successfully") << std::endl;
            }
            catch (const std::exception &e)
            {
                ucout << U("exception when starting or stopping connection: ")
                      << e.what() << std::endl;
            }
        }).get();
}

Let’s quickly go through the code. First we create a hub_connection instance which we then use to create a ChatHub proxy. Then we use the on function to set up a handler which will be invoked each time the server invokes the broadcastMessage client method. Note that the callback takes a json::value as a parameter which is an array containing parameters for the client method. If you ever used the SignalR .NET client then you probably notice it is different from what you are used to. The SignalR .NET Client does not expose JSON directly but uses reflection to convert the JSON array to typed values which are then passed to the On method as parameters. Since there isn’t reach reflection in C++ it is the responsibility of the user to interpret the parameters the SignalR C++ Client passes to the lambda in the on function. Once the hub proxy is set up we can start the connection and wait for the user to enter a message which we will then send to the server by invoking the server side send hub method. Server side hub methods are invoked with the hub_proxy::invoke function. This function has two flavors. One for methods that don’t return values: hub_proxy::invoke<void>(...), and one used to invoke non-void hub methods: hub_proxy::invoke<json::value>(...). If a server side hub method returns a value it will be returned to the user as a json:value and, again, it is up to the user to make sense out of it. There are also convenience overloads of hub_proxy::invoke function you can use to invoke a parameterless hub method which don’t take the arguments parameter. They will save you a couple lines of code required to create an empty JSON array.
Long running server side methods can notify the client about their progress. If the client wants to receive these notifications it can provide a callback that should be invoked each time a progress message is received. The callback is passed as the last parameter of the hub_proxy::invoke functions and defaults to a lambda expression with an empty body. The sample chat server does not have any method sending progress messages but if you are interested you can check SignalR end to end tests that tests this scenario.
That’s mostly it. The remaining code is just closing the connection and handling exceptions is done the same way as for Persistent Connection.
One important thing worth noting is how we capture the hub_proxy instance in the lambda – we do it by value. hub_proxy type has semantics of std::shared_ptr where all copies point to a single implementation instance which won’t be deleted as long as there is at least one instance that has a reference to it. As a result if you capture a hub_proxy instance by value in a lambda it will be valid even if the original variable is not around anymore. (Not that this matters in this particular example since we are blocking the thread anyways so even if we captured the hub_proxy instance by reference everything would work since the original variable does not go out of scope until the connection is closed).

Doing the right things

There are a few things you need to be aware of when working with the SignalR C++ client.

Prepare the connection before starting

SignalR C++ Client uses callbacks to communicate with the user’s code. However you can only set these callbacks when the connection is in the disconnected state. Otherwise an exception will be thrown. Similarly, when using hubs you have to create hub proxies before you start the connection.

Process messages fast or asynchronously

The callbacks invoked when a message is received (connection::set_message_received, hub_proxy::on) are invoked synchronously from the thread that receives messages. This is to ensure that the callbacks are invoked in the same order as the order the messages were received. The drawback is that the new messages won’t be received until the callback completes processing the current message. Therefore you need to process messages as quickly as possible or, if you don’t care about order, process messages asynchronously.

Handle exceptions

Any kind of network communication is susceptible to errors. Intermittent connection losses and timeouts may and will happen and they will result in exceptions. These exceptions have to be handled otherwise your app will crash due to an unobserved exception. To make exception handling easier the SignalR C++ Client follows a pattern where functions returning pplx::task<T> will not throw exceptions in case of errors but will instead return a faulted task. This saves the user from having to have an exception handler in addition to handling exceptions in a task based continuation. (You can read more about handling exceptions when using tasks here.

Capture connection and `hub_proxy` instances correctly

Copy constructors and copy assignment operators of the connection and hub_connection classes are intentionally deleted. This was done to prevent from capturing connection and hub_connection instances by value in lambda expressions (I also don’t think there is a clear answer as to what the operation of copying a connection should do). The issue with capturing connection and hub_connection instances by value is that because these classes use a std::shared_ptr pointing to the actual implementation it is possible to create a cycle which would prevent from destroying the connection instance (i.e. the destructor wouldn’t run). The cycle would result not only in a memory leak but would also keep the connection running after the variable went out of scope if the connection was not explicitly stopped – the destructor is responsible for stopping connections that were not stopped explicitly. The cycle would be created if a connection/hub_connection instance was captured by value in callbacks that are passed back to and stored in the connection or hub_connection instances i.e. callbacks passed to connection::set_message_received, set_reconnecting, set_reconnected, set_disconnected functions on both connection and hub_connection.
While deleting copy constructors and copy assignment operators on connection and hub_connection classes makes it more difficult to create a cycle it does not make it impossible – you could still inadvertently create a cycle if you capture a std::shared_ptr<connection> or std::shared_ptr<hub_connection> by value. So, what to do? The safest way is to create a weak pointer (std::weak_ptr) to the connection and capture this pointer. Then in the callback you will need call std::weak_ptr::lock() function to obtain a shared pointer to the connection. You need to check the return value of the std::weak_ptr::lock() function – it will return the nullptr if the instance it points to was destroyed (in which case you probably will want just to exit the callback). You could also try capturing the connection instance by reference by you will have to ensure that the reference is always valid when the callback is executed which sometimes might be hard to do.
Note that capturing hub_proxy instances by value is fine. Hub_proxy is linked back to the connection using a weak pointer so capturing hub_proxy instances by value won’t create cycles. If you, however, try invoking a function on a hub_proxy instance that outlived its connection an exception will be thrown.

Stop connection explicitly

While the connection is being stopped when the instance goes out of scope it is recommended to explicitly stop connections. Stopping the connection explicitly has a few advantages:

Throwing an exception from the destructor in C++ results in undefined behavior. As a result the connection class destructor (or to be more accurate the connection_impl dtor) catches and swallows all the exceptions. When stopping connections explicitly all the exceptions are passed back to the user (in form of a faulted task) giving the user a chance to handle them the way they see it fit
Stop is an asynchronous operation but the destructor isn’t. Therefore when the connection is running when the destructor is called the destructor blocks and waits until the stop operation completes. Since the destructor runs in the current thread you might experience “unexpected” delays – “unexpected” because the delay will happen even though you didn’t invoke any function explicitly. Rather the destructor is just called automatically for you when the variable leaves the scope (or when you call delete on dynamically allocated instances). These delays could be especially annoying if the current thread the destructor is running in happens to be the UI thread
Relying on the destructor stopping the connection may result in the connection not being stopped at all. Internally the connection is using std::shared_ptr pointing to the actual implementation which will be destroyed only when no one is referencing it. In case of a bug where the connection implementation instance stores a callback that captures its connection instance a cycle is created and the reference count will never reach 0. In this situation the destructor will never be called which means that not only would memory be leaked but also that the connection would never be stopped

Logging

The SignalR C++ Client is able to log its activities. You can control what activities are being logged and how they are logged. To control what’s being logged pass a trace_level to the connection or hub_connection constructor. The default setting is to log all activities. To control how the activities are logged you need to create a class derived from the log_writer class and pass a std::shared_ptr pointing to your writer in the when creating a connection/hub_connection instance. Your implementation has to ensure that logging is thread safe as the SignalR C++ Client does not synchronize logging in any way. If you don’t provide you own log_writer the SignalR C++ Client will use the default implementation that uses the OutputDebugString function to log entries. This is especially useful when debugging the client with Visual Studio since entries logged this way will appear in the Visual Studio Output window.

Limitations

While the SignalR C++ Client is fully functional it contains a couple of limitations. Currently the client supports only the webSockets transport. It also does not support detecting stale connections using the heartbeat mechanism. (The way heartbeat works in other clients is that the server sends keep alive messages every few seconds and if the client misses a few of these keep alive messages it will consider the connection to be stale/dead and will try restarting the connection. The SignalR C++ Client currently ignores keep alive messages). Finally, the SignalR C++ Client does not support sending or receiving State information.

Future

This is only an alpha 1 release. I expect there will be some bugs that have not been caught so far and fixing them should be a priority. Another interesting exercise is to try to make the SignalR C++ Client work on other platforms – specifically on Linux and Mac OS. It should be possible because the SignalR C++ Client is built on top of C++ REST SDK which is cross platform. Finally – depending on the feedback – adding new transports (at least the longPolling transport) may be something worth looking at.