SignalR on the Wire – an informal description of the SignalR protocol

I have seen the question asking about a description of the SignalR protocol come up quite a lot. Heck, when I started looking at SignalR I too was looking for something like this. Now, almost a year later, after I architecturally redesigned the SignalR C# client and wrote from scratch the SignalR C++ Client I think I can describe the protocol quite accurately. So, here we go.
In my view the protocol used by SignalR consists of two parts. The first part is related to connection management i.e. how the connection is started, stopped, reconnected etc. This part contains some quite complicated bits (especially around starting the connection) and it is mostly interesting to people who want to write their own client (which, I believe, is a minority). The second part which, I think, the vast majority of users is actually interested in is what are all these “H”s, “A”s, “I”s etc. SignalR is putting on the wire and writing to logs. I will start from the first part and then will describe the second part.
Disclaimer: In some cases I will be talking about differences among the clients. I have only worked with the SignalR .NET client, the SignalR C++ Client and the SignalR JavaScript Client (“worked” in this case is an overstatement – I just fixed a few bugs and looked at the code several times). I am aware of other SignalR clients like the Java or Objective-C one but I have not tried them nor looked at the code and I don’t know what they do, how they do it and how much they conform to the description below.

Connection Management
SignalR manages the connection by using the HTTP(S) protocol. Actions are initiated by the client which sends HTTP requests that contain the requested action and a sub-set of common parameters. The requests can be sent using the GET or (when using protocol version 1.5) POST method. Not all the requests require all the parameters. Here are the parameters used in SignalR requests with their descriptions:

transport – the name of the transport being used. Valid values: webSockets, longPolling, serverSentEvents, foreverFrame
clientProtocol – the version of protocol used by the client. The most recent version is 1.5 however it is only used by the JavaScript client since the change that mandated bumping the version of the protocol to 1.5 is only relevant for this client. The .NET and C++ clients currently use version 1.4. Note that the server is designed to support down-level clients (i.e. clients using previous versions of the protocol) and the current (2.2.0) version supports protocol versions from 1.2 to 1.5
connectionToken – a string that identifies the sender. It is returned in the response to the negotiate request. See this document for more details on connection token.
connectionData – a url-encoded JSon array containing a list of hubs the client is subscribing to. For instance if the client is subscribing to two hubs – “my_hub”, “your_hub” the array to be sent looks like this: [{"Name":"my_hub"},{"Name":"your_hub"}] and after url-encoding it becomes:
```
%5B%7B%22Name%22:%22my_hub%22%7D,%7B%22Name%22:%22your_hub%22%7D%5D
```
messageId – the id of the last received message. Used for reconnecting and – when using the longPolling transport – in poll requests
groupsToken – a token describing what groups the connection belongs to. Used for reconnecting
queryString – an arbitrary query string provided by the user; appended to all requests

Starting the Connection
Starting the connection is the most complicated task related to connection management performed by a SignalR client. It requires sending three requests to the server – negotiate, connect and start. The whole sequence looks as follows:

the client sends the negotiate request. The response to the negotiate request contains a number of client configuration settings
the client starts the transport by sending the connect request. The connect request has to complete within the timeout returned by the server in the response to the negotiate request. The response to the connect request (a.k.a. init message) is sent on the newly started transport (i.e. if you use webSockets transport it will be sent on the newly opened websocket, if you use serverSentEvents it will be sent on the newly opened event stream if you use longPolling it will be sent as a response to the connect/poll request)
once the init message has been received the client sends the start request. The server confirms it received the start request by responding with the {Response: Started} payload

You can also find some details about the start sequence here.

Connection Management Requests
Here is a list of requests the client sends to start, stop and reconnect the connection.

» negotiate – negotiate connection parameters
Required parameters: clientProtocol, connectionData (when using hubs)
Optional parameters: queryString
Sample request:

http://host/signalr/negotiate?clientProtocol=1.5&connectionData=%5B%7B%22name%22%3A%22chat%22%7D%5D

Sample response:

{
  "Url":"/signalr",
  "ConnectionToken":"X97dw3uxW4NPPggQsYVcNcyQcuz4w2",
  "ConnectionId":"05265228-1e2c-46c5-82a1-6a5bcc3f0143",
  "KeepAliveTimeout":10.0,
  "DisconnectTimeout":5.0,
  "TryWebSockets":true,
  "ProtocolVersion":"1.5",
  "TransportConnectTimeout":30.0,
  "LongPollDelay":0.0
}

Url – path to the SignalR endpoint. Currently not used by the client.
ConnectionToken – connection token assigned by the server. See this article for more details. This value needs to be sent in each subsequent request as the value of the connectionToken parameter
ConnectionId – the id of the connection
KeepAliveTimeout – the amount of time in seconds the client should wait before attempting to reconnect if it has not received a keep alive message. If the server is configured to not send keep alive messages this value is null.
DisconnectTimeout – the amount of time within which the client should try to reconnect if the connection goes away.
TryWebSockets – whether the server supports websockets
ProtocolVersion – the version of the protocol used for communication
TransportConnectTimeout – the maximum amount of time the client should try to connect to the server using a given transport

» connect – starts a transport
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs)
Optional parameters: queryString
Sample request:

wss://host/signalr/connect?transport=webSockets&clientProtocol=1.5&connectionToken=LkNk&connectionData=%5B%7B%22name%22%3A%22chat%22%7D%5D

Sample response (a.k.a. init message):

{"C":"s-0,2CDDE7A|1,23ADE88|2,297B01B|3,3997404|4,33239B5","S":1,"M":[]}

Remarks:
The connect request starts a transport. If you are using the webSockets transport the client will use the ws:// or wss:// scheme to open a websocket. If you are using the serverSentEvents transport the client will open an event stream. For the longPolling transport the connect request is treated by the server as the first poll request. The response to the connect request is sent using the newly opened channel and is a JSon object containing the property "S" set to 1 (a.k.a. init messge). The server however does not guarantee this message to be the first message sent to the client (e.g. there can be a broadcast in progress which will be sent to the client before the server sends the init message. This is interesting in case of the longPolling transport because the response to the connect request will close the pending connect request even though it is not the init message. The init message will in that case be sent as a response to a subsequent poll request).

» start – informs the server that transport started successfully
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs)
Optional parameters: queryString
Sample request:

http://host/signalr/start?transport=webSockets&clientProtocol=1.5&connectionToken=LkNk&connectionData=%5B%7B%22name%22%3A%22chat%22%7D%5D

Sample response:

{"Response":"started"}

Remarks:
start request was added in the version 1.4 of the protocol to make some scenarios work reliably on the server side. Adding this request to the start sequence made things complicated on the client since though since there is quite a few things that can go wrong after the client received the init message but before it received a response to the start message (like the connection is lost and the client starts reconnecting, the user stops the connection etc.).

» reconnect – sent to the server when the connection is lost and the client is reconnecting
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs), messageId, groupsToken (if the connection belongs to a group)
Optional parameters: queryString
Sample request:

ws://host/signalr/reconnect?transport=webSockets&clientProtocol=1.4&connectionToken=Aa-
aQA&connectionData=%5B%7B%22Name%22:%22hubConnection%22%7D%5D&messageId=d-3104A0A8-H,0%7CL,0%7CM,2%7CK,0&groupsToken=AQ

Sample response: N/A
Remarks:
Similarly to the connect request the reconnect request starts (re-starts) the transport. For the longPolling transport from the client perspective it is just yet another form of poll, for the serverSentEvents transport a new event stream will opened, for the webSockets transport it will open a new websocket. The messageId tells the server what was the last message the client received and the groupsToken tells the server what groups the client belonged to before reconnecting.

» abort – stops the connection
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs)
Optional parameters: queryString
Sample request:

http://host/signalr/abort?transport=longPolling&clientProtocol=1.5&connectionToken=QcnlM&connectionData=%5B%7B%22name%22%3A%22chathub%22%7D%5D

Sample response: empty
Remarks: The JavaScript and C++ clients send abort request in a fire and forget manner and ignore all the errors. The .NET client blocks until response is received or a timeout occurs, what apart from taking more time, causes some issues (like this bug).

» ping – pings the server
Required parameters: none
Optional parameters: queryString
Sample request:

http://host/signalr/ping

Sample response:

{ "Response": "pong" }

Remarks: The ping request is not really a “connection management request”. The sole purpose of this request is to keep the ASP.NET session alive. It is only sent by the the JavaScript client.

SignalR Messages
Before we can take a look at the messages SignalR puts on the wire we need to discuss how different transports send and receive messages. The webSockets transport is quite simple since it is creating a full-duplex communication channel used to send data from the server to the client and from the client to the server. Once the channel is setup there are no further HTTP requests until the client is stopped (the abort request) or the connection was lost and the client tries to re-establish the connection (the reconnect request). The serverSentEvents transport creates an event stream that is used to receive messages from the server. If the client wants to send a message to the server it creates a send HTTP POST request and sends the data in the request body. The longPolling transport creates a long running HTTP request which the server will respond to if it has a message for the client. If the server does not send any data within a configured timeout (calculated as the sum of the ConnectionTimeout received in the response to the negotiate request + 10 seconds – which by default is 120 seconds) the current poll request will be closed and the client will start a new poll request (this is to prevent proxies from closing the long running request which would result in unnecessary reconnects). Sending messages works in the same way as for the serverSentEvents transport – a send HTTP request containing the message in the request body is sent to the server. Here are the descriptions of the send and poll requests.

» send – sends data to the server. Used by the serverSentEvents and longPolling transports
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs), data (sent in the request body)
Optional parameters: queryString
Sample request:

http://host/signalr/send?transport=longPolling&clientProtocol=1.5&connectionToken=Ac5y5&connectionData=%5B%7B%22name%22%3A%22chathub%22%7D%5D

Data send int the request body (url encoded, see the description below) :

data=%7B%22H%22%3A%22chathub%22%2C%22M%22%3A%22Send%22%2C%22A%22%3A%5B%22a%22%2C%22test+msg%22%5D%2C%22I%22%3A0%7D

Sample response (see the description below):

{ "I" : 0 }

» poll – starts a (potentially) long running polling request that the server will use to send data to the client. Used only by the longPolling transport
Required parameters: transport, clientProtocol, connectionToken, connectionData (when using hubs), messageId (the JavaScript client sends messageId in the request body)
Optional parameters: queryString
Sample request:

http://host/signalr/poll?transport=longPolling&clientProtocol=1.5&connectionToken=A12
-FX&connectionData=%5B%7B%22name%22%3A%22chathub%22%7D%5D&messageId=d-53B8FCED-B%2C1%7CC%2C0%7CD%2C1

Sample response (see the description below):

{
  "C":"d-53B8FCED-B,4|C,0|D,1",
  "M":
  [
    {"H":"ChatHub","M":"broadcastMessage","A":["client","test msg1"]},
    {"H":"ChatHub","M":"broadcastMessage","A":["client","test msg2"]},
    {"H":"ChatHub","M":"broadcastMessage","A":["client","qwerty"]}
  ]
}

Persistent Connection Messages

The protocol used for persistent connection is quite simple. Messages sent to the server are just raw strings. There isn’t any specific format they have to be in. The C# client has a convenience Send() method that takes an object that is supposed to be sent to the server but all this method does is just converting the object to JSon and invoke the Send() overload that takes string. Messages sent to the client are more structured. They are JSon strings with a number of properties. Depending on the purpose of the message different properties can be present in the payload or the message may have no properties (KeepAlive messages). The properties you can find in the message are as follows:

C – message id, present for all non-KeepAlive messages

M – an array containing actual data.

{"C":"d-9B7A6976-B,2|C,2","M":["Welcome!"]}

S – indicates that the transport was initialized (a.k.a. init message)

{"C":"s-0,2CDDE7A|1,23ADE88|2,297B01B|3,3997404|4,33239B5","S":1,"M":[]}

G – groups token – an encrypted string representing group membership

{"C":"d-6CD4082D-B,0|C,2|D,0","G":"92OXaCStiSZGy5K83cEEt8aR2ocER=","M":[]}

T – if the value is 1 the client should transition into the reconnecting state and try to reconnect to the server (i.e. send the reconnect request). The server is sending a message with this property set to 1 if it is being shut down or restarted. Applies to the longPolling transport only.

L – the delay between re-establishing poll connections. Applies to the longPolling transport only. Used only by the JavaScript client. Configurable on the server by setting the IConfigurationManager.LongPollDelay property.

{"C":"d-E9D15DD8-B,4|C,0|D,0","L":2000,
  "M":[{"H":"ChatHub","M":"broadcastMessage","A":["C++","msg"]}]}

KeepAlive messages
KeepAlive messages are empty object JSon strings (i.e. {}) and can be used by SignalR clients to detect network problems. SignalR server will send keep alive messages at the configured time interval. If the client has not received any message (including a keep alive message) from the server within a certain period of time it will try to restart the connection. Note that not all the clients currently support restarting connection based on network activity (most notably it is not supported by the SignalR C++ Client). Sending keep alive messages by the server can be turned off by setting the KeepAlive server configuration property to null.

Hubs Messages

Hubs API makes it possible to invoke server methods from the client and client methods from the server. The protocol used for persistent connection is not rich enough to allow expressing RPC (remote procedure call) semantics. It does not mean however that the protocol used for hub connections is completely different from the protocol used for persistent connections. Rather, the protocol used for hub connections is mostly an extension of the protocol for persistent connections.
When a client invokes a server method it no longer sends a free-flow string as it was for persistent connections. Instead it sends a JSon string containing all necessary information needed to invoke the method. Here is a sample message a client would send to invoke a server method:

{"H":"chathub","M":"Send","A":["JS Client","Test message"],"I":0,
  "S":{"customProperty" : "abc"}}

The payload has the following properties:
I – invocation identifier – allows to match up responses with requests
H – the name of the hub
M – the name of the method
A – arguments (an array, can be empty if the method does not have any parameters)
S – state – a dictionary containing additional custom data (optional, currently not supported by the C++ client)

The message sent from the server to the client can be one of the following:

a result of a server method call
an invocation of a client method
a progress message

Server Side Hub Method Invocation Result

When a server method is invoked the server returns a confirmation that the invocation has completed by sending the invocation id to the client and – if the method returned a value – the return value, or – if invoking the method failed – the error. There are two kinds of errors – general errors and a hub errors. In case of a general error the response contains only an error message and the error is turned by the client into a generic exception – the .NET client throws an InvalidOperationException, the C++ client throws a std::runtime_error and the JavaScript client creates an Error with the Exception as the source. Hub errors contain a boolean property set to true to indicate that they are hub errors and they may contain some additional error data. Hub errors are turned into a HubException by the .NET Client, a signalr::hub_exception by the C++ client and the JavaScript client creates an Error with source set to HubException. Here are sample results of a server method call:

{"I":"0"}

A server void method whose invocation identifier was "0" completed successfully.

"{"I":"0", "R":42}

A server method returning a number whose invocation identifier was "0" completed successfully and returned the value 42.

{"I":"0", "E":"Error occurred"}

A server method whose invocation identifier was "0" failed with the error "Error occurred"

{"I":"0","E":"Hub error occurred", "H":true, "D":{"ErrorNumber":42}}

A server method whose invocation identifier was "0" failed with the hub error "Hub error occurred" and sent some additional error data.

Here is the full list of properties that can be present in the result of server method invocation:

I – invocation Id (always present)
R – the value returned by the server method (present if the method is not void)
E – error message
H – true if this is a hub error
D – an object containing additional error data (can only be present for hub errors)
T – stack trace (if detailed error reporting (i.e. the HubConfiguration.EnableDetailedErrors property) is turned on on the server). Note that none of the clients currently propagate the stack trace to the user but if tracing is turned on it will be logged with the message
S – state – a dictionary containing additional custom data (optional, currently not supported by the C++ client)

Client Side Hub Method Invocation

To invoke a client method the server extends the protocol used for persistent connections. The difference is that instead of sending a free flow text in the message portion of the message the server sends a JSon string that contains all the details needed to invoke the method (like the hub and method names and arguments). Here is an example of a message sent by the server to invoke a hub method on the client:

{"C":"d- F430FB19", "M":[{"H":"my_hub", "M":"broadcast", "A":["Hi!", 1]}] }

As you can see the “envelope” in form of message id or message property is the same as for persistent connections. The interesting part from the hub point of view is the value of the M property:

{"H":"my_hub", "M":"broadcast", "A":["Hi!", 1]}

This structure is quite similar to what the client is using to invoke a server hub method (except there is no invocation id since the server does not expect any response to this message).
H – the name of the hub
M – the name of the hub method
A – arguments (an array, can be empty if the method does not have any parameters)
S – state – a dictionary containing additional custom data (optional, currently not supported (ignored) by the C++ client)

Progress Message

The last kind of message sent from the server to the client is a progress message. When a server method is a long running method the server can send the information about the progress of execution of the method to the client. Similarly to the client method invocation the progress information is embedded in the message portion of a persistent connection message. The entire message looks like this:

{"C":"d-5E80A020-A,1|B,0|C,15|D,0", M:[{I:"P|1", "P":{"I":"0", "D":1}}] }

but the progress message itself looks like this:

{I:"P|1", "P":{"I":"0", "D":1}}

The structure containing information about progress contains two properties:
I – kind of an invocation id but prepended with "P|". Used only by older clients.
P – an object containing actual information about progress

The object containing “real” progress information has the following properties:
I – invocation id that tells which invocation this progress message applies to
D – progress data returned by the method

Note that there might be multiple progress messages sent to the client before the server sends the actual result of the invoked method.

Recent Protocol Revisions

1.4 – introduction of the start request
1.5 – requests can now be sent using the POST method. This helps avoid a memory leak when using the longPolling transport in Chrome and IE browsers (bug 2953). Only used by the JS client when with the longPolling transport. Note that the only properties the server checks the request body for are the groupsToken and the messageId

That’s pretty much it. The SignalR protocol is not very complex but the little caveats and exceptions may make the implementation a bit troublesome.

C++ Async Development (not only for) for C# Developers Part IV: Exception handling

Last time we were able to run some tasks asynchronously. Things worked great and it was pretty straightforward. However real life scenarios are not as simple as the ones I used in the previous post. For instance networking environment can be quite hostile – the server you are connecting to may go down for any reason, the connection might get dropped at any time, etc. Any of this condition will typically result in an exception which, if unhandled, will crash your process and bring down your application. C++ async is no different – you can easily check it for yourself by running this code:

pplx::task_from_result()
    .then([]()
    { 
        throw std::exception("test exception"); }
    ).get();

(the “for C# Developers” part – Note that in the .NET Framework 4 UnobservedTaskExceptions would terminate the application. It was later changed in .NET Framework 4.5 where UnobservedTaskExceptions no longer terminate applications (it is still recommended to observe and handle exceptions though). The behavior in C++ async with Casablanca is more in line with the .NET Framework 4 – any unobserved exception will eventually lead to a crash).
You might think that the way to handle this exception is just to wrap this call in a try…catch block. This would work if you blocked (e.g. used .get()) since you would be executing the code synchronously. However if you start a task, it will run on a different thread and the exception will be thrown on this new thread so, not where you are trying to catch it. As a result your app would still crash. The idea is that you have to observe exceptions not where tasks are started but where they are completed (i.e. where you call .get() or .wait()). Using a continuation for exception handling seems like a great choice because continuations run only after the previous task has completed. So, let’s build on the previous code snippet and add a continuation that handles the exception. It would look like this (I am still using .get() at the very end but it is only to prevent the main thread from exiting and terminating the other thread):

pplx::task_from_result()
    .then([]()
    { 
        throw std::exception("test exception"); }
    )
    .then([](pplx::task<void> previous_task)
    {
        try
        {
            previous_task.get();
        }
        catch (const std::exception& e)
        {
            std::cout << e.what() << std::endl;
        }
    }).get();

One very important thing to notice is that the continuation I added takes pplx::task<void> as the parameter. This is a so called “task based continuation” and is different from continuations we have seen so far which took the value returned by the previous task (or did not take any parameter if the previous task was void). The continuations we had worked with before were “value based continuations” (we will come back to value based continuations in the context of exception handling shortly). With task based continuations you don’t receive the result from the previous task but the task itself. Now you are in the business of retrieving the result yielded by this task. As we know from the previous post the way to get the result returned by a task is to call .get() or .wait(). Since exceptions are in a sense also results of executing a task if the task threw calling .get()/.wati() will result in rethrowing this exception. We can then catch it and handle and thus make the exception “observed” so the process will no longer crash. When I first came across this pattern it puzzled me a bit. I thought ‘.get() is blocking and I use async to avoid blocking so isn’t it a contradiction?’. But then I realized that we are already in a continuation so the task has already been completed and .get() is no longer blocking – it merely allows to get the result of the previous task (be it a value or an exception).
Coming back to value based continuations – let’s see what would happen if we added a value based continuation after the continuation that throws but before the continuation that handles this exception – just like this:

pplx::task_from_result()
    .then([]()
    { 
        throw std::exception("test exception"); }
    )
    .then([]()
    {
        std::cout << “calculating The Answer…” << std::endl;
        return 42;
    })
    .then([](pplx::task<int> previous_task)
    {
        try
        {
            std::cout << previous_task.get() << std::endl;
        }
        catch (const std::exception& e)
        {
            std::cout << e.what() << std::endl;
        }
    }).get();

(One thing to notice – since the continuation we inserted now returns int (or actually pplx::task<int> – there are some pretty clever C++ tricks used to allow returning just a value or (just throwing an exception) even though the .then() function ultimately returns a pplx::task<T> or pplx::task<void>) the task valued continuation now has to take a parameter of pplx::task<int> type instead of pplx::task<void> type). If you run the above code the result will be exactly as from the previous example. Why? When a task throws an exception all value based continuations are skipped until a task based continuation is encountered which will be invoked and will have a chance to handle the exception. This is a big and a very important difference between task based and value based continuations. This also makes a lot of sense – something bad happened and in value based continuations you have no way of knowing that it did or what it was since you have no access to the exception. There is also nothing to pass if the previous task would return something were there not for the exception. As a result executing value based continuations if nothing has happened would be plainly wrong.
If you have played a little bit with Casablanca or have seen some more advanced code that is using Casablanca you might have come across the pplx::task_from_exception() function. You might have been wondering why it is needed if you can just throw an exception. Typically tasks are executed on multiple threads and it is very common that an exception thrown in one thread is being observed on a different thread. As a result it is impossible to just unwind the stack when trying to find an exception handler. Rather, the exception has to be saved (which will make the task faulted) and then is re-thrown when the user calls .get() or .wait() to get the result. If you use the .then() function all this happens behind the scenes – you throw an exception from a continuation and the .then() function will catch it and turn into a faulted task which will be passed to the next available task based continuation. However consider the following function:

pplx::task<int> exception_test(bool should_throw)
{
    if (should_throw)
    {
        throw std::exception("bogus exception");
    }

    return pplx::task_from_result<int>(42);
}

If you pass true it will throw an exception, otherwise it will return a value. Note that I cannot just return 42; here because the return type of the function is pplx::task<int> and not int and there is no Casablanca magic involved which could turn my 42 into a task. Therefore I have to use pplx::task_from_result<int>() to return a completed task with the result. Now, let’s try to build a task based continuation that observes the exception we throw – something like this:

exception_test(true)
    .then([](pplx::task<int> previous_task)
    {
        try
        {
            previous_task.get();
        }
        catch (const std::exception& e)
        {
            std::cout << "exception: " << e.what() << std::endl;
        }
    }).get();

If you run this code it will crash. The reason is simple – we just synchronously throw from the exception_test function and no one is handling this exception. Note that we are not able to handle this exception in the continuation since it is never invoked – because there was no handler the exception crashed the application before it got to the .then(). To fix this the exception_test function needs to be modified as follows:

pplx::task<int> exception_test(bool should_throw)
{
    if (should_throw)
    {
        return pplx::task_from_exception<int>(std::exception("bogus exception"));
    }

    return pplx::task_from_result<int>(42);
}

Now instead of throwing an exception we return a faulted task. This task is then passed to our task based continuation which can now handle the exception.

That’s it for today. Next time we will look at cancellation.