By Anthony Brown

In this article, you will learn about how Akka.NET handles the sending of messages, and how to create systems that allows one to reliably guarantee that messages are delivered.

Save 37% off Reactive Applications with Akka.NET with code fccbrown.

 

 

Failure Handling

A modern application sits on top of several other layers of infrastructure, including the likes of the CPU internals, the operating system (which is potentially dependent upon the role of the application), and a network connecting multiple machines. Problems can arise from having this supporting infrastructure, which is capable of inducing failures. Akka.NET handles the sending of messages without any guarantees of the likelihood of failure.

Whilst, the underlying delivery guarantees of Akka.NET can’t be changed, given an “at most once” delivery guarantee we can turn this into an “at least once” delivery guarantee. To create systems that allow us to reliably send messages, we need a system that guarantees the delivery of messages. In this article, we’ll look at how we can build a tool that allows us to communicate freely without worrying about the underlying communication layer.

To create such a delivery system, we’ll need an API capable of repatedly (re)trying to send a message until an acknowledgement informs it that the task has been completed. Many instances exist where the work can be expressed through an actor, and this scenario is no different. To repeatedly attempt to deliver our message to a target, we can create an actor which is responsible for sending the message and attempting to receive an acknowledgement. In the event it doesn’t receive a response within a fixed timeout, it automatically resends the message, repeating the process until it receives an acknowledgement or it reaches the configured maximum attempts without acknowledgement.

We’ll first consider the states that our message delivery actor can exist in. The main state is: awaiting a response. In this state, we’re capable of accepting one of two messages, either an aknowledgment from the intended recipient notifying us that it’s successfully received the message, or a timeout message, informing it to send another message. If it receives an acknowledgement message, then it should shut down. If it fails to receive any acknowledgements across several timeouts, it’ll respond to the original sender with a message informing of the failure to send the message to the intended target. We can see these states and associated transitions in the diagram below.


Figure 1 – The state machine for an actor providing at least once delivery semantics is simple, requiring only 2 states with 2 possible events


The first thing to consider is the messages which we’ll be sending between the actors. As we can see in the diagram, we’ve got several events which cause state transitions, notably the Acknowledgement and ReceiveTimeout events. These two messages allow us to see most of the state transitions. We also have Success and Failure messages, provided by the framework, allowing our guaranteed delivery actor to inform the original sender of whether it successfully delivered the message.

Due to the high potential for failures in asynchronous systems, Akka.NET provides an API as part of the actor, which allows us to specify, that after a certain period of inactivity, a ReceiveTimeout message is automatically sent to an actor. This then provides us with a means of developing actors which can respond to situations where the actor needs to be aware of how long it’s been since it last received a message.

We now need a message that allows us to tell the repeated sender that the target has successfully received the message and it should stop processing messages. To do this we can define a simple class used to inform the target that it’s successfully received a message. In the following code snippet, we define such a class called “Ack”, which doesn’t need to hold any additional data associated with the received message.

We’ve now got the messages we use as state transitions in our state machine; from here we need to implement each of the states. As is the case with the actor model, we’ll define an actor which can do this resending work, allowing the original sender to be freed up to process other work as we attempt to communicate with a target. When we create the actor, we’ll need to take into account the destination of the message, the message we want to send, the maximum number of retries we should attempt, and the timeout between retries. Our actor is is designed to send one message and then stop, and we’ll rely on the use of the constructor to pass these arguments. In the code example below, we can see the initial actor definition with the constructor. The first thing you might notice is the decision to take in a simple actor selection for the target, rather than a direct reference to an actor in the form of an IActorRef. The difference between an IActorRef and an ActorSelection is that the IActorRef contains information about the incarnation of an actor, and if it restarts, this reference will change. In this scenario, we want to guarantee that we can send the message to the target regardless of any failures that happen, such as a network failure. By using an IActorRef, we wouldn’t be accounting for the possibility that the given actor incarnation might fail and restart. In the constructor, you’ll also notice that the use of the SetReceiveTimeout method tells the framework that the actor should receive a ReceiveTimeout message after the period specified in the timeout.

This state machine has only one core state; we don’t need to use any of the finite state machine features and can use a basic actor instead. We’ve seen in the state transition diagram how the two core events need to be handled: we will receive either a ReceiveTimeout message or an Ack message. If we receive a ReceiveTimeout message, then we need to do two things. First, we need to check whether we’ve reached the maximum number of retries for sending the message. If we have, we need to notify the original sender that we have failed to send the message, cancel the message receive timeout messages, and shut the actor down. If we have retries remaining, we can attempt to send the message to the target again and increment the retries counter. This means we have a receive handler for the timeout messages which looks like the code example below.

We also need to deal with the case in which we receive an Ack message in response from the target. In this case, we need to inform the sender that we’ve successfully sent the requested message, and we need to cancel the ReceiveTimeout messages before finally shutting down the actor. Once again, this leads to a receive handler that we can see in the code example below.

Finally, having completed such a system, we need to deal with the guaranteed delivery at the receiving end. Ultimately, we need to tell the actor that it has contacted the required target, and we need to send the Ack message to the Sender of the message.

It’s important to consider the operations that the actor undertakes upon receiving a message sent using our actor. In this case, there’s the possibility that our target could end up processing the same message multiple times over multiple timeouts if it doesn’t send its acknowledgment back within the timeout period. To get around this problem we need to either filter out messages which the target has already processed (potentially by passing a unique identifier a message), and then store a set of processed messages within the actor. Alternatively, we could choose to design our target actor to receive the same message twice, leading to the same outcome – this property is known as idempotence.

We can define an actor that provides us with a means of a best effort, at least once delivery guarantee, and will repeatedly attempt to send a message to a given target.

Hopefully you have learned a good bit about building message system with Akka.NET from this article, for more, download the free first chapter of Reactive Applications with Akka.NET. Don’t forget to save 37% with code fccbrown.