Retry and circuit-breaker patterns are the 2 most common approaches when coding for resiliency.
There’s a ton of other articles already written that go into the nitty gritty details of each approach, so I’ll just reference some of them at the end of this article. But to establish some context for this article, here’s a quick overview:
Retry: If something goes wrong, try repeating the same operation again x number of times before giving up.
Generally, this approach is used when you have a flaky dependency which you have no control over. Examples of this might be your service calling a third-party API.
Circuit-breaker: If something goes wrong, hit the panic button that prevents any further attempts to repeat the operation.
This is typically used when you have an extremely unreliable dependency. In this case, we want to stop calling it altogether, as additional attempts to call it might worsen the situation. An example of this might be an overloaded database.
I decided to write this article because I keep coming across .NET code where both approaches have been written manually — i.e. using loops for retries and locks for circuit breakers (or variations of these).
FYI — there’s no need to write this stuff manually!
Just use Polly. It’s a mature library which is almost synonymous with app resiliency, in the same way that Newtonsoft.Json is the defacto library for JSON (de)serialization.
Solution Layout
For this article, I’ve written a simple example API to demonstrate how to use Polly to implement both retry and circuit-breaker policies.
The full solution is available on GitHub here:
Here’s what the project looks like:
The API is pretty simple, with only 2 routes:
- /message/hello
- /message/goodbye
I’ve followed a standard repository pattern. The dependency chain looks like this:
MessageController -> MessageService -> MessageRepository
For simplicity, MessageRepository
is just reading configuration from appsettings.json
. In a real world example, this could be making an API call, or reading from a database or some other data source.
I’ve chosen to implement retry and circuit-breaker policies in MessageService
(the service layer) because it works well for demo purposes, but in the real-world you would probably do this at the repository layer.
Simulating a Faulty Dependency
To make the example easy to demonstrate, I’ve written MessageRepository
so that it throws an exception 50% of the time.
Using Polly
Let’s take a look at the MessageService
class, where I’m using Polly to do both retries and circuit-breaker patterns.
We’ll then go through each section in detail.
Retry Policies
A retry policy is typically created using the following signature:
Policy
.Handle<Exception>()
.WaitAndRetryAsync(int retryCount, Func<int, Timespan> sleepDurationProvider)
.Handle<Exception>
: Specifies the type of exceptions the policy can handle. SpecifyingException
means the policy will apply for allException
types. In real-world scenarios, you probably want to be more selective of what exception types you implement the policy for.WaitAndRetryAsync(int retryCount, Func<int, Timespan>
:
TheretryCount
is obviously how many times you want the policy to retry.Func<int, Timespan>
is a delegate which determines how long to wait before retrying. This is a pretty clever implementation, as it allows us to customize whether we wait the same amount of time, or implement exponential waits.
In my example, I’ve implemented exponential wait times. I also write a Console log for debugging purposes:
.WaitAndRetryAsync(2, retryAttempt => {
// Exponential wait time
var timeToWait = TimeSpan.FromSeconds(Math.Pow(2, retryAttempt));
Console.WriteLine($"Waiting {timeToWait.TotalSeconds} seconds");
return timeToWait;
}
);
But you could also just do a flat value like this:
.WaitAndRetryAsync(2, retryAttempt => {
Console.WriteLine($"Attempt {retryAttempt}. Waiting 10 seconds");
return TimeSpan.FromSeconds(10); // Wait 10 seconds
}
);
You can then use the RetryPolicy
to execute on an action using the ExecuteAsync
method. In my example, I’m executing the GetHelloMessage()
method of MessageRepository.
public async Task<string> GetHelloMessage()
{
return await _retryPolicy.ExecuteAsync<string>(async () => await _messageRepository.GetHelloMessage());
}
This is what happens when the MessageRepository throws an exception:
Circuit-breaker Policies
Circuit-breaker policies are typically created using this signature:
Policy
.Handle<Exception>()
.CircuitBreakerAsync(
int exceptionsAllowedBeforeBreaking,
TimeSpan durationOfBreak,
Action<Exception, TimeSpan> onBreak,
Action onReset);
.Handle<Exception>
: Same as with Retry policies. This specifies the type of exceptions the policy can handle.CircuitBreakerAsync(…)
:int exceptionsAllowedBeforeBreaking
specifies how many exception in a row will trigger a circuit break.TimeSpan durationOfBreak
specifies how long the circuit will remain broken.Action<Exception, TimeSpan> onBreak
is a delegate which allows you to perform some action (typically this is used for logging) when the circuit is broken.Action onReset
is a delegate which allows you to perform some action (again, typically for logging) when the circuit is reset.
In my example, I create the circuit-breaker policy which kicks in after 1 failure. Obviously this is just for demonstration purposes. In real-world scenarios, this will vary based on the service you’re attempting to call.
_circuitBreakerPolicy = Policy.Handle<Exception>()
.CircuitBreakerAsync(1, TimeSpan.FromMinutes(1),
(ex, t) =>
{
Console.WriteLine("Circuit broken!");
},
() =>
{
Console.WriteLine("Circuit Reset!");
});
I then implement it like this:
public async Task<string> GetGoodbyeMessage()
{
try
{
Console.WriteLine($"Circuit State: {_circuitBreakerPolicy.CircuitState}");
return await _circuitBreakerPolicy.ExecuteAsync<string>(async () =>
{
return await _messageRepository.GetGoodbyeMessage();
});
}
catch (Exception ex)
{
return ex.Message;
}
}
Notice how I’ve wrapped the execution in a try-catch block?
This is because when a circuit-breaker policy is in a broken state, any further attempts to execute the action will automatically throw a BrokenCircuitException
. Rather than properly handling the error, I just pass the error message back as the return value — again, this is just for demonstration purposes so you get to see what the error looks like.
This is what happens when a circuit-breaker kicks in:
From the consumer’s point of view, there’s major service impact (it’s down!). However, we’ve protected the failing service by giving it some breathing room to recover.
If the failure scenario is occurring due to load issues (e.g. a database is trying to handle too many operations), a circuit-breaker policy will prevent the situation from getting worse, whereas a retry policy would almost certainly have cause a bigger problem.
Closing Thoughts
Its almost a certainty that services we depend on will fail at some point.
With this in mind, any services we write should themselves be written to handle failure scenarios — either to lessen the impact to our consumers, or to protect other services which may be having problems.
Polly is a perfect library for this. It’s open-source, easy to use and does what it’s supposed to. Use it!