Handling Transient Failures with Polly in .NET

Explore how to use Polly with IHttpClientFactory to handle transient failures in .NET applications using built-in resilience features such as retry policies with jitter backoff.

Introduction to Polly

Polly is a powerful resilience and transient-fault-handling library for .NET. It allows developers to implement fault-handling strategies like retries, circuit breakers, and fallback mechanisms. These strategies help in building robust applications by managing transient failures that occur intermittently due to network issues, database timeouts, or external service disruptions.

Why Use Polly with IHttpClientFactory?

In distributed systems or cloud environments, transient failures are inevitable. Using Polly with IHttpClientFactory in ASP.NET Core allows you to create resilient HTTP clients that can automatically retry failed operations with advanced backoff strategies. A key feature of Polly is its support for jitter in retry policies, which helps prevent retry storms.

Benefits of Using Polly with IHttpClientFactory:

Centralized HttpClient Configuration: With IHttpClientFactory, you can easily manage your HttpClient instances and apply consistent resilience policies.
Retry Policies with Jitter: Jitter adds randomness to retry intervals, preventing simultaneous retries across multiple clients that could overload your system.
Improved Application Stability: By handling transient failures automatically, your application remains more reliable under various failure conditions.

Implementing Polly’s Built-in Jitter Backoff

Polly provides a built-in decorrelated jitter backoff mechanism through its Backoff class, which helps spread out retries more evenly. This prevents all clients from retrying at the same time, thereby avoiding the “thundering herd” problem.

Here’s how to configure Polly with built-in jitter backoff using IHttpClientFactory in an ASP.NET Core application.

Example: Configuring Polly with IHttpClientFactory

public void ConfigureServices(IServiceCollection services)
{
    services.AddHttpClient("MyHttpClient")
        .AddPolicyHandler(GetRetryPolicy()); // Attach the retry policy to HttpClient
}

private IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()  // Handle HTTP 408, 429 and 5xx
        .OrResult(msg => msg.StatusCode == System.Net.HttpStatusCode.NotFound)  // Handle 404 responses
        .WaitAndRetryAsync(
            Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromSeconds(1), retryCount: 3),
            onRetry: (outcome, timespan, retryAttempt, context) =>
            {
                Console.WriteLine($"Retry {retryAttempt} encountered an error. Waiting {timespan} before retrying.");
            });
}

Explanation of Polly Configuration

Retry Policy with Decorrelated Jitter Backoff: Polly’s built-in Backoff.DecorrelatedJitterBackoffV2 method applies jitter to the retry intervals. The medianFirstRetryDelay specifies the base delay (1 second in this example), and Polly will introduce randomness to spread out retries. The policy retries up to 3 times.
Handle Transient HTTP Errors: The policy handles transient HTTP errors (like 408, 429 and 5xx) and can also handle specific status codes (like 404 in this case).
Policy Handler: The AddPolicyHandler() method attaches Polly’s retry policy to the HttpClient instance, ensuring the retries follow the jitter backoff pattern. This approach spreads retry attempts over varying intervals, reducing the risk of overloading the server with simultaneous retries.

Benefits of Jitter in Retry Policies

Preventing Retry Storms

When multiple clients experience transient failures, they may all retry at the same intervals, leading to a “retry storm” where the system gets overwhelmed with repeated requests. Polly’s built-in decorrelated jitter backoff solves this by introducing randomness into the retry intervals. This ensures that retries are staggered, preventing synchronized retries from causing further strain on your application.

Smoother Load Distribution

By adding jitter, the retries are spread out over a range of time, leading to more even load distribution. This smoothens traffic spikes and helps the system recover more gracefully from transient failures.

Optimized Performance

The ability to handle transient errors without crashing the system or overwhelming services makes Polly’s jitter backoff essential for performance in high-demand, cloud-based, or distributed environments.

Combining Polly Policies

In addition to retry policies with jitter, Polly allows you to combine multiple resilience strategies, such as circuit breakers, timeouts, and bulkhead isolation. For example, you can combine a retry policy with a circuit breaker to stop excessive retries after repeated failures:

public void ConfigureServices(IServiceCollection services)
{
    services.AddHttpClient("MyHttpClient")
        .AddPolicyHandler(GetRetryPolicy())
        .AddPolicyHandler(GetCircuitBreakerPolicy());
}

private IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .WaitAndRetryAsync(3, retryAttempt => 
            Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(1), retryAttempt)
        );
}

private IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return Policy
        .Handle<HttpRequestException>()
        .CircuitBreakerAsync(2, TimeSpan.FromMinutes(1));
}

Explanation of Combined Policies

Retry Policy with Jitter: This retry policy handles transient errors and spreads retries using the jitter backoff pattern.
Circuit Breaker Policy: The circuit breaker prevents repeated retries by breaking the circuit after two consecutive failures and pausing requests for one minute.

Conclusion

Polly, combined with IHttpClientFactory, offers a powerful way to handle transient failures in .NET applications. By leveraging Polly’s decorrelated jitter backoff, you can ensure that retries are spaced out unpredictably, preventing retry storms and reducing system overload. With additional features like circuit breakers, timeouts, and bulkheads, Polly allows you to build resilient, reliable, and high-performing distributed systems.

By configuring Polly within ASP.NET Core, you centralize resilience policies for your HTTP clients, ensuring that transient failures are handled automatically and intelligently across your entire application.