2019-10-15 00:02:19 +05:30
---
layout: pattern
2019-11-16 21:56:40 +02:00
title: Circuit Breaker
2019-10-15 00:02:19 +05:30
folder: circuit-breaker
permalink: /patterns/circuit-breaker/
2019-12-13 21:09:28 +02:00
categories: Behavioral
2021-05-19 10:49:05 -06:00
language: en
2019-10-15 00:02:19 +05:30
tags:
2020-07-26 11:30:42 +03:00
- Performance
- Decoupling
- Cloud distributed
2019-10-15 00:02:19 +05:30
---
## Intent
2020-08-29 12:01:23 +03:00
Handle costly remote service calls in such a way that the failure of a single service/component
cannot bring the whole application down, and we can reconnect to the service as soon as possible.
2019-10-15 00:02:19 +05:30
## Explanation
Real world example
2020-10-10 23:57:53 +05:30
> Imagine a web application that has both local files/images and remote services that are used for
> fetching data. These remote services may be either healthy and responsive at times, or may become
> slow and unresponsive at some point of time due to variety of reasons. So if one of the remote
> services is slow or not responding successfully, our application will try to fetch response from
> the remote service using multiple threads/processes, soon all of them will hang (also called
> [thread starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science))) causing our entire web application to crash. We should be able to detect
> this situation and show the user an appropriate message so that he/she can explore other parts of
> the app unaffected by the remote service failure. Meanwhile, the other services that are working
> normally, should keep functioning unaffected by this failure.
2019-10-15 00:02:19 +05:30
In plain words
2020-08-29 12:01:23 +03:00
> Circuit Breaker allows graceful handling of failed remote services. It's especially useful when
> all parts of our application are highly decoupled from each other, and failure of one component
> doesn't mean the other parts will stop working.
2019-10-15 00:02:19 +05:30
Wikipedia says
2020-08-29 12:01:23 +03:00
> Circuit breaker is a design pattern used in modern software development. It is used to detect
> failures and encapsulates the logic of preventing a failure from constantly recurring, during
> maintenance, temporary external system failure or unexpected system difficulties.
2019-10-15 00:02:19 +05:30
## Programmatic Example
2020-08-29 12:01:23 +03:00
So, how does this all come together? With the above example in mind we will imitate the
functionality in a simple example. A monitoring service mimics the web app and makes both local and
remote calls.
2019-10-15 00:02:19 +05:30
The service architecture is as follows:
2021-01-30 14:44:41 +02:00

2019-10-15 00:02:19 +05:30
2020-08-29 12:01:23 +03:00
In terms of code, the end user application is:
2019-10-15 00:02:19 +05:30
```java
2021-03-13 13:19:21 +01:00
@Slf4j
2019-10-15 00:02:19 +05:30
public class App {
2020-10-01 21:09:39 +05:30
2019-10-15 00:02:19 +05:30
private static final Logger LOGGER = LoggerFactory.getLogger(App.class);
2020-10-01 21:09:39 +05:30
/**
* Program entry point.
*
* @param args command line args
*/
2019-10-15 00:02:19 +05:30
public static void main(String[] args) {
2020-10-01 21:09:39 +05:30
2019-10-15 00:02:19 +05:30
var serverStartTime = System.nanoTime();
2020-10-01 21:09:39 +05:30
var delayedService = new DelayedRemoteService(serverStartTime, 5);
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2,
2000 * 1000 * 1000);
var quickService = new QuickRemoteService();
var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2,
2000 * 1000 * 1000);
//Create an object of monitoring service which makes both local and remote calls
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker,
quickServiceCircuitBreaker);
//Fetch response from local resource
LOGGER.info(monitoringService.localResourceResponse());
//Fetch response from delayed service 2 times, to meet the failure threshold
LOGGER.info(monitoringService.delayedServiceResponse());
LOGGER.info(monitoringService.delayedServiceResponse());
//Fetch current state of delayed service circuit breaker after crossing failure threshold limit
//which is OPEN now
LOGGER.info(delayedServiceCircuitBreaker.getState());
//Meanwhile, the delayed service is down, fetch response from the healthy quick service
LOGGER.info(monitoringService.quickServiceResponse());
LOGGER.info(quickServiceCircuitBreaker.getState());
//Wait for the delayed service to become responsive
try {
2020-10-10 23:57:53 +05:30
LOGGER.info("Waiting for delayed service to become responsive");
2020-10-01 21:09:39 +05:30
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
2019-10-15 00:02:19 +05:30
}
2020-10-01 21:09:39 +05:30
//Check the state of delayed circuit breaker, should be HALF_OPEN
LOGGER.info(delayedServiceCircuitBreaker.getState());
//Fetch response from delayed service, which should be healthy by now
LOGGER.info(monitoringService.delayedServiceResponse());
//As successful response is fetched, it should be CLOSED again.
LOGGER.info(delayedServiceCircuitBreaker.getState());
2019-10-15 00:02:19 +05:30
}
}
```
2020-08-29 12:01:23 +03:00
The monitoring service:
2019-10-15 00:02:19 +05:30
2020-10-01 21:09:39 +05:30
```java
2019-10-15 00:02:19 +05:30
public class MonitoringService {
2020-10-01 21:09:39 +05:30
private final CircuitBreaker delayedService;
private final CircuitBreaker quickService;
public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) {
this.delayedService = delayedService;
this.quickService = quickService;
}
//Assumption: Local service won't fail, no need to wrap it in a circuit breaker logic
2019-10-15 00:02:19 +05:30
public String localResourceResponse() {
return "Local Service is working";
}
2020-10-01 21:09:39 +05:30
/**
* Fetch response from the delayed service (with some simulated startup time).
*
* @return response string
*/
public String delayedServiceResponse() {
2019-10-15 00:02:19 +05:30
try {
2020-10-01 21:09:39 +05:30
return this.delayedService.attemptRequest();
} catch (RemoteServiceException e) {
return e.getMessage();
}
}
/**
* Fetches response from a healthy service without any failure.
*
* @return response string
*/
public String quickServiceResponse() {
try {
return this.quickService.attemptRequest();
} catch (RemoteServiceException e) {
2019-10-15 00:02:19 +05:30
return e.getMessage();
}
}
}
```
2020-08-29 12:01:23 +03:00
As it can be seen, it does the call to get local resources directly, but it wraps the call to
remote (costly) service in a circuit breaker object, which prevents faults as follows:
2019-10-15 00:02:19 +05:30
```java
2020-10-01 21:09:39 +05:30
public class DefaultCircuitBreaker implements CircuitBreaker {
2019-10-15 00:02:19 +05:30
private final long timeout;
private final long retryTimePeriod;
2020-10-01 21:09:39 +05:30
private final RemoteService service;
2019-10-15 00:02:19 +05:30
long lastFailureTime;
2020-10-10 23:57:53 +05:30
private String lastFailureResponse;
2019-10-15 00:02:19 +05:30
int failureCount;
private final int failureThreshold;
private State state;
private final long futureTime = 1000 * 1000 * 1000 * 1000;
2020-10-01 21:09:39 +05:30
/**
* Constructor to create an instance of Circuit Breaker.
*
* @param timeout Timeout for the API request. Not necessary for this simple example
* @param failureThreshold Number of failures we receive from the depended service before changing
* state to 'OPEN'
* @param retryTimePeriod Time period after which a new request is made to remote service for
* status check.
*/
DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold,
long retryTimePeriod) {
this.service = serviceToCall;
// We start in a closed state hoping that everything is fine
2019-10-15 00:02:19 +05:30
this.state = State.CLOSED;
this.failureThreshold = failureThreshold;
2020-10-01 21:09:39 +05:30
// Timeout for the API request.
// Used to break the calls made to remote resource if it exceeds the limit
2019-10-15 00:02:19 +05:30
this.timeout = timeout;
this.retryTimePeriod = retryTimePeriod;
2020-10-01 21:09:39 +05:30
//An absurd amount of time in future which basically indicates the last failure never happened
2019-10-15 00:02:19 +05:30
this.lastFailureTime = System.nanoTime() + futureTime;
this.failureCount = 0;
}
2020-10-01 21:09:39 +05:30
2020-10-10 23:57:53 +05:30
// Reset everything to defaults
2020-10-01 21:09:39 +05:30
@Override
public void recordSuccess() {
2019-10-15 00:02:19 +05:30
this.failureCount = 0;
2020-10-01 21:09:39 +05:30
this.lastFailureTime = System.nanoTime() + futureTime;
2019-10-15 00:02:19 +05:30
this.state = State.CLOSED;
}
2020-10-01 21:09:39 +05:30
@Override
2020-10-10 23:57:53 +05:30
public void recordFailure(String response) {
2019-10-15 00:02:19 +05:30
failureCount = failureCount + 1;
this.lastFailureTime = System.nanoTime();
2020-10-10 23:57:53 +05:30
// Cache the failure response for returning on open state
this.lastFailureResponse = response;
2019-10-15 00:02:19 +05:30
}
2020-10-01 21:09:39 +05:30
2020-10-10 23:57:53 +05:30
// Evaluate the current state based on failureThreshold, failureCount and lastFailureTime.
2020-10-01 21:09:39 +05:30
protected void evaluateState() {
if (failureCount >= failureThreshold) { //Then something is wrong with remote service
2019-10-15 00:02:19 +05:30
if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) {
2020-10-01 21:09:39 +05:30
//We have waited long enough and should try checking if service is up
2019-10-15 00:02:19 +05:30
state = State.HALF_OPEN;
} else {
2020-10-01 21:09:39 +05:30
//Service would still probably be down
2019-10-15 00:02:19 +05:30
state = State.OPEN;
}
} else {
2020-10-01 21:09:39 +05:30
//Everything is working fine
2019-10-15 00:02:19 +05:30
state = State.CLOSED;
}
}
2020-10-01 21:09:39 +05:30
@Override
2019-10-15 00:02:19 +05:30
public String getState() {
2020-10-01 21:09:39 +05:30
evaluateState();
2019-10-15 00:02:19 +05:30
return state.name();
}
2020-10-01 21:09:39 +05:30
/**
* Break the circuit beforehand if it is known service is down Or connect the circuit manually if
* service comes online before expected.
*
* @param state State at which circuit is in
*/
@Override
public void setState(State state) {
2019-10-15 00:02:19 +05:30
this.state = state;
2020-10-01 21:09:39 +05:30
switch (state) {
case OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime();
break;
case HALF_OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime() - retryTimePeriod;
break;
default:
this.failureCount = 0;
}
2019-10-15 00:02:19 +05:30
}
2020-10-01 21:09:39 +05:30
/**
* Executes service call.
*
* @return Value from the remote resource, stale response or a custom exception
*/
@Override
public String attemptRequest() throws RemoteServiceException {
evaluateState();
2019-10-15 00:02:19 +05:30
if (state == State.OPEN) {
2020-10-10 23:57:53 +05:30
// return cached response if the circuit is in OPEN state
return this.lastFailureResponse;
2019-10-15 00:02:19 +05:30
} else {
2020-10-01 21:09:39 +05:30
// Make the API request if the circuit is not OPEN
try {
//In a real application, this would be run in a thread and the timeout
//parameter of the circuit breaker would be utilized to know if service
//is working. Here, we simulate that based on server response itself
var response = service.call();
// Yay!! the API responded fine. Let's reset everything.
recordSuccess();
return response;
} catch (RemoteServiceException ex) {
2020-10-10 23:57:53 +05:30
recordFailure(ex.getMessage());
2020-10-01 21:09:39 +05:30
throw ex;
2019-10-15 00:02:19 +05:30
}
}
}
}
```
2020-08-29 12:01:23 +03:00
How does the above pattern prevent failures? Let's understand via this finite state machine
implemented by it.
2019-10-15 00:02:19 +05:30
2021-01-30 14:44:41 +02:00

2019-10-15 00:02:19 +05:30
2020-08-29 12:01:23 +03:00
- We initialize the Circuit Breaker object with certain parameters: `timeout` , `failureThreshold` and `retryTimePeriod` which help determine how resilient the API is.
- Initially, we are in the `closed` state and nos remote calls to the API have occurred.
2019-10-15 00:02:19 +05:30
- Every time the call succeeds, we reset the state to as it was in the beginning.
2020-08-29 12:01:23 +03:00
- If the number of failures cross a certain threshold, we move to the `open` state, which acts just like an open circuit and prevents remote service calls from being made, thus saving resources. (Here, we return the response called ```stale response from API` ``)
- Once we exceed the retry timeout period, we move to the `half-open` state and make another call to the remote service again to check if the service is working so that we can serve fresh content. A failure sets it back to `open` state and another attempt is made after retry timeout period, while a success sets it to `closed` state so that everything starts working normally again.
2019-10-15 00:02:19 +05:30
2019-12-07 20:01:13 +02:00
## Class diagram
2020-08-29 12:01:23 +03:00
2019-12-07 20:01:13 +02:00

2019-10-15 00:02:19 +05:30
## Applicability
2020-08-29 12:01:23 +03:00
2019-10-15 00:02:19 +05:30
Use the Circuit Breaker pattern when
- Building a fault-tolerant application where failure of some services shouldn't bring the entire application down.
2020-08-29 12:01:23 +03:00
- Building a continuously running (always-on) application, so that its components can be upgraded without shutting it down entirely.
2019-10-15 00:02:19 +05:30
## Related Patterns
- [Retry Pattern ](https://github.com/iluwatar/java-design-patterns/tree/master/retry )
## Real world examples
2019-12-13 21:09:28 +02:00
2019-10-15 00:02:19 +05:30
* [Spring Circuit Breaker module ](https://spring.io/guides/gs/circuit-breaker )
* [Netflix Hystrix API ](https://github.com/Netflix/Hystrix )
## Credits
* [Understanding Circuit Breaker Pattern ](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42 )
* [Martin Fowler on Circuit Breaker ](https://martinfowler.com/bliki/CircuitBreaker.html )
* [Fault tolerance in a high volume, distributed system ](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a )
2020-07-26 11:30:42 +03:00
* [Circuit Breaker pattern ](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker )