--- layout: pattern title: Circuit Breaker folder: circuit-breaker permalink: /patterns/circuit-breaker/ categories: Behavioral language: en tags: - Performance - Decoupling - Cloud distributed --- ## Intent Handle costly remote service calls in such a way that the failure of a single service/component cannot bring the whole application down, and we can reconnect to the service as soon as possible. ## Explanation Real world example > Imagine a web application that has both local files/images and remote services that are used for > fetching data. These remote services may be either healthy and responsive at times, or may become > slow and unresponsive at some point of time due to variety of reasons. So if one of the remote > services is slow or not responding successfully, our application will try to fetch response from > the remote service using multiple threads/processes, soon all of them will hang (also called > [thread starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science))) causing our entire web application to crash. We should be able to detect > this situation and show the user an appropriate message so that he/she can explore other parts of > the app unaffected by the remote service failure. Meanwhile, the other services that are working > normally, should keep functioning unaffected by this failure. In plain words > Circuit Breaker allows graceful handling of failed remote services. It's especially useful when > all parts of our application are highly decoupled from each other, and failure of one component > doesn't mean the other parts will stop working. Wikipedia says > Circuit breaker is a design pattern used in modern software development. It is used to detect > failures and encapsulates the logic of preventing a failure from constantly recurring, during > maintenance, temporary external system failure or unexpected system difficulties. ## Programmatic Example So, how does this all come together? With the above example in mind we will imitate the functionality in a simple example. A monitoring service mimics the web app and makes both local and remote calls. The service architecture is as follows: ![alt text](./etc/ServiceDiagram.png "Service Diagram") In terms of code, the end user application is: ```java @Slf4j public class App { private static final Logger LOGGER = LoggerFactory.getLogger(App.class); /** * Program entry point. * * @param args command line args */ public static void main(String[] args) { var serverStartTime = System.nanoTime(); var delayedService = new DelayedRemoteService(serverStartTime, 5); var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2, 2000 * 1000 * 1000); var quickService = new QuickRemoteService(); var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2, 2000 * 1000 * 1000); //Create an object of monitoring service which makes both local and remote calls var monitoringService = new MonitoringService(delayedServiceCircuitBreaker, quickServiceCircuitBreaker); //Fetch response from local resource LOGGER.info(monitoringService.localResourceResponse()); //Fetch response from delayed service 2 times, to meet the failure threshold LOGGER.info(monitoringService.delayedServiceResponse()); LOGGER.info(monitoringService.delayedServiceResponse()); //Fetch current state of delayed service circuit breaker after crossing failure threshold limit //which is OPEN now LOGGER.info(delayedServiceCircuitBreaker.getState()); //Meanwhile, the delayed service is down, fetch response from the healthy quick service LOGGER.info(monitoringService.quickServiceResponse()); LOGGER.info(quickServiceCircuitBreaker.getState()); //Wait for the delayed service to become responsive try { LOGGER.info("Waiting for delayed service to become responsive"); Thread.sleep(5000); } catch (InterruptedException e) { e.printStackTrace(); } //Check the state of delayed circuit breaker, should be HALF_OPEN LOGGER.info(delayedServiceCircuitBreaker.getState()); //Fetch response from delayed service, which should be healthy by now LOGGER.info(monitoringService.delayedServiceResponse()); //As successful response is fetched, it should be CLOSED again. LOGGER.info(delayedServiceCircuitBreaker.getState()); } } ``` The monitoring service: ```java public class MonitoringService { private final CircuitBreaker delayedService; private final CircuitBreaker quickService; public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) { this.delayedService = delayedService; this.quickService = quickService; } //Assumption: Local service won't fail, no need to wrap it in a circuit breaker logic public String localResourceResponse() { return "Local Service is working"; } /** * Fetch response from the delayed service (with some simulated startup time). * * @return response string */ public String delayedServiceResponse() { try { return this.delayedService.attemptRequest(); } catch (RemoteServiceException e) { return e.getMessage(); } } /** * Fetches response from a healthy service without any failure. * * @return response string */ public String quickServiceResponse() { try { return this.quickService.attemptRequest(); } catch (RemoteServiceException e) { return e.getMessage(); } } } ``` As it can be seen, it does the call to get local resources directly, but it wraps the call to remote (costly) service in a circuit breaker object, which prevents faults as follows: ```java public class DefaultCircuitBreaker implements CircuitBreaker { private final long timeout; private final long retryTimePeriod; private final RemoteService service; long lastFailureTime; private String lastFailureResponse; int failureCount; private final int failureThreshold; private State state; private final long futureTime = 1000 * 1000 * 1000 * 1000; /** * Constructor to create an instance of Circuit Breaker. * * @param timeout Timeout for the API request. Not necessary for this simple example * @param failureThreshold Number of failures we receive from the depended service before changing * state to 'OPEN' * @param retryTimePeriod Time period after which a new request is made to remote service for * status check. */ DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold, long retryTimePeriod) { this.service = serviceToCall; // We start in a closed state hoping that everything is fine this.state = State.CLOSED; this.failureThreshold = failureThreshold; // Timeout for the API request. // Used to break the calls made to remote resource if it exceeds the limit this.timeout = timeout; this.retryTimePeriod = retryTimePeriod; //An absurd amount of time in future which basically indicates the last failure never happened this.lastFailureTime = System.nanoTime() + futureTime; this.failureCount = 0; } // Reset everything to defaults @Override public void recordSuccess() { this.failureCount = 0; this.lastFailureTime = System.nanoTime() + futureTime; this.state = State.CLOSED; } @Override public void recordFailure(String response) { failureCount = failureCount + 1; this.lastFailureTime = System.nanoTime(); // Cache the failure response for returning on open state this.lastFailureResponse = response; } // Evaluate the current state based on failureThreshold, failureCount and lastFailureTime. protected void evaluateState() { if (failureCount >= failureThreshold) { //Then something is wrong with remote service if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) { //We have waited long enough and should try checking if service is up state = State.HALF_OPEN; } else { //Service would still probably be down state = State.OPEN; } } else { //Everything is working fine state = State.CLOSED; } } @Override public String getState() { evaluateState(); return state.name(); } /** * Break the circuit beforehand if it is known service is down Or connect the circuit manually if * service comes online before expected. * * @param state State at which circuit is in */ @Override public void setState(State state) { this.state = state; switch (state) { case OPEN: this.failureCount = failureThreshold; this.lastFailureTime = System.nanoTime(); break; case HALF_OPEN: this.failureCount = failureThreshold; this.lastFailureTime = System.nanoTime() - retryTimePeriod; break; default: this.failureCount = 0; } } /** * Executes service call. * * @return Value from the remote resource, stale response or a custom exception */ @Override public String attemptRequest() throws RemoteServiceException { evaluateState(); if (state == State.OPEN) { // return cached response if the circuit is in OPEN state return this.lastFailureResponse; } else { // Make the API request if the circuit is not OPEN try { //In a real application, this would be run in a thread and the timeout //parameter of the circuit breaker would be utilized to know if service //is working. Here, we simulate that based on server response itself var response = service.call(); // Yay!! the API responded fine. Let's reset everything. recordSuccess(); return response; } catch (RemoteServiceException ex) { recordFailure(ex.getMessage()); throw ex; } } } } ``` How does the above pattern prevent failures? Let's understand via this finite state machine implemented by it. ![alt text](./etc/StateDiagram.png "State Diagram") - We initialize the Circuit Breaker object with certain parameters: `timeout`, `failureThreshold` and `retryTimePeriod` which help determine how resilient the API is. - Initially, we are in the `closed` state and nos remote calls to the API have occurred. - Every time the call succeeds, we reset the state to as it was in the beginning. - If the number of failures cross a certain threshold, we move to the `open` state, which acts just like an open circuit and prevents remote service calls from being made, thus saving resources. (Here, we return the response called ```stale response from API```) - Once we exceed the retry timeout period, we move to the `half-open` state and make another call to the remote service again to check if the service is working so that we can serve fresh content. A failure sets it back to `open` state and another attempt is made after retry timeout period, while a success sets it to `closed` state so that everything starts working normally again. ## Class diagram ![alt text](./etc/circuit-breaker.urm.png "Circuit Breaker class diagram") ## Applicability Use the Circuit Breaker pattern when - Building a fault-tolerant application where failure of some services shouldn't bring the entire application down. - Building a continuously running (always-on) application, so that its components can be upgraded without shutting it down entirely. ## Related Patterns - [Retry Pattern](https://github.com/iluwatar/java-design-patterns/tree/master/retry) ## Real world examples * [Spring Circuit Breaker module](https://spring.io/guides/gs/circuit-breaker) * [Netflix Hystrix API](https://github.com/Netflix/Hystrix) ## Credits * [Understanding Circuit Breaker Pattern](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42) * [Martin Fowler on Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html) * [Fault tolerance in a high volume, distributed system](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a) * [Circuit Breaker pattern](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)