312 lines
12 KiB
Markdown
312 lines
12 KiB
Markdown
|
---
|
|||
|
layout: pattern
|
|||
|
title: Circuit Breaker
|
|||
|
folder: circuit-breaker
|
|||
|
permalink: /patterns/circuit-breaker/
|
|||
|
categories: Behavioral
|
|||
|
tags:
|
|||
|
- Performance
|
|||
|
- Decoupling
|
|||
|
- Cloud distributed
|
|||
|
---
|
|||
|
|
|||
|
## 含义
|
|||
|
|
|||
|
以这样的方式(译者:指断路器方式)处理昂贵的远程服务调用,可以防止单个服务/组件的故障导致整个应用程序崩溃,同时我们可以尽快地进行服务重连。
|
|||
|
|
|||
|
## 解释
|
|||
|
|
|||
|
现实世界案例
|
|||
|
|
|||
|
> 设想一下,一个网络应用程序既有本地文件/图像,又有用于获取数据的远程服务。这些远程服务可能在某些时候是健康的、有反应的,也可能在某些时候由于各种原因而变得缓慢和无反应。因此,如果其中一个远程服务速度慢或不能成功响应,我们的应用程序将尝试使用多个线程/进程从远程服务中获取响应,很快所有的线程/进程都会挂起(也称为线程饥饿 thread starvation),从而导致我们整个 Web 应用程序崩溃。我们应该能够检测到这种情况,并向用户显示一个适当的信息,以便用户可以探索应用程序的其他部分,而不受远程服务故障的影响。同时,其他正常工作的服务应该保持运作,不受这次故障的影响。
|
|||
|
|
|||
|
简而言之
|
|||
|
|
|||
|
> 断路器允许优雅地处理失败的远程服务。当我们的应用程序的所有部分都高度解耦时,这种方式的效果会很好,一个组件的失败并不会导致其他部分停止工作。
|
|||
|
|
|||
|
维基百科的解释
|
|||
|
|
|||
|
> 断路器是现代软件开发中使用的一种设计模式。它用于检测故障,并封装了防止故障不断复发的逻辑,在维护期间,临时地处理外部系统故障或意外的系统问题。
|
|||
|
|
|||
|
## Programmatic Example
|
|||
|
|
|||
|
那么,这一切是如何实现的呢?考虑到上面的例子,我们将在一个简单的例子中模拟这个功能。一个监控服务(译者:下图的 Monitoring Service)模拟了网络应用,进行本地和远程调用。
|
|||
|
|
|||
|
该服务架构如下:
|
|||
|
|
|||
|

|
|||
|
|
|||
|
终端用户(译者:上图的 End User)应用的代码如下:
|
|||
|
|
|||
|
```java
|
|||
|
@Slf4j
|
|||
|
public class App {
|
|||
|
|
|||
|
private static final Logger LOGGER = LoggerFactory.getLogger(App.class);
|
|||
|
|
|||
|
/**
|
|||
|
* Program entry point.
|
|||
|
*
|
|||
|
* @param args command line args
|
|||
|
*/
|
|||
|
public static void main(String[] args) {
|
|||
|
|
|||
|
var serverStartTime = System.nanoTime();
|
|||
|
|
|||
|
var delayedService = new DelayedRemoteService(serverStartTime, 5);
|
|||
|
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2,
|
|||
|
2000 * 1000 * 1000);
|
|||
|
|
|||
|
var quickService = new QuickRemoteService();
|
|||
|
var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2,
|
|||
|
2000 * 1000 * 1000);
|
|||
|
|
|||
|
//Create an object of monitoring service which makes both local and remote calls
|
|||
|
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker,
|
|||
|
quickServiceCircuitBreaker);
|
|||
|
|
|||
|
//Fetch response from local resource
|
|||
|
LOGGER.info(monitoringService.localResourceResponse());
|
|||
|
|
|||
|
//Fetch response from delayed service 2 times, to meet the failure threshold
|
|||
|
LOGGER.info(monitoringService.delayedServiceResponse());
|
|||
|
LOGGER.info(monitoringService.delayedServiceResponse());
|
|||
|
|
|||
|
//Fetch current state of delayed service circuit breaker after crossing failure threshold limit
|
|||
|
//which is OPEN now
|
|||
|
LOGGER.info(delayedServiceCircuitBreaker.getState());
|
|||
|
|
|||
|
//Meanwhile, the delayed service is down, fetch response from the healthy quick service
|
|||
|
LOGGER.info(monitoringService.quickServiceResponse());
|
|||
|
LOGGER.info(quickServiceCircuitBreaker.getState());
|
|||
|
|
|||
|
//Wait for the delayed service to become responsive
|
|||
|
try {
|
|||
|
LOGGER.info("Waiting for delayed service to become responsive");
|
|||
|
Thread.sleep(5000);
|
|||
|
} catch (InterruptedException e) {
|
|||
|
e.printStackTrace();
|
|||
|
}
|
|||
|
//Check the state of delayed circuit breaker, should be HALF_OPEN
|
|||
|
LOGGER.info(delayedServiceCircuitBreaker.getState());
|
|||
|
|
|||
|
//Fetch response from delayed service, which should be healthy by now
|
|||
|
LOGGER.info(monitoringService.delayedServiceResponse());
|
|||
|
//As successful response is fetched, it should be CLOSED again.
|
|||
|
LOGGER.info(delayedServiceCircuitBreaker.getState());
|
|||
|
}
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
监控服务代码(译者:上图的 monitoring service):
|
|||
|
|
|||
|
```java
|
|||
|
public class MonitoringService {
|
|||
|
|
|||
|
private final CircuitBreaker delayedService;
|
|||
|
|
|||
|
private final CircuitBreaker quickService;
|
|||
|
|
|||
|
public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) {
|
|||
|
this.delayedService = delayedService;
|
|||
|
this.quickService = quickService;
|
|||
|
}
|
|||
|
|
|||
|
//Assumption: Local service won't fail, no need to wrap it in a circuit breaker logic
|
|||
|
public String localResourceResponse() {
|
|||
|
return "Local Service is working";
|
|||
|
}
|
|||
|
|
|||
|
/**
|
|||
|
* Fetch response from the delayed service (with some simulated startup time).
|
|||
|
*
|
|||
|
* @return response string
|
|||
|
*/
|
|||
|
public String delayedServiceResponse() {
|
|||
|
try {
|
|||
|
return this.delayedService.attemptRequest();
|
|||
|
} catch (RemoteServiceException e) {
|
|||
|
return e.getMessage();
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
/**
|
|||
|
* Fetches response from a healthy service without any failure.
|
|||
|
*
|
|||
|
* @return response string
|
|||
|
*/
|
|||
|
public String quickServiceResponse() {
|
|||
|
try {
|
|||
|
return this.quickService.attemptRequest();
|
|||
|
} catch (RemoteServiceException e) {
|
|||
|
return e.getMessage();
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
```
|
|||
|
可以看出,它直接进行了获取本地资源的调用,但它把对远程(昂贵的)服务的调用包装在一个断路器对象中,这样可以防止出现如下故障:
|
|||
|
|
|||
|
```java
|
|||
|
public class DefaultCircuitBreaker implements CircuitBreaker {
|
|||
|
|
|||
|
private final long timeout;
|
|||
|
private final long retryTimePeriod;
|
|||
|
private final RemoteService service;
|
|||
|
long lastFailureTime;
|
|||
|
private String lastFailureResponse;
|
|||
|
int failureCount;
|
|||
|
private final int failureThreshold;
|
|||
|
private State state;
|
|||
|
private final long futureTime = 1000 * 1000 * 1000 * 1000;
|
|||
|
|
|||
|
/**
|
|||
|
* Constructor to create an instance of Circuit Breaker.
|
|||
|
*
|
|||
|
* @param timeout Timeout for the API request. Not necessary for this simple example
|
|||
|
* @param failureThreshold Number of failures we receive from the depended service before changing
|
|||
|
* state to 'OPEN'
|
|||
|
* @param retryTimePeriod Time period after which a new request is made to remote service for
|
|||
|
* status check.
|
|||
|
*/
|
|||
|
DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold,
|
|||
|
long retryTimePeriod) {
|
|||
|
this.service = serviceToCall;
|
|||
|
// We start in a closed state hoping that everything is fine
|
|||
|
this.state = State.CLOSED;
|
|||
|
this.failureThreshold = failureThreshold;
|
|||
|
// Timeout for the API request.
|
|||
|
// Used to break the calls made to remote resource if it exceeds the limit
|
|||
|
this.timeout = timeout;
|
|||
|
this.retryTimePeriod = retryTimePeriod;
|
|||
|
//An absurd amount of time in future which basically indicates the last failure never happened
|
|||
|
this.lastFailureTime = System.nanoTime() + futureTime;
|
|||
|
this.failureCount = 0;
|
|||
|
}
|
|||
|
|
|||
|
// Reset everything to defaults
|
|||
|
@Override
|
|||
|
public void recordSuccess() {
|
|||
|
this.failureCount = 0;
|
|||
|
this.lastFailureTime = System.nanoTime() + futureTime;
|
|||
|
this.state = State.CLOSED;
|
|||
|
}
|
|||
|
|
|||
|
@Override
|
|||
|
public void recordFailure(String response) {
|
|||
|
failureCount = failureCount + 1;
|
|||
|
this.lastFailureTime = System.nanoTime();
|
|||
|
// Cache the failure response for returning on open state
|
|||
|
this.lastFailureResponse = response;
|
|||
|
}
|
|||
|
|
|||
|
// Evaluate the current state based on failureThreshold, failureCount and lastFailureTime.
|
|||
|
protected void evaluateState() {
|
|||
|
if (failureCount >= failureThreshold) { //Then something is wrong with remote service
|
|||
|
if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) {
|
|||
|
//We have waited long enough and should try checking if service is up
|
|||
|
state = State.HALF_OPEN;
|
|||
|
} else {
|
|||
|
//Service would still probably be down
|
|||
|
state = State.OPEN;
|
|||
|
}
|
|||
|
} else {
|
|||
|
//Everything is working fine
|
|||
|
state = State.CLOSED;
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
@Override
|
|||
|
public String getState() {
|
|||
|
evaluateState();
|
|||
|
return state.name();
|
|||
|
}
|
|||
|
|
|||
|
/**
|
|||
|
* Break the circuit beforehand if it is known service is down Or connect the circuit manually if
|
|||
|
* service comes online before expected.
|
|||
|
*
|
|||
|
* @param state State at which circuit is in
|
|||
|
*/
|
|||
|
@Override
|
|||
|
public void setState(State state) {
|
|||
|
this.state = state;
|
|||
|
switch (state) {
|
|||
|
case OPEN:
|
|||
|
this.failureCount = failureThreshold;
|
|||
|
this.lastFailureTime = System.nanoTime();
|
|||
|
break;
|
|||
|
case HALF_OPEN:
|
|||
|
this.failureCount = failureThreshold;
|
|||
|
this.lastFailureTime = System.nanoTime() - retryTimePeriod;
|
|||
|
break;
|
|||
|
default:
|
|||
|
this.failureCount = 0;
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
/**
|
|||
|
* Executes service call.
|
|||
|
*
|
|||
|
* @return Value from the remote resource, stale response or a custom exception
|
|||
|
*/
|
|||
|
@Override
|
|||
|
public String attemptRequest() throws RemoteServiceException {
|
|||
|
evaluateState();
|
|||
|
if (state == State.OPEN) {
|
|||
|
// return cached response if the circuit is in OPEN state
|
|||
|
return this.lastFailureResponse;
|
|||
|
} else {
|
|||
|
// Make the API request if the circuit is not OPEN
|
|||
|
try {
|
|||
|
//In a real application, this would be run in a thread and the timeout
|
|||
|
//parameter of the circuit breaker would be utilized to know if service
|
|||
|
//is working. Here, we simulate that based on server response itself
|
|||
|
var response = service.call();
|
|||
|
// Yay!! the API responded fine. Let's reset everything.
|
|||
|
recordSuccess();
|
|||
|
return response;
|
|||
|
} catch (RemoteServiceException ex) {
|
|||
|
recordFailure(ex.getMessage());
|
|||
|
throw ex;
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
上述模式是如何防止失败的呢?让我们通过它所实现的这个有限状态机来了解。
|
|||
|
|
|||
|

|
|||
|
|
|||
|
- 我们用 `timeout`(超时)、 `failureThreshold` (失败阈值)、`retryTimePeriod`(重试时间周期) 参数初始化断路器对象 ,用于确定 API 的适应性。
|
|||
|
- 最初,断路器处于 `closed` 关闭状态,没有发生对 API 的远程调用。
|
|||
|
- 每次调用成功,我们就把状态重置为开始时的样子。
|
|||
|
- 如果失败的次数超过了一定的阈值(`failureThreshold`),断路器就会进入 `open` 开启状态,它的作用就像一个开启的电路,阻止远程服务的调用,从而节省资源。
|
|||
|
- 一旦我们超过重试时间周期(`retryTimePeriod`),断路器就会转到 `half-open` 半启用状态,并再次调用远程服务,检查服务是否正常,以便我们可以提供最新的响应内容。如果远程服务调用失败会使断路器回到 `open` 状态,并在重试超时后进行另一次尝试;如果远程服务调用成功则使断路器进入 `closed` 状态,这样一切又开始正常工作。
|
|||
|
|
|||
|
## 类图
|
|||
|
|
|||
|

|
|||
|
|
|||
|
## 适用场景
|
|||
|
|
|||
|
在以下场景下,可以使用断路器模式:
|
|||
|
|
|||
|
- 构建一个高可用的应用程序,某些些服务的失败不会导致整个应用程序的崩溃。
|
|||
|
- 构建一个持续运行(长期在线)的应用程序,以便其组件可以在不完全关闭的情况下进行升级。
|
|||
|
|
|||
|
## 相关模式
|
|||
|
|
|||
|
- [Retry Pattern](https://github.com/iluwatar/java-design-patterns/tree/master/retry)
|
|||
|
|
|||
|
## 现实案例
|
|||
|
|
|||
|
* [Spring Circuit Breaker module](https://spring.io/guides/gs/circuit-breaker)
|
|||
|
* [Netflix Hystrix API](https://github.com/Netflix/Hystrix)
|
|||
|
|
|||
|
## 引用
|
|||
|
|
|||
|
* [Understanding Circuit Breaker Pattern](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42)
|
|||
|
* [Martin Fowler on Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html)
|
|||
|
* [Fault tolerance in a high volume, distributed system](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a)
|
|||
|
* [Circuit Breaker pattern](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)
|