2021-05-07 01:57:16 +08:00
---
layout: pattern
title: Circuit Breaker
folder: circuit-breaker
permalink: /patterns/circuit-breaker/
categories: Behavioral
2021-08-01 22:55:54 +08:00
language: zh
2021-05-07 01:57:16 +08:00
tags:
- Performance
- Decoupling
- Cloud distributed
---
2021-08-01 22:55:54 +08:00
## 意图
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
以这样一种方式处理昂贵的远程服务调用,即单个服务/组件的故障不会导致整个应用程序宕机,我们可以尽快重新连接到服务。
2021-05-07 01:57:16 +08:00
## 解释
2021-08-01 22:55:54 +08:00
真实世界例子
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
> 想象一个 Web 应用程序,它同时具有用于获取数据的本地文件/图像和远程服务。 这些远程服务有时可能健康且响应迅速,或者由于各种原因可能在某 个时间点变得缓慢和无响应。因此,如果其中一个远程服务缓慢或未成功响应,我们的应用程序将尝试使用多个线程/进程从远程服务获取响应,很快它们都会挂起(也称为 [线程饥饿][thread starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science)))导致我们的整个 Web 应用程序崩溃。我们应该能够检测到这种情况并向用户显示适当的消息,以便他/她可以探索不受远程服务故障影响的应用程序的其他部分。 同时,其他正常工作的服务应保持正常运行,不受此故障的影响。
>
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
通俗地说
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
> 断路器允许优雅地处理失败的远程服务。当我们应用程序的所有部分彼此高度解耦时,它特别有用,一个组件的故障并不意味着其他部分将停止工作。
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
维基百科说
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
> 断路器是现代软件开发中使用的一种设计模式。 它用于检测故障并封装防止故障不断重复发生、维护期间、临时外部系统故障或意外系统困难的逻辑。
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
## 程序示例
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
So, how does this all come together? With the above example in mind we will imitate the
functionality in a simple example. A monitoring service mimics the web app and makes both local and
remote calls.
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
那么,这一切是如何结合在一起的呢? 记住上面的例子,我们将在一个简单的例子中模仿这个功能。 监控服务模仿 Web 应用程序并进行本地和远程调用。
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
服务架构如下:
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00

在代码方面,最终用户应用程序是:
2021-05-07 01:57:16 +08:00
```java
@Slf4j
public class App {
private static final Logger LOGGER = LoggerFactory.getLogger(App.class);
/**
* Program entry point.
*
* @param args command line args
*/
public static void main(String[] args) {
var serverStartTime = System.nanoTime();
var delayedService = new DelayedRemoteService(serverStartTime, 5);
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2,
2000 * 1000 * 1000);
var quickService = new QuickRemoteService();
var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2,
2000 * 1000 * 1000);
2021-08-01 22:55:54 +08:00
// 创建一个可以进行本地和远程调用的监控服务对象
2021-05-07 01:57:16 +08:00
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker,
quickServiceCircuitBreaker);
2021-08-01 22:55:54 +08:00
// 获取本地资源
2021-05-07 01:57:16 +08:00
LOGGER.info(monitoringService.localResourceResponse());
2021-08-01 22:55:54 +08:00
// 从延迟服务中获取响应 2 次,以满足失败阈值
2021-05-07 01:57:16 +08:00
LOGGER.info(monitoringService.delayedServiceResponse());
LOGGER.info(monitoringService.delayedServiceResponse());
2021-08-01 22:55:54 +08:00
// 在超过故障阈值限制后获取延迟服务断路器的当前状态
// 现在是打开状态
2021-05-07 01:57:16 +08:00
LOGGER.info(delayedServiceCircuitBreaker.getState());
2021-08-01 22:55:54 +08:00
// 同时,延迟服务宕机,从健康快速服务获取响应
2021-05-07 01:57:16 +08:00
LOGGER.info(monitoringService.quickServiceResponse());
LOGGER.info(quickServiceCircuitBreaker.getState());
2021-08-01 22:55:54 +08:00
// 等待延迟的服务响应
2021-05-07 01:57:16 +08:00
try {
LOGGER.info("Waiting for delayed service to become responsive");
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
2021-08-01 22:55:54 +08:00
// 检查延时断路器的状态, 应该是HALF_OPEN
2021-05-07 01:57:16 +08:00
LOGGER.info(delayedServiceCircuitBreaker.getState());
2021-08-01 22:55:54 +08:00
// 从延迟服务中获取响应,现在应该是健康的
2021-05-07 01:57:16 +08:00
LOGGER.info(monitoringService.delayedServiceResponse());
2021-08-01 22:55:54 +08:00
// 获取成功响应后,它的状态应该是关闭。
2021-05-07 01:57:16 +08:00
LOGGER.info(delayedServiceCircuitBreaker.getState());
}
}
```
2021-08-01 22:55:54 +08:00
监控服务类:
2021-05-07 01:57:16 +08:00
```java
public class MonitoringService {
private final CircuitBreaker delayedService;
private final CircuitBreaker quickService;
public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) {
this.delayedService = delayedService;
this.quickService = quickService;
}
2021-08-01 22:55:54 +08:00
// 假设:本地服务不会失败,无需将其包装在断路器逻辑中
2021-05-07 01:57:16 +08:00
public String localResourceResponse() {
return "Local Service is working";
}
/**
* Fetch response from the delayed service (with some simulated startup time).
*
* @return response string
*/
public String delayedServiceResponse() {
try {
return this.delayedService.attemptRequest();
} catch (RemoteServiceException e) {
return e.getMessage();
}
}
/**
* Fetches response from a healthy service without any failure.
*
* @return response string
*/
public String quickServiceResponse() {
try {
return this.quickService.attemptRequest();
} catch (RemoteServiceException e) {
return e.getMessage();
}
}
}
```
2021-08-01 22:55:54 +08:00
可以看出,它直接调用获取本地资源,但它将对远程(昂贵)服务的调用包装在断路器对象中,防止故障如下:
2021-05-07 01:57:16 +08:00
```java
public class DefaultCircuitBreaker implements CircuitBreaker {
private final long timeout;
private final long retryTimePeriod;
private final RemoteService service;
long lastFailureTime;
private String lastFailureResponse;
int failureCount;
private final int failureThreshold;
private State state;
private final long futureTime = 1000 * 1000 * 1000 * 1000;
/**
* Constructor to create an instance of Circuit Breaker.
*
* @param timeout Timeout for the API request. Not necessary for this simple example
* @param failureThreshold Number of failures we receive from the depended service before changing
* state to 'OPEN'
* @param retryTimePeriod Time period after which a new request is made to remote service for
* status check.
*/
DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold,
long retryTimePeriod) {
this.service = serviceToCall;
2021-08-01 22:55:54 +08:00
// 我们从关闭状态开始希望一切都是正常的
2021-05-07 01:57:16 +08:00
this.state = State.CLOSED;
this.failureThreshold = failureThreshold;
2021-08-01 22:55:54 +08:00
// API的超时时间.
// 用于在超过限制时中断对远程资源的调用
2021-05-07 01:57:16 +08:00
this.timeout = timeout;
this.retryTimePeriod = retryTimePeriod;
//An absurd amount of time in future which basically indicates the last failure never happened
this.lastFailureTime = System.nanoTime() + futureTime;
this.failureCount = 0;
}
2021-08-01 22:55:54 +08:00
// 重置所有
2021-05-07 01:57:16 +08:00
@Override
public void recordSuccess() {
this.failureCount = 0;
this.lastFailureTime = System.nanoTime() + futureTime;
this.state = State.CLOSED;
}
@Override
public void recordFailure(String response) {
failureCount = failureCount + 1;
this.lastFailureTime = System.nanoTime();
// Cache the failure response for returning on open state
this.lastFailureResponse = response;
}
2021-08-01 22:55:54 +08:00
// 根据 failureThreshold、failureCount 和 lastFailureTime 评估当前状态。
2021-05-07 01:57:16 +08:00
protected void evaluateState() {
if (failureCount >= failureThreshold) { //Then something is wrong with remote service
if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) {
2021-08-01 22:55:54 +08:00
// 我们已经等得够久了,应该尝试检查服务是否已启动
2021-05-07 01:57:16 +08:00
state = State.HALF_OPEN;
} else {
2021-08-01 22:55:54 +08:00
// 服务可能仍会出现故障
2021-05-07 01:57:16 +08:00
state = State.OPEN;
}
} else {
2021-08-01 22:55:54 +08:00
// 一切正常
2021-05-07 01:57:16 +08:00
state = State.CLOSED;
}
}
@Override
public String getState() {
evaluateState();
return state.name();
}
/**
* Break the circuit beforehand if it is known service is down Or connect the circuit manually if
* service comes online before expected.
*
* @param state State at which circuit is in
*/
@Override
public void setState(State state) {
this.state = state;
switch (state) {
case OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime();
break;
case HALF_OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime() - retryTimePeriod;
break;
default:
this.failureCount = 0;
}
}
/**
* Executes service call.
*
* @return Value from the remote resource, stale response or a custom exception
*/
@Override
public String attemptRequest() throws RemoteServiceException {
evaluateState();
if (state == State.OPEN) {
2021-08-01 22:55:54 +08:00
// 如果电路处于打开状态,则返回缓存的响应
2021-05-07 01:57:16 +08:00
return this.lastFailureResponse;
} else {
2021-08-01 22:55:54 +08:00
// 如果电路未打开,则发出 API 请求
2021-05-07 01:57:16 +08:00
try {
2021-08-01 22:55:54 +08:00
//在实际应用程序中,这将在线程中运行,并且将利用断路器的超时参数来了解服务
// 是否正在工作。 在这里,我们根据服务器响应本身模拟
2021-05-07 01:57:16 +08:00
var response = service.call();
2021-08-01 22:55:54 +08:00
// api 响应正常,重置所有。
2021-05-07 01:57:16 +08:00
recordSuccess();
return response;
} catch (RemoteServiceException ex) {
recordFailure(ex.getMessage());
throw ex;
}
}
}
}
```
2021-08-01 22:55:54 +08:00
上述模式如何防止失败? 让我们通过它实现的这个有限状态机来理解。
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00

2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
- 我们使用某些参数初始化断路器对象:`timeout` 、`failureThreshold` 和 `retryTimePeriod` ,这有助于确定 API 的弹性。
- 最初,我们处于“关闭”状态,没有发生对 API 的远程调用。
- 每次调用成功时,我们都会将状态重置为开始时的状态。
- 如果失败次数超过某个阈值, 我们将进入“open”状态, 这就像开路一样, 阻止远程服务调用, 从而节省资源。 (这里,我们从 API 返回名为 ```stale response` `` 的响应)
- 一旦超过重试超时时间,我们就会进入“半开”状态并再次调用远程服务以检查服务是否正常工作,以便我们可以提供新鲜内容。 失败将其设置回“打开”状态,并在重试超时时间后进行另一次尝试,而成功将其设置为“关闭”状态,以便一切重新开始正常工作。
2021-05-07 01:57:16 +08:00
## 类图
2021-08-01 22:55:54 +08:00

2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
## 适用性
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
在以下情况下使用断路器模式
2021-05-07 01:57:16 +08:00
2021-08-01 22:55:54 +08:00
- 构建一个容错应用程序,其中某些服务的故障不应导致整个应用程序宕机。
- 构建一个持续运行(永远在线)的应用程序,这样它的组件就可以在不完全关闭的情况下升级。
2021-05-07 01:57:16 +08:00
## 相关模式
- [Retry Pattern ](https://github.com/iluwatar/java-design-patterns/tree/master/retry )
2021-08-01 22:55:54 +08:00
## 真实世界例子
2021-05-07 01:57:16 +08:00
* [Spring Circuit Breaker module ](https://spring.io/guides/gs/circuit-breaker )
* [Netflix Hystrix API ](https://github.com/Netflix/Hystrix )
2021-08-01 22:55:54 +08:00
## 鸣谢
2021-05-07 01:57:16 +08:00
* [Understanding Circuit Breaker Pattern ](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42 )
* [Martin Fowler on Circuit Breaker ](https://martinfowler.com/bliki/CircuitBreaker.html )
* [Fault tolerance in a high volume, distributed system ](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a )
2021-08-01 22:55:54 +08:00
* [Circuit Breaker pattern ](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker )