Mike Liu c5a4068e84
docs: Translation for zh (#1805)
* add state and callback pattern

* add command and template-method pattern

* add iterator pattern

* add bridege and DI pattern

* fix issue #1600

* add converter,proxy,visitor pattern

* add caching,composite,delegation,dirty-flag,interpreter patterns

* add dao and producer-consumer

* add dto and provate class data pattern

* fix #1646 png path problems

* fix #1646 composite png path case problem

* add abstract document pattern and version-number pattern

* add ambassador pattern

* add acyclic-visitor and api-gateway pattern

* add abstract-factory pattern

* add active-object pattern

* add aggregator-microservices and arrange-act-assert pattern

* update async-method-invocation pattern

* add balking and business-delegate pattern

* add bytecode and circuit-break pattern

* update arrange/act/assert pattern problems

* add csch pattern

* add language code, correct pic path

* #1805 update permalink

Co-authored-by: Subhrodip Mohanta <subhrodipmohanta@gmail.com>
Co-authored-by: Mike <admin@xiaod.info>
Co-authored-by: Ilkka Seppälä <iluwatar@users.noreply.github.com>
2021-08-01 20:25:54 +05:30

318 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: pattern
title: Circuit Breaker
folder: circuit-breaker
permalink: /patterns/circuit-breaker/
categories: Behavioral
language: zh
tags:
- Performance
- Decoupling
- Cloud distributed
---
## 意图
以这样一种方式处理昂贵的远程服务调用,即单个服务/组件的故障不会导致整个应用程序宕机,我们可以尽快重新连接到服务。
## 解释
真实世界例子
> 想象一个 Web 应用程序,它同时具有用于获取数据的本地文件/图像和远程服务。 这些远程服务有时可能健康且响应迅速,或者由于各种原因可能在某 个时间点变得缓慢和无响应。因此,如果其中一个远程服务缓慢或未成功响应,我们的应用程序将尝试使用多个线程/进程从远程服务获取响应,很快它们都会挂起(也称为 [线程饥饿][thread starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science)))导致我们的整个 Web 应用程序崩溃。我们应该能够检测到这种情况并向用户显示适当的消息,以便他/她可以探索不受远程服务故障影响的应用程序的其他部分。 同时,其他正常工作的服务应保持正常运行,不受此故障的影响。
>
通俗地说
> 断路器允许优雅地处理失败的远程服务。当我们应用程序的所有部分彼此高度解耦时,它特别有用,一个组件的故障并不意味着其他部分将停止工作。
维基百科说
> 断路器是现代软件开发中使用的一种设计模式。 它用于检测故障并封装防止故障不断重复发生、维护期间、临时外部系统故障或意外系统困难的逻辑。
## 程序示例
So, how does this all come together? With the above example in mind we will imitate the
functionality in a simple example. A monitoring service mimics the web app and makes both local and
remote calls.
那么,这一切是如何结合在一起的呢? 记住上面的例子,我们将在一个简单的例子中模仿这个功能。 监控服务模仿 Web 应用程序并进行本地和远程调用。
服务架构如下:
![alt text](../../../circuit-breaker/etc/ServiceDiagram.png "Service Diagram")
在代码方面,最终用户应用程序是:
```java
@Slf4j
public class App {
private static final Logger LOGGER = LoggerFactory.getLogger(App.class);
/**
* Program entry point.
*
* @param args command line args
*/
public static void main(String[] args) {
var serverStartTime = System.nanoTime();
var delayedService = new DelayedRemoteService(serverStartTime, 5);
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2,
2000 * 1000 * 1000);
var quickService = new QuickRemoteService();
var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2,
2000 * 1000 * 1000);
// 创建一个可以进行本地和远程调用的监控服务对象
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker,
quickServiceCircuitBreaker);
// 获取本地资源
LOGGER.info(monitoringService.localResourceResponse());
// 从延迟服务中获取响应 2 次,以满足失败阈值
LOGGER.info(monitoringService.delayedServiceResponse());
LOGGER.info(monitoringService.delayedServiceResponse());
// 在超过故障阈值限制后获取延迟服务断路器的当前状态
// 现在是打开状态
LOGGER.info(delayedServiceCircuitBreaker.getState());
// 同时,延迟服务宕机,从健康快速服务获取响应
LOGGER.info(monitoringService.quickServiceResponse());
LOGGER.info(quickServiceCircuitBreaker.getState());
// 等待延迟的服务响应
try {
LOGGER.info("Waiting for delayed service to become responsive");
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 检查延时断路器的状态应该是HALF_OPEN
LOGGER.info(delayedServiceCircuitBreaker.getState());
// 从延迟服务中获取响应,现在应该是健康的
LOGGER.info(monitoringService.delayedServiceResponse());
// 获取成功响应后,它的状态应该是关闭。
LOGGER.info(delayedServiceCircuitBreaker.getState());
}
}
```
监控服务类:
```java
public class MonitoringService {
private final CircuitBreaker delayedService;
private final CircuitBreaker quickService;
public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) {
this.delayedService = delayedService;
this.quickService = quickService;
}
// 假设:本地服务不会失败,无需将其包装在断路器逻辑中
public String localResourceResponse() {
return "Local Service is working";
}
/**
* Fetch response from the delayed service (with some simulated startup time).
*
* @return response string
*/
public String delayedServiceResponse() {
try {
return this.delayedService.attemptRequest();
} catch (RemoteServiceException e) {
return e.getMessage();
}
}
/**
* Fetches response from a healthy service without any failure.
*
* @return response string
*/
public String quickServiceResponse() {
try {
return this.quickService.attemptRequest();
} catch (RemoteServiceException e) {
return e.getMessage();
}
}
}
```
可以看出,它直接调用获取本地资源,但它将对远程(昂贵)服务的调用包装在断路器对象中,防止故障如下:
```java
public class DefaultCircuitBreaker implements CircuitBreaker {
private final long timeout;
private final long retryTimePeriod;
private final RemoteService service;
long lastFailureTime;
private String lastFailureResponse;
int failureCount;
private final int failureThreshold;
private State state;
private final long futureTime = 1000 * 1000 * 1000 * 1000;
/**
* Constructor to create an instance of Circuit Breaker.
*
* @param timeout Timeout for the API request. Not necessary for this simple example
* @param failureThreshold Number of failures we receive from the depended service before changing
* state to 'OPEN'
* @param retryTimePeriod Time period after which a new request is made to remote service for
* status check.
*/
DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold,
long retryTimePeriod) {
this.service = serviceToCall;
// 我们从关闭状态开始希望一切都是正常的
this.state = State.CLOSED;
this.failureThreshold = failureThreshold;
// API的超时时间.
// 用于在超过限制时中断对远程资源的调用
this.timeout = timeout;
this.retryTimePeriod = retryTimePeriod;
//An absurd amount of time in future which basically indicates the last failure never happened
this.lastFailureTime = System.nanoTime() + futureTime;
this.failureCount = 0;
}
// 重置所有
@Override
public void recordSuccess() {
this.failureCount = 0;
this.lastFailureTime = System.nanoTime() + futureTime;
this.state = State.CLOSED;
}
@Override
public void recordFailure(String response) {
failureCount = failureCount + 1;
this.lastFailureTime = System.nanoTime();
// Cache the failure response for returning on open state
this.lastFailureResponse = response;
}
// 根据 failureThreshold、failureCount 和 lastFailureTime 评估当前状态。
protected void evaluateState() {
if (failureCount >= failureThreshold) { //Then something is wrong with remote service
if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) {
// 我们已经等得够久了,应该尝试检查服务是否已启动
state = State.HALF_OPEN;
} else {
// 服务可能仍会出现故障
state = State.OPEN;
}
} else {
// 一切正常
state = State.CLOSED;
}
}
@Override
public String getState() {
evaluateState();
return state.name();
}
/**
* Break the circuit beforehand if it is known service is down Or connect the circuit manually if
* service comes online before expected.
*
* @param state State at which circuit is in
*/
@Override
public void setState(State state) {
this.state = state;
switch (state) {
case OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime();
break;
case HALF_OPEN:
this.failureCount = failureThreshold;
this.lastFailureTime = System.nanoTime() - retryTimePeriod;
break;
default:
this.failureCount = 0;
}
}
/**
* Executes service call.
*
* @return Value from the remote resource, stale response or a custom exception
*/
@Override
public String attemptRequest() throws RemoteServiceException {
evaluateState();
if (state == State.OPEN) {
// 如果电路处于打开状态,则返回缓存的响应
return this.lastFailureResponse;
} else {
// 如果电路未打开,则发出 API 请求
try {
//在实际应用程序中,这将在线程中运行,并且将利用断路器的超时参数来了解服务
// 是否正在工作。 在这里,我们根据服务器响应本身模拟
var response = service.call();
// api 响应正常,重置所有。
recordSuccess();
return response;
} catch (RemoteServiceException ex) {
recordFailure(ex.getMessage());
throw ex;
}
}
}
}
```
上述模式如何防止失败? 让我们通过它实现的这个有限状态机来理解。
![alt text](../../../circuit-breaker/etc/StateDiagram.png "State Diagram")
- 我们使用某些参数初始化断路器对象:`timeout``failureThreshold``retryTimePeriod`,这有助于确定 API 的弹性。
- 最初,我们处于“关闭”状态,没有发生对 API 的远程调用。
- 每次调用成功时,我们都会将状态重置为开始时的状态。
- 如果失败次数超过某个阈值我们将进入“open”状态这就像开路一样阻止远程服务调用从而节省资源。 (这里,我们从 API 返回名为 ```stale response``` 的响应)
- 一旦超过重试超时时间,我们就会进入“半开”状态并再次调用远程服务以检查服务是否正常工作,以便我们可以提供新鲜内容。 失败将其设置回“打开”状态,并在重试超时时间后进行另一次尝试,而成功将其设置为“关闭”状态,以便一切重新开始正常工作。
## 类图
![alt text](../../../circuit-breaker/etc/circuit-breaker.urm.png "Circuit Breaker class diagram")
## 适用性
在以下情况下使用断路器模式
- 构建一个容错应用程序,其中某些服务的故障不应导致整个应用程序宕机。
- 构建一个持续运行(永远在线)的应用程序,这样它的组件就可以在不完全关闭的情况下升级。
## 相关模式
- [Retry Pattern](https://github.com/iluwatar/java-design-patterns/tree/master/retry)
## 真实世界例子
* [Spring Circuit Breaker module](https://spring.io/guides/gs/circuit-breaker)
* [Netflix Hystrix API](https://github.com/Netflix/Hystrix)
## 鸣谢
* [Understanding Circuit Breaker Pattern](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42)
* [Martin Fowler on Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html)
* [Fault tolerance in a high volume, distributed system](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a)
* [Circuit Breaker pattern](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)