Update README.md
This commit is contained in:
parent
3f4d637510
commit
19378f3fdd
110
retry/README.md
110
retry/README.md
@ -10,39 +10,40 @@ tags:
|
||||
---
|
||||
|
||||
## Intent
|
||||
Transparently retry certain operations that involve communication with external resources, particularly over the
|
||||
network, isolating calling code from the retry implementation details.
|
||||
|
||||
Transparently retry certain operations that involve communication with external resources,
|
||||
particularly over the network, isolating calling code from the retry implementation details.
|
||||
|
||||
## Explanation
|
||||
Retry pattern consists retrying operations on remote resources over the
|
||||
network a set number of times. It closely depends on both business and technical
|
||||
requirements: how much time will the business allow the end user to wait while
|
||||
the operation finishes? What are the performance characteristics of the
|
||||
remote resource during peak loads as well as our application as more threads
|
||||
are waiting for the remote resource's availability? Among the errors returned
|
||||
by the remote service, which can be safely ignored in order to retry? Is the
|
||||
operation [idempotent](https://en.wikipedia.org/wiki/Idempotence)?
|
||||
|
||||
Another concern is the impact on the calling code by implementing the retry
|
||||
mechanism. The retry mechanics should ideally be completely transparent to the
|
||||
calling code (service interface remains unaltered). There are two general
|
||||
approaches to this problem: from an enterprise architecture standpoint
|
||||
(strategic), and a shared library standpoint (tactical).
|
||||
Retry pattern consists retrying operations on remote resources over the network a set number of
|
||||
times. It closely depends on both business and technical requirements: How much time will the
|
||||
business allow the end user to wait while the operation finishes? What are the performance
|
||||
characteristics of the remote resource during peak loads as well as our application as more threads
|
||||
are waiting for the remote resource's availability? Among the errors returned by the remote service,
|
||||
which can be safely ignored in order to retry? Is the operation
|
||||
[idempotent](https://en.wikipedia.org/wiki/Idempotence)?
|
||||
|
||||
From a strategic point of view, this would be solved by having requests
|
||||
be redirected to a separate intermediary system, traditionally an
|
||||
[ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus), but more recently
|
||||
a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a).
|
||||
Another concern is the impact on the calling code by implementing the retry mechanism. The retry
|
||||
mechanics should ideally be completely transparent to the calling code (service interface remains
|
||||
unaltered). There are two general approaches to this problem: From an enterprise architecture
|
||||
standpoint (strategic), and a shared library standpoint (tactical).
|
||||
|
||||
From a tactical point of view, this would be solved by reusing shared libraries
|
||||
like [Hystrix](https://github.com/Netflix/Hystrix) (please note that *Hystrix* is a complete implementation of
|
||||
the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of which the Retry pattern
|
||||
can be considered a subset of.). This is the type of solution showcased in the simple example that accompanies this
|
||||
*README*.
|
||||
From a strategic point of view, this would be solved by having requests redirected to a separate
|
||||
intermediary system, traditionally an [ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus),
|
||||
but more recently a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a).
|
||||
|
||||
From a tactical point of view, this would be solved by reusing shared libraries like
|
||||
[Hystrix](https://github.com/Netflix/Hystrix) (please note that Hystrix is a complete implementation
|
||||
of the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of
|
||||
which the Retry pattern can be considered a subset of). This is the type of solution showcased in
|
||||
the simple example that accompanies this `README.md`.
|
||||
|
||||
Real world example
|
||||
|
||||
> Our application uses a service providing customer information. Once in a while the service seems to be flaky and can return errors or sometimes it just times out. To circumvent these problems we apply the retry pattern.
|
||||
> Our application uses a service providing customer information. Once in a while the service seems
|
||||
> to be flaky and can return errors or sometimes it just times out. To circumvent these problems we
|
||||
> apply the retry pattern.
|
||||
|
||||
In plain words
|
||||
|
||||
@ -50,11 +51,14 @@ In plain words
|
||||
|
||||
[Microsoft documentation](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry) says
|
||||
|
||||
> Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.
|
||||
> Enable an application to handle transient failures when it tries to connect to a service or
|
||||
> network resource, by transparently retrying a failed operation. This can improve the stability of
|
||||
> the application.
|
||||
|
||||
**Programmatic Example**
|
||||
|
||||
In our hypothetical application, we have a generic interface for all operations on remote interfaces.
|
||||
In our hypothetical application, we have a generic interface for all operations on remote
|
||||
interfaces.
|
||||
|
||||
```java
|
||||
public interface BusinessOperation<T> {
|
||||
@ -73,16 +77,14 @@ public final class FindCustomer implements BusinessOperation<String> {
|
||||
}
|
||||
```
|
||||
|
||||
Our `FindCustomer` implementation can be configured to throw
|
||||
`BusinessException`s before returning the customer's ID, thereby simulating a
|
||||
'flaky' service that intermittently fails. Some exceptions, like the
|
||||
`CustomerNotFoundException`, are deemed to be recoverable after some
|
||||
hypothetical analysis because the root cause of the error stems from "some
|
||||
database locking issue". However, the `DatabaseNotAvailableException` is
|
||||
considered to be a definite showstopper - the application should not attempt
|
||||
to recover from this error.
|
||||
Our `FindCustomer` implementation can be configured to throw `BusinessException`s before returning
|
||||
the customer's ID, thereby simulating a flaky service that intermittently fails. Some exceptions,
|
||||
like the `CustomerNotFoundException`, are deemed to be recoverable after some hypothetical analysis
|
||||
because the root cause of the error stems from "some database locking issue". However, the
|
||||
`DatabaseNotAvailableException` is considered to be a definite showstopper - the application should
|
||||
not attempt to recover from this error.
|
||||
|
||||
We can model a 'recoverable' scenario by instantiating `FindCustomer` like this:
|
||||
We can model a recoverable scenario by instantiating `FindCustomer` like this:
|
||||
|
||||
```java
|
||||
final var op = new FindCustomer(
|
||||
@ -93,15 +95,12 @@ final var op = new FindCustomer(
|
||||
);
|
||||
```
|
||||
|
||||
In this configuration, `FindCustomer` will throw `CustomerNotFoundException`
|
||||
three times, after which it will consistently return the customer's ID
|
||||
(`12345`).
|
||||
In this configuration, `FindCustomer` will throw `CustomerNotFoundException` three times, after
|
||||
which it will consistently return the customer's ID (`12345`).
|
||||
|
||||
In our hypothetical scenario, our analysts indicate that this operation
|
||||
typically fails 2-4 times for a given input during peak hours, and that each
|
||||
worker thread in the database subsystem typically needs 50ms to
|
||||
"recover from an error". Applying these policies would yield something like
|
||||
this:
|
||||
In our hypothetical scenario, our analysts indicate that this operation typically fails 2-4 times
|
||||
for a given input during peak hours, and that each worker thread in the database subsystem typically
|
||||
needs 50ms to "recover from an error". Applying these policies would yield something like this:
|
||||
|
||||
```java
|
||||
final var op = new Retry<>(
|
||||
@ -117,26 +116,27 @@ final var op = new Retry<>(
|
||||
);
|
||||
```
|
||||
|
||||
Executing `op` *once* would automatically trigger at most 5 retry attempts,
|
||||
with a 100 millisecond delay between attempts, ignoring any
|
||||
`CustomerNotFoundException` thrown while trying. In this particular scenario,
|
||||
due to the configuration for `FindCustomer`, there will be 1 initial attempt
|
||||
Executing `op` once would automatically trigger at most 5 retry attempts, with a 100 millisecond
|
||||
delay between attempts, ignoring any `CustomerNotFoundException` thrown while trying. In this
|
||||
particular scenario, due to the configuration for `FindCustomer`, there will be 1 initial attempt
|
||||
and 3 additional retries before finally returning the desired result `12345`.
|
||||
|
||||
If our `FindCustomer` operation were instead to throw a fatal
|
||||
`DatabaseNotFoundException`, which we were instructed not to ignore, but
|
||||
more importantly we did *not* instruct our `Retry` to ignore, then the operation
|
||||
would have failed immediately upon receiving the error, not matter how many
|
||||
attempts were left.
|
||||
If our `FindCustomer` operation were instead to throw a fatal `DatabaseNotFoundException`, which we
|
||||
were instructed not to ignore, but more importantly we did not instruct our `Retry` to ignore, then
|
||||
the operation would have failed immediately upon receiving the error, not matter how many attempts
|
||||
were left.
|
||||
|
||||
## Class diagram
|
||||
|
||||

|
||||
|
||||
## Applicability
|
||||
Whenever an application needs to communicate with an external resource, particularly in a cloud environment, and if
|
||||
the business requirements allow it.
|
||||
|
||||
Whenever an application needs to communicate with an external resource, particularly in a cloud
|
||||
environment, and if the business requirements allow it.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Pros:**
|
||||
|
||||
* Resiliency
|
||||
|
Loading…
x
Reference in New Issue
Block a user