diff --git a/retry/README.md b/retry/README.md index 056674a18..8cf6bb94f 100644 --- a/retry/README.md +++ b/retry/README.md @@ -10,39 +10,40 @@ tags: --- ## Intent -Transparently retry certain operations that involve communication with external resources, particularly over the -network, isolating calling code from the retry implementation details. + +Transparently retry certain operations that involve communication with external resources, +particularly over the network, isolating calling code from the retry implementation details. ## Explanation -Retry pattern consists retrying operations on remote resources over the -network a set number of times. It closely depends on both business and technical -requirements: how much time will the business allow the end user to wait while -the operation finishes? What are the performance characteristics of the -remote resource during peak loads as well as our application as more threads -are waiting for the remote resource's availability? Among the errors returned -by the remote service, which can be safely ignored in order to retry? Is the -operation [idempotent](https://en.wikipedia.org/wiki/Idempotence)? -Another concern is the impact on the calling code by implementing the retry -mechanism. The retry mechanics should ideally be completely transparent to the -calling code (service interface remains unaltered). There are two general -approaches to this problem: from an enterprise architecture standpoint -(strategic), and a shared library standpoint (tactical). +Retry pattern consists retrying operations on remote resources over the network a set number of +times. It closely depends on both business and technical requirements: How much time will the +business allow the end user to wait while the operation finishes? What are the performance +characteristics of the remote resource during peak loads as well as our application as more threads +are waiting for the remote resource's availability? Among the errors returned by the remote service, +which can be safely ignored in order to retry? Is the operation +[idempotent](https://en.wikipedia.org/wiki/Idempotence)? -From a strategic point of view, this would be solved by having requests -be redirected to a separate intermediary system, traditionally an -[ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus), but more recently -a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a). +Another concern is the impact on the calling code by implementing the retry mechanism. The retry +mechanics should ideally be completely transparent to the calling code (service interface remains +unaltered). There are two general approaches to this problem: From an enterprise architecture +standpoint (strategic), and a shared library standpoint (tactical). -From a tactical point of view, this would be solved by reusing shared libraries -like [Hystrix](https://github.com/Netflix/Hystrix) (please note that *Hystrix* is a complete implementation of -the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of which the Retry pattern -can be considered a subset of.). This is the type of solution showcased in the simple example that accompanies this -*README*. +From a strategic point of view, this would be solved by having requests redirected to a separate +intermediary system, traditionally an [ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus), +but more recently a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a). + +From a tactical point of view, this would be solved by reusing shared libraries like +[Hystrix](https://github.com/Netflix/Hystrix) (please note that Hystrix is a complete implementation +of the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of +which the Retry pattern can be considered a subset of). This is the type of solution showcased in +the simple example that accompanies this `README.md`. Real world example -> Our application uses a service providing customer information. Once in a while the service seems to be flaky and can return errors or sometimes it just times out. To circumvent these problems we apply the retry pattern. +> Our application uses a service providing customer information. Once in a while the service seems +> to be flaky and can return errors or sometimes it just times out. To circumvent these problems we +> apply the retry pattern. In plain words @@ -50,11 +51,14 @@ In plain words [Microsoft documentation](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry) says -> Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application. +> Enable an application to handle transient failures when it tries to connect to a service or +> network resource, by transparently retrying a failed operation. This can improve the stability of +> the application. **Programmatic Example** -In our hypothetical application, we have a generic interface for all operations on remote interfaces. +In our hypothetical application, we have a generic interface for all operations on remote +interfaces. ```java public interface BusinessOperation { @@ -73,16 +77,14 @@ public final class FindCustomer implements BusinessOperation { } ``` -Our `FindCustomer` implementation can be configured to throw -`BusinessException`s before returning the customer's ID, thereby simulating a -'flaky' service that intermittently fails. Some exceptions, like the -`CustomerNotFoundException`, are deemed to be recoverable after some -hypothetical analysis because the root cause of the error stems from "some -database locking issue". However, the `DatabaseNotAvailableException` is -considered to be a definite showstopper - the application should not attempt -to recover from this error. +Our `FindCustomer` implementation can be configured to throw `BusinessException`s before returning +the customer's ID, thereby simulating a flaky service that intermittently fails. Some exceptions, +like the `CustomerNotFoundException`, are deemed to be recoverable after some hypothetical analysis +because the root cause of the error stems from "some database locking issue". However, the +`DatabaseNotAvailableException` is considered to be a definite showstopper - the application should +not attempt to recover from this error. -We can model a 'recoverable' scenario by instantiating `FindCustomer` like this: +We can model a recoverable scenario by instantiating `FindCustomer` like this: ```java final var op = new FindCustomer( @@ -93,15 +95,12 @@ final var op = new FindCustomer( ); ``` -In this configuration, `FindCustomer` will throw `CustomerNotFoundException` -three times, after which it will consistently return the customer's ID -(`12345`). +In this configuration, `FindCustomer` will throw `CustomerNotFoundException` three times, after +which it will consistently return the customer's ID (`12345`). -In our hypothetical scenario, our analysts indicate that this operation -typically fails 2-4 times for a given input during peak hours, and that each -worker thread in the database subsystem typically needs 50ms to -"recover from an error". Applying these policies would yield something like -this: +In our hypothetical scenario, our analysts indicate that this operation typically fails 2-4 times +for a given input during peak hours, and that each worker thread in the database subsystem typically +needs 50ms to "recover from an error". Applying these policies would yield something like this: ```java final var op = new Retry<>( @@ -117,26 +116,27 @@ final var op = new Retry<>( ); ``` -Executing `op` *once* would automatically trigger at most 5 retry attempts, -with a 100 millisecond delay between attempts, ignoring any -`CustomerNotFoundException` thrown while trying. In this particular scenario, -due to the configuration for `FindCustomer`, there will be 1 initial attempt +Executing `op` once would automatically trigger at most 5 retry attempts, with a 100 millisecond +delay between attempts, ignoring any `CustomerNotFoundException` thrown while trying. In this +particular scenario, due to the configuration for `FindCustomer`, there will be 1 initial attempt and 3 additional retries before finally returning the desired result `12345`. -If our `FindCustomer` operation were instead to throw a fatal -`DatabaseNotFoundException`, which we were instructed not to ignore, but -more importantly we did *not* instruct our `Retry` to ignore, then the operation -would have failed immediately upon receiving the error, not matter how many -attempts were left. +If our `FindCustomer` operation were instead to throw a fatal `DatabaseNotFoundException`, which we +were instructed not to ignore, but more importantly we did not instruct our `Retry` to ignore, then +the operation would have failed immediately upon receiving the error, not matter how many attempts +were left. ## Class diagram + ![alt text](./etc/retry.png "Retry") ## Applicability -Whenever an application needs to communicate with an external resource, particularly in a cloud environment, and if -the business requirements allow it. + +Whenever an application needs to communicate with an external resource, particularly in a cloud +environment, and if the business requirements allow it. ## Consequences + **Pros:** * Resiliency