Resilience4j is a fault tolerance library designed for Java 8 and functional programming. It is lightweight, modular and really fast. We will talk about its modules and functionality later, but first, let’s discuss it briefly.
Fault Tolerance
The ability of some components of a system to function properly in the event of a failure. It sounds simple, but it is not that easy to achieve, because if you want to make a system fault tolerant, it has to be done at all levels and sub-systems as a part of the design. And it’s not just about proper error handling; You should keep your failure domain as small as possible, working on fault isolation and the possibility of self-stabilization.
Error Handling
Error-handling bugs are the second largest category (18%) after logic bugs. The authors break down error-handling bugs into three classes of problems.
- Error/Failure Detection – Errors are often ignored and incorrectly detected.
- Error Propagation – This class of problems arises in layered systems where error detection and error handling code are located on different layers and there is propagation problem across layers.
- Error Handling – Sometimes it’s not clear how to handle rare corner-cases, and the lack of such specifications leads to error-prone code.
Modularization
Resilience4j can help you implement any fault tolerance ideas. With Resilience4j you don’t need to go completely, you can choose the things you need. Resilience4j provides several core modules and add-on modules.
Core modules
- resilience4j-circuitbreaker.
- resilience4j-ratelimiter.
- resilience4j-bulkhead.
- resilience4j-retry.
- resilience4j-cache.
Add-on modules
- resilience4j-retrofit: Retrofit adapter.
- resilience4j-feign: Feign adapter.
- resilience4j-consumer: Circular Buffer Event consumer.
- resilience4j-kotlin: Kotlin coroutines support.
Frameworks modules
- resilience4j-spring-boot: Spring Boot Starter.
- resilience4j-spring-boot2: Spring Boot 2 Starter.
- resilience4j-ratpack: Ratpack Starter.
- resilience4j-vertx: Vertx Future decorator.
Reactive modules
- resilience4j-rxjava2: Custom RxJava2 operators.
- resilience4j-reactor: Custom Spring Reactor operators.
Metrics modules
- resilience4j-micrometer: Micrometer Metrics exporter.
- resilience4j-metrics: Dropwizard Metrics exporter.
- resilience4j-prometheus: Prometheus Metrics exporter.
Example
Example that could potentially fail.
// Simulates a microservice for user management
public interface UserService {
Picture fetchProfilePicture(String userId);
}
Traditional example
try {
profilePicture = userService.fetchProfilePicture(userId);
} catch(Exception e) {
Logger.error("The world is not a perfect place ", e);
}
Yes, logging is a very important aspect of failure detection, but we can be a little smarter about it, and this is where Resilience4j can help you. The library will not automatically fix all possible bugs; All the important tasks and choices are still up to you. The library can only make this “the hard way” brighter. We can perform an unlimited number of additional actions in case of failure, except logging. Here are some options.
- Define “fallback” operations that can go to another host, query backup DB, or reuse the latest successful response. The example uses Vavr’s Try Monad to recover from an exception and invoke another lambda expression as a fallback.
Supplier fetchTargetPicture = () -> userService.fetchProfilePicture(targetID);
// in case of failure you'll receive some stub picture
Picture profilePicture = Try.ofSupplier(fetchTargetPicture)
.recover(throwable -> Picture.defaultForProfile()).get();
- Apply automatic retrying and configure max attempts count and wait duration before retries.
Supplier fetchTargetPicture = () -> userService.fetchProfilePicture(targetID);
RetryConfig retryConfig = RetryConfig.custom().maxAttempts(3).build();
Retry retry = Retry.of("userService", retryConfig);
// it will try to fetch image 3 times with 500ms pause between retries
fetchTargetPicture = Retry.decorateSupplier(retry, fetchTargetPicture);
Picture profilePicture = Try.ofSupplier(fetchTargetPicture)
.recover(throwable -> Picture.defaultForProfile()).get();
- Use circuit breaking, where you can track error rates of some service/component and, in case of problems, stop all operations with it to help it recover.
Supplier fetchTargetPicture = () -> userService.fetchProfilePicture(targetID);
CircuitBreaker circuitBreaker = CircuitBreaker.ofDefaults("userService");
// it will prevent all calls to original fetchTargetPicture in case of UserService failure
fetchTargetPicture = CircuitBreaker
.decorateSupplier(circuitBreaker, fetchTargetPicture);
RetryConfig retryConfig = RetryConfig.custom().maxAttempts(3).build();
Retry retry = Retry.of("userService", retryConfig);
fetchTargetPicture = Retry.decorateSupplier(retry, fetchTargetPicture);
Picture profilePicture = Try.ofSupplier(fetchTargetPicture)
.recover(throwable -> Picture.defaultForProfile()).get();
- Send an event directly to the monitoring system to speed up problem detection.
Supplier fetchTargetPicture = () -> userService.fetchProfilePicture(targetID);
CircuitBreaker circuitBreaker = CircuitBreaker.ofDefaults("userService");
fetchTargetPicture = CircuitBreaker
.decorateSupplier(circuitBreaker, fetchTargetPicture);
// they know what to do with it
circuitBreaker.getEventPublisher()
.onError(event -> Houston.weHaveAProblem(event));
RetryConfig retryConfig = RetryConfig.custom().maxAttempts(3).build();
Retry retry = Retry.of("userService", retryConfig);
fetchTargetPicture = Retry.decorateSupplier(retry, fetchTargetPicture);
Picture profilePicture = Try.ofSupplier(fetchTargetPicture)
.recover(throwable -> Picture.defaultForProfile()).get();
- Instant event based notifications are really great, but in general, you should always have a separate monitoring system that will poll all health checks of your nodes and will watch for any anomalies in your metrics. Resilience4j has add-on modules for integration with Prometheus and Dropwizard Metrics, so you can easily publish your metrics. to these systems.
final MetricRegistry collectorRegistry = new MetricRegistry();
collectorRegistry.registerAll(CircuitBreakerMetrics.ofCircuitBreaker(circuitBreaker));
Now you can see the uniqueness of Resilience4j from API point of view.
References
https://resilience4j.readme.io/docs
https://www.baeldung.com/resilience4j