Lagom Circuit Breaker: What, Why and How?


When our service depends on a third party service’s end point for response, it is possible that due to various reasons, that may vary from bad connection to an internal server error from the third party, our own service fails. A circuit breaker is employed to prevent cascading failures in our services if such scenarios were to occur. And if you are here, you are probably looking forward to customise the circuit breaker within your own service. So, shall we?

First of, when working with lagom in java, the circuit breaker functionality is on even if you don’t do anything about it. But it is always better to customise things to suit our own needs, right? So let’s first get the concept of circuit breaker clear in our head. A circuit breaker can have one of the following three states at an instant:

  1. Closed,
  2. Open, and
  3. Half-Open

Also, there are three parameters that govern the state of our circuit breaker, which we can specify in the application.conf file in the /resources folder:

  1. max-failures
  2. reset-timeout
  3. call-timeout

Consider two services S1 (client service) and S2 (supplier service), and imagine a circuit between them. A circuit breaker is like a switch that exists between these two services. Initially, this switch is closed, and the flow of request/response between the two services goes smoothly. Also, assume that we have set the values of our three parameters as below:

  • max-failures = 10,
  • reset-timeout = 15 seconds, and
  • call-timeout = 10 seconds .

Now, if for whatever reason, S2 fails to deliver response to S1 within the call-timeout set in the configuration (here we have set 10 seconds), it is counted as a failure. After the number of failures reaches the maximum limit of max-failures specified in the application.conf (i.e. when 10 failures have occurred in our case) , our circuit breaker goes in the open state, that too for the duration specified for the reset-timeout parameter (15 seconds for this example). During this state, if S1 tries to reach S2 for response, the circuit breaker generates a CircuitBreakerOpenException as response to the request. After the reset-timeout period is over, our circuit-breaker enters the half-open state temporarily for the next request in queue from S1, and tries to get the response from S2. If the response is not received yet again, the circuit breaker goes back to the open state for another period same as the reset-timeout value. However, S1 receives a success response from S2 (i.e., with status code 200), the circuit breaker goes back to the closed state.


Circuit-Breaker-Flow

So let’s get to the configuration part in our implementation.

In the /resources/application.conf within your service-impl, add the following configuration for now.

#default circuit breaker configuration.
lagom.circuit-breaker {

  # Default configuration that is used if a configuration section
  # with the circuit breaker identifier is not defined.
  default {
    # Enable/Disable circuit breaker.
    enabled = on

    # Number of failures before opening the circuit.
    max-failures = 10
    max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}

    # Duration of time in open state after which to attempt to close
    # the circuit, by first entering the half-open state.
    reset-timeout = 15s
    reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}

    # Duration of time after which to consider a call a failure.
    call-timeout = 10s
    call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
  }
}

Once you’ve done that, you have specified a default configuration for your circuit breaker. You can make modifications in these configuration to suit your own requirement. But, is it really this simple?

Actually, yes! You can even further customise your configuration for your service to any level you want. I believe your lagom service’s API looks somewhat as below.

default Descriptor descriptor() {
    return named("demo").withCalls(
            restCall(GET, "/demopath1/:param1/:param2/?param3&param4", this::demoMethod),
            restCall(GET, "/demopath2", this::health))
            .withCircuitBreaker(CircuitBreaker.identifiedBy("demo2")) //optional
            .withAutoAcl(true);
}

If we do not specify the below configuration in application.conf, the circuit breaker would consider the default parameters as we specified previously. However, if you now replace that with the below configuration in application.conf:

lagom.circuit-breaker {

  # Default configuration that is used if a configuration section
  # with the circuit breaker identifier is not defined.
  default {
    # Enable/Disable circuit breaker.
    enabled = on 

    # Number of failures before opening the circuit.
      max-failures = 10
    max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}

    # Duration of time in open state after which to attempt to close
    # the circuit, by first entering the half-open state.
    reset-timeout = 15s
    reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}

    # Duration of time after which to consider a call a failure.
    call-timeout = 10s
    call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
  }

  demo {
    # Enable/Disable circuit breaker.
    enabled = off

    # Number of failures before opening the circuit.
    max-failures = 2
    max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}

    # Duration of time in open state after which to attempt to close
    # the circuit, by first entering the half-open state.
    reset-timeout = 15s
    reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}

    # Duration of time after which to consider a call a failure.
    call-timeout = 10s
    call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
  }

  demo2 {
    # Enable/Disable circuit breaker.
    enabled = on

    # Number of failures before opening the circuit.
    max-failures = 3
    max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}

    # Duration of time in open state after which to attempt to close
    # the circuit, by first entering the half-open state.
    reset-timeout = 15s
    reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}

    # Duration of time after which to consider a call a failure.
    call-timeout = 10s
    call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
  }
}

You will observe that the configuration specified for demo overrides the default configuration for all the api calls within the descriptor named demo (i.e., for /demopath1 and for /demopath2). Thereafter, since in case of /demopath2, we have explicitly specified that it should use the circuit breaker configuration identified by demo2, its circuit breaker configuration is further overridden.

Please note that if values for CIRCUIT_BREAKER_MAX_FAILURES, CIRCUIT_BREAKER_RESET_TIMEOUT, and CIRCUIT_BREAKER_CALL_TIMEOUT are set in the environment variables, the values hard-coded in application.conf are over-ridden by those in the example used above.

Also, while implementing my own circuit breaker, I observed that when I was running the integration test cases on my service, I was getting the CircuitBreakerOpenException after occurrence of max-failures, but I wasn’t getting the same exception when I was hitting multiple requests using CURL or my browser. The reason for that is circuit breaker does not come into picture when there is just one independent service and some end users hit that service directly using a browser or an HTTP client. The role of circuit breaker is clearly defined in the documentation as: “A circuit breaker is used to provide stability and prevent cascading failures in distributed systems. These should be used in conjunction with judicious timeouts at the interfaces between services to prevent the failure of a single service from bringing down other services.” 

Hope this was helpful. I’m still looking more into circuit breaker implementation in lagom so if you have any queries regarding the same, please feel free to drop a comment down below. And if you found this helpful at any level, please like & share this blog, we are all here to learn! Cheers. 🙂

References:

  1. https://www.lagomframework.com/documentation/1.3.x/java/ServiceClients.html
  2. https://martinfowler.com/bliki/CircuitBreaker.html

Keep learning, Like & share.

 

KNOLDUS-advt-sticker

 

 

This entry was posted in Scala and tagged , . Bookmark the permalink.

3 Responses to Lagom Circuit Breaker: What, Why and How?

  1. Pingback: Lagom Circuit Breaker: What, Why and How? – sendilsadasivam

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s