What is the Correct Caching Strategy?


While uncovering ways to speed up our application on the Google App Engine, we decided to use Memcache. This led us to an interesting discussion which I am reproducing here to get your inputs.

As you would observe, if you are following our blog, that there are 2 potential ways to cache, invasive and non-invasive. May be there is a third way which you would be able to tell us. We decided that the entities which would not change much but are still being fetched again and again need to be cached.

For fetching the entities, there are again two possible ways

  1. Fetch the entity individually or
  2. Fetch the entity as a part of a group of entities.

This would become clear with the following example. Say, we have to find the tasks assigned to a person. These task assignments are based on a range of dates. So this is the API that we are talking about would be this

List<TaskAssignment> fetchTaskAssigments(User user, Date startDate, Date endDate);

Now, behind the scenes, this query would go to the datastore and fetch the TaskAssignment(s) for that user based on the date range. Also, for our scenario the date ranges are kind of canned. For simplicity, the date ranges would be month ranges. Hence we would be interested in TaskAssignment(s) for the month of Jun, Jul, Aug, Sep and so on.

CASE I

One way to cache would be cache lists. i.e. cache all assignments belonging to Jun, Jul, Aug and Sep. Hence we would have 4 lists cached List<TaskAssignment> for Jun, List<TaskAssignment> for Jul, List<TaskAssignment> for Aug and List<TaskAssignment> for Sep.

Benefits of caching this way,

  • Once the results are cached, there is no more computation necessary. All the lists would be fetched from the cache.
  • We can apply non-invasive caching on the methods as aspects. The results are put into the cache and the business logic does not need to know about the caching framework.

Limitations of caching this way,

  • There is duplication of TaskAssignment being cached. If the month of Jun, Jul and Sep have the same TaskAssignment then that entity is present in your cache 3 times.

CASE II

Another way to cache is to get all the TaskAssignments for the user irrespective of the date range and then cache that. Hence, effectively we are talking about the following API

List<TaskAssignment> fetchTaskAssigments(User user);

Now when there is a need to invoke a method of the following API

List<TaskAssignment> fetchTaskAssigments(User user, Date startDate, Date endDate);

then the implementation would be something like this

public void List<TaskAssignment> fetchTaskAssigments(User user, Date startDate, Date endDate){
	List<TaskAssignment> assignments = fetchTaskAssigments(User user);
	assignments = filterAssignmentsOnDateRange(assignments, startDate, endDate);
}

Here, there would be a non-invasive cache aspect applied on the fetchTaskAssigments(User user) method which would either fetch the list from the datastore or from the cache.

Benefits of caching this way,

  • There is NO duplication of TaskAssignment being cached. Each TaskAssignment is cached only once.
  • This caching is also non-invasive since the business logic is not aware of the cache.

Limitations of caching this way,

  • There needs to be a computation, filtering done everytime the TaskAssignment(s) need to be returned on the basis of date range.
  • Some extra logic needs to be written for fetching all the TaskAssignment which was not required earlier.

So in a nutshell instead of doing filtering on the datastore, we are doing it in the code. And instead of storing duplicate entities, we are storing a single entity.

Let us assume that the number of TaskAssignment(s) is not huge as a result of which, the fetchTaskAssigments(User user) in Case II is not very expensive. Also assume that we have enough caching space available as a result of which storing duplicate entities in Case I is also not very expensive.

Given these facts which strategy would you use and why? Are there any other benefits / limitations that you see for the above approach which would help you make your decision. For our case we went with Case I, since we could quickly write an around aspect and inject caching but we are not sure whether it is the best way to go. What are your thoughts and recommendations?

Advertisements

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development. Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). To know more, send a mail to hello@knoldus.com or visit www.knoldus.com
This entry was posted in Architecture, Java and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s