Google App Engine: Understanding Caching


If you have been following the Google App Engine downtime notify group then you would realize that the datastore has been behaving erratically over the last 2 weeks. Though we had been thinking about exploiting Memcache for caching for a while now, but the last 2 weeks proved to be the perfect alibi. Another reason was the comment that we got on our blog by Gianni Mariani which confirmed that Memcache is going to be significantly faster than the datastore.

Datastore transactions are generally much more expensive than Memcache transactions and is a significant source of application contention in heavily used apps.

Before we get to the actual implementation, there are some interesting facts about GAE caching that you should know.

  • Values can expire from the Memcache at any time, and may be expired prior to the expiration deadline set for the value. The app engine platform would be managing the expiration and the advice is that we should be prepared to lose the value before the actual expiration that we have defined. Why?
  • That is because, although, Memcache is segmented by application which means that every app is guaranteed a fair share of Memcache space, however, the nature of the Memcache is that you are sharing it with a lot of other applications. If other apps are Memcache hungry then memory pressure will eventually push your inactive values out of the cache even before the expiration.
  • Memcache API respects the Namespace so it is easy to use it in the multi-tenant application like ours.

GAE provides support for JCache, a proposed interface standard for memory caches, as an interface to the App Engine Memcache. This interface is described by JSR 107. App Engine provides this interface using the net.sf.jsr107 interface package. One thing that you would have to keep in mind is that you need to include the appengine-jsr107cache-1.3.7.jar as a part of your war. If you are using a maven structure then the dependency would be

<dependency>
	<groupId>com.google.appengine</groupId>
	<artifactId>appengine-jsr107cache</artifactId>
	<version>${gae.version}</version>
</dependency>

and to deploy it to nexus or your local repository use

gaev=1.3.7

mvn install:install-file -Dfile=$GAE_SDK_PATH/lib/user/appengine-jsr107cache-$gaev.jar -DgroupId=com.google.appengine -DartifactId=appengine-jsr107cache -Dversion=$gaev -DgeneratePom=true -Dpackaging=jar

Jcache provides an easy map like implementation. For example, our CacheController looks like this

public class CacheController {

	static Cache cache;
	static {
		try {
			CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
			Map props = createPolicyMap();
			cache = cacheFactory.createCache(props);
		} catch (CacheException e) {
			e.printStackTrace();
		}
	}

	private static Map createPolicyMap() {
		Map props = new HashMap();
		props.put(GCacheFactory.EXPIRATION_DELTA, 1800);
		return props;
	}

	public static String fetchCacheStatistics() {
		CacheStatistics stats = cache.getCacheStatistics();
		int hits = stats.getCacheHits();
		int misses = stats.getCacheMisses();
		return "Cache Hits=" + hits + " : Cache Misses=" + misses;
	}

	public static void put(String key, Object value) {
		cache.put(key, value);
	}

	public static Object get(String key) {
		return cache.get(key);
	}
}

As you would notice, we create the expensive CacheFactory in a static block and get a reference to the Cache which we use throughout the application.

Once you have the Cache controller, there are 2 ways to write your caching strategy.

  1. Invasive – this would mean that your services might be aware of the Caching service
  2. Non- Invasive – the services are not aware that there exists a caching framework

In this post we would try to understand the former i.e. the invasive strategy.

I call it invasive because the application is well aware of the caching framework and it uses it directly. I also call it invasive because my services do more than what they are supposed to do and apart from their own logic, they also take care of caching which defies the SRP principle and allows cross cutting concerns to be rendered with the application logic.

But nevertheless it is simple, let us look at an example in which we pull the projects assigned to the user and we also know that these are not going to change in the GCacheFactory.EXPIRATION_DELTA=1800 that we have configured the cache with.

public List<ProjectAssignment> getProjectAssignmentsForUser(User user, DateRange dateRange) {
	List<ProjectAssignment> validAssignments = new ArrayList<ProjectAssignment>();
	List<ProjectAssignment> cachedListOfProjectAssignments = (List<ProjectAssignment>) CacheController.get(user.getEncodedKey());
		if (cachedListOfProjectAssignments == null) {
			validAssignments = projectAssignmentDAO.findProjectAssignmentsForUser(user, dateRange);
			CacheController.put(user.getEncodedKey(), validAssignments);
		} else {
			validAssignments = cachedListOfProjectAssignments;
		}
		return validAssignments;
	}

Here the service method does the following steps

  1. Checks with the cache is the data exists
  2. If it exists, return with the cached data else
  3. Fetch data from the datastore and
  4. Set it in the cache, so that next time it can be fetched from the cache.

This is one standard way to make sure that your application is using the cache effectively and is not hitting the datastore for the same keys again and again. The best candidates for caching are the entities that you would otherwise pull again and again, user preferences, session data etc.

In the next post, we would try to look at the non-invasive way of caching.

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development. Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). To know more, send a mail to hello@knoldus.com or visit www.knoldus.com
This entry was posted in Cloud, Java and tagged , , , . Bookmark the permalink.

2 Responses to Google App Engine: Understanding Caching

  1. Pingback: Google App Engine: Understanding Non-Invasive Caching « Inphina Thoughts

  2. Pingback: What is the Correct Caching Strategy? « Inphina Thoughts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s