If you have been following the Google App Engine downtime notify group then you would realize that the datastore has been behaving erratically over the last 2 weeks. Though we had been thinking about exploiting Memcache for caching for a while now, but the last 2 weeks proved to be the perfect alibi. Another reason was the comment that we got on our blog by Gianni Mariani which confirmed that Memcache is going to be significantly faster than the datastore.
Datastore transactions are generally much more expensive than Memcache transactions and is a significant source of application contention in heavily used apps.
Before we get to the actual implementation, there are some interesting facts about GAE caching that you should know.
- Values can expire from the Memcache at any time, and may be expired prior to the expiration deadline set for the value. The app engine platform would be managing the expiration and the advice is that we should be prepared to lose the value before the actual expiration that we have defined. Why?
- That is because, although, Memcache is segmented by application which means that every app is guaranteed a fair share of Memcache space, however, the nature of the Memcache is that you are sharing it with a lot of other applications. If other apps are Memcache hungry then memory pressure will eventually push your inactive values out of the cache even before the expiration.
- Memcache API respects the Namespace so it is easy to use it in the multi-tenant application like ours.
GAE provides support for JCache, a proposed interface standard for memory caches, as an interface to the App Engine Memcache. This interface is described by JSR 107. App Engine provides this interface using the net.sf.jsr107 interface package. One thing that you would have to keep in mind is that you need to include the appengine-jsr107cache-1.3.7.jar as a part of your war. If you are using a maven structure then the dependency would be
[sourcecode language=”xml”]
<dependency>
<groupId>com.google.appengine</groupId>
<artifactId>appengine-jsr107cache</artifactId>
<version>${gae.version}</version>
</dependency>
[/sourcecode]
and to deploy it to nexus or your local repository use
[sourcecode language=”bash”]
gaev=1.3.7
mvn install:install-file -Dfile=$GAE_SDK_PATH/lib/user/appengine-jsr107cache-$gaev.jar -DgroupId=com.google.appengine -DartifactId=appengine-jsr107cache -Dversion=$gaev -DgeneratePom=true -Dpackaging=jar
[/sourcecode]
Jcache provides an easy map like implementation. For example, our CacheController looks like this
[sourcecode language=”java”]
public class CacheController {
static Cache cache;
static {
try {
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
Map props = createPolicyMap();
cache = cacheFactory.createCache(props);
} catch (CacheException e) {
e.printStackTrace();
}
}
private static Map createPolicyMap() {
Map props = new HashMap();
props.put(GCacheFactory.EXPIRATION_DELTA, 1800);
return props;
}
public static String fetchCacheStatistics() {
CacheStatistics stats = cache.getCacheStatistics();
int hits = stats.getCacheHits();
int misses = stats.getCacheMisses();
return "Cache Hits=" + hits + " : Cache Misses=" + misses;
}
public static void put(String key, Object value) {
cache.put(key, value);
}
public static Object get(String key) {
return cache.get(key);
}
}
[/sourcecode]
As you would notice, we create the expensive CacheFactory in a static block and get a reference to the Cache which we use throughout the application.
Once you have the Cache controller, there are 2 ways to write your caching strategy.
- Invasive – this would mean that your services might be aware of the Caching service
- Non- Invasive – the services are not aware that there exists a caching framework
In this post we would try to understand the former i.e. the invasive strategy.
I call it invasive because the application is well aware of the caching framework and it uses it directly. I also call it invasive because my services do more than what they are supposed to do and apart from their own logic, they also take care of caching which defies the SRP principle and allows cross cutting concerns to be rendered with the application logic.
But nevertheless it is simple, let us look at an example in which we pull the projects assigned to the user and we also know that these are not going to change in the GCacheFactory.EXPIRATION_DELTA=1800 that we have configured the cache with.
[sourcecode language=”java”]
public List<ProjectAssignment> getProjectAssignmentsForUser(User user, DateRange dateRange) {
List<ProjectAssignment> validAssignments = new ArrayList<ProjectAssignment>();
List<ProjectAssignment> cachedListOfProjectAssignments = (List<ProjectAssignment>) CacheController.get(user.getEncodedKey());
if (cachedListOfProjectAssignments == null) {
validAssignments = projectAssignmentDAO.findProjectAssignmentsForUser(user, dateRange);
CacheController.put(user.getEncodedKey(), validAssignments);
} else {
validAssignments = cachedListOfProjectAssignments;
}
return validAssignments;
}
[/sourcecode]
Here the service method does the following steps
- Checks with the cache is the data exists
- If it exists, return with the cached data else
- Fetch data from the datastore and
- Set it in the cache, so that next time it can be fetched from the cache.
This is one standard way to make sure that your application is using the cache effectively and is not hitting the datastore for the same keys again and again. The best candidates for caching are the entities that you would otherwise pull again and again, user preferences, session data etc.
In the next post, we would try to look at the non-invasive way of caching.
2 thoughts on “Google App Engine: Understanding Caching4 min read”
Comments are closed.