We were in process of porting an application to GAE. The application used JPA for persistence and we decided to use the same for GAE as well. After deploying, the application worked fine on Google App Engine. Over a period of time we started getting errors, which seemed strange. All our tests were running fine and application worked without any issues with our seed data. Where was the problem!

The application had some IN queries in our JPA code which failed after the data changed on GAE production environment.

We have a JPA Entity TimesheetEntry and a Data Access Object JpaDetailedReportDao which fetches TimesheetEntry based on ProjectAssignment Keys. We used JPA IN Query to extract timesheet entries. Let’s look at the code listing.

public class TimesheetEntry {
	@GeneratedValue(strategy = GenerationType.IDENTITY)
	private Key timesheetEntryKey;

	private Key projectAssignmentKey;

	private Float hours;

	private Date entryDate;
	. . .

TimesheetEntry has an unowned relationship with ProjectAssignment. It stores ProjectAssignment’s Key as an attribute. Now we have a JpaReportDao which fetches all TimesheetEnry entities for a bunch of ProjectAssignment Keys. Let’s look at the code listing.

public class JpaDetailedReportDao {
   . . .
   public List<TimesheetEntry> getTimesheetEntriesForProjectAssignments(List<Key> projectAssignmentKeys) {
   String projectAssignmentInClause = createInClause(projectAssignmentKeys);
   List<TimesheetEntry> entries = entityManager.createQuery("SELECT FROM TimesheetEntry WHERE projectAssignmentKey IN (" + projectAssignmentInClause + ")").getResultList();
   return entries;
   . . .

createInClause() method constructs a string which contains comma separated ProjectAssignment Keys. You can read about correct string representation of Key to generate projectAssignmentInClause.

There were no error in above implementation, our tests ran fine. The application worked after deployment on GAE for some time. The only thing which changed was the data and that was the problem.

Datastore does not natively support IN queries. The Jpa query is broken down in underlying queries, whose results are then combined in memory. If N is the number of values in IN clause, then there will be N underlying queries. If we have a Query like this

SELECT FROM TimesheetEntry WHERE timesheetEntryKey IN ("key1", "key2", "key3")

Then the underlying queries will be

SELECT FROM TimesheetEntry WHERE timesheetEntryKey = "key1"
SELECT FROM TimesheetEntry WHERE timesheetEntryKey = "key2"
SELECT FROM TimesheetEntry WHERE timesheetEntryKey = "key3"

There is a limit that there cannot be more than 30 underlying queries for a JPA IN clause. In our case when production data increased over time it exceeded that limit and we started getting exceptions.

When issuing a Jpa IN query we have to be aware of number of underlying queries it will fire. It is equal to number of values in IN clause. If we are not sure of the number of values in IN clause then they should be avoided.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.