Let Us Grid Compute


Since early times oxen were used for heavy pulling. Sometimes the logs were huge and an oxen could not pull it. The smart people from the earlier times did not build a bigger ox. Instead they used two or three together.

Simple, isn’t it?

It is the same concept which has gone behind the use of multiple commodity hardware linked together to provide super processing capabilities as compared to using a big, heavy and costly super computer.

Grid computing as the name suggests is a special type of parallel computing that relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. The advantages are multi fold. Each node is a commodity hardware which does not cost much. They can be distributed across the globe. There are easy scaling options available by just adding a few more. Redundancy can be high. And many more. Today with the advent of Cloud based computing, you do not even need to worry about the boxes. Thanks to Utility Computing, all that happens behind the scenes for you.

One of the early movers in the area of Grid computing is GridGain. They have graduated to a cloud offering now to help us remain abstracted from underlying logistics of setting up a grid computing cloud. The 3.0 EA version would be interesting for those who want to taste cloud computing with ease.

Grid Computing = Compute Grids (GridGain) + Data Grids (like Gigaspaces, Oracle Coherence etc)

Compute Grids, provide parallel execution where as Data Grids provide parallel storage.

In this post we will set up a few nodes of GridGain on the same machine and distribute our Jobs to these nodes. GridGain uses the concept of MapReduce for task execution.

picture courtesy: GridGain

The steps for any task are

  1. Task execution request
  2. Task splits into jobs
  3. Result of job execution
  4. Aggregation of job results

Now let us quickly dive into a simple scenario of computing on multiple nodes.

Assume that we get a bulky feed file from a main frame system for processing. This file is delivered at regular intervals and the current mechanism takes pretty long to compute it through and then make entries to the database.

Let us see how the client looks like


public static void main(String[] args) throws GridException {

 GridComputeClient computeClient = new GridComputeClient();

 if (args.length == 0) {
 GridFactory.start();
 } else {
 GridFactory.start(args[0]);
 }
 computeClient.process();
 }

private void process() {
 try {
 int numberOfCompaniesProcessed = processFeedFile("/feed.txt");

 logger.info(">>>");
 logger.info(">>> Finished executing Gridify with the company records.");
 logger.info(">>> Total number of companies processed is '" + numberOfCompaniesProcessed + "'.");

 } finally {
 GridFactory.stop(true);
 }
 }

As you would note that we have a processFeedFile(“/feed.txt”); method. We would want this method to be Gridified. Let us look at the method now


@Gridify(taskClass = GridSplitAndReduceTask.class, timeout = 3000)
 public static int processFeedFile(String fileLocation) {
 return 0;
 }

The task class, in this case GridSplitAndReduceTask, is responsible for splitting method execution into sub-jobs.

The @Gridify annotation uses AOP (aspect-oriented programming) to automatically “gridify” the method. This registers the method with the job scheduling system. When the application comes up and triggers execution the method is then scheduled through the job scheduling system and allocated to nodes.

Let us look what does the Task looks like.


/**
 * @author <a href="mailto:info@inphina.com">Inphina Technologies</a>
 *
 */
@SuppressWarnings("serial")
public class GridSplitAndReduceTask extends GridifyTaskSplitAdapter<Integer> {

Each task has to implement 2 methods. One is split and the other one is reduce. Again based on the concept of MapReduce. The main function of split is to create multiple jobs out of the main task. Each job is then sent to a different node for processing. Once the processing is done, all the job results are passed on to the reduce method, which can then play with the jobresults and pass the main result back to the client.


/**
 * {@inheritDoc}
 */
 @Override
 protected Collection<? extends GridJob> split(int gridSize, GridifyArgument gridArguements) throws GridException {

 Object[] feedFiles = gridArguements.getMethodParameters();
 ArrayList<String> records = splitFileIntoIndividualRecords((String) feedFiles[0]);

 List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(records.size());

 processJobs(records, jobs);

 return jobs;
 }


/**
 * {@inheritDoc}
 */
 public Integer reduce(List<GridJobResult> results) throws GridException {
 int totalRecordsProcessed = 0;
 for (GridJobResult res : results) {
 // Every job returns with a counter of 1 to show that it was processed.
 Integer counter = res.getData();
 totalRecordsProcessed += counter;
 }
 return totalRecordsProcessed;
 }

Hence, you would have observed that it is really simple to make your method and processing grid enabled. The main criteria would remain the creation of Jobs according to your logic. Once that is done, GridGain would take it to various nodes and execute them.

Other advantages of GridGain are

  • Failover support
  • Transparent redeployment of code changes to all nodes
  • Integration with multiple data grid systems
  • Presence on the cloud.

Download the source code of this post.

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development. Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). To know more, send a mail to hello@knoldus.com or visit www.knoldus.com
This entry was posted in Architecture, Java and tagged , , , . Bookmark the permalink.

One Response to Let Us Grid Compute

  1. Pingback: Combining CEP with Grid Computing « Inphina Thoughts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s