Let Us Grid Compute

Table of contents
Reading Time: 3 minutes

Since early times oxen were used for heavy pulling. Sometimes the logs were huge and an oxen could not pull it. The smart people from the earlier times did not build a bigger ox. Instead they used two or three together.

Simple, isn’t it?

It is the same concept which has gone behind the use of multiple commodity hardware linked together to provide super processing capabilities as compared to using a big, heavy and costly super computer.

Grid computing as the name suggests is a special type of parallel computing that relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. The advantages are multi fold. Each node is a commodity hardware which does not cost much. They can be distributed across the globe. There are easy scaling options available by just adding a few more. Redundancy can be high. And many more. Today with the advent of Cloud based computing, you do not even need to worry about the boxes. Thanks to Utility Computing, all that happens behind the scenes for you.

One of the early movers in the area of Grid computing is GridGain. They have graduated to a cloud offering now to help us remain abstracted from underlying logistics of setting up a grid computing cloud. The 3.0 EA version would be interesting for those who want to taste cloud computing with ease.

Grid Computing = Compute Grids (GridGain) + Data Grids (like Gigaspaces, Oracle Coherence etc)

Compute Grids, provide parallel execution where as Data Grids provide parallel storage.

In this post we will set up a few nodes of GridGain on the same machine and distribute our Jobs to these nodes. GridGain uses the concept of MapReduce for task execution.

picture courtesy: GridGain

The steps for any task are

  1. Task execution request
  2. Task splits into jobs
  3. Result of job execution
  4. Aggregation of job results

Now let us quickly dive into a simple scenario of computing on multiple nodes.

Assume that we get a bulky feed file from a main frame system for processing. This file is delivered at regular intervals and the current mechanism takes pretty long to compute it through and then make entries to the database.

Let us see how the client looks like

[sourcecode language=”java”]

public static void main(String[] args) throws GridException {

GridComputeClient computeClient = new GridComputeClient();

if (args.length == 0) {
GridFactory.start();
} else {
GridFactory.start(args[0]);
}
computeClient.process();
}

private void process() {
try {
int numberOfCompaniesProcessed = processFeedFile("/feed.txt");

logger.info(">>>");
logger.info(">>> Finished executing Gridify with the company records.");
logger.info(">>> Total number of companies processed is ‘" + numberOfCompaniesProcessed + "’.");

} finally {
GridFactory.stop(true);
}
}

[/sourcecode]

As you would note that we have a processFeedFile(“/feed.txt”); method. We would want this method to be Gridified. Let us look at the method now

[sourcecode language=”java”]

@Gridify(taskClass = GridSplitAndReduceTask.class, timeout = 3000)
public static int processFeedFile(String fileLocation) {
return 0;
}

[/sourcecode]

The task class, in this case GridSplitAndReduceTask, is responsible for splitting method execution into sub-jobs.

The @Gridify annotation uses AOP (aspect-oriented programming) to automatically “gridify” the method. This registers the method with the job scheduling system. When the application comes up and triggers execution the method is then scheduled through the job scheduling system and allocated to nodes.

Let us look what does the Task looks like.

[sourcecode language=”java”]

/**
* @author <a href="mailto:info@inphina.com">Inphina Technologies</a>
*
*/
@SuppressWarnings("serial")
public class GridSplitAndReduceTask extends GridifyTaskSplitAdapter<Integer> {

[/sourcecode]

Each task has to implement 2 methods. One is split and the other one is reduce. Again based on the concept of MapReduce. The main function of split is to create multiple jobs out of the main task. Each job is then sent to a different node for processing. Once the processing is done, all the job results are passed on to the reduce method, which can then play with the jobresults and pass the main result back to the client.

[sourcecode language=”java”]

/**
* {@inheritDoc}
*/
@Override
protected Collection<? extends GridJob> split(int gridSize, GridifyArgument gridArguements) throws GridException {

Object[] feedFiles = gridArguements.getMethodParameters();
ArrayList<String> records = splitFileIntoIndividualRecords((String) feedFiles[0]);

List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(records.size());

processJobs(records, jobs);

return jobs;
}

[/sourcecode]

[sourcecode language=”java”]

/**
* {@inheritDoc}
*/
public Integer reduce(List<GridJobResult> results) throws GridException {
int totalRecordsProcessed = 0;
for (GridJobResult res : results) {
// Every job returns with a counter of 1 to show that it was processed.
Integer counter = res.getData();
totalRecordsProcessed += counter;
}
return totalRecordsProcessed;
}
[/sourcecode]

Hence, you would have observed that it is really simple to make your method and processing grid enabled. The main criteria would remain the creation of Jobs according to your logic. Once that is done, GridGain would take it to various nodes and execute them.

Other advantages of GridGain are

  • Failover support
  • Transparent redeployment of code changes to all nodes
  • Integration with multiple data grid systems
  • Presence on the cloud.

Download the source code of this post.

Written by 

Vikas is the CEO and Co-Founder of Knoldus Inc. Knoldus does niche Reactive and Big Data product development on Scala, Spark, and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra). Vikas has been working in the cutting edge tech industry for 20+ years. He was an ardent fan of Java with multiple high load enterprise systems to boast of till he met Scala. His current passions include utilizing the power of Scala, Akka and Play to make Reactive and Big Data systems for niche startups and enterprises who would like to change the way software is developed. To know more, send a mail to hello@knoldus.com or visit www.knoldus.com

1 thought on “Let Us Grid Compute4 min read

Comments are closed.