Map reduce with Akka and Scala: The famous word count

After working for around a decade with Java and family, recently i tried my hands with Scala and Akka. Yes! changing taste buds is not at all easy. But working with Scala is fun!  Prior to this, every time when i start on building some APIs, First thing which comes in my mind is

Number of java classes or beans. Honestly a POJO is never more than having getters and setters. (Do i need them?)

Scala just made it easy. Let’s see how!

Java

Honestly in previous java blogs i used to skip these setters and getters by simply mentioning as

// getters and setters!

With Scala it’s really easy.

By the way, you can also define a property class for this and access it as property.
There are number of things which makes Scala having an added advantage over Java(e.g. immutability, utility functions etc.) but i will skip these for now and come back to topic, which we will walk through together.

Map Reduce
It was originally published as a google paper here. With Hadoop came into existence around late 2009, map reduce processing has been widely used across for large data processing. For more detail about Hadoop map reduce can be found here. It’s a model to perform parallel processing of data distributed across multiple data nodes.

Here we will take on famous word count example, which will read words from a file and will perform word count in map reduce manner.

I assume that user is aware Akka, in short it is all about asynchronous distributed message processing and can process millions of message per second on local box. Which enables application to utilize CPU and resources at the fullest.

Let’s discuss few of components(rather scala methods) to define and process a file on local box.

FileReceiver : An Akka actor to receive file name as an input message to initiate word count and finally broadcast message on completion.

LineCollector: Akka actor to receive file information and open file channel to read and further distribute lines as chunks message.(Local mapper)

LocalAggregator : Akka actor to act as local chunk collector and perform word count aggregation. (e.g. Local reducer)

CountAggregator: Akka actor to act as global line aggregator to publish final word count after successful reading of file.

finally what we need is an App to run this. here it is!

That’s it! We can write all this in single scala file. What about java ???
Also, try this out at your end it is incredibly fast on my local box(8 GB dual core Dell laptop)!

Oh yes! Forget to mention about

Load your Akka configuration file prepare and get Actorsystem up and running. Configuration looks very simple!

Just need to configure Remote actor provider, host and port and that’s it.

Have fun and happy programming.

References: http://dustinmartin.net/getters-and-setters-in-scala/

3 thoughts on “Map reduce with Akka and Scala: The famous word count

  1. IMHO it’d be better not to simply copy-paste stuff from places and compile into one.
    And your first piece of Scala code doesn’t compile:
    class Person
    {
    var age:Int = _
    def age = _age
    def age_=(age: Int) = _age = age
    }

    At least add some references instead of just plagiarizing.

  2. Hi,
    I was able to set up the project and also build it without any errors but when I run the program hags after it was read all the chunks. The program control does not go into CountAggregator! How should I make this work?

Leave a Reply

%d bloggers like this: