BigData Specifications – Part 1 : Configuring MySql Metastore in Apache Hive

Reading Time: 2 minutes Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive can use Spark SQL or HiveQL for the implementation of queries. Now Hive uses its metastore which contains the following information, Ids of tables, Ids Continue Reading

Effective Programming In Scala – Part 3 : Powering Up your code implicitly in Scala

Reading Time: 5 minutes Hi Folks, In this series we talk about the concepts that provide a better definition to the code written in scala. We provide the methods with some definitions that lead to perform a task in a better way. Lets have a look at what we have done in the series so far, Effective Programming in Scala – Part 1 : Standardizing code in better way Continue Reading

Intercepting Nutch Crawl Flow with a Scala Plugin

Reading Time: 4 minutes Apache Nutch, is an open source web search project. One of the interesting things that it can be used for is a crawler. The interesting thing about Nutch is that it provides several extension points through which we can plugin our custom functionality. Some of the existing extension points can be found here. It supports a plugin system which is used in Eclipse as well. Continue Reading