programming

Simple Java program to Append to a file in Hdfs

Reading Time: 2 minutes In this blog, I will present you with a java program to append to a file in HDFS. I will be using Maven as the build tool. Now to start with- First, we need to add maven dependencies in pom.xml. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor Continue Reading

R-The Statistical Programming Language

Reading Time: 5 minutes R is a powerful language used widely for data analysis and statistical computing. It was developed in the early 90s. It is one of the most popular languages used by statisticians, data analysts, researchers, and marketers to retrieve, clean, analyze, visualize and present data. It is open source and free. It supports cross-platform interoperability i.e, R code written on one platform can easily be ported Continue Reading

BigData Specifications – Part 1 : Configuring MySql Metastore in Apache Hive

Reading Time: 2 minutes Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive can use Spark SQL or HiveQL for the implementation of queries. Now Hive uses its metastore which contains the following information, Ids of tables, Ids Continue Reading

Effective Programming In Scala – Part 3 : Powering Up your code implicitly in Scala

Reading Time: 5 minutes Hi Folks, In this series we talk about the concepts that provide a better definition to the code written in scala. We provide the methods with some definitions that lead to perform a task in a better way. Lets have a look at what we have done in the series so far, Effective Programming in Scala – Part 1 : Standardizing code in better way Continue Reading

Intercepting Nutch Crawl Flow with a Scala Plugin

Reading Time: 4 minutes Apache Nutch, is an open source web search project. One of the interesting things that it can be used for is a crawler. The interesting thing about Nutch is that it provides several extension points through which we can plugin our custom functionality. Some of the existing extension points can be found here. It supports a plugin system which is used in Eclipse as well. Continue Reading