BigData

HDFS: A Conceptual View

There has been a significant boom in distributed computing over the past few years. Various components communicate with each other over network inspite of being deployed on different physical machines. A distributed file system (DFS) is a file system with data stored on a server. The data is accessed and processed as if it was stored on the local client machine. The DFS makes it convenient to share information Continue Reading

Getting Introduced with Presto

Hi Folks! In today’s blog I will be introducing you to a new open source distributed Sql Query Engine – Presto. It is designed for running SQL queries over Big Data( petabytes of Data). It was designed by the people at Facebook. Introduction Quoting it’s formal definition “Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of Continue Reading

Business Intelligence-Data Visualization: Tableau

Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use these terms frequently, managing (and managing means here- storing data, rectifying data, normalizing it, cleaning it and much more) such amount of data is really Continue Reading

Cassandra Counter Column And Table

Blog Describes the usage of counter column and table in Cassandra .

Solr Relevance Search Using SolrJ In Scala

In this blog we will see how we can perform relevance(or relevant) search in solr using solrj Http API in scala . To give brief what is relevance search : – A developer working on search relevancy focuses on the following areas as the “first line of defense”: Text Analysis: the act of “normalizing” text from both a search query and a search result to Continue Reading

Apache spark + cassandra: Basic steps to install and configure cassandra and use it with apache spark with example

To build an application using apache spark and cassandra you can use the datastax spark-cassandra-connector to communicate with spark. Before we are going to communicate with spark using connector we should know how to configure cassandra. So following are prerequisite to run example smoothly. Following steps to install and configure cassandra If you are new to cassandra first we nee to install cassandra on our Continue Reading

Handling Large Data File Using Scala and Akka

We needed to handle large data files reaching size Gigabytes in scala based Akka application of ours. We are interested in reading data from the file and then operating on it. In our application a single line in a file forms a unit of data to be worked upon. That means that we can only operate on lines in our big data file. These are Continue Reading

%d bloggers like this: