Tag Archives: Big Data

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. … Continue reading

Posted in apache spark, big data, Scala | Tagged , , , , , | Leave a comment

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 3 Comments

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 1 Comment

Spark Structured Streaming: A Simple Definition


“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having … Continue reading

Posted in Scala, Spark, Streaming | Tagged , , , , | 2 Comments

Installing and Running Presto


Hi Folks ! In my previous blog, I had talked about Getting Introduced with Presto. In today’s blog, I shall be talking about setting up(installing) and running presto. The basic pre-requisites for setting up Presto are: Linux or Mac OS … Continue reading

Posted in big data, database, Scala | Tagged , , , , | Leave a comment

Partition-Aware Data Loading in Spark SQL


Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put(“user”, “username”) connectionProperties.put(“password”, “password”) val jdbcDF = spark.read .jdbc(“jdbc:postgresql:dbserver”, “schema.table”, connectionProperties) In here we are … Continue reading

Posted in Scala, Spark | Tagged , , , | 7 Comments

Short Interview With SMACK Tech Stack !!!


Hello guy’s, today’s we conduct short interview with SMACK about its architecture and there uses. Let’s start with of some introduction. Interviewer: How would you describe your self ? SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka) and … Continue reading

Posted in Akka, Apache Kafka, apache spark, big data, Cassandra, Scala, Spark | Tagged , , , , , , , , , , , , | Leave a comment

Tableau: Getting into Tableau Public


Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So … Continue reading

Posted in apache spark, big data, Scala, Spark, Tableau | Tagged , , , , , , , | Leave a comment

Business Intelligence-Data Visualization: Tableau


Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use … Continue reading

Posted in Scala, Tableau | Tagged , , , , , , , , | 2 Comments

Setting Up Multi-Node Hadoop Cluster , just got easy !


In this blog,we are going to embark the journey of how to setup the Hadoop Multi-Node cluster on a distributed environment. So lets do not waste any time, and let’s get started. Here are steps you need to perform. Prerequisite: … Continue reading

Posted in Architecture, big data, Scala | Tagged , , , , , , | 7 Comments