Tag Archives: Big Data

Partition-Aware Data Loading in Spark SQL


Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put(“user”, “username”) connectionProperties.put(“password”, “password”) val jdbcDF = spark.read .jdbc(“jdbc:postgresql:dbserver”, “schema.table”, connectionProperties) In here we are … Continue reading

Posted in Scala, Spark | Tagged , , , | 7 Comments

Short Interview With SMACK Tech Stack !!!


Hello guy’s, today’s we conduct short interview with SMACK about its architecture and there uses. Let’s start with of some introduction. Interviewer: How would you describe your self ? SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka) and … Continue reading

Posted in Akka, Apache Kafka, apache spark, big data, Cassandra, Scala, Spark | Tagged , , , , , , , , , , , , | Leave a comment

Tableau: Getting into Tableau Public


Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So … Continue reading

Posted in apache spark, big data, Scala, Spark, Tableau | Tagged , , , , , , , | Leave a comment

Business Intelligence-Data Visualization: Tableau


Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use … Continue reading

Posted in Scala, Tableau | Tagged , , , , , , , , | 2 Comments

Setting Up Multi-Node Hadoop Cluster , just got easy !


In this blog,we are going to embark the journey of how to setup the Hadoop Multi-Node cluster on a distributed environment. So lets do not waste any time, and let’s get started. Here are steps you need to perform. Prerequisite: … Continue reading

Posted in Architecture, big data, Scala | Tagged , , , , , , | 6 Comments

Cassandra Data Modeling – Primary , Clustering , Partition , Compound Keys


In this post we are going to discuss more about different keys available in Cassandra . Primary key concept in Cassandra is different from Relational databases. Therefore it is worth spending time to understand this concept. Lets take an example … Continue reading

Posted in Best Practices, big data, Cassandra, database, NoSql, Scala | Tagged , , , , , , | 4 Comments

Spark – IoT : Combining Big Data Analysis with IoT


Welcome back , folks ! Time for some new gig ! I think that last series i.e. Scala – IOT was pretty amazing , which got an overwhelming response from you all which resulted in pumping up the idea of … Continue reading

Posted in apache spark, IOT, Scala, Spark | Tagged , , , , , , , , , , , | 2 Comments

Hive-Metastore : A Basic Introduction


As we know database is the most important and powerful part for any organisation. It is the collection of Schema, Tables, Relationships, Queries and Views. It is an organized collection of data. But can you ever think about these question … Continue reading

Posted in database, Scala | Tagged , , , , | 1 Comment

Is using Accumulators really worth ? Apache Spark


Before jumping right into the topic you must know what Accumulators are ? for that you can refer this blog. Now we know what and why of Accumulators lets jump to the main point. Description :- Spark automatically deals with failed or … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , | Leave a comment

Broadcast variables in Spark, how and when to use them?


As documentation for Spark Broadcast variables states, they are immutable shared variable which are cached on each worker nodes on a Spark cluster.  In this blog, we will demonstrate a simple use case of broadcast variables. When to use Broadcast variable? … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged , , | 1 Comment