Advertisements

SQL

Cafeteria Management System with a Twist

Reading Time: 2 minutes System to digitalize manual management of cafeteria.

Advertisements

Understanding Spark’s Logical and Physical Plan in layman’s term

Reading Time: 5 minutes This blog pertains to Apache SPARK 2.x, where we will find out how Spark SQL works internally in layman’s terms and try to understand what is Logical and Physical Plan. Also we will be looking into Catalyst Optimizer. So let’s get started. First let’s see what Apache Spark is. The official definition of Apache Spark says that “Apache Spark™ is a unified analytics engine for large-scale Continue Reading

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

RAP: Let’s discuss the architecture and technical details

Reading Time: 4 minutes In the previous blog, we discussed the journey and the features of RAP. In this blog, we will discuss the architecture and technical details. RAP Architecture: We have tried to follow the Domain-Driven Design and Reactive principles in designing RAP architecture. All RAP team members completed all the Reactive Architecture courses launched by Lightbend on cognitive classes to ensure adherence to reactive principles. These courses Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Using Vertica with Spark-Kafka: Reading

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

KSQL: Streams and Tables

Reading Time: 3 minutes By now you must be familiar with KSQL and how to get started with it. If not, check out the Part1 KSQL: Getting started with Streaming SQL for Apache Kafka of this series. In this blog, we’ll move one step forward to get an understanding of the Dual streaming model to see what abstractions does KSQL use to process the data. All the data that we Continue Reading

Streaming data from PostgreSQL using Akka Streams and Slick in Play Framework

Reading Time: 4 minutes In this blog post I’ll try to explain the process wherein you can stream data directly from PostgreSQL database using Scala Slick (which is Scala’s database access/query library) and Akka Streams (which is an implementation of Reactive Streams specification on top of Akka toolkit) in Play Framework. The process is going to be pretty straightforward in terms of implementation where data is read from one Continue Reading

KnolX: Understanding Spark Structured Streaming

Reading Time: < 1 minute Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

presto server using JDBC

Knolx: Getting started with Presto

Reading Time: < 1 minute Hi all, Knoldus has organized a 1-hour session on 8th September 2017. The topic was “Getting started with Presto”. Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides or video. The slides of the Knolx are here: And Here’s the video of the session: For any Continue Reading

SQL made easy and secure with Slick

Reading Time: 5 minutes Slick stands for Scala Language-Integrated Connection Kit. It is Functional Relational Mapping (FRM) library for Scala that makes it easy to work with relational databases. Slick can be considered as a replacement of writing SQL queries as Strings with a nicer API for handling connections, fetching results and using a query language, which is integrated more nicely into Scala. You can write your database queries Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!