Apache Kafka

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Using Vertica with Spark-Kafka: Reading

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Hands-on: Apache Kafka with Scala

Reading Time: 4 minutes Apache Kafka is an open sourced distributed streaming platform used for building real-time data pipelines and streaming applications. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies. Before the introduction of Apache Kafka, data pipleines used to be very complex and time-consuming. A separate streaming pipeline was needed for every consumer. You can guess the complexity of it with Continue Reading

Exactly-Once Semantics with Apache Kafka

Reading Time: 4 minutes Kafka’s exactly once semantics was recently introduced with the version 0.11 which enabled the message being delivered exactly once to the end consumer even if the producer retries to send the messages. This major release raised many eyebrows in the community as people believed that this is not mathematically possible in distributed systems. Jay Kreps, Co-founder on Confluent, and Co-creator of Apache Kafka explained its Continue Reading

spark streaming with kafka

Assimilation of Spark Streaming With Kafka

Reading Time: 2 minutes As we know Spark is used at a wide range of organizations to process large datasets. It seems like spark becoming main stream. In this blog we will talk about Assimilation of Spark Streaming With Kafka. So, lets get started. How Kafka can be integrated with Spark? Kafka provides a messaging and integration platform for Spark streaming. Kafka act as the central hub for real-time streams of Continue Reading

Setting It Up: KAFKA Multi-Broker System

Reading Time: 5 minutes In this blog, I am going to cover up the leftovers of my last blog: “A Beginners Approach To KAFKA” in which I tried to explain the details of Kafka, like its terminologies, advantages and demonstrated like how to set up the Kafka environment and get our Single Broker Cluster up and then test it’s working. So the main thing that I am going to cover up here is How Continue Reading

Short Interview With SMACK Tech Stack !!!

Reading Time: 3 minutes Hello guy’s, today’s we conduct short interview with SMACK about its architecture and there uses. Let’s start with of some introduction. Interviewer: How would you describe your self ? SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka) and belongs to all open source technologies. Mesosphere and Cisco collaboration bundles these technologies together and create a product called Infinity.  Which is used to solved Continue Reading

Meetup: Introduction to Apache Kafka

Reading Time: 1 minute Knoldus organized a Meetup on Wednesday, 3 June 2016. Topics which were covered  in this meetup are 1) Overview of Kafka and Kafka ecosystem. 2)  Configuration of brokers,consumers and producers. 3) Design and motivation behind apache kafka. 4) Implementation of Consumer and Producer API. Actually, This meetup completed in to two part. In first part, We  have covered  1 to 3 points above mentioned.   Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!