Studio-Scala

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Using Vertica with Spark-Kafka: Reading

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

Scala 2.13: Has Scala done it again?

Reading Time: 5 minutes The release of Scala 2.13 had been in talks for quite a long time, but it was finally released last month, i.e June 2019. With the release of this version, there are quite a few changes that Scala has brought for the users. With the intent of explaining some of the features that Scala has introduced/improved in its latest version I, Anmol Sarna, welcome back the Continue Reading

spark streaming with kafka

Kafka Streams: Data Enrichment with External lookup

Reading Time: 2 minutes Kafka Streams is a Client library where the input and output data are stored in an Apache Kafka cluster. It combines the simplicity of building and deploying Java and Scala processing applications with Kafka topics on the client side with the benefits of Kafka’s server-side cluster technology. When working with Kafka Streams, there are times when the stream processing application requires integration with data external Continue Reading

Take a deep dive into Kafka – Producer API

Reading Time: 4 minutes I am going to start a series of blogs on Kafka API. This blog is a part of the series. In the series of blogs In this blog, we are going to learn about Producer-API. If you are new to Kafka then I will recommend you to first get some basic idea about Kafka Quickstart from kafka-quickstart . There are many reasons an application might Continue Reading

Manage API Doc via Swagger

Reading Time: 2 minutes Swagger is a framework for describing your API using a common language that everyone can understand .Think of it as a blueprint for a house. Swagger reduces the manual work out of API documentation, provides a range of solutions for generating, visualizing and maintaining API docs.You can use whatever building materials you like, but you can’t step outside the parameters of the blueprint. For millions Continue Reading

protecting sensitive data in docker

Running a Cron Job in Docker Container

Reading Time: 3 minutes Setting up a cron job within a docker container might not sound new to many of us. But depending upon the base image that we use to build the docker image we might end up struggling with different issues. In this blog, I will walk you through the different challenges that I dealt with while setting up a cron using bash in a docker container. Continue Reading

Writing Java APIs using Apache Atlas Client

Reading Time: 2 minutes In the previous blog, Data Governance using Apache ATLAS we discussed the advantages and use cases of using Apache Atlas as a data governance tool. In continuation to it, we will be discussing on building our own Java APIs which can interact with Apache Atlas using Apache atlas client to create new entities and types in it. How to create new Entities and Types using Continue Reading

API Security in Apigee: Introduction to OAuth 2.0

Reading Time: 3 minutes In today’s world of web APIs How you can control access to your APIs from malicious attacks? How you can build a trustworthy system?

Do you really need Spark? Think Again!

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

Knoldus Inc. recognized by Clutch as a Top Hadoop Consultant

Reading Time: 2 minutes Here at Knoldus Inc., we pride ourselves on being one of the best developers of scala, big and fast data, microservices, and Artificial Intelligence, all of which have become increasingly important over the past years. However large and daunting these tasks may be, our clients are always our biggest priority. This is why we are ecstatic that Clutch has chosen us as one of the Continue Reading