Optimizations In Spark: For BETTER OR For WORSE
Reading Time: 5 minutes This blog focuses on some of the problems faced while working with the Spark SQL
Reading Time: 5 minutes This blog focuses on some of the problems faced while working with the Spark SQL
Reading Time: 5 minutes Streaming of data is the need of the hour. This blog focuses on the developer’s need to process this stream, benefits, and the challenges it introduces.
Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading
Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading
Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading
Reading Time: 6 minutes Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, unlike map-reduce no I/O overhead, fault tolerance and many more. With this much, you can do a lot in this world of Big data and Fast data. From “processing huge chunks of data” to “working on streaming data”, Spark works flawlessly in all. In this Continue Reading
Reading Time: 3 minutes There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided by Spark. Another one is Structured Streaming which is built upon the Spark-SQL library. We will discuss the trade-offs and differences between these two libraries in Continue Reading
Reading Time: 2 minutes Logistic Regression, a predictive analysis, is mostly used with binary variables for classification and can be extended to use with multiple classes as results also. We have already studied the algorithm in deep with this blog. Today we will be using KSAI library to build our logistic regression model. Setup
Reading Time: 6 minutes Devops engineers for long needed an open source tool to make it easy to deploy the code developed through all the ups and downs to reach this far and is considerably more capable of evolving (pun intended). As we all know in this world of agile we need to shift our requirements after a short duration of time. Be it addition of a feature or tweaking Continue Reading
Reading Time: < 1 minute Hi all, Knoldus has organized a 30 min session on 8th December 2017 at 4:15 PM. The topic was Machine Learning with Artificial Neural Networks. Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides. Machine Learning with Artificial Neural Networks from Knoldus Inc. Here’s the video of the Continue Reading
Reading Time: 4 minutes This term “Deep Learning”, is on fire for past two decades. Every machine learning enthusiast wants to work on it and many big companies are already making an impact on Data Science field by exploring it e.g. Google Brain project from Google or DeepFace from Facebook. The reason is simple, experts say and I quote “for most flavors of the old generations of learning algorithms … performance will Continue Reading
Reading Time: 3 minutes Frankly, I don’t think there’s any need of telling us, “The Developers”, the need for proper testing or Unit testing to be correct(QAs, Don’t be flattered :P). The unit test cases are the quickest way to know there’s something wrong with our code. “Unit testing is important because it is one of the earliest testing efforts performed on the code and the earlier defects are detected, the easier Continue Reading
Reading Time: 3 minutes The world as we know it is moving towards machines big time. But we can not fully utilize the working of any machine without a lot of human interaction. So in order to do that, we needed some kind of intelligence for the machines. Here comes the place for Artificial Intelligence. It is the concept of machines being smart to carry out numerous tasks without Continue Reading