Big Data

CuriosityX: RDDs – The backbone of Apache Spark

In our last blog, we tried to understand about using the spark streaming to transform and transport data between Kafka topics. After reading that many of the readers asked us to give a brief description of RDDs in Spark which we used. So, this blog is totally dedicated to the RDDs in Spark. So let’s start with the very basic question that comes to our mind Continue Reading

Code Combat II : The Code Battle For The Vanguard Continues…

“If you can dream it, you can do it. ”  -Walt Disney For some coding is a job. For some, it is an exercise. But for us folks here at Knoldus, it’s a Passion. So in order to bring a twist in the daily work schedule, Knoldus held an overnight Hackathon competition within the organization on 18th May 2018 which presented an opportunity for every Knolder(employees Continue Reading

Spark Stream-Stream Join

Tuning spark on yarn

In this blog we will learn how to tuning yarn with spark in both mode yarn-client and yarn-cluster,the only requirement to get started is that you must have a hadoop based yarn-spark cluster with you. In case you want to create a cluster you can follow this blog here. 1. yarn-client mode:  In client mode, the driver runs in the client process, and the application master is only used Continue Reading

Structured Streaming: Philosophy behind it

In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. It treats the live data stream as a table that is being continuously appended/updated which allows us to express our streaming computation as Continue Reading

KnolX: NAIVE BAYES CLASSIFIER

Hi all, Knoldus has organized a 30 min session on 27th April 2018 at 4:00 PM. The topic was NAIVE BAYES CLASSIFIER. Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides.

Structured Streaming: How it works?

In our previous blog post – Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it’s time to learn  – How it works? So, in this blog post, we will look at the working of a structured stream via an example. So, let’s take a Continue Reading

Running Spark on DC/OS

Devops engineers for long needed an open source tool to make it easy to deploy the code developed through all the ups and downs to reach this far and is considerably more capable of evolving (pun intended). As we all know in this world of agile we need to shift our requirements after a short duration of time. Be it addition of a feature or tweaking Continue Reading

KnolX: Machine Learning with Artificial Neural Networks

Hi all, Knoldus has organized a 30 min session on 8th December 2017 at 4:15 PM. The topic was Machine Learning with Artificial Neural Networks. Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you have any question related to linked slides.   Machine Learning with Artificial Neural Networks from Knoldus Inc. Here’s the video of the Continue Reading

Scorex: A Modular Blockchain, Scala Framework

Scorex is an open-source project written in Scala with loosely coupled and pluggable components. Scorex provides many abstractions for which you have to provide concrete implementations to make the Blockchain as per your requirements on the top of Scorex. It is backed by IOHK, a technology company based in Hong Kong. Scorex is still experimental and raw. It appears to be in an early stage of development. All the Continue Reading

Are you missing on the Digital wave?

It has been an interesting week for Knoldus. At this time, almost half of the organization is awake worldwide for us in Toronto, Singapore, Berlin, Chicago, Miami, Mumbai and Noida participating in our CodeCombat 2018 (24 hours long Hackathon). This week Knoldus also spoke at the ScalaDays 2018, Berlin. The other wonderful part is onboarding of a huge healthcare organization who would like to transform Continue Reading

Structured Streaming: What is it?

With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business challenges. The reason is, the frameworks (the ones mentioned above) provided inbuilt support for all of them. For example: In Spark Streaming, by just adding Continue Reading

Introduction to Hyperledger Sawtooth

Hyperledger Sawtooth is Hyperledger’s open-source blockchain platform, following closely on Hyperledger Fabric. It is an enterprise distributed ledger proposed by Intel and was one of the first projects to join Linux Foundation’s Hyperledger umbrella. This platform is modular, scalable, with an innovative consensus model, has unique support for permissioned and permissionless infrastructure and the potential for incredibly large network sizes.

MachineX: Total Support Tree for Association Rule Generation

In our previous blogs on Association Rule Learning, we have seen the FP-Tree and the FP-Growth algorithm. We also generated the frequent itemsets using FP-Growth. But a problem arises when we try to mine the association rules out of these frequent itemsets. Generally, the number of frequent itemsets is massive and to run an algorithm on them becomes very memory inefficient. So, to store these Continue Reading

%d bloggers like this: