MachineX: Understanding FP-Tree construction


In my previous blog, MachineX: Why no one uses apriori algorithm for association rule learning?, we discussed one of the first algorithms in association rule learning, apriori algorithm. Although even after being so simple and clear, it has some weaknesses as discussed in the above-mentioned blog. A significant improvement over the apriori algorithm is FP-Growth algorithm.

To understand how FP-Growth algorithm helps in finding frequent items, we first have to understand the data structure used by it to do so, the FP-Tree, which will be our focus in this blog.

FP-Tree

To put it simply, an FP-Tree is a compressed representation of the input data. It is constructed by reading the dataset one transaction at a time and mapping each transaction onto a path in the FP-Tree structure. As different transactions can have same items, their paths may overlap.

Continue reading

Posted in Artificial intelligence, machine learning, Scala | Tagged , , , , , , , , | Leave a comment

Lombok: Never write another getters/setters or equals method


Being shifted from scala to java in one of my projects, I was not happy as I would have to write 10 lines of code in Java for a single line of Scala code. Thankfully, I was introduced to Lombok through one of our clients.

Project Lombok is a java library that automatically plugs into your editor and build tools, spicing up your java.

Let me first give you a simple example of normal java v/s scala code.

For a POJO Employee in java,

public class Employee {

    private String id;
    private String name;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

There is just a single line of Scala code –

case class Employee(id: String, name: String)

Continue reading

Posted in Java, knoldus | 1 Comment

Docker Architecture


In my previous blog, we had a little glimpse of what Docker is. It’s time to take one step ahead of that. Let’s understand more about Docker through its architecture.

A Quick recall

WHAT?

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

WHY?

Suppose, I have made a Java application and set up an environment with tomcat server installed in it. Continue reading

Posted in Devops, Scala | Tagged , , , | Leave a comment

A Quick Guide: Java 8 DateTime API


DateTime API in Java 8 has overcome many shortcomings of older java.util.Date and java.util.Calendar. As there were many issues with the older Date and Calendar API. So now let us see how this java 8 DateTime API resolve the issues and how it provides the better support in JDK core.

Issues in Existing Date/Time APIs :

  1. Thread Safety.
  2. Zoned Date and Time.
  3. Design and Easiness.

Core Idea of Java 8 DateTime API :

The above-mentioned issues are resolved by the Java 8 DateTime API. Following three points will explain how Java 8 DateTime API resolved the issues of Existing java formatter.

  • Immutable-value classes: The main issue in the existing java formatter is that they are not thread-safe. Java 8 DateTime API resolve this issues by ensuring that its core classes are immutable and represent the well-defined values.
  • Domain-Driven Design: In this new API, classes are more specific, that represent different use cases for Date and Time closely. Basically, new API focussed on Domain-Driven Design and offers long-term benefits around clarity and understandability.
  • Chronologies Issues: New API allows developers to use different zone calendering without any burden. There is no need to put any extra effort into doing so.

Continue reading

Posted in Scala | Leave a comment

MachineX: Why no one uses apriori algorithm for association rule learning?


In my previous blog, MachineX: Two parts of Association Rule Learning, we discussed that there are two parts in performing association rule learning, namely, frequent itemset generation and rule generation. In this blog, we are going to talk about one of the algorithms for frequent itemset generation, viz., Apriori algorithm.

The Apriori Principle

Apriori algorithm uses the support measure to eliminate the itemsets with low support. The use of support for pruning candidate itemsets is guided by the following principle –

If  an itemset is frequent, then all of its subsets must also be frequent.

The above principle is known as The Apriori Principle.

Continue reading

Posted in Artificial intelligence, machine learning, Scala | Tagged , , , , , , , | 1 Comment

MachineX: Two parts of Association Rule Learning


In our previous blog, MachineX: Layman guide to Association Rule Learning, we discussed what Association rule learning is all about. And as you can already tell, with a large dataset, which almost every market has, finding association rules isn’t very easy. For these, purposes, we introduced measures of interestingness, which were support, confidence and lift. Support tells us how frequent an itemset is in a given dataset and confidence tells us about the reliability of that rule. Due to large datasets, it is computationally very expensive to find both support and confidence for it.

An initial step towards improving the performance of association rule mining algorithms is to decouple the support and confidence requirements. Let’s look at it more closely.

Suppose a dataset exists such as the one below –

Continue reading

Posted in Artificial intelligence, machine learning, Scala | Tagged , , , , , , , , | 2 Comments

DC/OS: The Architecture #2


In my previous post, we have gone through the type of nodes, tasks and the distributed process management. Now we’re going under the hood to explore the components which make DC/OS what it is.

You’ll find the basic working of each component here. Let me show you around…

Introduction

First, let’s check out the map we’re going to follow in order to get where we want to. Although, we’re only going to take a stroll here still one should be aware of where they are.

The following are the floors of this huge establishment. There are 3 floors.

Continue reading

Posted in Devops | Tagged , | Leave a comment

Deploying Services In DC/OS


DC/OS uses Marathon to manage processes and services. Marathon is the “init system” for DC/OS. Marathon starts and monitors your applications and services, automatically healing failures. A native Marathon instance is installed as a part of DC/OS installation. DC/OS services are Marathon applications that are deployed on DC/OS. We create or install a package from the Catalog to start a service. Both services you create and those you install from Universe appear on the Services tab of the DC/OS web interface when they are running. In this blog, we will learn how to use configure and run a service through JSON. We will launch a Tomcat server using a Docker image.

Prerequisites

A running DC/OS 1.11 cluster with at least 1 public agent node.

Continue reading

Posted in Devops | Tagged | Leave a comment

DC/OS: Dynamic Resourcing


Suppose that we are running a service on a node and it was producing some date to be used later. Then your node was restarted. What will happen to it? The data will evaporate and on the next boot, the machine will be empty.

Let’s extend the example consider that we have to reserve some resources on the node. How will we ever do that? If we did it, how will we ensure that it will available? Also, Is there a way to let the resource be used by some other application if it is not used?The last problem was a challenge in DC/OS but that challenge plus all above it are solved by Dynamic Resourcing in DC/OS.

Continue reading

Posted in Scala | Leave a comment

Unit Testing – The good, bad & ugly


Unit testing is common to all the codes that we write. We all do it, so it’s important to understand the most “looked forward” (sarcasm) aspect of development. However overlooked and underestimated, Unit Testing forms the most important part of any development cycle and hence it’s worth delving into. So let’s begin.

So the big question at hand is, what is Unit Testing?

A unit test is an automated piece of code that invokes a unit of work in the system and then checks a single assumption about the behavior of that unit of work.”

–       Roy Osherove

A fair definition but let’s break it down. I want to focus on two important constituents of this definition:

invokes a unit of work”

Here, a unit of work is a single logical use case of the system irrespective of its size. It could span a method, multiple methods, a class or even classes. All that matters is this combination achieves a single logical purpose.

checks a single assumption about the behavior of that unit of work”

Overloaded-Tricycle-Funny-Transportation-Picture

A unit test case works with something specific about a functionality, it doesn’t cover it entirely. It’s a bad practice to create something called testAllThings and call every method in the namespace.

Another important aspect of a unit test is that it works with the use case in isolation.

Which means it should not depend or be bothered by any external entity. So changing another part of the codebase should not affect a unit test case. Simply put, if you have a unit test case that doesn’t run without a proper setup or malfunctions in the presence of one, you haven’t written a unit test case. Continue reading

Posted in Best Practices | Leave a comment