Apache Nifi – The Ingestion tool

Reading Time: 3 minutes

What is Apache NiFi ?

Apache Nifi is an open source software for automating and managing the data flow between systems, which Leveraging the concept of Extract,Transform and Load. Apache Nifi a powerful as well as reliable system to process and distribute data. Additionally Apache Nifi has a web-based user interface for design, control, feedback, and monitoring of dataflows.

History of Apache NiFi

  • Based on the “NiagaraFiles” software previously developed by the US National Security Agency (NSA), which is also the source of a part of its present name.
  • Developed at NSA for over eight years
  • 2014- It was donated to the Apache Software Foundation
  • 2015- NiFi became an official part of the Apache Project Suite

What Makes Apache Nifi The Better Choice?

Visual : Visually develop and monitor your NiFi data flow through the web-base UI as a result of immediate visual feedback and flow monitoring.

Agile : Evolve your data flow on the fly in response to business condition, solution design, and performance needs.

Ingestion : Allows you to do data ingestion to pull data into NiFi, from numerous data sources as well as from create flow files

Connected: NiFi has built-in interoperability with Apache Hadoop, AWS services , generic HTTP web services, likewise

Tour of Nifi UI 

Every Segment of NIfi UI is responsible for different functionality of the application.

Status Bar -> 

The Status bar provides information about the number of threads that are currently active in the flow and also

the amount of data that currently exists in the flow, how many Remote Process Groups exist on the canvas in each state, etc

Operate Palette ->

The Operate Palette consists of buttons that are used by DFMs to manage the flow.

Also manages system configuration properties.

Navigate Palette -> 

The UI has some features that allow you to easily navigate around the canvas also you can use the Navigate Palette to pan around the canvas, and to zoom in and out.

Bird’s Eye View ->

The “Birds Eye View” of the dataflow provides a high-level view of the dataflow.

Architectural Overview

Web Server ->

Apache NiFi consists of a web server, flow controller and a processor, which runs on Java Virtual Machine.

It also has three repositories Flowfile Repository, Content Repository, and Provenance Repository 

Flow Controller ->

The flow controller is the brain of the operation. It offers threads for extensions to run and manage the schedule of when the extensions receive resources to run.

Content Repository ->

It is used to store all the data present in the flow files.The default approach is a fairly simple mechanism that stores blocks of data in the file system.

FlowFile Repository ->

The FlowFile Repository includes the current state and attribute of each FlowFile that passes through the data flow of NiFi.

It keeps track of the state that is active in the flow currently. 

The default location of this repository is in the root directory which can be changed as well

Provenance Repository ->

The repository tracks as well as stores all the events of all the flowfiles that flow in NiFi with two provenance repositories :

  • volatile provenance repository
  • persistent provenance repository

Advantages of Apache Nifi

  1. Apache NiFi offers a web-based User Interface (UI). So that it can run on a web browser using port and localhost. 
  2. On a web browser, Apache NiFi uses the HTTPS protocol to ensure secure user interaction.
  3. It supports the SFTP protocol that enables data fetching from remote machines.
  4. It also provides security policies at the process group level, user level, and other modules.
  5. NiFi supports all the devices that run Java.
  6. It provides real-time control that eases the movement of data between source and destination.
  7. Apache NiFi supports clustering so that it can work on multiple nodes with the same flow processing different data, which increases the performance of data processing.
  8. NiFi supports over 188 processors, and a user can create custom plugins to support various types of data systems.
apache nifi

Written by 

Chitra Sapkal is a software consultant at Knoldus Inc. having experience of 2 years. Knoldus does Big Data product development on Scala, Spark, and Functional Java. She is a self-motivated, passionate person who is recognized as a good team player, Her hobbies include playing badminton and travelling.