Introduction to InfluxDB

Reading Time: 4 minutes
InfluxDB

InfluxDB

A Time Series database that stores and manages data in time series form.

What is Time Series?

A time series is a collection of observations of well-defined data items resulting through repeated measurements over time. Time series data is indexed in time order which is a sequence of data points.

What is Time Series Database?

A time-series database (TSDB) is a database system that is optimized for providing time series data and its storage in association with time & value.

A time-series database consists of measurements or events that are monitored, tracked, and refinement of data i.e downsampling & aggregated over time. They can be application monitoring analysis data, server metrics, data about sensors, market trading data, and stock exchange data across markets.

A time-series database is capable of ingesting millions of data points per second providing high-level performance.

The classic real-world example of a time series is stock exchange currency price data.

What is the TICK Stack?

The TICK Stack is an acronym that denotes a platform of open-source tools built to collect, store, graph, and provides alerts on time series data incredibly easily and efficiently. “I” in TICK is “INFLUXDB”. TICK stack contains various components which are:

TELEGRAF: Telegraf’s provides a collection of metrics and a metrics collection agent that is used to collect and send metrics to InfluxDB.

InfluxDB: Influx DB is an open-source time-series database written in Go language which is developed by InfluxData. It is optimized for high-availability retrieval of data, and faster storage of time series data in fields such as operations monitoring, application metrics, IoT sensor data, and real-time analytics.

InfluxDB is a high-performance Time Series Database that can store data ranging from hundreds of thousands of points per second. The InfluxDB is a SQL-kind of query language which was built specifically for time series data.

CHRONOGRAF: It’s a whole TICK stack UI used to setup graphs and a dashboard of data in InfluxDB and integrates the Kapacitor alerts

KAPACITOR: It is used to break/crunch time series data into action alerts and send these alerts across to several products like Slack and PagerDuty. Kapacitor is a metrics, event processing, and alerting system application. The entire TICK Stack is interoperable, with each component providing significant value as a standalone application.

So why do we need specific storage for time series? Why can’t we use a traditional database like MySQL, Cassandra, MongoDB, or Elasticsearch?

To answer these questions, you should consider your use case. There are various benchmarks around InfluxDB versus other databases, and you can quickly see that InfluxDB outperforms them all.

But it isn’t only about performance; time series is a specific domain, and InfluxDB as a time series database provides different capabilities to work with time. This is probably the most important reason to use InfluxDB.

InfluxDB offers a powerful engine and two entry points to interact with it. It supports an HTTP API that runs by default on port 8086 with reading and writing capabilities. And it supports UDP as a writing protocol.

The Data Model

The data model is the structure of the data manipulated and managed by InfluxDB. You can see measurement as a table that contains a set of points that are usually under the same domain. Every point is labeled with tags and fields.

We call a set of tags the tagset. The main difference between tags and fields is the index. tags are indexed, and fields are not. Indexing tags allow you to make optimized queries with fewer resources consumed. Both tag and field are key values but the tag accepts strings, whereas fields accept integers and floats.

Every point has a time; we call it a timestamp. We use a protocol called line protocol to describe this data model:

measurement,tag=value,tag1=value1 field=value,field1=value1 timestamp

The timestamp is not mandatory, as InfluxDB will add it if not specified.

Understanding the InfluxDB model is important in order to design a fast structure around your dataset. A combination of .measurement + tagset is called a series. To identify a specific point, the right combination is measurement + tagset + timestamp.

To keep low cardinality and to increase the performance of your InfluxDB instance, you should keep the number of series as low as possible.

Influx CLI

With InfluxDB, there is a CLI called influx; if you install it via apt, yum, or Docker, you will find it in your system. The CLI is the default entry point. You can do everything from there — insert points, query, and manage database access. It uses the REST API to communicate with InfluxDB.

Query Engine

InfluxDB uses a SQL-like query language. It’s a bit controversial, and there are a lot of internal conversations about where to take this in the next major release.

The benefit of this query language is the onboarding process. It’s very simple since a lot of people know SQL, but for complex queries, sometimes it looks too complicated and hard to manipulate. That’s why the InfluxDB team is thinking about a different solution.

SELECT * FROM xyz WHERE time > now() - 1h LIMIT 10000

Retention Policy

The number of series and points stored is rapidly growing, which is the nature of monitoring, where you are continuously collecting and storing data. At some point, read performance will become a problem.

On other hand, if you don’t need to keep all your data in InfluxDB forever, there is a feature called retention policy. By default, it is set to keep your data forever but you can change it. If you set the retention policy to two weeks, all points stored with that retention will be removed after two weeks.

This helps you to automatically keep your InfluxDB clean and performing fast. You can have multiple retention policies working in the same database for various series. You just need to specify them.

knoldus

Written by 

Sumit Agarwal is a Software Consultant having more than 1+ years of experience. He is good at problem-solving skills. He likes to watch football and cricket. He has mainly experience in Java and Mysql.