Prometheus is an open-source monitoring system for processing time series metric data. It collects, organizes, and stores metrics using unique identifiers and timestamps. DevOps teams and developers query that data using PromQL and then visualize it in a UI such as Grafana.
Basic Terminologies of Prometheus
- Monitoring : Monitoring is a systematic process of collecting and recording the activities taking place in a target project, programme or service and then using that recorded values to check if the targets are reaching their objectives or not.
- Alert/Alerting : Alert is basically a triggering event. It is the outcome of an alerting rule in Prometheus that is actively firing. Prometheus sends these alerts to the Alertmanager. Using prometheus, you can define some set of conditions and rules in the form of PromQL expressions that prometheus evaluates continuously.
- Alertmanager : The Alertmanager takes in alert from prometheus server, aggregates them into groups, de-duplicates, applies silences, throttles and then sends out notifications to email, pagerduty, slack etc.
- Target : A target is an object whose metrics you want to scrape and monitor. For example, a target can be prometheus itself as it exposes its own metrics to be scraped, or it can be a linux machine, windows machine or your own set of applications.
- Instance : Instance is an endpoint which you can scrape. For example, the address like 18.104.22.168:5670 or 5671 where the address 5670 is one instance and 5671 is another instance.
- Job : Job is a collection of instances with the same purpose. For example, monitoring a group of similar processes replicated for scalability or reliability, is called a job.
- Sample: A sample is a single value of the retrieved metric at a point of time in a time series.
The Prometheus server is the core binary of the prometheus system. It consists of the following three parts:-
- HTTP Server
Retrieval block retrieves or scrapes the data from its target nodes which can be any system or application and then stores that scraped data into the storage. Prometheus stores data locally in a custom time series database and its storage can be a HDD or SDD. The stored data is then made available to the visualization tools like Grafana through http. One can query the data using PromQL over http.
Prometheus, most of the time, uses the ‘pull’ method while monitoring systems. So if you have deployed Prometheus to monitor your systems or any applications, then, unlike other monitoring tools, your application does not need to send metrics to Prometheus, rather, Prometheus ‘itself’ pulls or scrapes the metrics from the targets. This pulling feature of over http offers your applications a more flexibility of developing changes.
There are some special components like short-lived service level batch jobs which cannot be scraped .Pushgateway is used to monitor these kind of jobs. The Pushgateway allows short-lived batch jobs to ‘push’ time series to an intermediary job. To have it ingested in the core it will use the pull approach to fetch those pushed metrics from Pushgateway. So basically, the jobs will push metrics into an intermediate entity from where Prometheus will pull those metrics into its core.
To make Prometheus aware of your targets, you have two options. You can hard code your targets which is not the best approach, as you may have a long list of dynamic targets that may get increase over the time.
In real-time environments, we have some inventory database of our applications and machines. These inventories may be based on DNS, Kubernetes, Consul, EC2. Now Prometheus has integrations with these common service discovery mechanisms. So whenever a change is made in the inventory, like if a node is added, since Prometheus has integration with the inventory, it would add that node automatically as a target and starts scraping it.
Prometheus Web UI
Using the Prometheus web UI, you can request raw data with its query language PromQL. Prometheus also comes with a feature that can produce graphs and dashboards out of metrics. You can also integrate Grafana to create your own dashboards. Prometheus even gives you the freedom to integrate 3rd party API clients to do custom things with metrics.
Prometheus server pushes the alerts to the Alertmanager. The Alertmanager after receiving the alerts from Prometheus servers, aggregates those alerts into groups, applies silences, throttles on them and then, at last, it sends out notifications to email, Pagerduty, Slack, or other services.
Prometheus Metric Types
- Counters : A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or reset to zero on restart. Counters are mainly used to track how often a particular code path is executed. For example, you can use a counter to represent the number of requests served, tasks completed, or errors encountered.Counters have one key method, inc() that by default increments the counter value by one. You can also increase its value by any arbitrary value.
- Gauge : Gauge is a metric that represents a single numerical value that can arbitrarily go up and down. Basically, gauges represent a snapshot of some current state. For example, gauges are typically used for measured values like temperature, current memory usage, the number of active threads or anything whose value can go both up and down. Gauges have three main methods namely inc(), dec(), and set(). Using the set method you can set the increment value to any arbitrary value. The inc() and dec() increments or decrements the value by one.
- Summary : A summary samples observations, observations like request durations i.e. how long your application took to respond to a request, its latency and response sizes etc.Summary track the size and number of events. Summary has one primary method observe() to which we pass the size of the event.
- History : A histogram samples observation like request durations or response sizes but histogram count these observations into configurable buckets. The instrumentation for histograms is the same as for summary. The main purpose of using histogram is calculating quantiles.
Prometheus is a very powerful tool for collecting and querying metric data. Using PromQL queries and Prometheus web UI or other tools like Grafana we can query or analyse the differences between metric data snapshots to model or represent how the data changes over time. Its ease of use, versatility and literally endless integration options make it a favorite in the monitoring and alerting world.