Prometheus Query Language Explained

  Visit at GitHub

Prometheus is a very powerful tool that requires mastering it’s specialized query language for exploring its time series. This language is not very difficult to master, but has indeed many layers of complexity. In this post we will explain how to create the definitive PromQL queries that will show you all the insights in your data.

First I will explain each of the different parts of the Prometheus queries, starting with basic types, then how to access to data using tag filtering, and finally some operators and functions that can be applied over these metrics.

Prometheus metrics types

Timeseries at Prometheus can be counter by tree main techniques:

  1. An always increasing count of events AKA counters: How many visits on a server, error counts, CPU use as units of time used… This way its very easy to apply several techniques. For example get the rate at which the value changes or if suddenly there is a decrease in the value you can infer for example that the server that does the measurement has been restarted (check the restart function).

  2. Other data types are gauges, which allow counts to go up and down, like current memory usage.

  3. Finally histograms and summaries can be used for more advanced uses where it is possible to provide detailed statistics of the data as quantile aggregations, sums, counts and so on. These are “complex” types that use several gauges and/or counters to expose the data, but can be used by advanced functions. We will not explore them on this article.

These counters provide scalar values (just a number); from time to time Prometheus polls for the value of each counter for a tagged condition. For example it has the number of requests by type and return code. Then it attaches the timestamp to the query, and store the data.

When querying there are three main data types: scalars, vector of scalars, and vector of vector of scalars (timespans).

Anatomy of a Prometheus Metric

First element to select at any query are the metrics and which spcific data of each metric we want to analyze.

node_memory_MemFree{instance="web"}[5m] offset 1m
  |                         |        |     |
  |                         |        |     +- Offset value
  |                         |        +- Timespan range selector
  |                         +- Filtering selector
  +- Time series metric selection

You have to choose first which time series metric you are getting data from, then you can apply some filter selector based on the tags and finally add a time span range selector. An offset can also be set that allows to get data from some time in the past, instead of the data point in that moment.

Conceptually this query does this:

1. Grab the time series for the node_memory_MemFree (nmM)

timestamp 5 6 7 8 9
nmM{instance=”web”} 5 6 7 8 9
nmM{instance=”test”} 20 20 20 20 20
nmM{instance=”mail”} 30 30 30 30 30

Each of the tagged columns is a scalar vector, with each cell an scalar.

2. Filter to get only those which have the instance tag to web

timestamp 5 6 7 8 9
nmM{instance=”web”} 5 6 7 8 9

There are many possible selector operations:

Matchers Description
= Tag value match
!= Tag value DO NOT match
=~ Tag value matches a regular expression
!~ Tag value DO NOT match a regular expression

3. Create for each one of these series, a range of the last 5 minutes data.

timestamp 5 6 7 8 9
nmM{instance=”web”} 5,4,3,2,1 6,5,4,3,2 7,6,5,4,3 8,7,6,5,4 9,8,7,6,5

This generates a vector vector of scalars. We will call them timespans as they span many values over a time range. We will see how to use them at the Advanced Functions section.

4. Offset the data 1 minute. So it does not show current data, but from 1 minute ago.

timestamp 5 6 7 8 9
nmM{instance=”web”} 4,3,2,1 5,4,3,2,1 6,5,4,3,2 7,6,5,4,3 8,7,6,5,4

Finally for each time point, we have now 5m minutes of data, from 1 minute ago. If not enough data just take as much as possible.

PromQL Operations

With PromQL you can combine several timeseries using simple mathematical expressions. Thesse expressions will be appplied in pparallel to each time point of the related metrics.

For example to get the percentage of free memory, you can use:

node_memory_MemFree / node_memory_MemTotal

As time series:

timestamp 5 6 7 8 9
nmM{job=”web”} 10 9 8 9 11
nmM{job=”test”} 5 4 2 6 1
/          
nmT{job=”web”} 20        
nmT{job=”test”} 10        
=          
result{job=”web”} 0.5 0.45 0.4 0.45 0.55
result{job=”test”} 0.5 0.4 0.2 0.6 0.1

Normally you don’t need much more than this, but PromQL has a lot of more posibilities.

Inner workings and advanced operations

Internally operators are applied to each value of both timeseries; if the sizes do not match, for example because of a downtime on one side or different polling periods, the system will use the previous value for each side.

An expression like the example above generates a new set of time series, where the common groups of tags on each side are matched and the result is calculated.

This matching implies several things; for example if one side has tags which differ from the other side, that operation will not contain that set of tags.

For example a slight variation of percentage of free mem only for instance web, but divided by all the known metrics on MemTotal only makes sense when applying also the {instance=”web”} to MemTotal:

node_memory_MemFree{instance="web"} / node_memory_MemTotal

These two operations are also equivalent:

node_memory_MemFree / node_memory_MemTotal{instance="web"}
node_memory_MemFree{instance="web"} / node_memory_MemTotal{instance="web"}

This also means that if you want to match two time series for which the tags do not match, normally the result will fail and return nothing, so may want to force the match on a set of tags or to drop some tags.

http_errors{code="500"} / ignoring(code) http_requests
http_errors{code="500"} / on(method) http_requests

At this database http_errors store the code and method tag, and http_requests the method tag.

The first expression ignores the code label on the left side, so it can do the proper operation with the data on the right; without the ignoring(code) there would be no matches as http_requests does not store the return code, only the request type.

The second expression forces the match on the method tag, so we will get new timeseries each with one of the common tags.

Same result, different paths.

All usual suspects are available as expressions: +, -, *, /, %, ^, and, or. You can learn more on PromQL operations at the documentation page.

PromQL functions

You can also apply functions as avg, sum, max, min… to your data. For example to get the highest memory usage for your full cluster of machines, use:

max(node_memory_MemTotal - node_memory_MemFree)

It is possible to fan out the aggregation functions to give results partitioned by a tag.

For example, to group by jobs:

max by (job) (node_memory_MemTotal - node_memory_MemAvail)

Most common functions include:

Function Description
min Minimum value of metrics on all tags
max Maximum value of metrics on all tags
avg Average value of metrics on all tags
time Seconds since 1st Jan 1970 (unix time)
resets Count of how many times the counter did reset over a timespan

There are a lot of functions in Prometheus. Check out the reference documentation for the full list.

Advanced functions

As we saw in the anatomy of a PromQL expression, its possible to gather also timespans of scalars, which is a list of lists of values over some time. This is useful to calculate some statistics that apply over time. Namely rates.

For example one of the most useful expressions when monitoring servers is this one:

100 - (avg by (instance) (irate(node_cpu{job="node",mode="idle"}[5m])) * 100)

It uses the time range operator [5m] to calculate the rate of CPU idleness over all the CPUs assigned to the job node and calculates the average grouping by instance. Doing this aggregation it abstracts over the count of CPUs the server may have, the number of nodes assigned to the job and any other possible tag. It then “reverses” the operation to provide the CPU usage percentage).

If for example you are more interested on the maximum CPU usage for any server, you may use:

100 - (max(irate(node_cpu[1m])) * 100)

These functions can be applied over a timespan of values:

Function Description
min_over_time Minimum value over a timespan
max_over_time Maximum value over a timespan
avg_over_time Average value over a timespan
irate Instant rate over a timespan

Closing remarks

There are many details to the PromQL language and mastering it is not an easy task, but it’s not an impossible one.

If you are planning to use Prometheus in the near future and you are looking for beautiful Dashboards to show your data, or easier installation, or improved integration, or connection your data to other sources that are not yet using prometheus, or you are more comfortable using SQL, or want an advanced automation system, or all of the above, give Serverboards a try.

It’s Open Source and simplifies many of the management and tasks around prometheus, letting you play with the insides of PromQL where required.