Wednesday, August 16

The WITHIN GROUP and FILTER SQL clauses of PostgreSQL 9.4

PostgreSQL 9.4 extends the SQL standard by inserting two new clauses that facilitate many operations required during the development of applications: the WITHIN GROUP and FILTER clauses.

within-group-and-filter


The WITHIN GROUP clause

The WITHIN GROUP clause is particularly useful when performing aggregations on ordered subsets of data.

PostgreSQL introduced window functions since version 9.0 in order to work on subsets of data that can be correlated to each current record of tables, defining a sort of “aggregates” centred on any specific record as the query is gradually executed via the SQL OVER(PARTITION BY/ORDER BY) clause and by using the functions that can be performed on those aggregations.

With version 9.4 of PostgreSQL the SQL WITHIN GROUP clause was introduced: this simplified many operations that had previously only been possible with the use of the window functions, defining aggregations of ordered subsets of data.
In addition, new functions were introduced that can be applied to these subsets and expand the collection of available window
functions:

  • percentile_cont(), percentile_disc() for the calculation of
    percentiles;
  • mode() a statistical function that calculates the mode on ordered
    subsets;
  • rank(), dense_rank(), percent_rank(), cume_dist(): window functions already available in PostgreSQL to be executed on the subsets obtained using the OVER (PARTITION BY/ORDER BY) clause and now able to take as a parameter ordered subsets produced by the WITHIN GROUP clause.

To better clarify the situation, let’s say, for example, that we want to calculate the 25th, the 50th, the 75th and the 100th percentile of the first 20 integers. Until now, this was only possible by dividing the numbers into 4 sets via the OVER (PARTITION BY/ORDER BY) clause, then order them internally into 4 ordered subsets from which to then take the maximum value – for example, by using a CTE:

$ CREATE TABLE t AS SELECT generate_series(1,20) AS val;

$ WITH subset AS (
    SELECT val,
       ntile(4) OVER (ORDER BY val) AS tile
    FROM t
  )
  SELECT max(val)
  FROM subset GROUP BY tile ORDER BY tile;

   max
  ------
   5
  10
  15
  20
 (4 rows)

With PostgreSQL 9.4 everything is reduced to a single SQL command, resulting in significant advantages in terms of readability of the scripts and execution of the commands:

$ CREATE TABLE t AS SELECT generate_series(1,20) AS val;

$ SELECT unnest(percentile_disc(array[0.25,0.5,0.75,1])
    WITHIN GROUP (ORDER BY val))
  FROM t;

   max
  ------
   5
  10
  15
  20
 (4 rows)

The FILTER clause

This second clause of SQL commands is useful when you want to apply filters on subsets of data without necessarily
performing aggregations.
For example, it is now possible to perform a total count of the records of a table and also a partial code>count of one of its subsets that satisfies a certain condition (expressed by the WHERE clause) within a single query, without having to use further ones to be performed on aggregations:

$ SELECT count(*) count_all,
         count(*) FILTER(WHERE bid=1) count_1,
         count(*) FILTER(WHERE bid=2) count_2
  FROM pgbench_history;

 count_all | count_1 | count_2
 ----------+---------+---------­­­­­­­­­
      7914 |     758 |     784
 (1 row)

In this particular case, this also simplifies the readability of scripts and improves execution performances.


Conclusions

The extension of the SQL standard through the introduction of these new clauses further facilitates the work of developers, who are increasingly able to delegate manipulation and aggregation of subsets of data to the database.
By using the WITHIN GROUP clause, the management of subsets of data that can be ordered becomes easier through the introduction of new window functions.
The FILTER clause helps to manage subsets of data that meet certain conditions, thereby avoiding aggregations.



Last updated 2015-05-21 10:51:41 CEST

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *