Monday, June 26

PostgreSQL’s CTEs are optimisation fences

As part of AXLE project work I recently found myself re-examining issues around optimisation fencing and non-semantic query execution changes in PostgreSQL. One key issue was the inability to use a CTE where optimisation (inlining, push-up/pull-down, flattening, etc) was desired. I’d like to explain that in more detail, as it’s something that can surprise new and experienced PostgreSQL users alike.

When optimising queries in PostgreSQL (true at least in 9.4 and older), it’s worth keeping in mind that – unlike newer versions of various other databases – PostgreSQL will always materialise a CTE term in a query.

This can have quite surprising effects for those used to working with DBs like MS SQL:

  • A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile; and
  • You cannot UPDATE or DELETE FROM a CTE term, because it’s more like a read-only temp table rather than a dynamic view.

It means that these two queries are surprisingly different:

WITH aggs(k,n) AS (SELECT x, count(x) FROM big_table GROUP BY x)
   k, n
FROM aggs
WHERE k = 42;


   k, n
FROM (SELECT x, count(x) FROM big_table GROUP BY x) aggs(k,n)
WHERE k = 42;

In other DBs they’d generally be different ways of writing the same thing, but in PostgreSQL the former will read and aggregate the whole table, then throw most of the work away, wheras the latter will generally get turned into:

   k, n
FROM (SELECT x, count(x) FROM big_table GROUP BY x WHERE x = 42) aggs(k,n)

by the query optimiser and only read a small subset of the underlying table.

As there is significant opposition to changing this behaviour, you may find that you want to stick to using subqueries in FROM even when a CTE would be a lot clearer to read.

Essentially, PostgreSQL currently uses CTEs as query hints in order to avoid having actual query hints like Oracle’s /*+ materialize */. So the semantics of the query are conflated with execution policy about the query.

Because people have been advised to use CTEs as optimisation fences to work around planner issues in the past:

Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.

… it’s going to be more difficult to get this changed in future, though it doesn’t look like the SQL spec requires CTEs to behave this way.

Note that PostgreSQL also uses another hint technique for blocking qualifier pull-up or push-down, the “offset 0” hack, e.g.:

   k, n
FROM (SELECT x, count(x) FROM big_table GROUP BY x OFFSET 0) aggs(k,n)
WHERE k = 42;

… which also acts as an optimisation fence, but is an obviously quirky anomaly that doesn’t serve any other purpose, unlike WITH, so it stands out as a hint way to modify execution without changing query semantics.

There are a few more like it, too.

If this CTE quirk bites you, feel free to leave a comment here or drop a note on pgsql-general mentioning how you ran into the issue. Probably calm down a bit before writing one like this though.

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633


  • RobJ

    I’m a developer/DBA, not a Postgres hacker. To me, it seems natural that a with-query would be materialized, because it could be used for other with-queries in the same statement, not just the main SELECT statement. However, if the DBMS knows that a faster execution plan would merge the with query into the statement, it makes sense to allow it to do so.

    If merging optimization is a worthy goal, then this sounds like a good candidate for a postgresql.conf setting. The first release to include the setting would default it to the current fencing behavior, then a later release would default it to allow the merging behavior in a later release, and a final release would remove the setting and always allow merging. Calm folks could settle on the exact timeline.

    • That’s my view as well, since we’re stuck with the fence as a feature. I seem to recall significant opposition to doing that last time it came up on -hackers; I’d need to re-read the whole thread to find it.

      (On second thoughts, a GUC would be an unwise choice. It can only affect the whole query, not a part of it. It isn’t easy to set it to affect just that query. And we’ve generally experienced pain with “big hammer” planner GUCs in the past.)

        • wCTEs must obviously be fenced, but that’s no good reason for a SELECT to be. We know at parse time whether a CTE term is a wCTE or not.

          That isn’t true of functions with side effects in a regular SELECT term in a CTE. Though that isn’t really greatly different to subqueries, views, etc – we’re not, AFAIK, particularly careful to avoid pullup/pushdown of quals if it might affect the rows affected by VOLATILE functions.

          I suppose in an ideal world we’d be able to mark functions as “VOLATILE READ ONLY” vs “VOLATILE READ WRITE” (or something) then scan the parse tree for r/w functions. But we haven’t had it so far and we still inline views and all sorts of other things without it.

          Ideally (to me) a SELECT term in a CTE should work just like a view that’s only visible for the life of the query – complete with being simply updatable if applicable. I’m not even sure it’s complicated to implement and I’ve done enough work in the rewriter that I should have a go.

          If people want them to be materialized they should use our materialize hint OFFSET 0 hack.

  • Jon Erdman

    You might want to point out that the CTEs themselves are now writable (i.e. you can do an UPDATE/DELETE/INSERT inside the CTE itself, which can do some pretty interesting things when combined with RETURNING).

  • Josh Kupershmidt

    One notable exception to the “(non-writeable) CTEs are always materialized” rule is that if a non-writeable CTE node is not referenced anywhere, it won’t actually be evaluated. For example, this query returns 1 instead of bombing out:

    WITH not_executed AS (SELECT 1/0),
    executed AS (SELECT 1)
    SELECT * FROM executed;

  • Noah Yetter

    Given that the Postgres development group stubbornly and foolishly refuses to implement hints, this is fully necessary. If CTEs *didn’t* act as an optimization fence, there would be essentially no way to hand-tune queries in Postgres.

    • Actually, I showed how to do just that above: the OFFSET 0 hack prevents pushdown/pullup too. Unlike CTEs it has no other utility – it really is a hint, it just isn’t called a hint.

      So I disagree that CTE fencing is the only way to achieve such tweaks.

  • Len Jaffe

    Why the subqueries at all?

    Why aren’t you just writing

    select x, count(*) from table where x = 42 group by x; ?

    • I needed to construct a simple query that’d demonstrate the behaviour without being complex or confusing enough to detract from the explanation.

      Most of the real queries where this becomes a problem are complicated enough that they’d need a fair bit of explanation that’s unrelated to the topic at hand.

      You’re correct that you can just flatten it by hand. That’s not the point of the article, though.

  • ypercube

    Minor typo (WHERE – GROUP BY order). The:

    FROM (SELECT x, count(x) FROM big_table GROUP BY x WHERE x = 42) aggs(k,n)

    should be:

    FROM (SELECT x, count(x) FROM big_table WHERE x = 42 GROUP BY x) aggs(k,n)

  • I was bitten by this. :-)

    Professionally I work with SQL Server, but I’m using PostgreSQL in an academic project to perform a data quality assessment.

    PostgreSQL has some joyful programmability features like COUNT DISTINCT over multiple columns, string aggregates, and interval arithemetic. The equivalent solutions in SQL Server are cumbersome and hacky.

    But when a complex query starts to look like arrow code, I almost wish I built my project using SQL Server!

    For readability’s sake, please let’s tear down the CTE fence and flatten the arrow code!

  • ypercube

    Follow up on Josh Kupershmidt’s comment.

    This variation will raise the “division by zero” error:

    WITH a AS (SELECT 1/0),
    b AS (SELECT * FROM a),
    executed AS (SELECT 1)
    SELECT * FROM executed;

  • r90t

    Thank you for your article! Do you know if there is there any performance improvement on using with output multiple times in join condition?

    • It’s hard to say without a concrete example, especially as I don’t know what you’re comparing to. Performance improvement compared to what?

      If you’re asking whether a CTE term (“WITH output”) gets evaluated once for each time it’s referenced, the answer is no. It’s evaluated once and materialized. So it’s like using a temp table. (In PostgreSQL, that, is; other systems differ).

      Whether this is faster than evaluating it multiple times if it appears more than once in the list of relations being joined to actually depends on the other predicates. If you have 1 million rows, one join condition selects 100 and the other join condition selects 200 then you’re probably greatly better off evaluating the CTE term twice with different predicates. But only if you can push the predicates down, and only if there aren’t any top-level nodes like a sort and limit. Wheras if it’d have to evaluate the CTE term for all 1 million rows anyway it’ll be much more efficient to evaluate it once and re-use the result.

      In other words “it depends”. The problem is that right now the query planner is denied the freedom to make that choice, and by documenting this limitation as a feature we’ve painted ourselves into a corner where we can’t easily change it later.

  • ypercube

    CTEs are always materialized, fine. Does that include `ORDER BY` in the CTE? I mean a query with:

    WITH a AS (SELECT x, y FROM big_table ORDER BY x)
    SELECT *
    FROM a ;

    Can one rely that the final order will be by x? I would assume that the answer is No and the ORDER BY in the CTE can be removed by the planner but I’m not sure.

    (Related question at DBA.SE: )

    • It’s like quite a few other cases – right now PostgreSQL will always return the rows in the order they’re output by the CTE, but you shouldn’t technically rely on it. Who knows what future features could change that?

      Unfortunately IIRC it’s also not smart enough to *recognise* that the rows are ordered correctly if you add another ORDER BY in the outer query. (Haven’t tested, but pretty sure). So it’s not necessarily free to do it the “right” way.

  • TechnicalJohn

    Okay, so I get that this is different than other DBs… in that it ACTUALLY FOLLOWS THE STANDARD… And yes, I do have some instances where I wish that I could turn of the fencing…

    But I would like to take issue with the thought that there is OPPOSITION to implementing a non-fenced CTE. Instead, there are REASONS for not implementing a non-fenced CTE. From the thread you link to the reasons are valid, and the devs are being careful about implementing a solution instead of some divergence from the standard that would end up biting them, and us, later down the line.

    I’ve always appreciated this about PostgreSQL development: while they do sometimes lag behind for a little while, their careful consideration and (mostly) consistent adherence to standards ends up eventually putting them far ahead once they do settle on a real SOLUTION.

    As far as for my own issues with this, I find that it usually makes me take a step back and ask myself “why am I doing it this way?”, and often I see that I’m trying to force something into the database that shouldn’t be there.

  • Chris Cogdon

    The premise of the article is not quite true. The statement in the with query is not necessarily fully materialized. If the parent query does not refer to the CTE, then the CTE is not executed. (Exception: if the CTE is data modifying then it’s executed exactly once). If the parent query limits the number of rows (with a LIMIT statement) then only enough rows from the CTE are generated necessary to satisfy the LIMIT clause.

    However, the part that’s important is that conditions in the parent query are less likely to be pushed down into the CTE. Especially true if the CTE is referred to in more than one place.

    From the documentation (9.1 and onwards):

    “…that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)”

  • Dan

    I find this issue a bit maddening. In almost every possible way, PostgreSQL is far more developer-friendly than Oracle, and I’ve been pushing for my company to convert our flagship db from Oracle to PG…

    … but this is kind of a dealbreaker.

    We have tons and tons and tons of views written like this:

    WITH q1 as (

    q2 as (

    q3 as (

    select bar, baz
    from q1
    join q2 on (…)
    join q3 on (…)

    I personally find this linear style of subqueries/CTEs far, far easier to read and to debug than nested subqueries.

    But performance with this coding style is frequently terrible in PG, due to the CTE optimization fence. It’d be really useful to have a keyword to treat WITH-SELECT CTEs exactly the same as subqueries.

    • I couldn’t agree more. Personally I think it was a horrible mistake to do this in the first place. But we’re stuck with it to a degree.

      I’d like to try to handle optimisation across CTEs, but don’t know the planner/optimiser code well enough and I’m super busy on logical replication work. Really, we need someone to step up and fund development on this, but so far that hasn’t happened and everyone’s busy on other priorities.

  • hans

    I too think that this optimization is needed. There might have been good reasons to choose this solution when CTEs were first introduced, but I think it’s time to re-think that position 7 years later.

    What about starting a crowd-funding project for this and see how much that would yield?

    • It’s hard to imagine that working well unless you also find someone who’s keen enough to work on it for rather less than their usual rates, but also has the skills required to successfully complete the work. It’s hopelessly impractical to crowd-fund realistic consulting-development work at even discounted rates unless it’s for quite small things.

      It’s not just development, you see. You also have to convince everyone it’s a good idea, get the feature through multiple rounds of review, revise (or rewrite) it after each round of review, find a committer who agrees it should be included, etc.

      In this case you’d have to come up with a solution to the backward compatibility issues proper CTEs would introduce, given that we’ve documented our current implementation’s limitations as features for some time. It’s exceptionally unlikely that “use a GUC” (configuration option) would be acceptable, given how poorly that worked for standard_conforming_strings, bytea_output, etc. A number of committers are extremely opposed to GUCs that change query semantics, and for pretty solid reasons.

      So yeah. While I think trying to collect crowd funding pledges sounds nice, I’m not sure how practical it is. What’s really needed is someone who wants this enough to implement it or hire someone to implement it. So far everyone I’ve worked with on various porting efforts etc has just grumbled about it then adapted their queries to use subqueries in FROM or other workarounds.

Leave a Reply

Your email address will not be published. Required fields are marked *