Monday, December 11

Column Store Plans

Over at pgsql-general, Bráulio Bhavamitra asks:

I wonder if there is any plans to move postgresql entirely to a columnar store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. Completely replacing the current row-based store wouldn’t be a good idea: it has served us extremely well and I’m pretty sure that replacing it entirely with a columnar store would be disastrous performance-wise for OLTP use cases.

Some columns
Some columns. Picture courtesy of Yiming Sun on Flickr

That doesn’t mean columnar stores are a bad idea in general — because they aren’t. They just have a more limited use case than “the whole database”. For analytical queries on append-mostly data, a columnar store is a much more appropriate representation than the regular row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do a lot more than just change the underlying storage: you need to ensure that the rest of the system can take advantage of the changed representation, so that it can execute queries optimally; for instance, you may want aggregates that operate in a SIMD mode rather than one-value-at-a-time as it is today. This, in itself, is a large undertaking, and there are other challenges too.

As it turns out, there’s a team at 2ndQuadrant working precisely on these matters. We posted a patch last year, but it wasn’t terribly interesting — it only made a single-digit percentage improvement in TPC-H scores; not enough to bother the development community with (it was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you’re going to be able to say Dear server, for this table kindly set up columnar storage for me, would you? Thank you very much. And then you’re going to get a table which may be slower for regular usage but which will rock for analytics. For most of your tables the current row-based store will still likely be the best option, because row-based storage is much better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

3 Comments

  • vdp

    Isn’t that essentially what cstore_fdw already provides ? Or does its fdw nature make it hard to optimize ? What about a table with some fields stored in colums and the other fields stored in rows ?

  • Anon

    So I watched Mark Wong youtube clip summarising patched and CS comparisons..

    From what I could tell, CS1 ans CS2 were 3-4x faster for queries. XL was 160 faster than PG. so therefore XL about half the speed of CS DBs?
    I also assume that XL was based on 9.5? and doesnt have any of the 9.6 perf improvements which looks like 100% read improvement approx and 2 CPU parallel query also looks like a fairly linear 100% gain as well (not sure if XL would improve the stats if it did have 9.6 perf code in it?)

    If the 9.6 code would improve XL.. and with similar perf gain e.g. 100% gain, then it would be similar to CS for queries would it not?

    So a Column store only benefit would be the write benefits..
    So is there much benefit of a column store? wouldnt focus on improvement PG as it is be a better option?

    Also, not sure I understand the full PG history, but if Cstore_FDW already exists.. why not incorporate that and just improve that rather than reinvent the wheel? I assume it may not be easy to utilise SIMD but surely tweaking cstore is better than from scratch, especially from a time to market – well get it into a release perspective..

    I could be well off, and since im not technical, any insight, corrections and thoughts would be appreciated :)

Leave a Reply

Your email address will not be published. Required fields are marked *