Wednesday, November 14

Adding new table columns with default values in PostgreSQL 11

In PostgreSQL version 10 or less, if you add a new column to a table without specifying a default value then no change is made to the actual values stored. Any existing row will just fill in a NULL for that column. But if you specify a default value, the entire table gets rewritten with the default value filled in on every row. That rewriting behavior changes in PostgreSQL 11.

In a new feature I worked on and committed, the default value is just stored in the catalog and used where needed in rows existing at the time the change was made. New rows, and new versions of existing rows, are written with the default value in place, as happens now. Any row that doesn’t have that column must have existed before the table change was made, and uses this value stored in the catalog when the row is fetched. The great advantage of this is that adding a column with a default value is now quite a fast and cheap operation, whereas before for very large tables it has been horribly, often intolerably slow. Rewriting a whole multi-terabyte table is really something you want to avoid.

The default value doesn’t have to be a static expression . It can be any non-volatile expression, e.g. CURRENT_TIMESTAMP. Volatile expressions such as random() will still result in table rewrites. For a non-volatile expression, it is evaluated at the time the statement runs and the result is what is stored in the catalog for use with existing rows. Of course, for new rows the expression will be re-evaluated at the time the row is created, as happens now.

Any time that the table does get rewritten, say by VACUUM FULL, all this is cleaned up, as there will be no rows left needing the value from the catalog. Likewise, any time that a row is updated the column’s value is filled in on the new row version, so over time the proportion of rows using the value fetched from the catalog will decrease.

One Comment

  • Tometzky

    Was there any discussion on allowing for non-storing default values for new and updated rows also?

    There are often quite a few columns on the database with over 90% of values equal for to some static value. For example issue_status=’CLOSED’, invoice_currency=’USD’, address_country=’USA’ etc. With this functionality the requirement for storing the value for 90% of rows disappeared.

    Would it be possible to add the functionality to:
    – on row update which does not update the column, to leave it not-stored?
    – on row update which sets the column value to DEFAULT to not-store it?
    – on new row creation which sets the column value to DEFAULT to not-store it?

    This has a potential for significantly decreasing storage size requirements for some tables.

Leave a Reply

Your email address will not be published. Required fields are marked *