Monday, June 26

PostgreSQL 10 identity columns explained

For PostgreSQL 10, I have worked on a feature called “identity columns”. Depesz already wrote a blog post about it and showed that it works pretty much like serial columns:

CREATE TABLE test_old (
    id serial PRIMARY KEY,
    payload text
);

INSERT INTO test_old (payload) VALUES ('a'), ('b'), ('c') RETURNING *;

and

CREATE TABLE test_new (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    payload text
);

INSERT INTO test_new (payload) VALUES ('a'), ('b'), ('c') RETURNING *;

do pretty much the same thing, except that the new way is more verbose. ;-)

So why bother?

Compatibility

The new syntax conforms to the SQL standard. Creating auto-incrementing columns has been a notorious area of incompatibility between different SQL implementations. Some have lately been adopting the standard SQL syntax, however. So now you can move code around between, for example, PostgreSQL, DB2, and Oracle without any change (in this area).

Permissions

A general problem with the old way is that the system doesn’t actually remember that the user typed serial. It expands this at parse time into something like

CREATE SEQUENCE test_old_id_seq;

CREATE TABLE test_old (
    id int NOT NULL PRIMARY KEY,
    payload text
);

ALTER TABLE test_old
    ALTER COLUMN id SET DEFAULT nextval('test_old_id_seq');

ALTER SEQUENCE test_old_id_seq OWNED BY test_old.id;

The OWNED BY in the last command is an attempt to remember something about the serialness, but it is still insufficient in some cases.

The new way creates the sequence as a proper internal dependency of the table, so that various weird implementation details of the serial pseudotype are not exposed.

One common problem is that permissions for the sequence created by a serial column need to be managed separately:

CREATE USER foo;
GRANT INSERT ON test_old TO foo;
GRANT INSERT ON test_new TO foo;

SET SESSION AUTHORIZATION foo;

INSERT INTO test_old (payload) VALUES ('d');
ERROR:  permission denied for sequence test_old_id_seq

INSERT INTO test_new (payload) VALUES ('d');
-- OK

You can fix the error by also running

GRANT USAGE ON SEQUENCE test_old_id_seq;

If you have deployment scripts, this is annoying and problematic, because the name of the sequence is automatically generated. Here, of course, it appears in the error message, and it is easy to guess, but sometimes a slightly different name is chosen, and then your deployment scripts will fail.

Managing sequences

You also need to know the name of the sequence if you want to make some changes to the sequence:

ALTER SEQUENCE test_old_id_seq RESTART WITH 1000;

With an identity column, you don’t need to know the name of the sequence:

ALTER TABLE test_new ALTER COLUMN id RESTART WITH 1000;

Schema management

Since serial is not a real type, it can only be used in certain circumstances. You can specify serial as a column type when creating a table or when adding a column. But dropping serialness from an existing column or adding it to an existing column is not straightforward.

To drop serialness, you can drop the sequence (again, after ascertaining the name) with the CASCADE option, which cascades to remove the default value of the associated column:

DROP SEQUENCE test_old_id_seq CASCADE;

If you instead drop the default value like

ALTER TABLE test_old ALTER COLUMN id DROP DEFAULT;

it will drop the default but leave the sequence in place.

If you want to take an existing integer column and turn it into a serial column, there is no single command to do that. You will have to manually assemble the CREATE SEQUENCE and ALTER TABLE ... SET DEFAULT commands shown earlier.

Dropping the identity property of an existing column is easy:

ALTER TABLE test_new ALTER COLUMN id DROP IDENTITY;

You cannot accidentally make a mistake and drop a default, because there is none. But just in case, you get a nice error message:

=> ALTER TABLE test_new ALTER COLUMN id DROP DEFAULT;
ERROR:  column "id" of relation "test_new" is an identity column
HINT:  Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead.

You can also turn an existing integer column into an identity column with one command:

ALTER TABLE test_new
    ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

Copying table structures

If you use the CREATE TABLE / LIKE functionality to copy the structure of a table, serial columns pose a problem:

CREATE TABLE test_old2 (LIKE test_old INCLUDING ALL);
INSERT INTO test_old2 (payload) VALUES ('e') RETURNING *;
 id | payload
----+---------
  4 | e

Note that even though the new table is a separate table, it keeps using the old sequence.

This gets even more confusing when you want to drop the first table:

=> DROP TABLE test_old;
ERROR:  cannot drop table test_old because other objects depend on it
DETAIL:  default for table test_old2 column id depends on sequence test_old_id_seq
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

(You can use CASCADE as hinted, or drop test_old2 first. The latter works without CASCADE because the sequence is still linked to the first table.)

When you copy a table with an identity column in this way, you get a new sequence:

CREATE TABLE test_new2 (LIKE test_new INCLUDING ALL);
INSERT INTO test_new2 (payload) VALUES ('e') RETURNING *;
 id | payload
----+---------
  1 | e

Upgrading

Perhaps you are convinced and you want to “upgrade” all your messy serial columns to this new identity column thing. (Note that you don’t have to “upgrade”. You can keep using serial columns the same way as before.) Here is a PL/pgSQL function that you can use:

CREATE OR REPLACE FUNCTION upgrade_serial_to_identity(tbl regclass, col name)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
  colnum smallint;
  seqid oid;
  count int;
BEGIN
  -- find column number
  SELECT attnum INTO colnum FROM pg_attribute WHERE attrelid = tbl AND attname = col;
  IF NOT FOUND THEN
    RAISE EXCEPTION 'column does not exist';
  END IF;

  -- find sequence
  SELECT INTO seqid objid
    FROM pg_depend
    WHERE (refclassid, refobjid, refobjsubid) = ('pg_class'::regclass, tbl, colnum)
      AND classid = 'pg_class'::regclass AND objsubid = 0
      AND deptype = 'a';

  GET DIAGNOSTICS count = ROW_COUNT;
  IF count < 1 THEN
    RAISE EXCEPTION 'no linked sequence found';
  ELSIF count > 1 THEN
    RAISE EXCEPTION 'more than one linked sequence found';
  END IF;  

  -- drop the default
  EXECUTE 'ALTER TABLE ' || tbl || ' ALTER COLUMN ' || quote_ident(col) || ' DROP DEFAULT';

  -- change the dependency between column and sequence to internal
  UPDATE pg_depend
    SET deptype = 'i'
    WHERE (classid, objid, objsubid) = ('pg_class'::regclass, seqid, 0)
      AND deptype = 'a';

  -- mark the column as identity column
  UPDATE pg_attribute
    SET attidentity = 'd'
    WHERE attrelid = tbl
      AND attname = col;
END;
$$;

Call it like this:

SELECT upgrade_serial_to_identity('test_old', 'id');

If you are using schemas, it would look like this:

SELECT upgrade_serial_to_identity('public.test_old', 'id');

Check your table definition before and after:

=> \d test_old
                             Table "public.test_old"
 Column  |  Type   | Collation | Nullable |               Default
---------+---------+-----------+----------+--------------------------------------
 id      | integer |           | not null | nextval('test_old_id_seq'::regclass)
 payload | text    |           |          |
=> \d test_old
                           Table "public.test_old"
 Column  |  Type   | Collation | Nullable |             Default
---------+---------+-----------+----------+----------------------------------
 id      | integer |           | not null | generated by default as identity
 payload | text    |           |          |

Further reading

More information can be found in the PostgreSQL documentation, starting from the CREATE TABLE reference page.

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *