Friday, December 14

Performance of Sequences and Serials in Postgres-XL

In Postgres-XL, sequences are maintained at the Global Transaction Manager (GTM) to ensure that they are assigned non-conflicting values when they are incremented from multiple nodes. This adds significant overhead for a query doing thousands of INSERTs in a table with a serial column, incrementing sequence one at a time and making a network roundtrip to the GTM, for every INSERT.

Shaun Thomas in a recent blog complained about INSERTs running a magnitude slower on Postgres-XL as compared to vanilla PostgreSQL. There is already a way to improve performance for sequences, but it’s clearly not well advertised. I thought this is a good opportunity to explain the facility.

Postgres-XL provides a user-settable GUC called sequence_range. Every backend requests a block of sequence values as controlled by this GUC. Given that COPY is popularly used to bulk load data in Postgres, Postgres-XL automatically overrides this GUC during COPY operation and sets it to 1000, thus dramatically improving COPY performance. Unfortunately, for regular INSERTs, the default is 1 and unless user explicitly sets sequence_range to a reasonably higher value, INSERT performance suffers. Here is an example, using the same sample schema as used by Shaun in his blog post.

CREATE TABLE sensor_log (
  sensor_log_id  SERIAL PRIMARY KEY,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  reading_date   TIMESTAMP NOT NULL
) DISTRIBUTE BY HASH (sensor_log_id);

postgres=# \timing
Timing is on.
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 12067.911 ms

postgres=# set sequence_range TO 1000;
SET
Time: 1.231 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 397.406 ms

So by appropriately setting sequence_range to 1000, performance of the INSERT query improved by nearly 30 times.

When this feature was added, the default value of sequence_range GUC was set to 1 because it can leave holes in the sequence values. But looking at the performance implications for a very common use case, we decided to increase the default to 1000 and this has now been committed to the XL9_5_STABLE branch of the repository.

Its important to note that while a high value of sequence_range will improve performance for sequences and serials, it can also leave large holes in sequence ranges since the sequence ranges are cached at a backend level. To address this issue, Postgres-XL starts with the specified CACHE parameter value used at sequence creation time and doubles it every time (limited by sequence_range) if sequences are being consumed at a very high rate.

Similar improvement can also be achieved by increasing the CACHE parameter value of the sequence so that a chunk of sequence values are cached at the backend level. Following example shows how to do that for a serial column. But the sequence_range GUC provides an easy way to override the global default and also ensures that the sequences are cached only when they are getting incremented very rapidly.

postgres=# ALTER SEQUENCE sensor_log_sensor_log_id_seq CACHE 1000;                                                                                                             ALTER SEQUENCE
Time: 8.683 ms
postgres=# SET sequence_range TO 1;
SET
Time: 2.341 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                            SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                  FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 418.068 ms

You may choose any of these techniques to improve the performance. Though now that the default value of sequence_range is changed to 1000, not many users may see the difference in performance.

7 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *