Limitations removed in PostgreSQL 9.1

The release of PostgreSQL 9.1 lining up with this week’s Postgres Open conference has resulted in my getting asked the same question by several people now: “how long should I wait before deploying 9.1?” It’s an interesting question that highlights some of the benefits and hazards of the PostgreSQL release schedule, and one I have a very strong opinion on: you should consider 9.1 for new projects instead of 9.0 starting right now.

Read on and I’ll detail why I feel that way.

Most changes to a software project are sorted into two main groups: bug fixes and features. PostgreSQL is very good about backporting bug fixes to all supported versions of the program, when practical to do so. But feature changes aren’t done in older releases. Stability is highly prioritized in the stable versions–sometimes to the point where some very annoying things that could be fixed aren’t.

The last release, PostgreSQL 9.0, started its life as version 8.5. When it became obvious several big, disruptive features were going to be included, the version number was bumped in part to make people a little more wary of the release. The bug count over the last year hasn’t been too terrible, but there have been plenty of serious bugs. April’s 9.0.4 corrected both a nasty data loss bug (for users of pg_upgrade) and a case where new database crash recovery code could introduce a problem even when not using the feature it was modified to support.

There is a third category of changes here though: removing limitations. Sometimes the new “feature” added by a new version is that a part of the database code is made to support new cases, because they didn’t do everything people really wanted . These are considered new features and not backported, even though it’s often the case that people are running into the limitation on older versions.

While all the genuine new features are great, too, what makes 9.1 a really compelling upgrade from my perspective is how many of these limitations are blown away by it.

Here’s my short list of limitations removed that I’ve run into on multiple installations of 9.0 and earlier versions now:

  • Queries using MIN or MAX against a partitioned table can now use an index to return that value. This is a showstopper for sites that run into it, one that normally doesn’t show up until after you’ve got a large amount of data into PostgreSQL; then it hits you by surprise. The most common way I’ve run into this are sites that monitor the last time/entry inserted into some busy table. SELECT now() – max(timestampfield) FROM t is a good way to tell how long it’s  been since the last record was inserted into your database, to set an alarm if that grows too big. On a busy site that means something has probably broken and stopped importing records. But if you use this approach, and you switch to partitioned tables…you’re toast. Until 9.1, you had to rewrite the query to fix this. And in some places, application changes like that take a huge amount of QA work to do. Now it just works.
  • Replication lag time can easily be monitored using pg_stat_replication. All of our 9.0 deployments integrate our repmgr tool in order to provide this capability. In 9.1, you get that part in the core server.  (We’re adding some new stuff to repmgr to keep it relevant–it’s always going to be where between release features we’re working on can incubate at)
  • Systems using Hot Standby to run long reports against standby systems can now set a large value for max_standby_archive_delay and max_standby_streaming_delay. Changing the mechanics of these two parameters was one of the very last code changes in 9.0 before its feature code freeze. And subsequent testing missed the fact that the maximum value you can set these two only allows 35 minutes of delay until it was too late to fix in 9.0.
  • Long reports on the standby can use the new hot_standby_feedback mechanism to keep the data they are looking at from being removed from the master. This is really the right way to handle running most longer reports on the standby. This limitation was recognized before 9.0 shipped, with one fix proposed and a prototype built. It just wasn’t possible to get the code into 9.0 before its feature freeze.
  • There is a maximum replication_time setting. It’s far too easy to have connections to a system that isn’t working get stuck in TCP/IP retries for several minutes in 9.0.
  • Checkpoints on busy servers are much less likely to get stuck when the background writer’s internal fsync aborb queue fills. The worst case of this problem can cause your checkpoints for a long time–I’ve seen what should have been a 5
    minute checkpoint take six hours now. And even mild cases you might not normally notice can slow your system down.

When I talk with people who are looking into a 9.0 upgrade from an earlier version, it’s often so that they can take advantage of all the benefits Hot Standby allows: read scaling by adding standby servers, reporting servers, live monitoring of the standby. If you do this on 9.0, you are likely to run into one of these limitations. And as more sites grow their PostgreSQL installations, and new sites are converting increasingly big databases over to it, I’m seeing a lot more of them run into either the partitioning MIN/MAX issue for the checkpoint one.

Normally, people don’t like to consider the latest version of a database program because of the perception that it’s more likely to have bugs than an older one. Well, in this case, the dynamic is a little different. If you deploy PostgreSQL 9.0, and you run into one of the bugs described above, there is little you can do about them other than upgrading to the newer version.

Some workarounds exist, and plugging holes like that between version releases is part of why consultants and support companies have something to sell anyway when the software is free.

If you deploy 9.1 instead, you will not run into these problems. The question you should be asking is not whether there are any unknown bugs to be scared of in 9.1; there surely are. Given the sort of code changes that happened, I expect there to be a steady stream of bug reports against the new features introduced. But there weren’t that many things changed that will impact users who don’t use those features.

But if you deploy on 9.0, the odds are decent you will run into one of these known limitations of that version. Which would you rather have: the possibility of running into a bug, or the certainty you will run into a known limitation?

Many people choose wrong and cling to older versions longer than they should, only to waste a lot of resources fighting known limitations and bugs that are already fixed in the newer release. You don’t want to be one of those people, and now that 9.1 has shipped you should seriously consider deploying it instead of 9.0.

This Post Has 2 Comments

  1. Robert Haas says:

    s/replication_time/replication_timeout/

  2. titanofold says:

    PostgreSQL 9.1.0 Released

    PostgreSQL 9.1.0 was released on September 11, 2011. It’s available now at a Portage mirror near you. There have been some major improvements in this release. So much so that there really isn’t a really good reason to stick with anything le…

Leave A Reply