Monday, June 26

Putting a PostgreSQL tablespace on a ramdisk risks ALL your data

I periodically see people being advised to put their tablspaces on RAM disks or tempfs volumes. This is very bad advice. Do not put a PostgreSQL TABLESPACE on a RAM disk or tempfs.

Why you shouldn’t put a tablespace on a ramdisk

Unlike MySQL and some other databases, PostgreSQL tablespaces are not completely independent of the rest of the database system. You can’t just throw a tablespace away and have the rest of the database system keep on working.

Most importantly, the write-ahead log (WAL) that provides PostgreSQL’s crash safety is stored in pg_xlog, and is shared across all tablespaces. PostgreSQL expects to be able to replay this log in order and without errors after a crash or shutdown. Until the log has replayed, the database is assumed to be in an unsafe state and connections will be refused with:

FATAL:  the database system is starting up

If the write-ahead log contains changes to tables/indexes/etc in a tablespace that no longer exists, PostgreSQL cannot replay the WAL. It will get stuck, and will never start up successfully, or will abort startup. You will be unable to access any of your tables on any database in that PostgreSQL instance. PostgreSQL will fail to start with an error like:

LOG:  could not open tablespace directory "pg_tblspc/16389/PG_9.4_201405111": No such file or directory
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/89ECF00
FATAL:  could not create directory "pg_tblspc/16389/PG_9.4_201405111": No such file or directory
CONTEXT:  xlog redo insert: rel 16389/13060/16390; tid 8283/206
LOG:  startup process (PID 28209) exited with exit code 1
LOG:  aborting startup due to startup process failure

PostgreSQL can’t just throw the changes in the transaction logs away, because it doesn’t know you meant to remove the tablespace. What if that’s your tablespace for vitally important accounting data on a separate encrypted hard drive, and the admin hasn’t entered the password required to mount the drive yet? Or, more likely, what if you moved it and forgot to update the tablespace symlink? You don’t want PostgreSQL saying “oh, I guess you didn’t want that data after all”.

So. Don’t put a tablespace on a ramdisk.

The same problem affects removable storage. It’s fine to put a tablespace on removable storage, but you can’t remove the storage until you first drop the tablespace.

How this could improve

There are a few improvements we could make to PostgreSQL in this area. I’ve already helped make one of them: 9.4 now has a warning about this admin error in the documentation.

One useful, albeit platform specific option, would be to try to detect temporary storage and warn the user. It would be ultimately pretty futile, though – what about removable storage? What if the user creates the tablespace on durable storage then moves the symlink? What about a SAN volume on non-durable storage? etc. It’d be at best an attempt to catch obvious admin mistakes. It might still be useful, but it’s not clear that it’s worth adding the complexity required to implement it.

Instead, we need two things:

  • An option that can be used to recover from this mistake. Akin to zero_damaged_pages, the (currently nonexistent) discard_missing_tablespaces option would throw away all WAL writes destined to tablespaces that do not currently exist. It would have to mark the tablespace as broken in the system catalogs so all attempts to access it failed. The admin could restart PostgreSQL with this option enabled to recover from an error like this, or could even run with it always enabled if they regularly used tablespaces on temporary storage
  • A TEMPORARY TABLESPACE feature that marks a tablespace as disposable. If PostgreSQL sees WAL writes to a tablespace marked TEMPORARY and it doesn’t exist, it would discard the writes and drop the tablespace.

Recovery from lost tablespaces

I’ll write about options for recovering databases where you’ve discarded/lost an in-use tablespace in a follow-up post, along with some suggestions on what you can do safely as an alternative. In the mean time, here’s a Stack Overflow post I wrote a while ago about how to improve PostgreSQL’s performance for unit/integration testing without relying on ramdisks.

If you’re seeing errors like:

ERROR:  could not open file "pg_tblspc/16384/PG_9.4_201405111/13060/16385": No such file or directory

from a psql session, you might just be able to DROP TABLESPACE and DROP TABLE. Recovery is only a serious problem when the database cannot start because of WAL records that cannot be replayed.

At least on PostgreSQL 9.4, you can also just create an empty directory where PostgreSQL expects to find the tablespace. It will assume that you’ve lost the tablespace contents and will proceed with recovery, discarding changes that apply to that tablespace.


  • alhashash

    It should work if the tablespace has only unlogged tables which should not create any xlog entries except for DDL queries.

    • Exactly. It’s that “except” that’s a concern.

      You can get away with a tablespace on volatile storage much of the time, but it’s risky, and it’s usually unnecesary given how well Linux’s virtual memory management works. UNLOGGED tables + a tablespace on a separate filesystem are generally quite sufficient.

  • Misha

    It is save.

    |MASTER + TMPFS| === wal ===> |STANDBY + TMPFS|

    If there is any crash happens on MASTER — then just switch to standby.
    It is simple like any others cases. TMPFS or NOT TMPFS doesn’t matter at all.

    • Sure, that’s the same logic as using an EC2 instance store. The database instance might be destroyed on crash, but you just fail over to your synchronous replica.

      That doesn’t make the use of a tablespace on a tempfs/ramdisk or other volatile storage any less likely to destroy that particular PostgreSQL instance.

      After all, you can rm -rf /var/lib/pgsql/9.3/data and then recover from a standby, too. That doesn’t make it safe for the instance that just got deleted.

  • Diego

    What about putting the “temp_tablespaces” on a RAM disk? Is it safe since in “temp_tablespaces” only TEMP TABLES are stored?

  • humberto.ibanez

    We done some tests with tablespaces at Ramdisk, in which we create only unlogged tables. Using the Postgresql-9.4 we done:

    1) remove data from Ramdisk with rm command or rename the symlink,
    2) restart the postgresql.
    The postgresql was restarted without problem.

    Would be secure maintain the tablespaces on Ramdisk that are populated exclusively by unlogged tables?

  • Frank

    I have requirements for a table that is volatile. The usage would basically be that of a scratchpad. When the server is online it will write and use this table. If the data or table is lost, there is no negative as the server will just recreate this scratchpad. I had thought that using a RAM disk or storing the table in memory would be wise since I would gain additional speed and performance. There will be many reads/writes to this table and since data integrity is not of concern, the performance is all that needs focus.

    Reading your article, I start to see this is a bad idea. It seems PostgreSQL wouldn’t handle this well. You linked to a post on stackoverflow that goes into a bit more detail of explaining things you can do to increase performance. Specifically you mentioned using ‘UNLOGGED’ and also creating a new cluster using initdb on a RAM disk.

    Is there more I can read regarding the benefits and drawbacks of both ‘UNLOGGED’ and creating a new cluster? How they compare and the advantages each have over the other?

    • Yep, it should be safe(ish) for tablespaces containing only TEMPORARY or UNLOGGED tables. PostgreSQL might refuse to start up thoug until you re-create the tablespace directory after restart. It probably also won’t recreate the relfilenodes for the unlogged tables properly so you may not be able to actually use the UNLOGGED tables, but it shouldn’t stop the rest of Pg working.

  • ergo

    While I can confirm that UNLOGGED tables work with tablespaces in RAM, I also found, that a RAMdisk is no magic bullet.

    What you can expect:

    a) faster writes, but the increase is not dramatic if your server is correctly configured for fast writes anyway and you are already using UNLOGGED tables

    b) consistent read performance from the first query onwards

    What you can’t expect:

    c) much faster queries if you have enough RAM for the OS filesystem cache and shared_buffers anyway. Once caching kicks in, the read performance will equal that of a RAMdisk

    So, if you don’t need b), your money is better spent on server class SSDs and the RAM is better dedicated to caching. And if you need b), I’d recommend trying pg_prewarm and maybe pg_hibernator first.

Leave a Reply

Your email address will not be published. Required fields are marked *