Don’t set fsync=off if you want to keep your data
There are a lot of amazing features coming in PostgreSQL 9.6, but I’m personally very happy about a really small, simple one that helps close a long-standing user foot-gun.
commit a31212b429cd3397fb3147b1a584ae33224454a6 Author: Robert HaasDate: Wed Apr 27 13:46:26 2016 -0400 Change postgresql.conf.sample to say that fsync=off will corrupt data. Discussion: [email protected] Per a suggestion from Craig Ringer. This wording from Tom Lane, following discussion.
There’s a bit of terrible advice floating around that turning fsync=off will make PostgreSQL run faster. Which is true, as far as it goes, but neglects that little risk massive data corruption on crash part. So users get bitten. I’ve tried to scrub as much of that advice from the Internet as I can or get it qualified with warnings, but I still see people advised to do it on random forums sometimes.
The docs do a good job of explaining that setting fsync=off is a bad idea, but the sample config file made it seem pretty innocuous:
#fsync = on # turns forced synchronization on or off
so users keep turning it off. Then teams like 2ndQuadrant land up doing expensive forensic data recovery on their database clusters once they crash weeks, months or years later and experience severe corruption… or notice the corruption they experienced a while ago. Or users just write off their data and start again because it’s all too hard and expensive.
To make turning fsync=off a little less attractive the sample now reads:
#fsync = on # flush data to disk for crash safety # (turning this off can cause # unrecoverable disk corruption)
It won’t stop someone using ALTER SYSTEM SET without realising, but there’s only so much you can do.
I’m really happy about this. It’s nice to knock off such minor improvements that have a disproportionate impact on usability and UX.
If you are ever tempted to set fsync=off, pretend it’s called eat_my_data_if_you_feel_like_it=on and see if you still want to set it. synchronous_commit=off is probably a better choice. Read the manual.
I have one use-case for fsync=off. If I am restoring a dump into a brand-new PostgreSQL instance, I do the restore with fsync=off. Then I stop PostgreSQL; run the Unix command “sync” and turn fsync back on. Then I restart PostgreSQL.
Other than that… yeah, not valid reason for turning it off unless you don’t care about your data.
That’s very reasonable, and one of the non-test/development reasons the “off” setting exists at all.
It’s only safe if the whole instance is new; if you turn fsync off to restore a DB to an instance with other running DBs you’ll still risk a giant mess on crash since the DB you’re doing the data load into shares xlogs, clog, shared catalog tables etc with the other DBs on the instance. You probably know that, but I’m being explicit for other readers.
Another reasonable reason to turn fsync off can be if you intend to rely entirely on synchronous replication for durability. In particular, if you’re running on something like an AWS instance using local instance storage not EBS, the storage isn’t durable anyway. On crash it might just go away permanently and unrecoverably. It certainly will if you stop and start the instance. In this case fsync doesn’t do you any good-the data’s gone, so crash safety is irrelevant. However, you must make sure that if the instance crashes but the data is *not* lost you still throw it away, since it could be corrupted and if you start running with it you might then sync corrupt data to your replicas. Fencing and STONITH is important.
For most other purposes it’s better to just set synchronous_commit = off if you don’t mind the risk of losing recent data but don’t want to risk possible corruption.
Would it be possible to let ALTER SYSTEM SET return a warning in this case?
I can also recommend this for those who like fsync=off: https://www.flamingspork.com/projects/libeatmydata/