2ndQuadrant Ltd official blog

Archivi April 2010

If you're running Linux, and particularly if you're running a database on Linux, it's been hard to recommend any filesystem other than plain old ext3 in recent years.  Some of the alternatives that looked interesting at one point--jfs, ReiserFS--are completely abandoned at this point.  The one that has been almost viable for some time now is XFS, originally an SGI projecs.  And it's back to being in the limelight again this week.

XFS had suffered from a number of problems in the past.  Since it was designed for stable hardware, it wasn't as robust on standard cheap PC hardware at first; quite a bit of that was just cleaned up two years ago.  It had this odd problem with zeroed files that scared some people off.  It was treated as a second-class citizen in business oriented Linux distributions like RedHat, requiring you to compile your own kernel; even on the less restrictive CentOS, you had to do some strange looking setup steps to add XFS support, and the result was quite obviously unsupported.  And as one of the first filesystems to turn on and aggressively utilize write barriers, deployments were vulnerable to drives and controllers that didn't flush their caches when told to, an issue you don't find as often on modern hardware anymore if you configure it right (except for SSDs, but that's another story).

So why bother?  Well, performance is one major reason.  I found myself working with XFS again when working with Greenplum's free Single Node Edition software recently.  Greenplum told me flat out that they didn't recommend anything but XFS for high-performance installs, and given the underlying similarities to community PostgreSQL I felt that was worth investigating why that was some more.

The timing on that turned out to be perfect.  One of the other limitations of ext3 is that on common hardware it will only support 16TB of storage.  Since you can put that much storage in a medium sized disk rack now, that's clearly not enough for high-end systems nowadays, much less a few years from now.  Realizing that, RedHat has been seriously reviving their support for XFS in their distribution of Linux.  RHEL 5.4, released a few months ago, added it back in as an optional module for some customers.  You still couldn't install on XFS, and even the CentOS version didn't support 32-bit installs, but it was clearly making steps toward mainstream again.

Yesterday the first public beta of RHEL6 was released, and XFS is back to being right in the major feature set.  It's sitting next to ext4 on the supported filesystem list, pointing out its suitablity for large installations in particular.  So I can now tell people that they have XFS support available in somewhat rough form in RHEL/CentOS 5.4, with the expectation that it's a first class supported filesystem as systems are upgraded to RHEL6 and its derivates in the future, and have some hope that will be reliable.

With the enteprise Linux support and accordingly the perceived stability side of the XFS code finally under control again, how about the performance?  Well, it turns out Greenplum was right about XFS being worth the trouble to get running.  I took my test server and reformatted one of its moderately fast drives with three different filesystem/mount combinations:  ext3 ordered, ext3 journal, and xfs.  After three bonnie++ 1.96 runs with each filesystem, the results I saw broke down like this:

  • ext3 ordered:  39-58MB/s write, 44-72MB/s read
  • ext3 journal:  25-30MB/s write, 49-67MB/s read
  • xfs:  68-72MB/s write, 72-77MB/s read

While the best of the ext3 read results approached similar levels to what xfs was capable of, on average it did much better.  And the write results were at least 25% better in all cases.  I liked the tighter, more predictable throughput as well; inconsistent performance is something I often struggle with on ext3.

I'm not normally one to be an early adopter of new Linux releases, but the RHEL6 beta with full XFS support has replaced the thorougly underwhelming new Ubuntu release at the top of my list of OSes to install next.  It's not often you see filesystem technology get a second chance to impress, but XFS seems to have made an unexpected transition back to completely relevant again, for now.  I'm not sure how long that will be true, with both ext4 available already and btrfs coming closer to production quality by recently reaching a stable disk format.  It will be interesting to see how this reinvigorated set of filesystem choices on Linux plays out.
A few weeks ago I presented an updated 2010 version of my talk on database hardware benchmarking at PG East; slides available from our talks page.  CPU and memory performance are particularly important for a PostgreSQL database, because every individual query runs as a single process.  Therefore, the speed of your fastest core determines how fast any one query can execute at, and in modern systems that's quite likely to bottleneck based on memory speed.

One of the things that's obvious from recent memory speed results is that all of AMD's processors have been stuck in a distant second place for almost 18 months now.  While AMD continues to use DDR2-800, Intel's "Nehalem" processors, shipping in volume since early 2009, have been adopting increasingly fast DDR3 in good performing multi-channel configurations--the exact area AMD used to be the king of.  In the normal single or dual core server configuration, Intel has had such a lead that it's been impossible to recommend them for anything but a completely disk-bound workload for some time now.

Like many commentaries on PC hardware, my suggestions were only cutting edge for...drumroll please...one week.  Basically, the minute my talk was over, AMD released a new line of 12-core processors that use DDR-1333, and they've closed most of the gap with Intel again.  In raw memory performance, they've increased memory performance 130% over their earlier design, and actually pulled ahead on that low-level benchmark.

How about database workloads?  One of the supporting bits of data I pointed to for how much the CPU/memory performance could impact a database workload were the Oracle Charbench "Calling Circle" OLTP benchmark results run by AnandTech.  Their new Calling Circle results show where the market is at now.  Intel still owns the top part of the market, but AMD's results with their Opteron 6174 are back to respectable. 

If you have a workload where more cores is what you need most of the time, the new processors from AMD could be just what you're looking for.  Fast enough for single queries again, scaling up quite well to handle workloads with many clients.  Memory technology really matters, and you should make sure to note (and benchmark yourself!) the speed of any system you're considering or using to make sure it's appropriate for your workload.

How long will this situation continue?  Well, Intel's next big server processor refresh, codenamed Sandy Bridge, is expected by the end of 2010.  Progress marches on.