<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>2ndQuadrant, Professional PostgreSQL</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/" />
    <link rel="self" type="application/atom+xml" href="http://blog.2ndquadrant.com/en/atom.xml" />
    <id>tag:blog.2ndquadrant.com,2009-06-22:/en//3</id>
    <updated>2010-07-25T22:17:15Z</updated>
    <subtitle>2ndQuadrant Ltd official blog</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Open Source 4.12</generator>

<entry>
    <title>Heads in the cloud at CHAR(10)</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/07/heads-in-the-cloud-at-char10.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.97</id>

    <published>2010-07-25T21:14:37Z</published>
    <updated>2010-07-25T22:17:15Z</updated>

    <summary>Whether or not you made it our CHAR(10) conference last month, you can now relive part of the experience by downloading the conference slides. Some of those were posted live during the conference, some showed up later, but almost everything...</summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="International News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresqlcloudchar10" label="postgresql cloud char10" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>Whether or not you made it our CHAR(10) conference last month, you can now relive part of the experience by downloading the <a href="http://projects.2ndquadrant.com/char10">conference slides</a>.  Some of those were posted live during the conference, some showed up later, but almost everything is there now.  Sadly, Nic Ferrier's entertaining presentation about how <a href="http://www.woome.com/">WooMe</a> was scaled up using Londiste and Django wasn't available in a form we could easily replay.  For that one, you certainly did have to be there, in more ways than one.</p>

<p>The two talks I found the most informative were the updates on the states of pgpool-II and pgmemcache.  Both those tools have that slightly frustrating combination of being really useful and a bit underdocumented relative to how complicated they are (in English at least!), so getting additional insight into them from those actually working on the code was great.</p>

<p>Markus's discussion of MVCC and clustering also had a fun twist to it.  His talk ended with a performance analysis of his Postgres-R against pgpool-II, Postgres-XC, and PostgreSQL 9 using Streaming Replication plus Hot Standby, all used in cluster configurations to accelerate dbt2 test results.  I don't quite agree with his premise there that network congestion is the most vital cluster component because "overall computing power, memory and storage capacity scale easily"--that's not always true--but it was satisfying to see that the PG9 HS/SR pairing is efficient in that regard.</p>

<p>The conference set aside two sessions to talk about general clustering topics in a relatively unstructured way.  The more heated discussion talked about what would make PostgreSQL deployments into cloud computing infrastructure easier to deal with.  That stirred up enough ideas to generate two <a href="http://blog.2ndquadrant.com/en/2010/07/some-ideas-about-lowlevel-reso.html">blog</a> <a href="http://blog.tapoueh.org/articles/blog/_MVCC_in_the_Cloud.html">entries</a> from my coworkers already.</p>

<p>One of the ideas from that session I found particularly interesting was noting that if you have a deployment where nodes are added in the "elastic" way people like to discuss in relation to the cloud concept, there's a manageability gap there right now in terms of making it easy for applications to talk to that node set.  If you can put pgpool-II or pgBouncer between your application and the set of nodes, you can abstract away exactly what's behind the nodes a bit right now.  But now you've added another layer and therefore a potential bottleneck to the whole thing.  That's the opposite of what elastic cloud deployments are supposed to be about:  just adding capacity as needed with minimal management work.</p>

<p>A solution approach suggested was making it easier to build a database routing directory at the application level, so that apps can just ask for the type of node needed and get one to directly connect to.  Nodes can just register themselves to the directory as they are brought online (or are taken down).  This has similarities to some components that are already floating around.  The directory lookup part you might put into LDAP; PostgreSQL servers can already announce themselves via ZeroConf AKA Bonjour.  It's not hard to imagine bolting those two together, putting an application layer that does LDAP lookups connected to a routing backend that tracks available nodes via any number of protocols.  As usual, the devil's in the details.  Things like timing out failed nodes, distinguishing between read and write traffic (pgpool-II does it by actually parsing the SQL, which is expensive), and making the resulting directory broadcasts cached for high performance while also featuring cache invalidation are all tricky implementation details to get right.</p>

<p>With PostgreSQL 9.0 featuring more ways than ever to scale upward database architecture, this problem isn't going away though.  I'm not sure what form yet people are going to solve it in, but it's a common enough problem that it's worth solving.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Installing Greenplum Single Node Edition on Ubuntu 10.4 (Lucid)</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/07/installing-greenplum-sne-on-ubuntu-lucid.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.96</id>

    <published>2010-07-17T19:44:45Z</published>
    <updated>2010-07-19T11:36:32Z</updated>

    <summary>Officially Greenplum Database Single Node Edition (SNE) is only installable on Red Hat Enterprise Linux (RHEL) and SUSE Linux Enteprise Server (SLES), but while surfing the web I have seen many requests on how to install it on Debian/Ubuntu. Here...</summary>
    <author>
        <name>Marco Nenciarini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="Greenplum" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="greenplumsneonubuntulinux" label="Greenplum SNE on Ubuntu Linux" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>Officially Greenplum Database Single Node Edition (SNE) is only installable on Red Hat Enterprise Linux (RHEL) and SUSE Linux Enteprise Server (SLES), but while surfing the web I have seen many requests on how to install it on Debian/Ubuntu. Here I&#8217;m trying to give you some advices.</p>
]]>
        <![CDATA[<p>Before installing Greenplum Database SNE, you need to adjust the following OS configuration parameters:</p>

<p>Set the following parameters in the <code>/etc/sysctl.conf</code> file:</p>

<pre><code>kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 64000 100 512
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2
</code></pre>

<p>To activate such parameters you can either run <code>sudo sysctl -p</code> or reboot the system. </p>

<p>Set the following parameters in the <code>/etc/security/limits.conf</code> file:</p>

<pre><code>* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
</code></pre>

<p>In the file  /etc/hosts comment out the line beginning with <code>::1</code>, as it could confuse the database when it resolves the hostname for localhost. Also make sure either localhost and your hostname is resolvable to a local address.</p>

<p>Now you have done preparing the environment for your Greenplum Database SNE. The next step is to create the user account designated to be the administrator of your installation, usually this user is called gpadmin.</p>

<pre><code>sudo adduser --gecos "Greenplum Administrator" gpadmin
</code></pre>

<p>At this point you have to download or copy the installer file to the system. Installer files are available at <a href="http://www.greenplum.com/products/single-node">http://www.greenplum.com/products/single-node</a>. You should choose the RHEL installer for your architecture. I have a x86_64 so from now on I will use it as example.</p>

<p>To start the installation run the following commands (you need the unzip program installed):</p>

<pre><code>unzip greenplum-db-3.3.6.1-build-1-SingleNodeEdition-RHEL5-x86_64.zip
sudo bash greenplum-db-3.3.6.1-build-1-SingleNodeEdition-RHEL5-x86_64.bin
</code></pre>

<p>Follow the on screen instructions. Accept the license and choose the installation path. The default one is fine. The installer will create a <code>greenplum-db</code> symbolic link one directory level above your chosen installation directory. The symbolic link is used to facilitate patch maintenance and upgrades between versions. From now on the install location will be referred to as <code>$GPHOME</code>.</p>

<p>Change the ownership of the installation so that it is owned by the gpadmin user and group.</p>

<pre><code>sudo chown -R gpadmin:gpadmin $GPHOME
</code></pre>

<p>Now is the time to choose the data directory location, to explain how to choose nothing is better of quoting the official quick-start guide.</p>

<blockquote>
  <p>Every Greenplum Database SNE instance has a designated storage area on disk that
is called the data directory location. This is the file system location where the database
data is stored. In the Greenplum Database SNE, you initialize a Greenplum Database
SNE master instance and two or more segment instances on the same system, each 
requiring a data directory location. These directories should have sufficient disk space
for your data and be owned by the gpadmin user.</p>

<p>Remember that the data directories of the segment instances are where the user data
resides, so they must have enough disk space to accommodate your planned data
capacity. For the master instance, only the system catalog tables and system
metadata are stored in the master data directory.</p>
</blockquote>

<p>For this guide we will use the default layout, with the master (<code>/gpmaster</code>) and two segments (<code>/gpdata1</code> and <code>/gpdata2</code>).</p>

<pre><code>sudo mkdir /gpmaster /gpdata1 /gpdata2
sudo chown gpadmin:gpadmin /gpmaster /gpdata1 /gpdata2
</code></pre>

<p>A <code>greenplum_path.sh</code> file is provided in your <code>$GPHOME</code> directory with environment variable settings for Greenplum Database SNE. You should source this in the gpadmin user&#8217;s startup shell profile (such as <code>.bashrc</code>) adding a line like the following:</p>

<pre><code>source /usr/local/greenplum-db/greenplum_path.sh
</code></pre>

<p>Before to continue we should do some magics to avoid failures running programs from Ubuntu with libraries shipped by Greenplum SNE. </p>

<pre><code>#!/bin/sh

cd $GPHOME/lib

# libraries shipped with Greenplum SNE
gplibs="$(find -maxdepth 1 -type f | cut -f 2 -d /)"

# libraries with same abi installed via dpkg
deblibs="$(dpkg -S $gplibs 2&gt; /dev/null | cut -f 2  -d ' ')"

# we remove the greenplum one to avoid "no version information available" errors
for lib in $deblibs; do
  rm -f $(basename $lib)
done
</code></pre>

<p>For your convenience you can find the script attached to this guide.</p>

<p><span class="mt-enclosure mt-enclosure-file" style="display: inline;"><a href="http://blog.2ndquadrant.com/en/2010/07/17/fixlibs.sh">fixlibs.sh</a></span></p>

<p>It&#8217;s now time to initialize the database system, all the following steps are to be executed as gpadmin user. </p>

<pre><code>su - gpadmin

cp $GPHOME/docs/cli_help/single_hostlist_example ./single_hostlist
cp $GPHOME/docs/cli_help/gp_init_singlenode_example ./gp_init_singlenode
</code></pre>

<p>If you do not want to use the default configuration, data directory locations, ports, or other configuration options, edit the <code>gp_init_singlenode</code> file and enter your configuration settings.</p>

<p>Run the gpssh-exkeys utility to exchange ssh keys for the local host:</p>

<pre><code>gpssh-exkeys -h 127.0.0.1 -h localhost
</code></pre>

<p>Run the following command to initialize the database:</p>

<pre><code>gpinitsystem -c gp_init_singlenode
</code></pre>

<p>The utility verifies your setup information and makes sure that the data directories specified in the <code>gp_init_singlenode</code> configuration file are accessible. If all of the verification checks are successful, the utility prompts you to confirm the configuration before creating the system. </p>

<p>At the end of a successful setup, the utility starts your system. You should see:</p>

<pre><code>=&gt; Greenplum Database instance successfully created.
</code></pre>

<p>The management utilities require that you set the <code>MASTER_DATA_DIRECTORY</code> environment variable. This should specify the directory created by the gpinitsystem utility in the master data directory location.</p>

<pre><code>echo "export MASTER_DATA_DIRECTORY=/gpmaster/gpsne-1" &gt;&gt; ~/.bashrc
source ~/.bashrc
</code></pre>

<p>Now you can connect the master database using the psql client program:</p>

<pre><code>psql postgres
</code></pre>

<p>I would remark to you that a system installed following this guide is to be considered as <strong>evaluation platform only</strong>, and is not supposed to be for production installations of Greenplum Database.</p>
]]>
    </content>
</entry>

<entry>
    <title>Installing PostGIS on Greenplum Single Node Edition</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/07/installing-postgis-on-greenplum-single-node-edition.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.95</id>

    <published>2010-07-13T15:08:58Z</published>
    <updated>2010-07-15T05:43:56Z</updated>

    <summary>One of the main reasons users switch from other relational databases to PostgreSQL is the advanced support for geographic objects included in the PostGIS extension. Being PostgreSQL specialists at 2ndQuadrant, we have tried to investigate if it was possible (and...</summary>
    <author>
        <name>Gabriele Bartolini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="Greenplum" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgisongreenplumsinglenodeedition" label="PostGIS on Greenplum Single Node Edition" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>One of the main reasons users switch from other relational databases to PostgreSQL is the advanced support for geographic objects included in the PostGIS extension.</p>

<p>Being PostgreSQL specialists at 2ndQuadrant, we have tried to investigate if it was possible (and how) to install PostGIS on the Greenplum Single Node edition. Let's see how Marco Nenciarini, 2ndQuadrant consultant and a long time Debian developer, tried to do it.</p>
]]>
        <![CDATA[<p>Greenplum Single Node Edition (SNE) is a free version of the <a href="http://www.greenplum.com/products/greenplum-database/">Greenplum database</a>, one of the most advanced solutions for data warehousing and analytics, which is based on a shared nothing architecture and allows for data distribution and parallel processing on several nodes (servers).</p>

<p>The Single Node edition of Greenplum is a freely distributed version of Greenplum which can be installed on a single node. On a multi-processor architecture, Greenplum Single Node Edition allows to create multiple <strong>segments</strong> (usually one per core) and hence to take advantage of parallel processing. <a href="http://www.greenplum.com/products/single-node/">Greenplum Single Node Edition can be downloaded for free</a> from the main website.</p>

<p>With Greenplum originally based on a PostgreSQL 8.2 branch, Marco downloaded the latest compatible version of PostGIS with PostgreSQL 8.2: version 1.4.2 (http://postgis.refractions.net/download/postgis-1.4.2.tar.gz).</p>

<p>The system we used was a CentOS Linux 5.5, running Greenplum Database 3.3.6.1. Following PostGIS requirements, we installed <code>proj4</code> and GEOS (http://www.argeo.org/linux/argeo-el/5/gis/x86_64/).</p>

<pre><code>yum install gcc
yum install make
rpm -Uvh http://www.argeo.org/linux/argeo-el/5/gis/x86_64/proj-4.7.0-1.el5.argeo.x86_64.rpm
rpm -Uvh http://www.argeo.org/linux/argeo-el/5/gis/x86_64/proj-devel-4.7.0-1.el5.argeo.x86_64.rpm
rpm -Uvh http://www.argeo.org/linux/argeo-el/5/gis/x86_64/geos-3.2.2-1.el5.argeo.x86_64.rpm
rpm -Uvh http://www.argeo.org/linux/argeo-el/5/gis/x86_64/geos-devel-3.2.2-1.el5.argeo.x86_64.rpm
</code></pre>

<p>Unfortunately, the plain <code>configure/make/make install</code> process for PostGIS did not work straight away. We mainly encountered two types of issues: <strong>configuration issues</strong> and <strong>compilation issues</strong>.</p>

<p>The workaround we have developed consists of:</p>

<ul>
<li>a wrapper file for the <code>pg_config</code> file</li>
<li>a <em>patch</em> for PostGIS</li>
</ul>

<p><em>(The files are attached to this entry)</em></p>

<p>Once you have downloaded and uncompressed PostGIS, patch the source code using the attached patch (<code>0001-Fix-all-compile-issues.patch</code>). Then place the modified <code>pg_config</code> file in the PostGIS source directory and launch:</p>

<pre><code>./configure --with-pgconfig=$PWD/pg_config</code></pre>

<p>Then:</p>

<pre><code>make PERL=$(which perl)
make PERL=$(which perl) install
</code></pre>

<p>You can perform PostGIS regression tests with:</p>

<pre><code>make PERL=$(which perl) check
</code></pre>

<p>Keep in mind that - due to some NOTICE messages raised by Greenplum (which complains about the lack of specification of the distribution key by PostGIS) the test results officially fail. A thorough look at the diff file shows that most of these errors are harmless and can be ignored. We will however continue to test the environment in the next weeks.</p>

<p>Please do not hesitate to let us and Greenplum know about your feedback, even here or on the <a href="http://community.greenplum.com/">community support forum</a>. It would be great if PostGIS support could be integrated in Greenplum, and I am confident that Greenplum staff will be supportive.</p>

<p>For the moment we hope this patch will come useful.</p>

<p><strong>Attachments:</strong></p>

<div><strong>wrapper script</strong>: <span class="mt-enclosure mt-enclosure-file" style="display: inline;"><a href="http://blog.2ndquadrant.com/en/2010/07/13/pg_config">pg_config</a></span></div>

<div><strong>patch</strong>: <span class="mt-enclosure mt-enclosure-file" style="display: inline;"><a href="http://blog.2ndquadrant.com/en/2010/07/13/0001-Fix-all-compile-issues.patch">0001-Fix-all-compile-issues.patch</a></span></div>
]]>
    </content>
</entry>

<entry>
    <title>Some ideas about low-level resource pooling in PostgreSQL</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/07/some-ideas-about-lowlevel-reso.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.92</id>

    <published>2010-07-06T19:28:11Z</published>
    <updated>2010-07-06T20:57:02Z</updated>

    <summary>Last week at the CHAR(10) conference we had a workshop on &quot;Cloud Databases&quot;. To put it simply: what to do when the use case requirements exceed the resources available in the database server. This was a main topic of the...</summary>
    <author>
        <name>Gianni Ciolli</name>
        <uri>http://www.2ndquadrant.it</uri>
    </author>
    
        <category term="Gianni&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="lowlevelresourcepoolinginpostgresql" label="Low-level resource pooling in PostgreSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>Last week at the <code>CHAR(10)</code> conference we had a workshop on "Cloud Databases". To put it simply: what to do when the use case requirements exceed the resources available in the database server.</p>

<p>This was a main topic of the whole conference, and several solutions have been illustrated during the day. A common theme has been that no solution fits all the use cases, and that each solution comes with its cost; hence you have to choose the solution that your use case can afford.</p>]]>
        <![CDATA[<p>Another common (albeit implicit) point has been the focus on "high-level" solutions, that is: connecting several database servers at a higher level to emulate a single server with larger resources.</p>

<p>An obvious advantage is that you don't need to alter the well-scrutinised PostgreSQL code; a drawback is that using multiple database servers with their independent timelines you are losing some useful properties. Two examples: the partial loss of transactional semantics generates conflicts; pre-parsing each query outside the database introduces limitations on the accepted queries.</p>

<p>The discussion was quite interesting, and when Dimitri Fontaine mentioned remote tablespaces I started wondering around a related but distinct idea, namely: whether a lower-level approach to the problem of resource pooling would really be impractical. Before I could elaborate on the details the workshop ended, and I could only sketch the idea to some of the people that were around the whiteboard (among which Gabriele Bartolini, Nic Ferrier, Marko Kreen, Hannu Krosing, Greg Smith) together with the basic questions "does it look feasible?"  and "does that resemble something you already know?".</p>

<p>A brief sketch: an application stack can be represented in this way</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
</pre>


<p>where the resources used by the database include storage, <span class="caps">RAM </span>and <span class="caps">CPU</span>s. The purpose is to allow the application to command more resources in order to increase capacity and speed. "Clever" applications that manage several databases can be represented as</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
      |
      +---------&gt; (connection) --&gt; (db server) --&gt; (resources)
</pre>


<p>while "connection pooling" solutions can be represented as</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
                       |
                       +---------&gt; (db server) --&gt; (resources)
</pre>


<p>by "lower-level" solutions I mean something like</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
                                       |
                                       +---------&gt; (resources)
</pre>


<p>which might resemble something familiar, but it is not what I am proposing here. To explain the difference I can increase the detail and write</p>


<pre>
(resources) = (virtual resources) --&gt; (physical resources)
</pre>


<p>to represent the fact that at the lowest level you can have a non-trivial mapping between physical objects and virtual ones. For instance, <span class="caps">SAN </span>storage or <span class="caps">RAID </span>striping can provide larger virtual disks by joining together smaller physical disks. Such cases could be pictured as</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (virt.res.) --&gt; (ph.res.)
                                                        |
                                                        +--------&gt; (ph.res.)
</pre>


<p>My proposal is to pool resources at the <em>database server</em> level, so that we can have a more efficient "virtualisation" by using the knowledge of the specific use cases for each resource (CPU, <span class="caps">RAM, </span>disk), and at the same time we can avoid may of the difficulties of the transactional paradigm. The picture would be:</p>


<pre>
(application) --&gt; (connection) --&gt; (db server) --&gt; (virt.res.) --&gt; (ph.res.)
                                        |
                                        +--------&gt; (virt.res.) --&gt; (ph.res.)
</pre>


<p>The advantage is that we don't need to manage all the possible use cases for each virtual resource; we just have to manage (and optimise for) the use cases that are actually needed by PostgreSQL. For instance: <span class="caps">WAL </span>should still be written in local "unvirtualised" storage, the bgwriter will access local and remote resources (RAM and disk), etc.</p>

<p>Some final words about reliability. To operate properly the whole system needs each subsystem; partial failures are not managed, because this architecture is not redundant. It is a distributed system, but not shared. If this architecture could provide cheap and simple scalability via a virtual database server which is functionally equivalent to a physical server with larger resources, then high availability could be obtained in the standard way by setting up two identical virtual servers in a Hot Standby configuration.</p>

<p>Network quality has a large impact on the overall performance; this design might be useful only if you have an array of machines in the same <span class="caps">LAN, </span>not only for speed reasons but also because a network failure would actually be a system failure.  Even with these restrictions, my opinion is that having this option would be quite useful.</p>

<p>This is still a sketch, to be used as a reference for further discussion. Next possible steps:</p>


<ul>
<li>to make a detailed list of the resource use cases</li>
<li>to decide which technologies can help best in each use case</li>
<li>to estimate the actual performance/development costs</li>
</ul>

]]>
    </content>
</entry>

<entry>
    <title>CHAR(10) Conference 1st to 3rd July 2010 - Secure Your Place Now</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/06/char10-conference-1st-to-3rd-j.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.91</id>

    <published>2010-06-15T09:46:47Z</published>
    <updated>2010-07-01T16:15:55Z</updated>

    <summary>CHAR(10) - Clustering, high availability, and replication conference with top international speakers. On-line booking registration now being taken at http://www.char10.org....</summary>
    <author>
        <name>Keith</name>
        <uri>http://www.2ndQuadrant.com/</uri>
    </author>
    
        <category term="International News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United Kingdom News" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresqlchar10conference" label="PostgreSQL CHAR(10) Conference" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>CHAR(10) - Clustering, high availability, and replication conference with top international speakers. On-line booking registration now being taken at <a href="http://www.char10.org/">http://www.char10.org</a>.</p>
]]>
        <![CDATA[<p>PostgreSQL CHAR(10) will be held at Oriel College, Oxford University in the centre of Oxford, UK between 1<sup>st</sup> and 3<sup>rd</sup> July 2010. Accommodation is available on-site, a few minutes walk from the River Thames and in the heart of a busy town, both ancient and modern. Courtyard Drinks and Dinner on Friday evening and Conference Banquet on Saturday evening are included in the attendance fee.</p>

<p>Secure your place via the on-line booking registration at <a href="http://www.char10.org/">http://www.char10.org</a>.</p>

<p>CHAR(10) stands for Clustering, High availability, Replication, though includes all forms of Parallel, Distributed and grid architectures.</p>

<p>An Open Source coding day is followed by a full 2-day schedule of international speakers, covering new and existing technologies of PostgreSQL.</p>

<p>Optional tutorial days are available from 5<sup>th</sup> to 9<sup>th</sup> July taught by Simon Riggs and Greg Smith.</p>

<p>Visit our <a href="http://www.2ndquadrant.com/postgresql-training/">http://www.2ndquadrant.com/postgresql-training/</a>.</p>
]]>
    </content>
</entry>

<entry>
    <title>PostgreSQL, FreeBSD, and Free Dog Food</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/05/postgresql-freebsd-and-free-do.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.89</id>

    <published>2010-05-14T05:28:46Z</published>
    <updated>2010-05-14T07:37:24Z</updated>

    <summary><![CDATA[ This week I did something I'd prefer to never repeat:&nbsp; I left the country, did something useful, and made it back again in the same day.&nbsp; The occasion was the FreeBSD Developer Summit, held just before BSDCan--the convention that...]]></summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresqlfreebsd" label="postgresql freebsd" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[ This week I did something I'd prefer to never repeat:&nbsp; I left the country, did something useful, and made it back again in the same day.&nbsp; The occasion was the FreeBSD Developer Summit, held just before <a href="http://www.bsdcan.org/">BSDCan</a>--the convention that happens in Ottawa the week before PGCon every year.&nbsp; So I get to head right back again next week, but stay a while that time.<br /><br />The FreeBSD developers were nice enough to sponsor my trip so that we could talk about both the business and technical hurdles that I felt were keeping the sort of companies I work with from deploying their databases on FreeBSD more often than they do.&nbsp; My slightly updated slides are available on our <a href="http://projects.2ndquadrant.it/talks">talks page</a>, I cleaned up a couple of things from what was presented (the most important rewording I'll talk about below).<br /><br />I was very pleased at how friendly and receptive the developers were even to some of my critical comments.&nbsp; FreeBSD and PostgreSQL have very like minded communities:&nbsp; open for any purpose BSD license, academic roots, developers focused on stability, and even a strong documentation culture.&nbsp; There's been plenty of cross-over too.<br /><br />Much of the PostgreSQL infrastructure has been run using FreeBSD jails for quite some time (although plans are moving to use more Debian in its place, details on why at <a href="http://postgresqlconference.org/2010/east/talks/inside_the_postgresql_project_infrastructure">Inside the PostgreSQL Project Infrastructure</a>).&nbsp; My running joke during the talk was that if PostgreSQL developers are eating their own dog food by deploying critical infrastructure that depends on the database, much of that has been served in a FreeBSD bowl.&nbsp; (The lunch at the conference session was pizza, much better choice)<br /><br />And there's been plenty of FreeBSD development that's used PostgreSQL benchmarking as a measuring stick for the success of their advances.&nbsp; The very popular <a href="http://people.freebsd.org/%7Ekris/scaling/7.0%20Preview.pdf">Introducing FreeBSD 7.0</a> slides that not only showed their achieving performance parity against Linux during that release, it doubled as a document showing how PostgreSQL outscales MySQL.&nbsp; Cheers all around for community driven, BSD licensed code.<br /><br />One bit of audience contention during my talk came from my assertion that not having support for Emulex fiber channel cards in FreeBSD was preventing a significant amount of "big iron" adoption for databases, due to their perception as the market leader for connecting up expensive hardware like SANs.&nbsp; The guys from FreeBSD hardware and support vendor <a href="http://www.ixsystems.com/">iXsystems</a> called me out on that, suggesting that the alternative vendor here--QLogic--is both completely trusted by the big boys and has top notch FreeBSD driver quality.<br /><br />I did a bit more research into whether I was suffering from sampling bias from the set of people I'd talked to about this, and it looks like that was the case.&nbsp; While Emulex claims they've been named Sun's "<a href="http://www.emulex.com/partners/oems/sun-microsystems.html">Best-in-Class Supplier for OEM products</a>", and all the Sun FC cards I've personally run into came from them, there are tons of Sun rebrands of both <a href="http://blogs.sun.com/jmcp/entry/current_sun_emulex_fc_hba">Emulex</a> and <a href="http://blogs.sun.com/jmcp/entry/current_sun_qlogic_fc_hba">QLogic</a> cards.&nbsp; Same thing is true at all the other vendors I mentioned in my talk; you can get FC cards from both manufacturers via <a href="http://h18006.www1.hp.com/storage/saninfrastructure/hba.html">HP</a> and <a href="http://www.dell.com/us/en/enterprise/networking/blade-fibre-channel-card/cp.aspx?refid=blade-fibre-channel-card">Dell</a> too.&nbsp; I think my general point, that not supporting both Emulex and QLogic hurts the perception of FreeBSD as a serious choice for large businesses, still stands; it's just not quite as bad as I'd feared.&nbsp; Accordingly, I tweaked the wording in the slides I'm publishing, to better match reality here than the ones I presented.<br /><br />In additional to the solid core they've been growing for years, FreeBSD's license has allowed them to incorporate two very valuable features Sun released as open-source, ZFS and DTrace, into their operating system, both of which are incompatible with Linux's license and are extremely valuable for PostgreSQL deployments.&nbsp; It's still not ideal yet; FreeBSD DTrace can currently be used <a href="http://www.freebsd.org/doc/en/books/handbook/dtrace-implementation.html">only by root</a> for example.&nbsp; Limitations such as these have in the past kept me from being particularly motivated to work with FreeBSD.&nbsp; The existence of a free commercial Solaris that ran on generic hardware, combined with the steady progress and open enough community around OpenSolaris, satisfied my needs better.&nbsp; While not many of my PostgreSQL installations have been on Solaris, its has a monopoly share for hosting the terabyte scale databases I've worked with.&nbsp; High quality filesystem snapshots via ZFS and the additional piece of mind you get from disk block checksums alone justified those platform decisions.<br /><br />The problem today is that hating everything about how Oracle does business is what got me working with PostgreSQL in the first place, and now that they own Sun they're doing the same things to Solaris.&nbsp; No more Solaris on non-Sun hardware, serious cutbacks on the open-source version (OpenSolaris looks like a walking corpse to me), cutting off even basic OS patches unless you have a support contract--that's what we've seen just in the first round from Oracle here.&nbsp; Solaris isn't free in any sense of the word again, we're right back to the same dynamics that pushed me away from them and toward Linux fifteen years ago.<br /><br />But I continue to be dissapointed at how little focus there is on quality control in Linux.&nbsp; How poorly the filesystem mechanics work for the sorts of database work I do doesn't help either.&nbsp; The Linux OOM killer might as well be named the Linux PostgreSQL Hater for how it acts on my servers.&nbsp; And those sexy Solaris features I know work so well for databases, still not there (even if SystemTap is getting better at DTrace emulation).<br /><br />Meanwhile, FreeBSD has the whole "free" thing sorted out right in their name, and their quality control paranoia is similar to that of your typical good DBA.&nbsp; It looks to me like they're very close to fully assimilating ZFS and DTrace to the point where they can start improving them, rather than just working on getting the original feature set Solaris already had complete and the matching code stable.&nbsp; I think all of us who work on business critical PostgreSQL deployments and who value free software should do a sanity check on just what dog food we're chewing on, and start making sure there's a FreeBSD bowl there at least sometimes.&nbsp; From what I heard this week, the FreeBSD developers are gearing for another round of chewing on ours too.&nbsp; They're looking into database oriented performance improvements as part of future development, and they're not any happier about using MySQL for that than I am about running PostgreSQL on Solaris.&nbsp; Looks like it might be bowls of dog food all around.&nbsp; Nobody said that leading the software industry was going to be tasty.<br />]]>
        
    </content>
</entry>

<entry>
    <title>How to install multiple PostgreSQL servers on RedHat Linux</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/05/install-multiple-postgresql-servers-redhat-linux.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.88</id>

    <published>2010-05-10T13:00:05Z</published>
    <updated>2010-05-10T14:47:04Z</updated>

    <summary>If you have a Linux server of the RedHat family (inclusing CentOS and Fedora), you might envy the way Debian/Ubuntu distributions handle PostgreSQL clusters management. Although it is not easy to install different PostgreSQL versions on the same RedHat Linux...</summary>
    <author>
        <name>Gabriele Bartolini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="Gabriele&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="multiplepostgresqlserversonredhatlinux" label="Multiple PostgreSQL servers on RedHat Linux" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>If you have a Linux server of the RedHat family (inclusing CentOS and Fedora), you might envy the way Debian/Ubuntu distributions handle PostgreSQL clusters management.</p>

<p>Although it is not easy to install different PostgreSQL versions on the same RedHat Linux server using RPMs, it is much simpler to install several instances of PostgreSQL (servers) and, at the same time, take advantage of the services infrastructure.</p>
]]>
        <![CDATA[<p>Once you have setup the RPM installation, by following the instructions that you find at the <a href="http://yum.pgsqlrpms.org/">PostgreSQL YUM Repository</a>, you will notice that the process will create two files among the others:</p>

<ul>
<li><code>/etc/init.d/postgresql</code>: init script for the PostgreSQL server</li>
<li><code>/etc/sysconfig/pgsql/postgresql</code>: system configuration for the postgresql service</li>
</ul>

<p>By default, PostgreSQL data directory (<code>PGDATA</code>) points to the <code>/var/lib/pgsql/data</code> directory. It is possible to change it by modifying the <code>/etc/sysconfig/pgsql/postgresql</code> file.</p>

<p>Let's suppose we want to install two PostgreSQL servers on the same RedHat Linux, by adding a second server to the default one which will be used for development purposes. We will call this <code>postgresql-devel</code>. It will be installed in the <code>/var/lib/pgsql/data-devel</code> directory and will run on the 5433 port.</p>

<p>We create a symbolic link to the main <code>postgresql</code> init script, and call it <code>postgresql-devel</code>:</p>

<pre><code>
cd /etc/init.d/
ln -s postgresql postgresql-devel
</code></pre>

<p>Then we start filling the <code>postgresql-devel</code> configuration file in the <code>/etc/sysconfig/pgsql</code> directory. <strong>It is important to note that the init script and the system configuration file have the same name</strong>.</p>

<pre><code>
cat <<EOF > /etc/sysconfig/pgsql/postgresql-devel
PGDATA=/var/lib/pgsql/data-devel
PGPORT=5433
PGLOG=/var/lib/pgsql/pgstartup.\${PGPORT}.log
EOF
</code></pre>

<p>Once this is done, you can initialise the data directory by running: <code>/etc/init.d/postgresql-devel initdb</code> or simply <code>service postgresql-devel initdb</code>.</p>

<p>Similarly you can control the startup and the shutdown of the service, by running - respectively:</p>

<ul>
<li><code>service postgresql-devel start</code></li>
<li><code>service postgresql-devel stop</code></li>
</ul>

<p>You can add/remove the script from the startup and the shutdown of the system by using <code>chkconfig</code> the same way you would with other services.</p>

<p>The <a href="http://wiki.postgresql.org/wiki/PostgreSQL_on_RedHat_Linux">PostgreSQL wiki contains a detailed page about this topic</a>, and I suggest that you read it along with this one. However, this simple article shows you how to easiliy integrate multiple PostgreSQL instances on the same Linux server, and manage them using the standard RedHat services infrastructure (thanks to the great job done by Devrim Gunduz).</p>
]]>
    </content>
</entry>

<entry>
    <title>The Return of XFS on Linux</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.86</id>

    <published>2010-04-22T15:29:20Z</published>
    <updated>2010-04-22T16:41:25Z</updated>

    <summary><![CDATA[If you're running Linux, and particularly if you're running a database on Linux, it's been hard to recommend any filesystem other than plain old ext3 in recent years.&nbsp; Some of the alternatives that looked interesting at one point--jfs, ReiserFS--are completely...]]></summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="xfslinuxgreenplum" label="xfs linux greenplum" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[If you're running Linux, and particularly if you're running a database on Linux, it's been hard to recommend any filesystem other than plain old ext3 in recent years.&nbsp; Some of the alternatives that looked interesting at one point--jfs, ReiserFS--are completely abandoned at this point.&nbsp; The one that has been almost viable for some time now is XFS, originally an SGI projecs.&nbsp; And it's back to being in the limelight again this week.<br /><br />XFS had suffered from a number of problems in the past.&nbsp; Since it was <a href="http://lkml.org/lkml/2007/3/28/316">designed for stable hardware</a>, it wasn't as robust on standard cheap PC hardware at first; quite a bit of that was just <a href="http://thread.gmane.org/gmane.comp.file-systems.xfs.general/22268">cleaned up two years ago</a>.&nbsp; It had this odd problem with <a href="http://madduck.net/blog/2006.08.11:xfs-zeroes/">zeroed files</a> that scared some people off.&nbsp; It was treated as a second-class citizen in business oriented Linux distributions like RedHat, requiring you to <a href="http://web.archive.org/web/20080403003724/http://phaq.phunsites.net/2008/02/04/enabling-reiserfs-xfs-jfs-on-redhat-enterprise-linux/">compile your own kernel</a>; even on the less restrictive CentOS, you had to do some strange looking <a href="http://blogwords.neologix.net/neils/?p=1">setup steps</a> to add XFS support, and the result was quite obviously unsupported.&nbsp; And as one of the first filesystems to <a href="http://xfs.org/index.php/XFS_FAQ">turn on and aggressively utilize write barriers</a>, deployments were vulnerable to drives and controllers that didn't flush their caches when told to, an issue you don't find as often on modern hardware anymore if you configure it right (except for SSDs, but that's another story).<br /><br />So why bother?&nbsp; Well, performance is one major reason.&nbsp; I found myself working with XFS again when working with Greenplum's free <a href="http://www.greenplum.com/products/single-node/">Single Node Edition</a> software recently.&nbsp; Greenplum told me flat out that they didn't recommend anything but XFS for high-performance installs, and given the underlying similarities to community PostgreSQL I felt that was worth investigating why that was some more.<br /><br />The timing on that turned out to be perfect.&nbsp; One of the other limitations of ext3 is that on common hardware it will only support <a href="http://en.wikipedia.org/wiki/Ext3">16TB of storage</a>.&nbsp; Since you can put that much storage in a medium sized disk rack now, that's clearly not enough for high-end systems nowadays, much less a few years from now.&nbsp; Realizing that, RedHat has been seriously reviving their support for XFS in their distribution of Linux.&nbsp; RHEL 5.4, released a few months ago, added it back in as an optional module for some customers.&nbsp; You still couldn't <a href="http://phaq.phunsites.net/2008/02/04/enabling-reiserfs-xfs-jfs-on-redhat-enterprise-linux/">install on XFS</a>, and even the CentOS version <a href="http://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.4">didn't support 32-bit installs</a>, but it was clearly making steps toward mainstream again.<br /><br />Yesterday the first <a href="http://press.redhat.com/2010/04/21/red-hat-enterprise-linux-6-beta-available-today-for-public-download/">public beta of RHEL6</a> was released, and XFS is back to being right in the major feature set.&nbsp; It's sitting next to ext4 on the <a href="http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/Beta_Release_Notes/filesystems.html">supported filesystem</a> list, pointing out its suitablity for large installations in particular.&nbsp; So I can now tell people that they have XFS support available in somewhat rough form in RHEL/CentOS 5.4, with the expectation that it's a first class supported filesystem as systems are upgraded to RHEL6 and its derivates in the future, and have some hope that will be reliable.<br /><br />With the enteprise Linux support and accordingly the perceived stability side of the XFS code finally under control again, how about the performance?&nbsp; Well, it turns out Greenplum was right about XFS being worth the trouble to get running.&nbsp; I took my test server and reformatted one of its moderately fast drives with three different filesystem/mount combinations:&nbsp; ext3 ordered, ext3 journal, and xfs.&nbsp; After three bonnie++ 1.96 runs with each filesystem, the results I saw broke down like this:<br /><br /><ul><li>ext3 ordered:&nbsp; 39-58MB/s write, 44-72MB/s read</li><li>ext3 journal:&nbsp; 25-30MB/s write, 49-67MB/s read</li><li>xfs:&nbsp; 68-72MB/s write, 72-77MB/s read</li></ul><br />While the best of the ext3 read results approached similar levels to what xfs was capable of, on average it did much better.&nbsp; And the write results were at least 25% better in all cases.&nbsp; I liked the tighter, more predictable throughput as well; inconsistent performance is something I often struggle with on ext3.<br /><br />I'm not normally one to be an early adopter of new Linux releases, but the RHEL6 beta with full XFS support has replaced the thorougly underwhelming new Ubuntu release at the top of my list of OSes to install next.&nbsp; It's not often you see filesystem technology get a second chance to impress, but XFS seems to have made an unexpected transition back to completely relevant again, for now.&nbsp; I'm not sure how long that will be true, with both ext4 available already and btrfs coming closer to production quality by recently reaching a <a href="https://btrfs.wiki.kernel.org/index.php/Main_Page">stable disk format</a>.&nbsp; It will be interesting to see how this reinvigorated set of filesystem choices on Linux plays out. <br />]]>
        
    </content>
</entry>

<entry>
    <title>AMD, Intel, and PostgreSQL</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/04/amd-intel-and-postgresql.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.85</id>

    <published>2010-04-14T19:00:27Z</published>
    <updated>2010-04-14T19:31:29Z</updated>

    <summary><![CDATA[A few weeks ago I presented an updated 2010 version of my talk on database hardware benchmarking at PG East; slides available from our talks page.&nbsp; CPU and memory performance are particularly important for a PostgreSQL database, because every individual...]]></summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[A few weeks ago I presented an updated 2010 version of my talk on database hardware benchmarking at PG East; slides available from our <a href="http://projects.2ndquadrant.com/talks">talks page</a>.&nbsp; CPU and memory performance are particularly important for a PostgreSQL database, because every individual query runs as a single process.&nbsp; Therefore, the speed of your fastest core determines how fast any one query can execute at, and in modern systems that's quite likely to bottleneck based on memory speed.<br /><br />One of the things that's obvious from recent memory speed results is that all of AMD's processors have been stuck in a distant second place for almost 18 months now.&nbsp; While AMD continues to use DDR2-800, Intel's "Nehalem" processors, shipping in volume since early 2009, have been adopting increasingly fast DDR3 in good performing multi-channel configurations--the exact area AMD used to be the king of.&nbsp; In the normal single or dual core server configuration, Intel has had such a lead that it's been impossible to recommend them for anything but a completely disk-bound workload for some time now.<br /><br />Like many commentaries on PC hardware, my suggestions were only cutting edge for...drumroll please...one week.&nbsp; Basically, the minute my talk was over, AMD released a new line of 12-core processors that use DDR-1333, and they've closed most of the gap with Intel again.&nbsp; In raw memory performance, they've increased memory performance <a href="http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/5">130%</a> over their earlier design, and actually pulled ahead on that low-level benchmark.<br /><br />How about database workloads?&nbsp; One of the supporting bits of data I pointed to for how much the CPU/memory performance could impact a database workload were the Oracle Charbench "Calling Circle" OLTP benchmark results run by AnandTech.&nbsp; Their <a href="http://it.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon/8">new Calling Circle results</a> show where the market is at now<strong></strong>.&nbsp; Intel still owns the top part of the market, but AMD's results with their Opteron 6174 are back to respectable.&nbsp; <br /><br />If you have a workload where more cores is what you need most of the time, the new processors from AMD could be just what you're looking for.&nbsp; Fast enough for single queries again, scaling up quite well to handle workloads with many clients.&nbsp; Memory technology really matters, and you should make sure to note (and benchmark yourself!) the speed of any system you're considering or using to make sure it's appropriate for your workload.<br /><br />How long will this situation continue?&nbsp; Well, Intel's next big server processor refresh, codenamed <a href="http://en.wikipedia.org/wiki/Intel_Sandy_Bridge_%28microarchitecture%29">Sandy Bridge</a>, is expected by the end of 2010.&nbsp; Progress marches on.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Installing Greenplum Single Node Edition on Amazon&apos;s EC2</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/03/installing-greenplum-sne-ec2.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.82</id>

    <published>2010-03-23T17:00:02Z</published>
    <updated>2010-03-30T05:38:43Z</updated>

    <summary>I have been thinking for a while now about adding Greenplum support to an open-source application for web analytics that I wrote a few years ago, which is called ht://Miner and uses PostgreSQL. In order to do this, I need...</summary>
    <author>
        <name>Gabriele Bartolini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="Gabriele&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Greenplum" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="greenplumsneonamazonsec2" label="Greenplum SNE on Amazon&apos;s EC2" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>I have been thinking for a while now about adding Greenplum support to an open-source application for web analytics that I wrote a few years ago, which is called <a href="http://www.htminer.org/">ht://Miner</a> and uses PostgreSQL.</p>

<p>In order to do this, I need a multi-CPU environment. While still waiting to get our new servers installed here in our data centre in Italy, I decided to look at <a href="http://aws.amazon.com/ec2/">Amazon's Elastic Compute Cloud (EC2) infrastructure</a>. My intention is to do some benchmarking and spot the main differences in terms of performances between <a href="http://www.greenplum.com/products/single-node/">Greenplum Single Node Edition</a> and <a href="http://www.postgresql.org/">PostgreSQL 8.4</a>, my favourite DBMS.</p>

<p>If you wish to follow this article, you need to have an Amazon AWS account with a valid credit card. Do not worry, this test will only cost you a couple of dollars!</p>
]]>
        <![CDATA[<p>Greenplum SNE is a free version of the <a href="http://www.greenplum.com/products/greenplum-database/">Greenplum</a> database, one of the most advanced solutions for data warehousing and analytics, which is based on a shared nothing architecture and allows for data distribution and parallel processing on several nodes (servers).</p>

<p>The Single Node edition of Greenplum is a freely distributed version of Greenplum which can be installed on a single node. On a multi-processor architecture, Greenplum Single Node Edition allows to create multiple <strong>segments</strong> (usually one per core) and hence to take advantage of parallel processing. <a href="http://www.greenplum.com/products/single-node/">Greenplum Single Node Edition can be downloaded for free</a> from the main website.</p>

<p>My intention is to install it on a <strong><a href="http://aws.amazon.com/ec2/#instance">Large Instance</a></strong> running CentOS Linux 5.4 on Amazon. EC2's large instance has the following characteristics:</p>

<ul>
<li>7.5 GB of memory</li>
<li>4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)</li>
<li>850 GB of local instance storage</li>
<li>64-bit platform</li>
</ul>

<p>I also decided to get a 10GB volume of Elastic Block Store (1 dollar a month), which I will format using the XFS file system. This volume will contain Greenplum data directories (this time I will try with just one single volume - next time I will try with a volume per segment).</p>

<p>The first step is to log into your <a href="https://console.aws.amazon.com/ec2/home">Amazon AWS management console</a>. Get your 10GB EBS volume and then launch a large instance using the <code>ami-ebe4cf9f</code> AMI file (AMI stands for Amazon Machine Image), a <a href="http://support.rightscale.com/18-Release_Notes/02-AMI/RightImages_Release_Notes">CentOS 5.4 image file distributed by RightScale</a> for a 64 bit architecture. You may have a different code, as I use a Europe based server.</p>

<p>I then attach the created volume to the instance I just started. The management console informs me that the volume has been attached on <code>/dev/sdf</code>. I grab the public DNS information and connect to the server via ssh as root, using my EC2 identity.</p>

<p>I install the YUM packages for XFS support, by running:</p>

<pre><code>yum install kmod-xfs.x86_64 xfsprogs xfsdump</code></pre>

<p>I create a primary partition on /dev/sdf using fdisk and format it:</p>

<pre><code>mkfs -t xfs /dev/sdf1 </code></pre>

<p>I then add the entry to <code>/etc/fstab</code>:</p>

<pre><code>/dev/sdf1 /greenplum xfs noatime 0 0</code></pre>

<p>and mount the partition on the <code>/greenplum</code> mount point:</p>

<pre><code>mkdir /greenplum
mount /greenplum</code></pre>

<p>Download Greenplum's Quickstart guide from the download area. Grab the URL of the 64bit RedHat installation of Greenplum and download it from the EC2 server using <code>wget</code> (or upload it from your computer using <code>scp</code>).</p>

<p>Follow the instructions on the quickstart guide about preparing your system to Greenplum (in particular kernel settings and limits).</p>

<p>Unzip the Greenplum's zip file and execute the .bin file. Answer yes to all the questions and Greenplum at the end of the process is installed in the <code>/usr/local/greenplum-db</code> directory.</p>

<p>Create the <code>gpadmin</code> user and set the password:</p>

<pre><code>useradd gpadmin
passwd gpadmin</code></pre>

<p>Prepare the data directories for the master and the segments:</p>

<pre><code>mkdir -p /greenplum/master
mkdir -p /greenplum/segment1
mkdir -p /greenplum/segment2
chown -R gpadmin:gpadmin /greenplum
</code></pre>

<p>Become <code>gpadmin</code> using the <code>su</code> command and include <code>source /usr/local/greenplum-db/greenplum_path.sh</code> into gpadmin's ~/.bashrc file. Load these settings. Edit the ~/single_host_file file, add <code>localhost</code> to its contents and launch:

 <pre><code>gpssh-exkeys -f ~/single_host_file</code></pre>

<p>Create the <code>~/gp_init_config</code> file with the following content:</p>

<pre><code>ARRAY_NAME="Greenplum" 
MACHINE_LIST_FILE=/home/gpadmin/single_host_file
SEG_PREFIX=gp
PORT_BASE=50000
declare -a DATA_DIRECTORY=(/greenplum/segment1 /greenplum/segment2)
MASTER_HOSTNAME=localhost
MASTER_DIRECTORY=/greenplum/master
MASTER_PORT=5432
ENCODING=UNICODE
</code></pre>

<p>Finally launch:</p>

<pre><code>gpinitsystem -c ~/gp_init_config</code></pre>

<p>At the end of the process, Greenplum SNE edition is installed on your Amazon's EC2 server running CentOS 5.4. On this server you can test the solution at quite a reasonable price (I was on the server for 7 hours today and I spent only 3 dollars).</p>

<p>I will post a few more articles on this topic in the next few days, and hopefully I will be able to post the first benchmarks too. Enjoy!</p>
]]>
    </content>
</entry>

<entry>
    <title>PGEast, Hardware Benchmarking, and the PG Performance Farm</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/03/pgeast-hardware-benchmarking-a.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.80</id>

    <published>2010-03-11T18:47:38Z</published>
    <updated>2010-03-11T19:54:54Z</updated>

    <summary><![CDATA[Today is the deadline for the special room rate at the hotel hosting this month's PostgreSQL Conference East 2010.&nbsp; If you've been procrastinating booking a spot at the conference, as of tomorrow that will start costing you.My talk is on...]]></summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United States News" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="pgbench" label="pgbench" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[Today is the deadline for the <a href="http://www.postgresqlconference.org/east/2010/accommodations">special room rate</a> at the hotel hosting this month's PostgreSQL Conference East 2010.&nbsp; If you've been procrastinating booking a spot at the conference, as of tomorrow that will start costing you.<br /><br />My talk is on <a href="http://postgresqlconference.org/2010/east/talks/database_hardware_benchmarking">Database Hardware Benchmarking</a> and is scheduled for late afternoon on the first day, Thursday March 25th.&nbsp; Those who might have seen this talk before, either live at <a href="http://www.pgcon.org/2009/schedule/events/152.en.html">PGCon 2009</a> or via the video link available there, might be wondering if I'm going to drag out the same slides and talk again.&nbsp; Not the case; while the general philosophy of the talk ("trust no one, run your own benchmarks") stays the same, the examples and test mix suggested have been updated to reflect another year worth of hardware advances, PostgreSQL work, and my own research during that time.&nbsp; The Intel vs. AMD situation in particular has changed quite a bit, requiring a new set of memory benchmarks to really follow what's going on now.<br /><br />And PostgreSQL 9.0 fixed a major problem that kept it from normally delivering accurate results on Linux, due to a <a href="http://kerneltrap.org/mailarchive/linux-kernel/2008/5/21/1899434">kernel regression</a> that made much worse an already far too common situation:&nbsp; it's easy for a single pgbench client to become the bottleneck when running it, rather than the database itself.&nbsp; The review I did for <a href="http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907291918380.19638@westnet.com">multi-threaded pgbench</a> (which can also be multi-process pgbench on systems that don't support threads) suggested a solid &gt;30% speedup even on systems that didn't have the bad kernel incompatibility on them.&nbsp; Subsequent testing suggests it can easily take 8 pgbench processes to get full throughput out of even inexpensive modern processors under recent Linux kernels.&nbsp; I'll go over exactly how that ends up playing out on such systems, and how this new feature makes it possible again to use pgbench as the primary way to measure CPU performance running the database.<br /><br /><br />Recently I've also made an updated to the <a href="http://github.com/gregs1104/pgbench-tools">git repo for pgbench-tools</a> that adds working support for PostgreSQL 8.4 and basic 9.0 compatibility, and the next update will include support for the multi-threaded option now that I've mapped out how that needs to work.&nbsp; This is all leading somewhere.&nbsp; Once we have accurate measurements for PostgreSQL performance that are CPU limited on the server side, something that hasn't often been the case for over two years now, those again become a useful way to monitor for performance regressions in the PostgreSQL codebase.&nbsp; The tests included will need to expand for that to cover more eventually, but for now we've reached a point where pgbench can be used to find regressions that impact how fast simple SELECT statements execute.&nbsp; I know that works as expected, because every time I accidentally build PostgreSQL with assertions on that's caught because I see the average processing rate drop dramatically.<br /><br />Once I've got a couple of systems setup here to test for such regressions, the question becomes how to automate what I'm doing, and then to do the same thing against a wider range of build checkouts.&nbsp; Ideally, you'd be able to see a graph of average SELECT performance each day, broken down by version, so that when a commit that reduced it was introduced it would immediately be obvious when the performance dropped.&nbsp; This is the dream goal for building a performance farm similar to the <a href="http://buildfarm.postgresql.org/">PostgreSQL buildfarm</a>.&nbsp;&nbsp;&nbsp; The pieces are almost all together now:&nbsp; my pgbench parts are wrapping up, extensions to the buildfarm to make it speak directly to git are moving along (not a requirement, but nobody working on this project wants to use CVS if we can avoid it), and the main thing missing at this point is someone to put the time in to integrate what I've been doing into a buildfarm-like client.<br /><br />And it looks like we now have a corporate sponsor willing to help with that chunk of work, who I'll let take credit for when we're all done, and that's scheduled to happen this summer.&nbsp; I fully expect that PostgreSQL 9.1 development, and 9.0 backpatching, is going to happen with an early performance farm in place to guard against performance regressions.&nbsp; If we can backport the new multi-threaded pgbench to older PostgreSQL versions we might include them in the mix as well.&nbsp; I already have a backport of the 8.3 pgbench, which has a lot of improvements, I maintain just for testing 8.2 systems.&nbsp; With pgbench as a fairly standalone contrib module, it's possible to build a later one different from the rest of the system, so long as it doesn't expect newer database features to exist too.<br /><br />If that's something you're interested in, my talk at the conference is going to map out the foundations I expect it to be built on.&nbsp; Regardless, hope you can make it to conference and enjoy the long list of talks being presented there.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Trade-offs in Hot Standby Deployments</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.79</id>

    <published>2010-02-24T07:25:39Z</published>
    <updated>2010-02-24T18:46:44Z</updated>

    <summary>The new Hot Standby feature in the upcoming PostgreSQL 9.0 allows running queries against standby nodes that previously did nothing but execute a recovery process. Two common expectations I&apos;ve heard from users anticipating this feature is that it will allow...</summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="hotstandbyusergroup" label="Hot Standby User Group" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>The new Hot Standby feature in the upcoming PostgreSQL 9.0 allows running queries against standby nodes that previously did nothing but execute a recovery process.  Two common expectations I've heard from users anticipating this feature is that it will allow either distributing short queries across both nodes, or allow running long reports against the standby without using resources on the master.  These are both possible to do right now, but unless you understand the trade-offs involved in how Hot Standby works there can be some unanticipated behavior here.</p>

<h2>Standard Long-running Queries</h2>

<p>One of the traditional problems in a database using MVCC, like PostgreSQL, is that a long-running query has to keep open a resource--referred to as a <a href="http://www.postgresql.org/docs/current/static/transaction-iso.html">snapshot</a> in the current Postgres implementation--to prevent the database from removing data the query needs to operate.  For example, just because another client has deleted a row and committed, if an already running query needs that row to complete you can't actually wipe the physical disk blocks related to that row out just yet.  You have to wait until no open queries that expect that row to be visible are still around.</p>

<h2>Hot Standby Limitations</h2>

<p>If you have a long-running query you want Hot Standby to execute, there are a couple of types of bad things that can happen when the recovery process is applying updates.  These are described in detail in the <a href="http://developer.postgresql.org/pgdocs/postgres/hot-standby.html">Hot Standby Documentation</a>.  Some of these bad things will cause queries running on the standby to be canceled for reasons that might not be intuitively obvious:</p>

<ul>
<li>A HOT update or VACUUM related update arrives to delete something that query expects to be visible</li>
<li>A B-tree deletion appears</li>
<li>There is a locking issue between the query you're running and what locks are required for the update to be processed.</li>
</ul>

<p>The lock situation is difficult to deal with, but not very likely to happen in practice for all that long if you're just running read-only queries on the standby, because those will be isolated via MVCC.  The other two are not hard to run into.  The basic thing to understand is that <em>any</em> UPDATE or DELETE on the master can lead to interrupting any query on the standby; doesn't matter if the changes even relate to what the query is doing.</p>

<h2>Good, fast, cheap:  pick two</h2>

<p>Essentially, there are three things people might want to prioritize:</p>

<ol>
<li>Avoid master limiting:  Allow xids and associated snapshots to advance unbounded on the master, so that VACUUM and similar cleanup isn't held back by what the standby is doing</li>
<li>Unlimited queries:  Run queries on the slave for any arbitrary period of time</li>
<li>Current recovery:  Keep the recovery process on the standby up to date with what's happening on the master, allowing fast fail-over for HA</li>
</ol>

<p>In any situation with Hot Standby, it's literally impossible to have all three at once.  You can only pick your trade-off.  The tunable parameters available already let you optimize a couple of ways:</p>

<ul>
<li>Disabling all these delay/defer settings optimizes for always current recovery, but then you'll discover queries are more likely to be canceled than you might expect.</li>
<li><em>max_standby_delay</em> optimizes for longer queries, at the expense of keeping recovery current.  This delays applying updates to the standby once one that will cause a problem (HOT, VACUUM, B-tree delete, etc.) appears.
<li><em>vacuum_defer_cleanup_age</em> and some snapshot hacks can introduce some master limiting to improve on the other two issues, but with a weak UI to do that.  vacuum_defer_cleanup_age is in units of transaction IDs.  You need to have some idea the average amount of xid churn on your system per unit of time to turn the way people think about this problem ("defer by at least 1 hour so my reports will run") into a setting for this value.  xid consumption rate just isn't a common or even reasonable thing to measure/predict.  Alternately, you can open a snapshot on the primary before starting a long-running query on the standby.  dblink is suggested in the Hot Standby documentation as a way to accomplish that.  Theoretically a daemon on the standby could be written in user-land, living on the primary, to work around this problem too (Simon has a basic design for one).  Basically, you start a series of processes that each acquire a snapshot and then sleep for a period before releasing it.  By spacing out how long they each slept for you could ensure xid snapshots never advanced forward too quickly on the master.  It should already sound obvious how much of a terrible hack this would be.</li>
</ul>

<h2>Potential Improvements</h2>

<p>The only one of these you can really do something about cleanly is tightening up and improving the UI for the master limiting.  That turns this into the traditional problem already present in the database:  a long-running query holds open a snapshot (or at least limits the advance of visibility related transaction IDs) on the master, preventing the master from removing things needed for that query to complete.  You might alternately think of this as an auto-tuning vacuum_defer_cleanup_age.</p>

<p>The question is how to make the <em>primary</em> respect the needs of long running queries on the <em>standby</em>.  This might be possible if more information about the transaction visibility requirements of the standby were shared with the master.  Doing that sort of exchange would really be something more appropriate for the new Streaming Replication implementation to share.  The way a simple Hot Standby server is provisioned does not provide any feedback toward the master suitable for this data to be exchanged, besides approaches like the already mentioned dblink hack.</p>

<p>With PostgreSQL 9.0 just reaching a fourth alpha release, there may still be time to see some improvements in this area yet before the 9.0 release.  It would be nice to see Hot Standby and Streaming Replication really integrated together in a way that accomplishes things that neither is fully capable of doing on their own before coding on this release completely freezes.</p>]]>
        
    </content>
</entry>

<entry>
    <title>2ndQuadrant US Launch Party - February 12 New York City - Canceled</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/02/2ndquadrant-us-launch-party.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.78</id>

    <published>2010-02-10T15:50:15Z</published>
    <updated>2010-02-12T12:57:53Z</updated>

    <summary>Due to the weather related travel issues this week, the launch party we had planned for this Friday has been canceled....</summary>
    <author>
        <name>Gabriele Bartolini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="International News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United Kingdom News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United States News" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="2ndquadrantuslaunchparty" label="2ndQuadrant US launch party" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[Due to the weather related travel issues this week, the launch party we had planned for this Friday has been canceled.<br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Measuring PostgreSQL Checkpoint Statistics</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html" />
    <id>tag:blog.2ndquadrant.com,2010:/en//3.75</id>

    <published>2010-01-29T07:25:26Z</published>
    <updated>2010-01-29T18:29:41Z</updated>

    <summary>Checkpoints can be a major drag on write-heavy PostgreSQL installations. The first step toward identifying issues in this area is to monitor how often they happen, which just got an easier to use interface added to the database recently....</summary>
    <author>
        <name>Greg Smith</name>
        <uri>http://www.2ndQuadrant.us/</uri>
    </author>
    
        <category term="Greg&apos;s PlanetPostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="PostgreSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="postgresqlperformance" label="postgresql performance" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>Checkpoints can be a major drag on write-heavy PostgreSQL installations.  The first step toward identifying issues in this area is to monitor how often they happen, which just got an easier to use interface added to the database recently.</p>]]>
        <![CDATA[<p>Checkpoints are periodic maintenance operations the database performs to make sure that everything it's been caching in memory has been synchronized with the disk.  The idea is that once you've finished one, you can eliminate needing to worry about older entries placed into the write-ahead log of the database.  That means less time to recover after a crash.</p>

<p>The problem with checkpoints is that they can be very intensive, because to complete one requires writing every single bit of changed data in the database's buffer cache out to disk.  There were a number of features added to PostgreSQL 8.3 that allow you to better monitor the checkpoint overhead, and to lower it by spreading the activity over a longer period of time.  I wrote a long article about those changes called  <a href="http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm">Checkpoints and the Background Writer</a> that goes over what changed, but it's pretty dry reading.</p>

<p>What you probably want to know is how to monitor checkpoints on your production system, and how to tell if they're happening too often.  Even though things have improved, "checkpoint spikes" where disk I/O becomes really heavy are still possible even in current PostgreSQL versions.  And it doesn't help that the default configuration is tuned for very low disk space and fast crash recovery rather than performance.  The checkpoint_segments parameter that's one input on how often a checkpoint happens defaults to 3, which forces a checkpoint after only 48MB of writes.</p>

<p>You can find out checkpoint frequency two ways.  You can turn on log_checkpoints and watch what happens in the logs.  You can also use the pg_stat_bgwriter view, which gives a count of each of the two sources for checkpoints (time passing and writes occurring) as well as statistics about how much work they did.</p>

<p>The main problem with making that easier to do is that until recently, it's been impossible to reset the counters inside of pg_stat_bgwriter.  That means you have to take a snapshot with a timestamp on it, wait a while, take another snapshot, then subtract all the values to derive any useful statistics from the data.  That's a pain.</p>

<p>Enough of a pain that I  <a href="http://archives.postgresql.org/message-id/4B4F8A96.5080004@2ndquadrant.com">wrote a patch</a> to make it easier.  With the current development version of the database, you can now call pg_stat_reset_shared('bgwriter') and pop all these values back to 0 again.  This allows following a practice that used to be common on PostgreSQL.  Before 8.3, there was a parameter named stats_reset_on_server_start you could turn on.  That reset all of the server's internal statistics each time you started it.  That meant that you could call the handy pg_postmaster_start_time() function, compare with the current time, and always have an accurate count in terms of operations/second of any statistic available on the system.</p>

<p>It's still not automatic, but now that resetting these shared pieces is possible you can do it yourself.  The first key is to integrate statistics clearing into your server startup sequence.  A script like this will work:</p>

<pre><code>
pg_ctl start -l $PGLOG -w
psql -c "select pg_stat_reset();"
psql -c "select pg_stat_reset_shared('bgwriter');"
</code></pre>

<p>Note the "-w" on the start command there--that will make pg_ctl wait until the server is finished starting before it returns, which is vital if you want to immediately execute a statement against it.</p>

<p>If you've done that, and your server start time is essentially the same as when the background writer stats started collection, you can now use this fun query:</p>

<pre><code>
SELECT 
  total_checkpoints,
  seconds_since_start / total_checkpoints / 60 AS minutes_between_checkpoints
FROM 
  (SELECT 
      EXTRACT(EPOCH FROM (now() - pg_postmaster_start_time())) AS seconds_since_start
      (checkpoints_timed+checkpoints_req) AS total_checkpoints 
    FROM pg_stat_bgwriter
  ) AS sub;
</code></pre>

<p>And get a simple report of exactly how often checkpoints are happening on your system.  The output looks like this:</p>

<pre><code>
total_checkpoints           | 9
minutes_between_checkpoints | 3.82999310740741
</code></pre>

<p>What you do with this information is stare at the average time interval and see if it seems too fast.  Normally, you'd want a checkpoint to happen no more than every five minutes, and on a busy system you might need to push it to ten minutes or more to have a hope of keeping up.  With this example, every 3.8 minutes is probably too fast--this is a system that needs checkpoint_segments to be higher.</p>

<p>Using this technique to measure the checkpoint interval lets you know if you need to increase the checkpoint_segments and checkpoint_timeout parameters in order to achieve that goal.  You can compute the numbers manually right now, and once 9.0 ships it's something you can consider making completely automatic--so long as you don't mind your stats going away each time the server restarts.</p>

<p>There are some other interesting ways to analyze the data the background writer provides for you in pg_stat_bgwriter, but I'm not going to give away all of my tricks today.</p>]]>
    </content>
</entry>

<entry>
    <title>Hot Standby User Group (online) - 13 Jan at 1600UTC</title>
    <link rel="alternate" type="text/html" href="http://blog.2ndquadrant.com/en/2010/01/hot-standby-user-group-online.html" />
    <id>tag:blog.2ndquadrant.it,2010:/en//3.71</id>

    <published>2010-01-11T10:54:49Z</published>
    <updated>2010-01-15T10:34:31Z</updated>

    <summary>Come on-line and discuss the features set of Hot Standby for PostgreSQL 8.5. Meeting is for anybody planning to actively use the Hot Standby in the next main release of PostgreSQL. Major focus is on providing user feedback direct to...</summary>
    <author>
        <name>Gabriele Bartolini</name>
        <uri>http://www.2ndQuadrant.it/</uri>
    </author>
    
        <category term="International News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United Kingdom News" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="United States News" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="hotstandbyusergroup" label="Hot Standby User Group" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blog.2ndquadrant.com/en/">
        <![CDATA[<p>Come on-line and discuss the features set of Hot Standby for PostgreSQL 8.5.<br />
<br />
Meeting is for anybody planning to actively use the Hot Standby in the
next main release of PostgreSQL. Major focus is on providing user
feedback direct to developers to guide the final weeks of development
before we go into Beta.<br />
<br /> </p>
]]>
        <![CDATA[<p>Topics:<br />
* Overview of Hot Standby - Simon<br />
* Lightning Talks<br />
* User/tester feedback to developers<br />
* Planned enhancements - Simon<br />
* Questions &amp; Answers<br />
* Roundup &amp; Development priorities<br />
<br />
Lightning Talks from users and developers are welcome. These will be
limited to 5 mins each and should cover topics such as specific
implementation architecture, test results on performance, failover
timing, usage experience or anything directly and solely related to Hot
Standby.<br />
Meeting is at 1600UTC on 13 Jan 2010.<br />
Please allow 1-2 hours.<br />
Please come and give your feedback directly to the developers. Help us make PostgreSQL 8.5 even better.<br />
Please register via <a href="http://2ndquadrant.com/about/contact/">http://www.2ndQuadrant.com/</a>. Attendee numbers may be limited.<br />
Meeting is in English, will be hosted by Simon Riggs and other project contributors. <br /></p>
]]>
    </content>
</entry>

</feed>
