Wednesday, January 16

How not to build PostgreSQL 9.0 extensions on RPM platforms

For a long time, adding packages to RedHat derived Linux systems has been called “RPM Hell”, for good reason.  Particularly before the yum utility came about to help, getting RPM to do the right thing has often been a troublesome task.  I was reminded of this again today, while trying to compile a PostgreSQL extension on two nearly identical CentOS systems.

PostgreSQL provides an API named PGXS that lets you build server extensions that both leverage the code library of the server and communicate with it.  We use PGXS to install our repmgr utility, and having that well defined API let the program be developed externally from the main server core.  Many popular pieces of PostgreSQL add-ons rely on PGXS to build themselves.  In fact, the contrib modules that come with PostgreSQL itself are often built this way.  Grabbing a similar contrib module and hacking on it from there is a well trod path toward building a new PostgreSQL extension.

PGXS relies upon the pg_config utility being in your PATH.  pg_config comes with the postgresql-devel package, which nowadays is actually named postgresql90-devel.  Unfortunately it’s not in the path for anyone by default.  So the first step you need to build using PGXS is make it there.  Something like this will work for most UNIX systems:

export PATH=”/usr/pgsql-9.0/bin:$PATH”

Here’s how building repmgr looked on the working system:

[[email protected] repmgr]$ make USE_PGXS=1
gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector –param=ssp-buffer-size=4 -m64 -mtune=generic -I/usr/include/et -DLINUX_OOM_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -I/usr/pgsql-9.0/include -I. -I. -I/usr/pgsql-9.0/include/server -I/usr/pgsql-9.0/include/internal -I/usr/include/et -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o dbutils.o dbutils.c

This includes –m64 -mtune=generic, which are the gcc options to say build for a 64 bit platform, but let the compiler figure out exactly which one you are on relative to the other restrictions (see the gcc documentation for details).  Nowadays the result normally comes out optimized for x86_64 if you have a 64-bit system.  The auto-detection was more useful back when the choices were i386, i468, i586, and i686.

Onto the troublesome system.  I thought I’d put PostgreSQL on here identically, yet the build didn’t work at all:

gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector –param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -I/usr/include/et -DLINUX_OOM_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -I/usr/pgsql-9.0/include -I. -I. -I/usr/pgsql-9.0/include/server -I/usr/pgsql-9.0/include/internal -I/usr/include/et -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o dbutils.o dbutils.c

/usr/bin/ld: skipping incompatible /usr/pgsql-9.0/lib/libpq.so when searching for -lpq
/usr/bin/ld: skipping incompatible /usr/lib64/libtermcap.so when searching for -ltermcap
/usr/bin/ld: skipping incompatible /usr/lib64/libtermcap.a when searching for -ltermcap
/usr/bin/ld: cannot find -ltermcap
collect2: ld returned 1 exit status

What?  This is trying to build 32 bit code:  “-m32 -march=i386 -mtune=generic”.  Because of that, when it tries to link with all the 64-bit libraries on the server like libpq and libtermcap, it can’t.  How in the world is this happening?

You can see where the information that goes into a PGXS build command is coming from using pg_config.  Here’s how to check the part related to the CFLAGS, the section where the bit size info is located at:

$ pg_config –cflags
-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector –param=ssp-buffer-size=4 -m64 -mtune=generic -I/usr/include/et -DLINUX_OOM_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv

Now I’m pissed.  This is saying build for 64 bits as well, yet it’s still finding 32-bit information.  Where is that coming from?

Some digging into the PGXS interface trying to trace this back eventually let me to /usr/pgsql-9.0/lib/pgxs/src/Makefile.global and here’s what the clue started to show up.  That file listed 32-bit compiler options!  Where did they come from?

At this point I started looking at exactly what RPMs were installed on each server,
because something had to be different between them.  Here’s a handy command to know:

$ rpm -qa –queryformat ‘%{NAME}t%{ARCH}n’  | grep postgres | sort
compat-postgresql-libs    i686
compat-postgresql-libs    x86_64
postgresql90-contrib    x86_64
postgresql90-devel    x86_64
postgresql90-libs    i386
postgresql90-libs    x86_64
postgresql90-server    x86_64
postgresql90    x86_64

RHEL5 is capable of running 32 and 64 bit applications side by side, you just have to be careful to compile them.  So it’s normal that the database compatibility packages compat-postgresql-libs and postgresql90-libs include both architectures.  You might have both 32 and 64 apps that want to talk to the same server.  This is often annoying, for example when you want to delete a package and it tells your request matches more than one and does nothing–you need –allmatches to fix that.

What do we see on the server that won’t compile?  Not quite the same thing:

compat-postgresql-libs    i686
compat-postgresql-libs    x86_64
postgresql90-contrib    x86_64
postgresql90-devel    i386
postgresql90-devel    x86_64
postgresql90-libs    i386
postgresql90-libs    x86_64
postgresql90-server    x86_64
postgresql90    x86_64

What are postgresql90-devel packages for both i386 and x86_64 doing there?  That doesn’t make any sense at all!

Now, after testing to try and make sense of this, if you have either -devel package and try to install the other, it kicks back the right series of errors for files that conflict, like this:

  file /usr/pgsql-9.0/lib/pgxs/src/Makefile.global from install of postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64 conflicts with file from package postgresql90-devel-9.0.2-2PGDG.rhel5.i386

The packager knows perfectly well that they overwrite the same Makefile.global.  How did I end with both?  After wiping everything out I found exactly how:

# yum install postgresql90-devel

=========================================================================
 Package               Arch      Version              Repository    Size
=========================================================================
Installing:
 postgresql90-devel    i386    
  9.0.2-2PGDG.rhel5    pgdg90        1.5 M
 postgresql90-devel    x86_64    9.0.2-2PGDG.rhel5    pgdg90        1.6 M

Transaction Summary
===================
Install       2 Package(s)
Upgrade       0 Package(s)

Total size: 3.1 M
Total download size: 1.5 M
Is this ok [y/N]:

It certainly is not OK!  yum is perfectly happy to combine them, and I must have done that without noticing before.  It turns out that if you do let them both install like this, the copy you’re left with may not report the right information back to PGXS–unsurprisingly, it is confused. That’s how I ended up with my problem.  I was using the Makefile.global installed by the i386 version, but everything else on the system was x86_64.

So how to cleanup?  Given the mix of files here, you can’t really trust that just deleting the unwanted one is enough.  Then you may have no copies left of everything that conflcied.  Only safe choice is to nuke them both, then just install the x86_64 one, now that we know exactly version is available from the test above:

rpm -e postgresql90-devel –allmatches
yum install postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64

With this sorted out, now my PGXS extension builds just fine, and development
on repmgr proceeds again, after a day of lost time to figure this all out.

Lessons for today:  be careful when installing the postgresql90-devel package via yum, and do not let it put both architectures of that file there.  Only use the one that matches the platform of your main postgresql90 package.  And if you are trying to build a PGXS extension on a RHEL/CentOS system, and you see the skipping incompatible library message, start by looking at the PostgreSQL development package(s) you have installed.

We’ll probably get this particular bad combination blocked by future updates to the PostgreSQL 9.0 packages.  I thought it was interesting to share anyway, because there aren’t many good examples of doing troubleshooting like this on RPM.  I once wrote one titled Installing the PostgreSQL 8.2 RPMs on RHEL 5/CentOS 5 that goes through some more of the background here.  But those were simpler days, before 64-bit platforms were popular, and before you could install more than one PostgreSQL version via RPM at the same time.  Knowing the right RPM incantation to list packages installed with their associated architecture is a vital trick nowadays to navigating your way out of RPM hell.

One Comment

  • A few points…I don’t think I’m using ‘usability’ out of coentxt. Programming Languages suffer froimm usability issues as much as normal applications do. The we’re-all-geeks-here attitude hurts language development IMHO. Weird inconsistencies don’t just raise the barrier to learn a language — they also raise the barrier to remember it.I also think you’re vastly underestimating the complexity of C++. Most teenagers ‘conversant’ (I guess I qualify still, I’m 19) in C++ just went through a class that taught them C++ is C with classes. To do real work they’re going to need to understand the intricacies of the differences between pointers, references, auto pointers, and some form of reference counting. They’re also probably going to need to become familiar with Boost, because everytime I ask a question, I end up pointed to Boost. ;)Just setting up a C++ project is a huge ordeal compared to writing a Python script. In python, I can just fire up emacs, type some text into the buffer, and with two keystrokes have it run. With C++ I can either go it straight text editor and have to create a bunch of folders myself, write a Makefile, etc. or use an IDE that will force me to go through the process of creating a project. In both cases, if I want to use some library on my system, like say OpenGL, or QT even, I’m going to have to futz around adding the right compiler/linker flags. Doing this in Visual Studio on windows was painfully unintuitive — and KDevelop isn’t any better about it.I know that design is the biggest killer, not language. I learned that the hard way over the summer, redesigning the app twice requiring lots of recoding. I’m looking at more design challenges coming down the road, and I now know making bad choices can cause me a lot of pain later.But in a good language, that pain should be minimal. A good language should minimize the cost of redesign because design is never correct the first time. Thus I think saying that higher level languages aren’t any better because they can still make bad apps is somewhat of a strawman. You’re right, they’re not a ‘silver bullet’ per se. But they can lower the threshold of laziness that must be passed over by volunteer programmers to do redesign.Do existing high level languages like Python, Ruby, etc. do this perfectly? No. But C++ is overwhelmingly horrible about it.

Leave a Reply

Your email address will not be published. Required fields are marked *