Monday, December 11

pgxc_ctl: Teaching Postgres-XL in New York City

At the 2ndQuadrant PostgreSQL conference in New York City this year, I had the pleasure of delivering a training on Postgres-XL together with Andrew Dunstan. We capped the attendees at 8 because we wanted to have a lot of lab time with the students and spend time helping them get set up with Postgres-XL. The class sold out. We had attendees not only from New York City but also from other parts of the United States and one from Europe. I was honored to have folks travel so far to take the course Andrew and I put together, based on work Pavan Deolasee had done previously. We received a lot of positive feedback on the class. Andrew put together a script that enabled all the students to operate on  3 separate 5 node Postgres-XL clusters.

Each of the students could of had their own 5 node cluster but we bumped into a limitation with our purchased Amazon account, which only allowed for a fixed number of virtual machines to be active. However, in a single day everybody was able to configure clusters, add nodes, create tables, run performance tests and get OMNI-db connected to a coordinator node.  (For those of you who don’t know, OMNIdb has a lot of cool support for Postgres-XL as well as Postgres-BDR. A little off topic but I was blown away by how quickly Omni-DB features keep getting released. This coming week I am looking forward to the monitoring dashboard. For those of you who are interested in OMNIdb and Postgres-XL please see a great blog by William Ivanski located here: https://blog.2ndquadrant.com/postgres-xl-omnidb/).

Andrew gave a great presentation on the PostgreSQL query planner. The presentation was based on 2ndQuadrant’s traning material but was augmented to show a good plan for a shared nothing sharded solution as well as a bad plan. Good plan being where portions of a query are distributed to different nodes and aggregated on the coordinator node. A bad plan being where there are a of cross node join operations.

I developed some training material on pgbench and provided students with time to run different benchmarks against Postgres vs. our 5 node pg_ctl cluster (3 data nodes, 1 coordinator and 1 Global Transaction Manager). People were able to come up with demonstrations where Postgres-XL showed linear scalability across the nodes.

At the time of deciding to deliver the course I became a bit concerned that we would struggle to set up the PostgreSQL clusters. It had been years since I tried to setup Postgres-XL and was a bit intimidated by all the potential issues editing configuration files and getting things setup and running properly. I was ecstatic to discover pgxc_ctl. This is a tool that enables you to create a cluster very easily and add and subtract nodes at will. You can do it on a single host or across a group of hosts.  Many thanks to Tomas Vondra and Pavan for pointing this out to me.

To cut right to the commands, we used to set up the clusters here it is (Thanks once again to Tomas):

#!/bin/sh

XLDIR=/var/lib/pgsql/ HOST=localhost
pgxc_ctl prepare config empty
pgxc_ctl add gtm master gtm "$HOST" 20001 "$XLDIR/gtm"
pgxc_ctl add coordinator master coord_1 "$HOST" 30001 31001 "$XLDIR/coord-1" none none
pgxc_ctl add coordinator master coord_2 "$HOST" 30002 31002 "$XLDIR/coord-2" none none
pgxc_ctl add datanode master datanode_1 "$HOST" 40001 41001 "$XLDIR/data-1" none none none
pgxc_ctl add datanode master datanode_2 "$HOST" 40002 41002 "$XLDIR/data-2" none none none
pgxc_ctl monitor all createdb -p 30001 test

The above shell script sets up the cluster we used during the class on a single machine (again, in the class we did this with 5 separate machines). Interestingly enough, I was able to observe  performance wins using Postgres-XL, even on a single machine vs. a Postgres instance on the same machine. This was due to the fact that there were certain queries you can parallelize across Postgres-XL that you couldn’t parallelize across the version of Postgres (I was using Postgres 9.5) and I was using a two CPU machine. Obviously the above script can be modified to set up XL to use multiple hosts. If you are interested in the details of the parameters above, the on line documentation is located here: https://www.postgres-xl.org/documentation/pgxc-ctl.html .  To get to the specifics on the above script go to F.32.12.
One thing I will add, to get the above script working you will need to setup password-less access for the user running the scripts (in this case the postgres user). Obviously, this is not ideal but it can be easily fixed once the cluster is created. With a little more effort you could eliminate this requirement all together. Also, you need to get the XL installation directory in the path of the user. We took the brute force method of just adding the path to the file: /etc/environment. I need to go back and look at why this was required at a later point in time.

It was great to teach the class with Andrew Dunstan, a long time Postgres expert who I learn things from every time I speak with him. Also, it was great to interact with the students as much as I did. We truly had a great group (a note to the students who attended…Thanks for spending the day with us and I hope you enjoyed it as much as Andrew and I did!!). Also, it was awesome to see my buddy Bruce Momjian join in for a short period of time.  As an aside, Bruce was coming by for the conference. He did our opening keynote at the Chicago conference. Magnus did our Keynote in New York City. Simon Riggs, our founder and CTO, did the closing Keynote at both locations. They were all great. I hope to write another blog about our conference in general at a later date.

I would like to close on a personal note. I am the General Manager of North America at 2ndQuadrant. So a significant portion of my time is spent dealing with operational aspects of what we do in North America. However, 2ndQuadrant is a place where everybody has great opportunities to do everything. One of the many things I love about being part of this organization. I have worked at many great places over the years and have alway strived for being in a role where I can balance working with people, working with technology and have the result of doing both of those well be a growing business. No question about it – 2ndQuadrant is such a place!

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *