XMLTABLE in PostgreSQL

I just committed a patch by Pavel Stěhule that adds the XMLTABLE functionality to PostgreSQL 10.  XMLTABLE is a very useful feature dictated by the SQL/XML standard, that lets you turn your XML data into relational form, so that you can mix it with the rest of your relational data. This feature has many uses; keep reading for some details on it.

Probably the most interesting use case of XMLTABLE is to extract data from some XML document to insert into a relational table, during in-database ETL processing. However, XMLTABLE can be used on-the-fly on data stored in XML columns, so that once the data is in relational form, you can apply any standard operations you want, such as adding WHERE clauses, doing aggregations, joining to other tables, and so on.

A Simple Example

As an example, let’s suppose you administer a hotel chain, and that the data is stored thusly:

CREATE TABLE hoteldata AS SELECT xml
$$<hotels>
 <hotel id="mancha">
  <name>La Mancha</name>
  <rooms>
   <room id="201"><capacity>3</capacity><comment>Great view of the Channel</comment></room>
   <room id="202"><capacity>5</capacity></room>
  </rooms>
  <personnel>
   <person id="1025">
    <name>Ferdinando Quijana</name><salary currency="PTA">45000</salary>
   </person>
  </personnel>
 </hotel>
  <hotel id="valpo">
  <name>Valparaíso</name>
  <rooms>
   <room id="201"><capacity>2</capacity><comment>Very noisy</comment></room>
   <room id="202"><capacity>2</capacity></room>
  </rooms>
  <personnel>
   <person id="1026"><name>Katharina Wuntz</name><salary currency="EUR">50000</salary></person>
   <person id="1027"><name>Diego Velázquez</name><salary currency="CLP">1200000</salary></person>
  </personnel>
 </hotel>
</hotels>$$ AS hotels;

With XMLTABLE, you can turn this into a relationally-formatted table consisting of room numbers and capacity, annotating for each hotel in your chain:

SELECT xmltable.*
  FROM hoteldata,
       XMLTABLE ('/hotels/hotel/rooms/room' PASSING hotels
                 COLUMNS
                    id FOR ORDINALITY,
                    hotel_name text PATH '../../name' NOT NULL,
                    room_id int [email protected]' NOT NULL,
                    capacity int,
                    comment text PATH 'comment' DEFAULT 'A regular room'
                );
id hotel_name room_id capacity comment
1 La Mancha 201 3 Great view of the Channel
2 La Mancha 202 5 A regular room
3 Valparaíso 201 2 Very noisy
4 Valparaíso 202 2 A regular room

Explaining the syntax

Let’s study the query above. The XMLTABLE clause must go in the FROM part of the query. We also have hoteldata in the FROM, which is what feeds the data into XMLTABLE.

First, the PASSING clause is where we specify the XML data that we want to process. In this case, the data comes from the hotels column in the hoteldata table. We call this the document expression.

Just before the PASSING clause you see an XPath expression '/hotels/hotel/rooms/room'. We call this the row-generating expression or just the row expression.

We have the COLUMNS clause next, declaring a few columns. For each column we indicate a data type as well as an optional PATH clause, which we call the column expression.

XMLTABLE‘s theory of operation is that the row expression is applied to the document expression, slicing the document into pieces to generate rows; for each row so generated, the various column expressions are applied to obtain the values for each column.

The column expression is an XPath expression that obtains a value starting from the XML for the current row. If no PATH is specified, then the column name itself is used as the XPath expression. Note that in the column hotel_name we used a path having “../“, which means to “go up” in the XML document to grab values from the the “container” objects in the document. We can also use xml PATH '.' in a row, which gives us the full source XML for that row.

One column can be marked FOR ORDINALITY. The column is then of type INTEGER, and is numbered sequentially for each row obtained from the document. (If there are multiple input documents, such as when you have multiple rows in a table, the counter starts from 1 for each new document).

There is also a DEFAULT clause. If the XPath for a column does not match a value for a certain row, then the DEFAULT value is used.

Some of these columns have been marked as NOT NULL. If there is no match, and no DEFAULT clause is specified (or the DEFAULT also evaluates to NULL), an error is thrown.

I won’t go into more detail of XPath, which is a powerful language, but I can offer the XPath article in Wikipedia, and the official recommendation document from W3C as useful resources.

The full XMLTABLE syntax

The documented syntax synopsis is:

xmltable( [XMLNAMESPACES(namespace uri AS namespace name[, ...])]
         row_expression PASSING [BY REF] document_expression [BY REF]
         COLUMNS name { type [PATH column_expression] [DEFAULT expr] [NOT NULL | NULL]
                       | FOR ORDINALITY }
                  [, ...]
)

Note that the document expression can be a reference to some a table you have in the FROM clause, or it can be a complete XML document as a string literal. The BY REF clauses don’t have any effect; they are there for compatibility with the standard and with other database systems.

I have not covered the XMLNAMESPACES clause in this post; I’m leaving that for a future installment.

Applying SQL on top

As mentioned, once XMLTABLE has processed the data into relational form, you can do whatever you want using well-known tools. For instance, if you had another XML document with more personnel in each hotel,

INSERT INTO hoteldata VALUES (xml
$$<hotels> 
 <hotel id="mancha">
  <name>La Mancha</name>
  <personnel>
   <person id="1028">
    <name>Sancho Panza</name><salary currency="PTA">35000</salary>
   </person>
  </personnel>
 </hotel>
 <hotel id="valpo">
  <name>Valparaíso</name>
  <personnel>
   <person id="1029"><name>Kurt Werner</name><salary currency="EUR">30000</salary></person>
  </personnel>
 </hotel>
</hotels>$$);

It’s easy to obtain the total salaries for each currency you need to pay at each hotel,

  SELECT hotel, currency, sum(salary)
    FROM hoteldata,
XMLTABLE ('/hotels/hotel/personnel/person' PASSING hotels
       COLUMNS hotel text PATH '../../name' NOT NULL,
               salary integer PATH 'salary' NOT NULL,
               currency text PATH [email protected]' NOT NULL
) GROUP BY hotel, currency;
hotel currency sum
Valparaíso CLP 1200000
Valparaíso EUR 80000
La Mancha PTA 80000

Conclusion

In this article, I have covered the new feature XMLTABLE to appear in PostgreSQL version 10. I think XMLTABLE is a great feature for integrating external data, and I hope you find it valuable too. Please test it and report any problems, so that we can sort them out before the final release. If you like XMLTABLE, be sure to let us know leaving a comment!

This Post Has 3 Comments

  1. That’s great, now we should hack XMLTABLE infrastructure to support JSON_TABLE.

    • alvherre says:

      Yes, thanks!

      One thing I noticed in your patch is that multiple FOR ORDINALITY columns are allowed, but this framework only supports one (because that’s what the standard requires). Does JSON_TABLE require more than one?

Leave A Reply