Friday, September 22

Author: Mark Wong

Visualizing sar data

Mark's PlanetPostgreSQL
Hopefully you are now regularly collecting system statistics after reading Tomas Vondra’s defense of using sar.  If you don’t have anything in place that readily visualizes the data, I have some examples that you can hopefully build off on to see everything you need to see. In addition to sar, the sysstat package also has a utility called sadf to display collected data.  Simply running sadf without any arguments will show you the recently aggregated processor utilization data resulting in: gentoo 599 2017-05-25 17:10:01 UTC all %user 13.03 gentoo 599 2017-05-25 17:10:01 UTC all %nice 0.01 gentoo 599 2017-05-25 17:10:01 UTC all %system 5.16 gentoo 599 2017-05-25 17:10:01 UTC all %iowait 0.47 gentoo 599 2017-05-25 17:10:01 UTC all %steal 0.00 gentoo 599 2017-05-25 17:10:01 (more…)

SCALE 14x

Mark's PlanetPostgreSQL
I had the pleasure of speaking about PostgreSQL performance at The 14th Annual Southern California Linux Expo and meeting many enthusiastic database users in the expo hall. There was a great turnout of people attending the conference, and many PostgreSQL presentations to see. Simon Riggs also talked about SERIALIZABLE, Eventual Consistency and Really Scary Stuff. Thanks to everyone who came to talk with me, Peter, Simon and Karen. We are looking forward to next year (more…)

Testing Postgres-XL with DBT-3

Mark's PlanetPostgreSQL
As people have been working on merging Postgres-XL with the PostgreSQL 9.5 branch, I’ve been trying out Postgres-XL’s ability to run a decision support-like workload.  The DBT-3 test kit, originally developed at the OSDL as a TPC-H derivative, has some new changes so that it can run against Postgres-XL. I’ll describe how to use DBT-3 and what you can expect it to tell you. What is DBT-3? DBT-3 is an open source derivative implementation of the TPC Benchmark(™) H, which is a business oriented ad-hoc decision support workload simulating a parts supplier’s data warehouse.  The purpose  of DBT-3 is to help characterize system performance to aid in the development of open source solutions.  In additional to providing a synthetic workload, this test kit gathers database and (more…)

Index Overhead on a Growing Table

Mark's PlanetPostgreSQL
This another simple test in continuation from last time. We will start with the same lineitem table as in the previous example. We will measure the time it takes to load the same 7.2GB text file repeatedly until the table size grows to about 1TB. We create a baseline with a table that has no indexes built on it. Then repeat with a B-tree index on the l_shipdate DATE column, and again after replacing the B-tree index with a BRIN index. Our baseline shows that as the table grows the time it takes to insert data also increases. The difference in the time that it takes to insert data when the table is near 1TB compared to when it is empty is about 12 seconds. With the B-tree index in place the difference increases to 84 seconds. Finally the change is only about 15 seconds with (more…)

Loading Tables and Creating B-tree and Block Range Indexes

Mark's PlanetPostgreSQL
I have been looking at the new Block Range Indexes (BRIN) being developed for PostgreSQL 9.5. BRIN indexes are designed to provide similar benefits to partitioning, especially for large tables, just without the need to declare partitions. That sounds pretty good but let's look in greater detail to see if it lives up to the hype. How large? Here's one data point. Using the TPC Benchmark(TM) H provided dbgen we created data for the lineitem table at the 10GB scale factor, which results in a 7.2GB text file. We're going to compare a couple of basic tasks. The first look will be at the impact of inserting data into a table using the COPY command. We will do a simple experiment of creating a table without any indexes or constraints on it and time how long it takes to load the (more…)