PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the third post of the series and we’ll talk about timeline issues and their effects on fault tolerance and dependability of PostgreSQL.
If you would like to witness the evolution progress from the beginning, please check the first two blog posts of the series:
Evolution of Fault Tolerance in PostgreSQL
Evolution of Fault Tolerance in PostgreSQL: Replication Phase
The ability to restore the database to a previous point in time creates some complexities which we’ll cover some of the cases by explaining failover (Fig. 1), switchover (Fig. 2) and pg_rewind (Fig (more…)
I had the pleasure to attend PGDay UK last week - a very nice event, hopefully I'll have the chance to come back next year. There was plenty of interesting talks, but the one that caught my attention in particular was Performace for queries with grouping by Alexey Bashtanov.
I have given a fair number of similar performance-oriented talks in the past, so I know how difficult it is to present benchmark results in a comprehensible and interesting way, and Alexey did a pretty good job, I think. So if you deal with data aggregation (i.e. BI, analytics, or similar workloads) I recommend going through the slides and if you get a chance to attend the talk on some other conference, I highly recommend doing so.
For a long time, one of the most known shortcomings of PostgreSQL was the ability to parallelise queries. With the release of version 9.6, this will no longer be an issue. A great job has been done on this subject, starting from the commit 80558c1, the introduction of parallel sequential scan, which we will see in the course of this article.
First, you must take note: the development of this feature has been continuous and some parameters have changed names between a commit and another. This article has been written using a checkout taken on June 17 and some features here illustrated will be present only in the version 9.6 beta2.
Compared to the 9.5 release, new parameters have been introduced inside the configuration file. These are:
max_parallel_workers_per_gather: the (more…)
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the second post of the series and we'll talk about replication and its importance on fault tolerance and dependability of PostgreSQL.
If you would like to witness the evolution progress from the beginning, please check the first blog post of the series: Evolution of Fault Tolerance in PostgreSQL
Database replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system. Keeping a reliable copy of a running system is one of the biggest concerns of redundancy and we all like maintainable, easy-to-use and (more…)
If you're trying to optimise the performance of your PostgreSQL-based application you're probably focusing on the usual tools: EXPLAIN (BUFFERS, ANALYZE), pg_stat_statements, auto_explain, log_statement_min_duration, etc.
Maybe you're looking into lock contention with log_lock_waits, monitoring your checkpoint performance, etc too.
But did you think about network latency? Gamers know about network latency, but did you think it mattered for your application server?
A small peek into the future of what should be arriving for PostgreSQL 9.6.
Today PostgreSQL took a big step ahead in the data warehouse world and we now are able to perform aggregation in parallel using multiple worker processes! This is great news for those of you who are running large aggregate queries over 10's of millions or even billions of records, as the workload can now be divided up and shared between worker processes seamlessly.
We performed some tests on a 4 CPU 64 core server with 256GB of RAM using TPC-H @ 100 GB scale on query 1. This query performs some complex aggregation on just over 600 million records and produces 4 output rows.
The base time for this query without parallel aggregates (max_parallel_degree = 0) is 1375 seconds. If we add a single worker ( (more…)
Although in the future most database servers (particularly those handling OLTP-like workloads) will use a flash-based storage, we're not there yet - flash storage is still considerably more expensive than traditional hard drives, and so many systems use a mix of SSD and HDD drives. That however means we need to decide how to split the database - what should go to the spinning rust (HDD) and what is a good candidate for the flash storage that is more expensive but much better at handling random I/O.
There are solutions that try to handle this automatically at the storage level by automatically using SSDs as a cache, automatically keeping the active part of the data on SSD. Storage appliances / SANs often do this internally, there are hybrid SATA/SAS drives with large HDD and small SSD in (more…)
I'm extremely pleased to see that after months of efforts and contributions by different people around the world, Postgres-XL 9.5 R1 Beta1 has finally arrived. This release is significantly better, in all respects such as performance, stability and high availability, as compared to the past release. Enormous amount of work has gone into PostgreSQL in the last few years and majority of that is now available in this release of Postgres-XL 9.5R1 which is based on the latest PostgreSQL release.
If you're looking for a PostgreSQL based technology that can support terabytes of data, massively parallel processing, supports OLAP and OLTP workloads, offers a consistent view of the entire cluster and is compliant with PostgreSQL as far as client APIs are concerned, have a look at the latest (more…)
One of the problems with Postgres-XL 9.2 is that it assigns a global transaction identifier (GXID) for every transaction that is started in the cluster. Since its hard for the system to know if a transaction is read-only or not (remember, even SELECT queries can do write activities), Postgres-XL would always assign a GXID and send that to the datanodes. This is quite bad for system performance because every read-only transaction now must go to the GTM, start a new transaction, grab an GXID and also finish the transaction on the GTM when its committed or aborted. For short transactions, like read-only pgbench workload, this adds a lot of overhead and latency. The overhead is even more noticeable when the actual work done by the transaction is very fast, say because the data is fully (more…)
PgBouncer is a lightweight connection pooler for PostgreSQL. PgBouncer 1.7 was announced on the 18th of December 2015. In this blog post we'll talk about the major new improvements in PgBouncer. (more…)