<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>2ndQuadrant &#187; Gianni&#8217;s PlanetPostgreSQL</title>
	<atom:link href="http://blog.2ndquadrant.com/giannis-planetpostgresql/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.2ndquadrant.com</link>
	<description>PostgreSQL expertise from specialists with a source code level understanding of RDBMS</description>
	<lastBuildDate>Mon, 17 Jun 2013 08:00:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>CTE and the Birthday Paradox</title>
		<link>http://blog.2ndquadrant.com/cte-birthday-paradox/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=cte-birthday-paradox</link>
		<comments>http://blog.2ndquadrant.com/cte-birthday-paradox/#comments</comments>
		<pubDate>Thu, 20 Sep 2012 11:52:40 +0000</pubDate>
		<dc:creator>Gianni Ciolli</dc:creator>
				<category><![CDATA[Gianni's PlanetPostgreSQL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Birthday Paradox]]></category>
		<category><![CDATA[CTE]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[recursive]]></category>

		<guid isPermaLink="false">http://blog.2ndquadrant.com/?p=386</guid>
		<description><![CDATA[An interesting query has been twitted by Will Leinweber from Postgres Open: -- this returns a different result each...]]></description>
			<content:encoded><![CDATA[<p>An <a title="interesting query" href="https://postgres.heroku.com/dataclips/yarorlkmpftaeoscmitucejqxsfe">interesting query</a> has been twitted by Will Leinweber from <a href="http://www.postgresopen.org">Postgres Open</a>:</p>
<pre><code style="font-family: monospace;">-- this returns a different result each time it is ran
with recursive s as (
  select random()
union
  select random() from s
) select count(*) from s;</code></pre>
<p>I like this example: a surprising result, which can be explained by (and indeed helps to explain) CTE behaviour.</p>
<p><span id="more-386"></span></p>
<p>Unexpected truths are denoted with the word &#8220;paradox&#8221;, and in fact this is a manifestation (an &#8220;instance&#8221;, in programmers&#8217; jargon) of what is known as the <strong>Birthday Paradox</strong>.</p>
<p>Its simplest formulation is probably this: if you randomly choose 23 persons, the probability that two of them share the same birthday is greater than 50%.</p>
<p>The result is unexpected, because there are 366 different birthdays, and the number 23 seems very small compared to 366.</p>
<p>However it is correct, as it can be shown with a direct computation. In PostgreSQL we can run another recursive CTE:</p>
<pre><code style="font-family: monospace;">WITH RECURSIVE r(i,acc) AS (
  SELECT 1, 1 :: double precision
UNION
  SELECT i + 1,
    acc * (((366 - i) :: double precision) / 366)
  FROM r WHERE acc > 0.5
) SELECT count(1) FROM r;</code></pre>
<p>producing 23 as the result.</p>
<p>A recursive CTE stops when the recursive step does not add any new rows. In the last query, <code style="font-family: monospace;">acc</code> represents the probability that the first <code style="font-family: monospace;">i</code> birthdays are distinct, so recursion stops when that number is not above 50%.</p>
<p>In the query mentioned at the beginning, which we&#8217;ll call the &#8220;random query&#8221; for short, the recursive CTE terminates when <code style="font-family: monospace;">random()</code> does not add a new row. That is, when the randomly-computed value has already been computed in a previous iteration; that&#8217;s because the recursive CTE is using <code style="font-family: monospace;">UNION</code> instead of <code style="font-family: monospace;">UNION ALL</code>.</p>
<p>This is indeed the Birthday paradox, with 366 replaced by the maximum possible number of distinct values that <code style="font-family: monospace;">random()</code> can produce. What exactly is that number?</p>
<p>The <code style="font-family: monospace;">random()</code> function returns a double precision value, whose exact definition depends on the system. Not all the double precision values can be produced, though; the underlying C function can produce 2^31 different results, regardless of the bit size of a double precision value. This is good enough in practice, and at the same time compatibility with all the various architectures and library implementations is ensured.</p>
<p>So we can replace 366 with 2^31 in our query, and instead of 23 we get 54563 as the answer.</p>
<p>Does it come near to the actual output of the random query? Let us run it a few times, collect the result, and compute the average:</p>
<pre><code style="font-family: monospace;">gianni=# create table t(n int);
CREATE TABLE

gianni=# with recursive s as (
  select random()
union
  select random() from s
) insert into t select count(1) from s;
INSERT 0 1

/* repeat the last query 49 times */

gianni=# select count(1), avg(n) from t;

 count | avg
-------+--------------------
    50 | 54712.060000000000
 (1 row)</code></pre>
<p>The average of the actual results is quite close to the expected threshold of 54563; the difference is less than 0.3%, quite <em>orthodoxically</em>, we might say!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.2ndquadrant.com/cte-birthday-paradox/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>pgChess code published</title>
		<link>http://blog.2ndquadrant.com/pgchess_code_published/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=pgchess_code_published</link>
		<comments>http://blog.2ndquadrant.com/pgchess_code_published/#comments</comments>
		<pubDate>Wed, 08 Dec 2010 12:21:07 +0000</pubDate>
		<dc:creator>Gianni Ciolli</dc:creator>
				<category><![CDATA[Gianni's PlanetPostgreSQL]]></category>
		<category><![CDATA[chess]]></category>
		<category><![CDATA[European PostgreSQL conference]]></category>
		<category><![CDATA[pgChess]]></category>
		<category><![CDATA[PGDay.eu 2010]]></category>
		<category><![CDATA[PostgreSQL 9.0]]></category>

		<guid isPermaLink="false">http://blog.2ndquadrant.com/?p=28</guid>
		<description><![CDATA[I have been lucky enough to be invited at the marvellous PGDay.eu 2012 conference in Stuttgart, which...]]></description>
			<content:encoded><![CDATA[<p>I have been lucky enough to be invited at <a href="http://2010.pgday.eu">the marvellous <strong>PGDay.eu 2012 conference</strong> in Stuttgart</a>, which ended just yesterday.</p>
<p>The topic of <a href="http://www.postgresql.eu/events/schedule/pgday2010/session/54-play-chess-against-postgresql-and-get-beaten/">the first of my two talks</a> has been a collection of <strong>PostgreSQL</strong> objects that <em>play chess</em>, either between themselves or against a human (<a href="http://twitpic.com/3dg42m">see this nice photo, thanks steerio!</a>).</p>
<p><span id="more-28"></span><br />
My intention was to present the software, tidy up the code a bit (out of respect for the fellow PostgreSQL user :-) and then <strong>release it as free software</strong>, for the usual reasons, if there was any interest during the talk.<br />
That being the case, I have named the program <strong>pgChess</strong> and <a href="http://github.com/gciolli/pgChess">published the code on GitHub</a> so that the interested reader can verify my claim that the code weights just around 20KB.<br />
I split the core of the code in two files: the <strong>pgchess</strong> library, which in turn invokes the <strong>2podg</strong> library, which stands for &#8220;2-player open deterministic games&#8221;: <em>open</em> meaning that each player knows 100% of the status of the  game, and <em>deterministic</em> because the whole game is determined uniquely by the choices that the players make.<br />
Two (counter)examples: most card games are not open; Backgammon is not deterministic.<br />
The strategy is quite simple, and is tailored on PostgreSQL features. As short as I can:</p>
<ul style="position: static; z-index: auto;">
<li>custom types are defined both for the game state, and for a single move</li>
<li>the chessboard is defined as an 8&#215;8 array</li>
<li>a quite large function computes the valid moves available from a given game state</li>
<li>a recursive algorithm populates the tree of possible moves, up to the specified level (to David Fetter: I am really looking forward to writable CTEs for this part!)</li>
<li>a recursive SQL query selects the &#8220;best&#8221; move, according to the desired strategy which is encoded in the query itself</li>
<li>a multi-transactional psql script displays the chessboard, thanks to the Unicode chess characters: ♟,♜,♞,♝,♛ and ♚ for the Black; ♙,♖,♘,♗,♕ and ♔ for the White.</li>
<li>players can type their move like &#8220;b2Pb4&#8243;, or &#8220;g8Nh6&#8243; (the piece letter always uppercase regardless of the colour).</li>
</ul>
<p>The reason for the talk is to show that PostgreSQL is an excellent framework to develop application prototypes quickly (which is not a big secret; but providing one more example reinforces the claim). And speed prototyping is one key summand that I recall from Simon Phipps in <a href="http://www.postgresql.eu/events/schedule/pgday2010/session/41-keynote-back-to-the-future-of-open-source/">his remarkable keynote speech</a>, whose slides I recommend when they will become available.<br />
If you attended the conference, please consider <a href="http://2010.pgday.eu/feedback">giving feedback</a>: who knows, you could win a free entrance for the 2011 conference!<br />
Please note finally that my talk slides will be shortly published on the conference website, and that this article will be updated accordingly.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.2ndquadrant.com/pgchess_code_published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some ideas about low-level resource pooling in PostgreSQL</title>
		<link>http://blog.2ndquadrant.com/some_ideas_about_lowlevel_reso/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=some_ideas_about_lowlevel_reso</link>
		<comments>http://blog.2ndquadrant.com/some_ideas_about_lowlevel_reso/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 20:28:11 +0000</pubDate>
		<dc:creator>Gianni Ciolli</dc:creator>
				<category><![CDATA[Gianni's PlanetPostgreSQL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[low-level]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[resource pooling]]></category>

		<guid isPermaLink="false">http://blog.2ndquadrant.com/?p=18</guid>
		<description><![CDATA[Last week at the CHAR(10) conference we had a workshop on &#8220;Cloud Databases&#8221;. To put it simply:...]]></description>
			<content:encoded><![CDATA[<p>Last week at the <strong>CHAR(10)</strong> conference we had a workshop on &#8220;Cloud Databases&#8221;. To put it simply: what to do when the use case requirements exceed the resources available in the database server.<br />
This was a main topic of the whole conference, and several solutions have been illustrated during the day. A common theme has been that no solution fits all the use cases, and that each solution comes with its cost; hence you have to choose the solution that your use case can afford.</p>
<p><span id="more-18"></span><br />
Another common (albeit implicit) point has been the focus on &#8220;high-level&#8221; solutions, that is: connecting several database servers at a higher level to emulate a single server with larger resources.<br />
An obvious advantage is that you don&#8217;t need to alter the well-scrutinised PostgreSQL code; a drawback is that using multiple database servers with their independent timelines you are losing some useful properties. Two examples: the partial loss of transactional semantics generates conflicts; pre-parsing each query outside the database introduces limitations on the accepted queries.<br />
The discussion was quite interesting, and when Dimitri Fontaine mentioned remote tablespaces I started wondering around a related but distinct idea, namely: whether a lower-level approach to the problem of resource pooling would really be impractical. Before I could elaborate on the details the workshop ended, and I could only sketch the idea to some of the people that were around the whiteboard (among which Gabriele Bartolini, Nic Ferrier, Marko Kreen, Hannu Krosing, Greg Smith) together with the basic questions &#8220;does it look feasible?&#8221; and &#8220;does that resemble something you already know?&#8221;.<br />
A brief sketch: an application stack can be represented in this way</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)</pre>
<p>where the resources used by the database include storage, RAM and CPUs. The purpose is to allow the application to command more resources in order to increase capacity and speed. &#8220;Clever&#8221; applications that manage several databases can be represented as</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
|
+---------&gt; (connection) --&gt; (db server) --&gt; (resources)</pre>
<p>while &#8220;connection pooling&#8221; solutions can be represented as</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
|
+---------&gt; (db server) --&gt; (resources)</pre>
<p>by &#8220;lower-level&#8221; solutions I mean something like</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (resources)
|
+---------&gt; (resources)</pre>
<p>which might resemble something familiar, but it is not what I am proposing here. To explain the difference I can increase the detail and write</p>
<pre>(resources) = (virtual resources) --&gt; (physical resources)</pre>
<p>to represent the fact that at the lowest level you can have a non-trivial mapping between physical objects and virtual ones. For instance, SAN storage or RAID striping can provide larger virtual disks by joining together smaller physical disks. Such cases could be pictured as</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (virt.res.) --&gt; (ph.res.)
|
+--------&gt; (ph.res.)</pre>
<p>My proposal is to pool resources at the <em>database server</em> level, so that we can have a more efficient &#8220;virtualisation&#8221; by using the knowledge of the specific use cases for each resource (CPU, RAM, disk), and at the same time we can avoid may of the difficulties of the transactional paradigm. The picture would be:</p>
<pre>(application) --&gt; (connection) --&gt; (db server) --&gt; (virt.res.) --&gt; (ph.res.)
|
+--------&gt; (virt.res.) --&gt; (ph.res.)</pre>
<p>The advantage is that we don&#8217;t need to manage all the possible use cases for each virtual resource; we just have to manage (and optimise for) the use cases that are actually needed by PostgreSQL. For instance: WAL should still be written in local &#8220;unvirtualised&#8221; storage, the bgwriter will access local and remote resources (RAM and disk), etc.<br />
Some final words about reliability. To operate properly the whole system needs each subsystem; partial failures are not managed, because this architecture is not redundant. It is a distributed system, but not shared. If this architecture could provide cheap and simple scalability via a virtual database server which is functionally equivalent to a physical server with larger resources, then high availability could be obtained in the standard way by setting up two identical virtual servers in a Hot Standby configuration.<br />
Network quality has a large impact on the overall performance; this design might be useful only if you have an array of machines in the same LAN, not only for speed reasons but also because a network failure would actually be a system failure. Even with these restrictions, my opinion is that having this option would be quite useful.<br />
This is still a sketch, to be used as a reference for further discussion. Next possible steps:</p>
<ul style="position: static; z-index: auto;">
<li>to make a detailed list of the resource use cases</li>
<li>to decide which technologies can help best in each use case</li>
<li>to estimate the actual performance/development costs</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.2ndquadrant.com/some_ideas_about_lowlevel_reso/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
