Wednesday, January 16

Tag: PostgreSQL

Using PL/Java in Greenplum

Greenplum
In this article we are going to show you how to write PL/Java functions in Greenplum. I assume that you have a working Greenplum (or Greenplum Community Edition) at your disposal. In this example we will use version **4.0.4**, installed in /usr/local/greenplum-db-4.0.4.0 (which is the default location). (more…)

Association rules with MADlib in Greenplum

Data Mining, Greenplum
[*MADlib*](http://madlib.net) is an open-source library for scalable in-database analytics which targets the PostgreSQL and the Greenplum databases. MADlib version 0.2beta needs to be installed properly to follow this article, so we encourage you to read the [official documentation](https://github.com/madlib/madlib/wiki/Installation-Guide-%28v0.2beta%29) to install it in a Greenplum database. I'm going to show you how to perform Association Rules using MADlib and Greenplum. (more…)

EuroPython 2011: “The Python and the Elephant”

International News, United Kingdom News
During EuroPython 2011, the major annual event for Python developers and users in Europe, 2ndQuadrant will deliver a special hands-on training session entitled "The Python and the Elephant".This 4-hour workshop will take place on Thursday June 23 and will cover the two main techniques for writing applications in Python for PostgreSQL: standard client applications using PsycoPG or internal extensions using the PL/Python language for stored procedures. (more…)

Data generation and hardware quality

Greg's PlanetPostgreSQL, PostgreSQL, United States News
One of the challenges when dealing with a new database design is that you don't know things like how big the tables will end up being until they're actually populated with a fair amount of data.  But if the design has to factor in the eventual scalability concerns, you can't deploy it to obtain that data until the estimation is done.  One way around this is to aggressively prototype things.  Use staging hardware for this purpose that new applications can live on temporarily while sorting details like this out.  You can just factor in that you'll need to move the app and possibly redesign it after a few months, when you have a better idea what data is going to show up in it.The other way to get around this chicken/egg problem is to write a data generator.  Construct (more…)

Tuning Linux for low PostgreSQL latency

Greg's PlanetPostgreSQL, United States News
One of the ugly parts of Linux with PostgreSQL is that the OS will happily cache up to around 5% of memory before getting aggressive about writing it out.  I've just updated a long list of pgbench runs showing how badly that can turn out, even on a server with a modest 16GB of RAM.  Note that I am intentionally trying to introduce the bad situation here, so this is not typical performance.  The workload that pgbench generates is not representative of any real-world workload, it's as write-intensive as it's possible to be.Check out test set 5, which is running a stock development version PostgreSQL 9.1.  Some the pauses where the database is unresponsive during checkpoints, as shown by the max_latency figure there (which is in milliseconds), regularly exceed 40 seconds.& (more…)

Reducing the postgresql.conf, parameter at a time

Greg's PlanetPostgreSQL, PostgreSQL, United States News
One of the more useful bits of PostgreSQL documentation I ever worked on is Tuning Your PostgreSQL Server.  When that was written in the summer of 2008, a few months after the release of PostgreSQL 8.3, it was hard to find any similar guide that was both (relatively) concise and current. Since then, myself and many other PostgreSQL contributors have helped keep that document up to date as changes to PostgreSQL were made. The interesting and helpful trend during that period is that parameters keep disappearing from the set of ones you need to worry about. In PostgreSQL 8.2, there was a long list of parameters you likely needed to adjust for good performance on a PostgreSQL server: shared_buffers, effective_cache_size, checkpoint_segments, autovacuum, max_fsm_pages, (more…)

PostgreSQL CommitFest #3 for 9.1 needs you!

Greg's PlanetPostgreSQL, United States News
PostgreSQL development is now done with periodic pauses to review and commit patches that have been submitted, called Commit Fests.  The patches themselves are stored on a custom web app, and documentation about the process is on the PostgreSQL wiki.  I'm returning to the role of CommitFest manager for this one, which means I get to harass people all month trying to get patch review finished.  There are currently 26 patches that have no reviewer assigned to them.  If you were ever interested in making a contribution to the PostgreSQL code base, getting involved in the review process is a great way to learn the attributes of patches that are committed, and to see how others fail to gain traction.I submitted three patches myself this time, and one of them got comitted in (more…)

Some ideas about low-level resource pooling in PostgreSQL

Gianni's PlanetPostgreSQL, PostgreSQL
Last week at the CHAR(10) conference we had a workshop on "Cloud Databases". To put it simply: what to do when the use case requirements exceed the resources available in the database server. This was a main topic of the whole conference, and several solutions have been illustrated during the day. A common theme has been that no solution fits all the use cases, and that each solution comes with its cost; hence you have to choose the solution that your use case can afford. (more…)

How to install multiple PostgreSQL servers on RedHat Linux

Gabriele's PlanetPostgreSQL, PostgreSQL
If you have a Linux server of the RedHat family (inclusing CentOS and Fedora), you might envy the way Debian/Ubuntu distributions handle PostgreSQL clusters management. Although it is not easy to install different PostgreSQL versions on the same RedHat Linux server using RPMs, it is much simpler to install several instances of PostgreSQL (servers) and, at the same time, take advantage of the services infrastructure. (more…)