hen working with databases, one of the most common task is to load data from one or more CSV files.
Several tools are available to achieve this task. Some are executed via command line, like COPY (using psql), some are more complex, like ETL systems.
We will start today with Talend but, in the next weeks, we will proceed with Kettle (Pentaho Data Integration).
[*MADlib*](http://madlib.net) is an open-source library for scalable in-database analytics which targets the PostgreSQL and the Greenplum databases.
MADlib version 0.2beta needs to be installed properly to follow this article, so we encourage you to read the [official documentation](https://github.com/madlib/madlib/wiki/Installation-Guide-%28v0.2beta%29) to install it in a Greenplum database.
I'm going to show you how to perform Association Rules using MADlib and Greenplum.
Because of PostgreSQL Conference Europe I had to reschedule the German trainings.The next upcoming training will be the 4-Days Administration, Performance, Streaming Replication Training.There are still a few seats left.Schedule: 2011 October 7 - 10Location: BielefeldCome to the nice East-Westphalia town and join our training.Register now!Detailed informations in German you will find in our flyer:http://www.2ndquadrant.de/img/10Feel free to contact us and ask for more details:
Greenplum Community Edition is available in different flavours, including a VMWare virtual machine based on CentOS with all the fancy tools and the documentation already installed. This allows you to easily try and evaluate this powerful platform for data warehousing.
[Greg Smith from our 2ndQuadrant team, recently explained how to install this image on Linux](http://www.greenplum.com/community/forums/showthread.php?486-Getting-Started-with-VMWare-on-Linux).
This article will guide you through the installation of this image - specifically prepared for VMWare - on VirtualBox, by giving those users the opportunity to easily test and evaluate Greenplum on VirtualBox.
Picking back up this week's theme of
where you can publicize your PostgreSQL related project at, you're
probably reading this blog entry because it appeared on the Planet PostgreSQL blog aggregator. There
are "Planet" feeds around many open-source projects. The Debian and GNOME
ones spawned off the Planet software, which now powers a ton of other
blogs such as the the well regarded Planet
Python. Occasionally you'll find general open-source database news
posted both here and on Planet MySQL. And I
used to read Planet CentOS back when I used
to care if they'd ever release CentOS 6.
Planet PostgreSQL has been around since
about seven years ago, when Devrim Gündüz first made the service
available on one of his servers. Like many good open-source
projects, it has some history
the Community Edition
install it and start testing it.
One of the coolest features that Greenplum offers to Data warehousing
and Business Intelligence operators as far as ETL
is concerned, is the combination of read only external tables
with gpfdist, Greenplum's parallel file distribution server.
The typical use case for this solution is parallel data loading of text files
(coming from etherogeneous sources - databases or applications) into a Greenplum
data warehouse. For those of you who want to know more about Greenplum, I suggest that
you visit the
The Python and the Elephant
".This 4-hour workshop
will take place on Thursday June 23 and will cover the two main
techniques for writing applications in Python for PostgreSQL: standard
client applications using PsycoPG or internal extensions using the
PL/Python language for stored procedures.
During EuroPython 2011, the major annual event for Python developers and
users in Europe, 2ndQuadrant will deliver a special hands-on training
session entitled "
I doubt many people can tell you exactly when the first time they read a map was. Mine was memorable though. Circa 3rd grade, I went through the usual battery of standardized tests for the first time, which included map reading. I did pretty bad, which was odd because it was the only section I bombed like that. Concerned that perhaps I had some sort of learning problem related to spatial data or visualization, a guidance counselor reviewing my scores quizzed me about that section and what I thought of it. Told her I thought it was pretty neat, and that I was looking forwarding to learning about these "maps" one day. Turns out, due to a school change and differences in class order between schools, I had never been shown one before the exam. For
By always having Peter E. in background I am active in SQL Standard (ISO/IEC 9075) committee since 2008.The membership expired after I switched companies.I got the pleasant news today morning that the German office of my new company extended the membership.This means - I will be PostgreSQL agent in SQL Standard committee again for the next 12 months.And this time - 100% PostgreSQL. My company has no interests to force me to represent others then PostgreSQL.To be exact:DIN is the institute for standardisation in Germany (similar to ANSI in USA or BSI in UK).My company (the office in Germany) got member of DIN for me. Thank you for supporting this.Means I will represent PostgreSQL in DIN and officially I will represent Germany on international level. On international level (ISO) every
I already did the long conference entry here, so just a quick update: slides from PGEast are posted and next week I'll be at the increasingly misnamed MySQL Conference in Santa Clara, California.One thing I'm known for now is ranting about cheap Solid State Drives and how they suck for database use. The Reliable Writes wiki page collects up most of the background here. The situation the last few years has been that every inexpensive drive on the market does not have a safe write cache for database use. Every customer of mine who has purchased one of Intel's SSD drives for example, either the X25-M or the not-enterprise-at-all X25-E, has suffered at least one massive data corruption loss.In order to make a flash drive safe, you need to have a battery-backup on the