As usual Greenplum Community Edition 4.2.1 is available as a VMWare virtual machine based on CentOS with all the fancy tools and the documentation already installed. This allows you to easily try it.
This article is an updated guide about running the provided image on VirtualBox.
**External web tables** are one of the most useful features when you you have to load
data into a Greenplum database from different sources.
JDBC is the driver used to access a database with Java. Greenplum has a full working JDBC implementation.
In this short article we'll see how to use it.
With an announce on the forum, Greenplum staff has spoke out about the new version of their Database Management System.
I can't resist to blog about some of its new features.
In the [previous article](https://blog.2ndquadrant.com/en/2011/12/a-greenplum-41-handbook.html) we have seen how to install Greenplum on multiple nodes.
After installation steps, we must init the entire system.
Let's see how.
One of the main advantages using Greenplum is that it gains power when it uses multiple nodes.
Horizontal scalability is a main feature of Greenplum.
Here is a compact handbook to install a multi-node Data Warehouse environment with Greenplum.
Greenplum does not officially support Ubuntu Server 11.10 as underlying operating system.
However, I needed to install it on the most recent Ubuntu server just to perform some tests and evaluate it.
Through this article, we are going to complete the MapReduce job started in the [previous article](https://blog.2ndquadrant.com/en/2011/10/mapreduce-in-greenplum.html).
We have a remote datasource, served by a gpfdist server. We need to import the data in a Greenplum database, while performing some ETL manipulation during the import.
It is possible to accomplish this goal with a simple transformation in a few steps using Kettle.