Monday, October 15

Ansible Loves PostgreSQL

Ansible is simply great and PostgreSQL is surely awesome, let’s see how they work amazingly together!

miniAnsibleLovesPostgres

==================== Prime time announcement ! ====================

PGConf Europe 2015 will be on 27th-30th of October in Vienna this year.

I assume you’re possibly interested in configuration management, server orchestration, automated deployment (that’s why you’re reading this blog post, right?) and you like working with PostgreSQL (for sure) on AWS (optionally), then you might want to join my talk “Managing PostgreSQL with Ansible” on 28th Oct, 15-15:50.

Please check the amazing schedule and don’t miss the chance to attend Europe’s biggest PostgreSQL event!

Hope to see you there, yes I like drinking coffee after talks :)

==================== Prime time announcement ! ====================

What is Ansible and how it works?

Ansible’s motto is “simple, agentless and powerful open source IT automation” by quoting from Ansible docs.

As it can be seen from the figure below, Ansible’s homepage states that the main usage areas of Ansible as: provisioning, configuration management, app deployment, continuous delivery, security and compliance, orchestration. The overview menu also shows on which platforms we can integrate Ansible i.e. AWS, Docker, OpenStack, Red Hat, Windows.

Ansible Overview Menu

Let’s check main use-cases of Ansible to understand how it works and how helpful it is for IT environments.

Provisioning

Ansible is your loyal friend when you want to automate everything in your system. It’s agentless and you can simply manage your things (i.e. servers, load balancers, switches, firewalls) via SSH. Whether your systems running on bare-metal or cloud servers Ansible will be there and helping you to provision your instances. It’s idempotent characteristics ensure that you will always be in the state you desired (and expected) to be.

Configuration Management

One of the hardest things is not to repeat yourself in repetitive operational tasks and here Ansible comes to mind again as a savior. In the good old days when times were bad, sysadmins were writing many many scripts and connecting to many many servers to apply them and it was obviously not the best thing in their lives. As we all know manual tasks are error prone and they lead to heterogenous environment instead of a homogenous and more manageable one and definitely making our life more stressful.

With Ansible you can write simple playbooks (with the help of very informative documentation and the support of its huge community) and once you write your tasks you can call a wide range of modules (i.e. AWS, Nagios, PostgreSQL, SSH, APT, File modules). As a result, you can focus on more creative activities than managing configurations manually.

App Deployment

Having the artifacts ready, it is super easy to deploy them to a number of servers. Since Ansible is communicating over SSH, there is no need to pull from a repository on each server or hassle with ancient methods like copying files over FTP. Ansible can synchronize artifacts and ensure that only new or updated files are transfered and obsolete files are removed. This also speeds up file transfers and saves a lot of bandwidth.

Besides transferring files, Ansible also helps for making servers ready for production usage. Prior to transfer, it can pause monitoring, remove servers from load balancers, and stop services. After the deployment, it can start services, add servers to load balancers, and resume monitoring.

All of this do not have to happen at once for all servers. Ansible can work on a subset of servers at a time to provide zero-downtime deployments. For example at a single time it can deploy 5 servers at once, and then it can deploy to next 5 servers when they are finished.

After implementing this scenario, it can be executed on anywhere. Developers or members of the QA team can make deployments on their own machines for testing purposes. Also, to rollback a deployment for any reason, all Ansible needs is the location of the last-known working artifacts. It can then easily redeploy them on production servers to put the system back in a stable state.

Continuous Delivery

Continuous delivery means adopting a fast and simple approach for releases. To achieve that goal, it is crucial to use the best tools that enable frequent releases without downtimes and require as little human intervention as possible. Since we have learnt about application deployment capabilities of Ansible above, it is quite easy to perform deployments with zero-downtime. The other requirement for continuous delivery is less manual processes and that means automation. Ansible can automate any task from provisioning servers to configuring services to become production ready. After creating and testing scenarios in Ansible, it becomes trivial to put them in front of a continuous integration system and let Ansible to do its job.

Security and Compliance

Security is always considered to be the most important thing but keeping systems secure are one of the hardest things to achieve. You need to be sure about security of your data as well as security of your customer’s data. For being sure of security of your systems,  defining security is not enough, you need to be able to apply that security and constantly monitor your systems to ensure they remain compliant with that security.

Ansible is easy-to-use whether it’s about setting up firewall rules, locking down users and groups, or applying custom security policies. It’s safe by its nature since you can repeatedly apply the same configuration, and it will only make the necessary changes to put the system back into compliance.

Orchestration

Ansible ensures that all given tasks are in the proper order and establishes an harmony between all of the resources it manages. Orchestrating complex multi-tier deployments are easier with Ansible’s configuration management and deployment capabilities. For example, considering the deployment of a software stack, concerns like making sure that all database servers are ready before activating application servers or having the network configured before adding servers to the load balancer are not complicated problems anymore.

Ansible also helps orchestration of other orchestration tools such as Amazon’s CloudFormation, OpenStack’s Heat, Docker’s Swarm, etc. This way, instead of learning different platforms, languages, and rules; users can only concentrate on Ansible’s YAML syntax and powerful modules.

What is an Ansible Module?

Modules or module libraries provide Ansible means to control or manage resources on local or remote servers. They perform a variety of functions. For example a module may be responsible for rebooting a machine or it can simply display a message on the screen.

Ansible allows users to write their own modules and also provides out-of-the-box core or extras modules.

Quick Tip: What is the meaning of core modules or extras modules in Ansible?

Core modules:

These are modules that the core Ansible team maintains and will always ship with Ansible itself. They will also receive slightly higher priority for all requests than those in the “extras” modules.

The source of these modules is hosted on GitHub in the ansible-modules-core repository.

Extras modules:

These modules are currently shipped with Ansible, but might be shipped separately in the future. They are also mostly maintained by the community. Non-core modules are still fully usable, but may receive slightly lower response rates for issues and pull requests.

Popular “extras” modules may be promoted to core modules over time.

The source for these modules is hosted on GitHub in the ansible-modules-extras repository.

What about Ansible playbooks?

Ansible lets us to organize our work in various ways. In its most direct form, we can work with Ansible modules using the “ansible” command line tool and the inventory file.

Inventory

One of the most important concepts is the inventory. We need an inventory file to let Ansible to know which servers it needs to connect using SSH, what connection information it requires, and optionally which variables are associated with those servers.

The inventory file is in an INI-like format. In the inventory file, we can specify more than one hosts and group them under more than one host groups.

Our example inventory file hosts.ini is like the following:

<code>[dbservers]
db.example.com
</code>

Here we have a single host called “db.example.com” in a host group called “dbservers”. In the inventory file, we may also include custom SSH ports, SSH usernames, SSH keys, proxy information, variables, etc.

Since we have an inventory file ready, in order to see uptimes of our database servers, we may invoke Ansible’s “command” module and execute the “uptime” command on those servers:

<code>ansible dbservers -i hosts.ini -m command -a "uptime"
</code>

Here we instructed Ansible to read hosts from the hosts.ini file, connect them using SSH, execute the “uptime” command on each of them, and then print their output to the screen. This type of module execution is called an ad-hoc command.

Output of the command will be like:

<code><span >gulcin</span><span >@</span><span>apatheticmagpie</span> <span >~/blog/ansible-loves-postgresql</span> <span >#</span> ansible dbservers -i hosts.ini -m command -a "uptime"
<span >db.example.com | success | rc=0 &gt;&gt;
21:16:24 up 93 days,  9:174 users,  load average: 0.08, 0.03, 0.05</span>
</code>

However, if our solution contains more than a single step, it becomes difficult to manage them only by using ad-hoc commands.

Here comes Ansible playbooks. It allows us to organize our solution in a playbook file by integrating all of steps by means of tasks, variables, roles, templates, handlers, and an inventory.

Let’s take a brief look at some of these terms to understand how they can help us.

Tasks

Another important concept is tasks. Each Ansible task contains a name, a module to be called, module parameters, and optionally pre/post-conditions. They allow us to call Ansible modules and pass information to consecutive tasks.

Variables

There is also variables. They are very useful for reusing information we provided or gathered. We can either define them in the inventory, in external YAML files or in playbooks.

Playbook

Ansible playbooks are written using the YAML syntax. It may contain more than one plays. Each play contains name of host groups to connect to and tasks it needs to perform. It may also contain variables/roles/handlers, if defined.

Now we can look at a very simple playbook to see how it can be structured:

<code>---
 
- hosts: dbservers
  gather_facts: no
 
  vars:
    who: World
 
  tasks:
  - name: say hello
    debug: msg="Hello {{ who }}"
 
  - name: retrieve the uptime
    command: uptime</code>

In this very simple playbook, we told Ansible that it should operate on servers defined in the “dbservers” host group. We created a variable called “who” and then we defined our tasks. Notice that in the first task where we print out a debug message, we used the “who” variable and caused Ansible to print “Hello World” to the screen. In the second task, we told Ansible to connect to each host and then execute the “uptime” command there.

Ansible PostgreSQL Modules

Ansible provides a number of modules for PostgreSQL. Some of them are located under core modules while others can be found under extras modules.

All PostgreSQL modules require the Python psycopg2 package to be installed on the same machine with PostgreSQL server. Psycopg2 is a PostgreSQL database adapter in Python programming language.

On Debian/Ubuntu systems, the psycopg2 package can be installed using the following command:

<code>apt-get install python-psycopg2
</code>

Now we will be examining these modules in detail. For example purposes, we will be working on a PostgreSQL server at host db.example.com on port 5432 with postgres user and an empty password.

postgresql_db

This core module creates or removes a given PostgreSQL database. In Ansible terminology, it ensures that a given PostgreSQL database is present or absent.

The most important option is the required parameter “name”. It represents the name of the database in a PostgreSQL server. Another significant parameter is “state”. It requires one of two values: present or absent. This allows us to create or remove a database that is identified by the value given in the name parameter.

Some workflows may also require specification of connection parameters such as login_host, port, login_user, and login_password.

Let’s create a database called “module_test” on our PostgreSQL server by adding below lines to our playbook file:

<code>- postgresql_db: name=module_test
                 state=present
                 login_host=db.example.com
                 port=5432
                 login_user=postgres
</code>

Here, we connected to our test database server at db.example.com with the user; postgres. However, it doesn’t have to be the postgres user as the username can be anything.

Removing the database is as easy as creating it:

<code>- postgresql_db: name=module_test
                 state=absent
                 login_host=db.example.com
                 port=5432
                 login_user=postgres
</code>

Note the “absent” value in the “state” parameter.

postgresql_ext

PostgreSQL is known to have very useful and powerful extensions. For example, a recent extension is tsm_system_rows which helps fetching the exact number of rows in tablesampling. (For further information you can check my previous post about tablesampling methods.)

This extras module adds or removes PostgreSQL extensions from a database. It requires two mandatory parameters: db and name. The db parameter refers to the database name and the name parameter refers to the extension name. We also have the state parameter that needs present or absent values, and same connection parameters as in postgresql_db module.

Let’s start by creating the extension we talked about:

<code>- postgresql_ext: db=module_test
                  name=tsm_system_rows
                  state=present
                  login_host=db.example.com
                  port=5432
                  login_user=postgres
</code>

postgresql_user

This core module allows adding or removing users and roles from a PostgreSQL database.

It is a very powerful module because while ensuring a user is present on the database, it also allows modification of privileges or roles at the same time.

Let’s start by looking at the parameters. The single mandatory parameter here is “name”, which refers to a user or a role name. Also, as in most Ansible modules, the “state” parameter is important. It can have one of present or absent values and its default value is present.

In addition to connection parameters as in previous modules, some of other important optional parameters are:

  • db: Name of the database where permissions will be granted
  • password: Password of the user
  • priv: Privileges in “priv1/priv2” or table privileges in “table:priv1,priv2,…” format
  • role_attr_flags: Role attributes. Possible values are:
    • [NO]SUPERUSER
    • [NO]CREATEROLE
    • [NO]CREATEUSER
    • [NO]CREATEDB
    • [NO]INHERIT
    • [NO]LOGIN
    • [NO]REPLICATION

In order to create a new user called ada with password lovelace and a connection privilege to the database module_test, we may add the following to our playbook:

<code>- postgresql_user: db=module_test
                   name=ada
                   password=lovelace
                   state=present
                   priv=CONNECT
                   login_host=db.example.com
                   port=5432
                   login_user=postgres
</code>

Now that we have the user ready, we can assign her some roles. To allow “ada” to login and create databases:

<code>- postgresql_user: name=ada
                   role_attr_flags=LOGIN,CREATEDB
                   login_host=db.example.com
                   port=5432
                   login_user=postgres
</code>

We can also grant global or table based privileges such as “INSERT”, “UPDATE”, “SELECT”, and “DELETE” using the priv parameter. One important point to consider is that a user cannot be removed until all granted privileges are revoked first.

postgresql_privs

This core module grants or revokes privileges on PostgreSQL database objects. Supported objects are: table, sequence, function, database, schema, language, tablespace, and group.

Required parameters are “database”; name of the database to grant/revoke privileges on, and “roles”; a comma-separated list of role names.

Most important optional parameters are:

  • type: Type of the object to set privileges on. Can be one of: table, sequence, function, database, schema, language, tablespace, group. Default value is table.
  • objs: Database objects to set privileges on. Can have multiple values. In that case, objects are separated using a comma.
  • privs: Comma separated list of privileges to grant or revoke. Possible values include: ALL, SELECT, UPDATE, INSERT.

Let’s see how this works by granting all privileges on the “public” schema to “ada”:

<code>- postgresql_privs: db=module_test
                    privs=ALL
                    type=schema
                    objs=public
                    role=ada
                    login_host=db.example.com
                    port=5432
                    login_user=postgres
</code>

postgresql_lang

One of the very powerful features of PostgreSQL is its support for virtually any language to be used as a procedural language. This extras module adds, removes or changes procedural languages with a PostgreSQL database.

The single mandatory parameter is “lang”; name of the procedural language to add or remove. Other important options are “db”; name of the database where the language is added to or removed from, and “trust”; option to make the language trusted or untrusted for the selected database.

Let’s enable the PL/Python language for our database:

<code>- postgresql_lang: db=module_test
                   lang=plpython2u
                   state=present
                   login_host=db.example.com
                   port=5432
                   login_user=postgres
</code>

Putting it All Together

Now that we know how an Ansible playbook is structured and which PostgreSQL modules are available for us to use, we may now combine our knowledge in an Ansible playbook.

The final form of our playbook main.yml is like the following:

<code>---
 
- hosts: dbservers
  sudo: yes
  sudo_user: postgres
  gather_facts: yes
 
  vars:
    dbname: module_test
    dbuser: postgres
 
  tasks:
  - name: ensure the database is present
    postgresql_db: &gt;
      state=present
      db={{ dbname }}
      login_user={{ dbuser }}
 
  - name: ensure the tsm_system_rows extension is present
    postgresql_ext: &gt;
      name=tsm_system_rows
      state=present
      db={{ dbname }}
      login_user={{ dbuser }}
 
  - name: ensure the user has access to database
    postgresql_user: &gt;
      name=ada
      password=lovelace
      state=present
      priv=CONNECT
      db={{ dbname }}
      login_user={{ dbuser }}
 
  - name: ensure the user has necessary privileges
    postgresql_user: &gt;
      name=ada
      role_attr_flags=LOGIN,CREATEDB
      login_user={{ dbuser }}
 
  - name: ensure the user has schema privileges
    postgresql_privs: &gt;
      privs=ALL
      type=schema
      objs=public
      role=ada
      db={{ dbname }}
      login_user={{ dbuser }}
 
  - name: ensure the postgresql-plpython-9.4 package is installed
    apt: name=postgresql-plpython-9.4 state=latest
    sudo_user: root
 
  - name: ensure the PL/Python language is available
    postgresql_lang: &gt;
      lang=plpython2u
      state=present
      db={{ dbname }}
      login_user={{ dbuser }}
</code>

Now we can run our playbook using the “ansible-playbook” command:

<code><span >gulcin</span><span >@</span><span >apatheticmagpie</span> <span >~/blog/ansible-loves-postgresql</span> <span >#</span> ansible-playbook -i hosts.ini main.yml
 
PLAY [dbservers] **************************************************************
 
GATHERING FACTS ***************************************************************
<span >ok: [db.example.com]</span>
 
TASK: [ensure the database is present] ****************************************
<span >changed: [db.example.com]</span>
 
TASK: [ensure the tsm_system_rows extension is present] ***********************
<span>changed: [db.example.com]</span>
 
TASK: [ensure the user has access to database] ********************************
<span >changed: [db.example.com]</span>
 
TASK: [ensure the user has necessary privileges] ******************************
<span >changed: [db.example.com]</span>
 
TASK: [ensure the user has schema privileges] *********************************
<span >changed: [db.example.com]</span>
 
TASK: [ensure the postgresql-plpython-9.4 package is installed] ***************
<span >changed: [db.example.com]</span>
 
TASK: [ensure the PL/Python language is available] ****************************
<span >changed: [db.example.com]</span>
 
PLAY RECAP ********************************************************************
<span >db.example.com</span>             : <span >ok=8</span>    <span >changed=7</span>    unreachable=0    failed=0
</code>

You can find the inventory and the playbook file at my GitHub repository created for this blog post. There is also another playbook called “remove.yml” that undoes everything we did in the main playbook.

For more information about Ansible:

  • Check out their well-written docs.
  • Watch Ansible quick start video which is a really helpful tutorial.
  • Follow their webinar schedule, there are some cool upcoming webinars on the list.

7 Comments

  • You can make psql calls by using command module, unfortunately there is no psql module in ansible yet. If you would like to run an arbitrary script you can, Craig’s answer at the link you pasted explains it well. Ansible’s nature is idempotent and it’s hard to ensure idempotecency of scripts, so if you would like to write your own script you should be the one who ensures the expected state of your db after running your script, like when you run an insert command more than once you should not insert the same records, if the records are exists your script should not do anything or if there are the rows it should update etc.

  • What’s a good playbook to use for CentOS and PostgreSQL v9.6 and v10? I’ve found several on Galaxy…

    ANXS.postgresql (Last Commit 1 week, 1 day ago)
    geerlingguy.postgresql (Last Commit 8 months, 2 weeks ago)
    openmicroscopy.postgresql (Last Commit 1 month, 3 weeks ago)
    zzet.postgresql (Last Commit 11 months, 3 weeks ago)

    Some of them look unmaintained, others don’t support CentOS or somewhat simplistic… but is ANXS.postgresql OK to use? Maybe it’s overly complicated, but it’s hard to tell unless I try all of them.

Leave a Reply

Your email address will not be published. Required fields are marked *