PostgreSQL uses the tagline "the world's most advanced open source relational database." For PostgreSQL, part of being "advanced" means supporting multiple server-side procedural languages, both built-in and provided by third parties. Luckily for us here at the blog, one of the built-in languages is Python. Unluckily, it's not completely obvious how to get started using Python inside PostgreSQL. This post will provide a short walkthrough demonstrating how to do that. It's written for macOS, but many of the steps are generic.

Server Programming Basics

Relational database servers often support user-defined functions and/or procedures. In the best implementations, they can be used whenever a built-in function or procedure can be used. For example, they might be used in a SQL SELECT statement to compute values for a column in a result set, or they might be executed automatically as a trigger when data in a table changes, or they might even be called directly by a client to perform some complex action.

The database servers that support user-defined functions and procedures tend to offer a custom language to write them in. These languages tend to be somewhat similar to SQL. While that can ease writing queries and updates, it can also lead to baroque code when the logic gets more complicated.

In PostgreSQL, that SQL-derived language is called PL/pgSQL, in Oracle database it's called PL/SQL, and in Microsoft SQL server, it's called T-SQL. MySQL doesn't seem to give a name to its dialect, simply referring to SQL Compound-Statements. All of these languages are very different from one another (although PL/pgSQL and PL/SQL share some similarities).

PostgreSQL goes beyond that, using its extension mechanism to support multiple server-side procedural languages. These include:

Installing PostgreSQL

The first step to working with any of those procedural languages is to install PostgreSQL. Because all of these languages are extensions, and thus optional, we need to also compile the extensions we wish to use. Here we'll be using MacPorts to install PostgreSQL 10 with support for Python:

$ sudo port install postgresql10 +python
--->  Computing dependencies for postgresql10
--->  Fetching archive for postgresql10
--->  Some of the ports you installed have notes:
  postgresql10 has the following notes:
    To use the postgresql server, install the postgresql10-server port

We want to use the server, not just the client, so we follow the instructions to install the server:

$ sudo port install postgresql10-server
--->  Computing dependencies for postgresql10-server
--->  Fetching archive for postgresql10-server
...

--->  Some of the ports you installed have notes:
  postgresql10-server has the following notes:
    To create a database instance, after install do
      sudo mkdir -p /opt/local/var/db/postgresql10/defaultdb
      sudo chown postgres:postgres /opt/local/var/db/postgresql10/defaultdb
      sudo su postgres -c 'cd /opt/local/var/db/postgresql10 && /opt/local/lib/postgresql10/bin/initdb -D /opt/local/var/db/postgresql10/defaultdb'

We are prompted to create the initial database, so we follow the instructions there too:

$ sudo mkdir -p /opt/local/var/db/postgresql10/defaultdb
$ sudo chown postgres:postgres /opt/local/var/db/postgresql10/defaultdb
$ sudo su postgres -c 'cd /opt/local/var/db/postgresql10 && /opt/local/lib/postgresql10/bin/initdb -D /opt/local/var/db/postgresql10/defaultdb'
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
...

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

     /opt/local/lib/postgresql10/bin/pg_ctl -D /opt/local/var/db/postgresql10/defaultdb -l logfile start

The instructions at the end for starting the database server will spin it up in the background and write logging information to the file logfile. For the purposes of debugging and understanding, I prefer to run the server in the foreground, like so:

$ sudo -u postgres /opt/local/lib/postgresql10/bin/postgres -D /opt/local/var/db/postgresql10/defaultdb/
2018-09-21 07:48:14.363 CDT [18023] LOG:  listening on IPv6 address "::1", port 5432
2018-09-21 07:48:14.364 CDT [18023] LOG:  listening on IPv6 address "fe80::1%lo0", port 5432
2018-09-21 07:48:14.364 CDT [18023] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 07:48:14.364 CDT [18023] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-09-21 07:48:14.381 CDT [18024] LOG:  database system was shut down at 2018-09-21 07:46:28 CDT
2018-09-21 07:48:14.384 CDT [18023] LOG:  database system is ready to accept connections
...

We now leave that program running in its own terminal window where we can watch its output as we proceed. Everything else from now on will be done in a new terminal window.

Creating Users

Now that the server is running, we're ready to connect a client and start experimenting. We'll be using the stock psql interactive terminal that comes with PostgreSQL. MacPorts installs that as /opt/local/bin/psql10. (pgcli is an excellent alternative shell.)

$ psql10
psql10: FATAL:  role "jmadden" does not exist

When we ran initdb, we were warned that trust authentication would be used for local connections. What this means is that our local macOS user name would be used to connect to the server. My local user name is 'jmadden', and there is no matching user on the server (users and roles in PostgreSQL are essentially interchangeable), so we need to create one. PostgreSQL ships a createuser command for this purpose, but it's more instructive to look under the hood to see what this does by actually issuing the commands ourself.

We do this by connecting as the only user that's pre-existing and can login, the postgres super-user, and creating the role:

$ sudo -u postgres psql10
psql10 (10.5)
Type "help" for help.

postgres=# create role jmadden;
CREATE ROLE

We're not done yet, we need to grant login rights to that role:

postgres=# \du
                                List of roles
Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
jmadden   | Cannot login                                               | {}
postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# alter role jmadden with createdb login createrole;
ALTER ROLE

postgres=# \du
                                List of roles
Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
jmadden   | Create role, Create DB                                     | {}
postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

In psql, \du is a shortcut command for "display users". In PostgreSQL, almost all information about database objects is kept in system tables, where it can be read and manipulated with SQL. In this case, the table we're updating is the pg_roles table:

postgres=# select * from pg_roles;
      rolname        | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil | rolbypassrls | rolconfig |  oid
----------------------+----------+------------+---------------+-------------+-------------+----------------+--------------+-------------+---------------+--------------+-----------+-------
pg_signal_backend    | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  4200
postgres             | t        | t          | t             | t           | t           | t              |           -1 | ********    |               | t            |           |    10
pg_read_all_stats    | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3375
pg_monitor           | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3373
jmadden              | f        | t          | t             | t           | t           | f              |           -1 | ********    |               | f            |           | 16384
pg_read_all_settings | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3374
pg_stat_scan_tables  | f        | t          | f             | f           | f           | f              |           -1 | ********    |               | f            |           |  3377
(7 rows)

Now that we can login, surely we can connect:

$ psql10
psql10: FATAL:  database "jmadden" does not exist

Nope! By default, PostgreSQL will connect our session to a database matching our user name. That database doesn't exist yet. It's a good idea to create one to hold our experiments in anyway, as opposed to using one of the existing databases (to which we probably don't have access rights anyway). We'll use the createdb command to create this database and then try to login again:

$ /opt/local/lib/postgresql10/bin/createdb jmadden
$ psql10
psql10 (10.5)
Type "help" for help.

jmadden=> \l
                               List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 jmadden   | jmadden  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

Tip

Notice that the psql prompt is different for regular users and the superuser; regular users have => in the prompt, while the super user has =#.

Creating Languages

Fantastic! Now we're logged in to our own database. Let's create the first example Python function:

jmadden=> CREATE FUNCTION pymax (a integer, b integer)
          RETURNS integer
          AS $$
             if a > b:
                 return a
             return b
          $$ LANGUAGE plpythonu;
ERROR:  language "plpythonu" does not exist
HINT:  Use CREATE LANGUAGE to load the language into the database.

Well that didn't work. If we back up to the introduction to PL/Python, it says we need to create the extension first (the "PL" means "procedural language"; we'll see what the "u" means in a little bit):

jmadden=> create extension plpythonu;
ERROR:  permission denied to create extension "plpythonu"
HINT:  Must be superuser to create this extension.

Hmm, ok, I suppose that makes sense. Lets use our superuser login to create the extension:

$ sudo -u postgres psql10
psql10 (10.5)
Type "help" for help.

postgres=# create extension plpythonu;
CREATE EXTENSION

Ok, extension created. Now lets go back to our own database and create that example function:

jmadden=> CREATE FUNCTION pymax (a integer, b integer)
          RETURNS integer
          AS $$
             if a > b:
                 return a
             return b
          $$ LANGUAGE plpythonu;
ERROR:  language "plpythonu" does not exist
HINT:  Use CREATE LANGUAGE to load the language into the database.

Same error! Argh!

After examining the documentation a bit, it turns out that extensions are local to particular databases. So we need to use the 'jmadden' database as the superuser 'postgres' to create the extension in that database:

$ sudo -u postgres psql10 jmadden
psql10 (10.5)
Type "help" for help.

jmadden=# create extension plpythonu;
CREATE EXTENSION

Untrusted Languages

Surely now we can create that simple example function:

jmadden=> CREATE FUNCTION pymax (a integer, b integer)
          RETURNS integer
          AS $$
             if a > b:
                 return a
             return b
          $$ LANGUAGE plpythonu;
ERROR:  permission denied for language plpythonu

At least we got a different error this time.

It turns out that the "u" in "plpythonu" means that the language is "untrusted." I'll quote the docs for what that means:

PL/Python is only available as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it and is therefore named plpythonu. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such as plpythonu.

The safest way to work with untrusted languages is to only temporarily escalate your privileges by using a superuser role to create "safe" functions, as outlined in this StackExchange post. But that's no fun. Since we're only experimenting on our own machine, and we don't want to have superuser priviliges if we can avoid it (to limit the risks of accidentally damaging our database), we have another option: we can mark the language as trusted.

To do this, we'll need our superuser shell again:

postgres=# select * from pg_language;
  lanname  | lanowner | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator | lanacl
-----------+----------+---------+--------------+---------------+-----------+--------------+--------
 internal  |       10 | f       | f            |             0 |         0 |         2246 |
 c         |       10 | f       | f            |             0 |         0 |         2247 |
 sql       |       10 | f       | t            |             0 |         0 |         2248 |
 plpgsql   |       10 | t       | t            |         12545 |     12546 |        12547 |
 plpythonu |       10 | t       | f            |         16416 |     16417 |        16418 |
(5 rows)

postgres=# update pg_language set lanpltrusted = true where lanname = 'plpythonu';
UPDATE 1

Now our regular user can create and execute this function:

jmadden=> CREATE FUNCTION pymax (a integer, b integer)
          RETURNS integer
          AS $$
             if a > b:
                 return a
             return b
          $$ LANGUAGE plpythonu;
CREATE FUNCTION

jmadden=> select pymax(1, 2);
 pymax
-------
   2
(1 row)

More

PL/Python can do much more than just compare integers, of course. It can query the database, processing rows in batches or iteratively. It can access and produce PostgreSQL "compound types." It can be used for row or statement level triggers. That's all beyond the scope of this post, but you can read more about it in the PostgreSQL documentation.