Exporting CSV from PostgreSQL Streamlining your reporting with variables and psql2csv

April 28, 2019

Running a query and exporting the resulting data is a common requirement when working with databases, be it for visualizing the data, using it as input to another tool, or simply sharing it with others. PostgreSQL provides the COPY command, allowing us to do so from psql.

GUIs such as pgAdmin usually have a data export feature built in. In this article I’ll be focusing on psql and the command line in general. After all, I’m a fan of using PostgreSQL on the command line.

Exporting Data in a psql Session

The COPY command copies data from a table or from a query to a file. It is also used to move data from a file to a table. As we’re talking about exporting data, we’ll be focusing on the former use case.

The interface of the COPY TO command is as follows¹:

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    OIDS [ boolean ]
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    ENCODING 'encoding_name'

Let’s introduce a small table that we’ll use to explore the COPY command a bit:

CREATE TABLE users (
    id serial primary key,
    name text NOT NULL,
    age integer NOT NULL
);

INSERT INTO users (name, age) VALUES
    ('Anne', 24),
    ('Bill', 37),
    ('Carrie', 39),
    ('Daniel', 19);

We can now copy this entire table to a file in a psql session:

db=# COPY users TO 'users.csv';
ERROR:  relative path not allowed for COPY to file

Looks like that didn’t work… Maybe the docs for COPY can help. The doc for the filename argument specifically states:

An input file name can be an absolute or relative path, but an output file name must be an absolute path.

Fair enough, we can specify an absolute path if we have to. Although it would be nicer to work with relative paths, obviously.

Further, the docs also state:

COPY with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible by the PostgreSQL user (the user ID the server runs as) and the name must be specified from the viewpoint of the server.

This sounds like a showstopper. When connecting to a remote database, any file on my computer is obviously inaccessible by that PostgreSQL user on the server.

To remedy this, psql provides a meta-command called \copy that works just like COPY, but without the above limitations. As per the psql docs for \copy:

Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

To achieve this, \copy uses COPY TO STDOUT under the hood (and COPY FROM STDIN when importing data).

Try to run the above command again, this time with \copy²:

db=# \copy users TO 'users.csv'
COPY 4

Seems like the command copied all four rows. Here’s how the resulting users.csv looks like:

1	Anne	24
2	Bill	37
3	Carrie	39
4	Daniel	19

Nice! But that doesn’t quite look like CSV.

That’s because we did not specify the FORMAT option, which has three possible values: text (default), csv, and binary.

Let’s get some CSV this time:

db=# \copy users TO 'users.csv' WITH (FORMAT csv)
COPY 4

1,Anne,24
2,Bill,37
3,Carrie,39
4,Daniel,19

What about the headers? We can get those with the HEADER option:

db=# \copy users TO 'users.csv' WITH (FORMAT csv, HEADER)
COPY 4

id,name,age
1,Anne,24
2,Bill,37
3,Carrie,39
4,Daniel,19

As initially mentioned, we can not only copy entire tables, but also query results.

How about omitting the id column, reordering and renaming the other two columns, filtering, and sorting the data?

db=# \copy (SELECT age AS "Age", name AS "First Name" FROM users WHERE age >= 21 ORDER BY age DESC) TO 'users.csv' WITH (FORMAT csv, HEADER)
COPY 3

Age,First Name
39,Carrie
37,Bill
24,Anne

As you can see, with our custom query we can bring the data into the desired format.

In addition to the HEADER option, there are several other options that can be specified:

DELIMITER: Specify the value delimiter (defaults to ,)
QUOTE: Specify the quote character to be used when a value contains the delimiter or when the column is being quoted forcefully (defaults to ")
ESCAPE: Specify the escape character to be used when a value contains the quote character (defaults to the QUOTE value)
NULL: Specify how NULL values are represented (defaults to empty string)
FORCE_QUOTE: Force quote some or all columns, even if quotes aren’t necessary (defaults to no column)
ENCODING: Specify the encoding (defaults to the client encoding)

From my own experience these are rarely needed. Consult the COPY docs for more details.

Extracting CSV Without Entering a psql Shell

You might want to use the exported CSV data in a script or pipe it to another program. Or you might simply want to execute a single query and get the output as a CSV, without having to enter the interactive psql shell.

psql has the --command option (-c for short), with which we can execute the specified command and exit. Alternatively, the command can be passed in through standard input. This allows us to execute a COPY command and print the output to standard output.

$ psql --quiet --command \
    'COPY (SELECT name, age FROM users) TO STDOUT WITH (FORMAT csv, HEADER)'
name,age
Anne,24
Bill,37
Carrie,39
Daniel,19

This use case was so frequent for me that I needed a tool which would allow me to specify just the raw SQL query without the surrounding COPY command.

Introducing psql2csv

psql2csv is a thin wrapper around psql that I wrote, which does exactly that. It puts the focus on the query, hiding the boilerplate. The above becomes:

$ psql2csv 'SELECT name, age FROM users'
name,age
Anne,24
Bill,37
Carrie,39
Daniel,19

It also accepts the query through standard input, allowing us to read a query stored in a file. Storing the extracted CSV is a matter of redirecting the output to a file:

$ psql2csv < query.sql > data.csv

psql2csv will read the query, remove any trailing semicolon, and wrap it in a COPY TO STDOUT command.

With this, psql2csv can streamline some of your manual reporting work. For instance, you might have a collection of SQL queries you use periodically. Throwing together a small script that runs them all through psql2csv is quite simple.

It also has options to specify all of the COPY formatting options explained above for extra convenience. Further, it allows to specify the time zone you want the query to be run in (useful when doing counts per day in a specific time zone) as well as the search_path for the case where you’re using schema for multi-tenancy.

psql Variables

psql has a neat feature that I wish I had discovered earlier. Through the --set or --variable (-v for short) option you can pass variables to the psql session that can be used in your queries either

as literals (:VAR),
as strings (:'VAR'), or
as identifiers (:"VAR").

Unfortunately, this only works when passing the command through standard input, not through the --command or -c option:

$ psql --quiet -v INT=123 -v STR=foobar -v IDENTIFIER=xyz <<SQL
    SELECT :INT, :'STR' AS :"IDENTIFIER"
SQL
 ?column? |  xyz
----------+--------
      123 | foobar
(1 row)

Because psql2csv is just a thin wrapper around psql that forwards options to psql, we can make use of variables in psql2csv, too:

$ psql2csv -v INT=123 -v STR=foobar -v IDENTIFIER=xyz <<SQL
    SELECT :INT, :'STR' AS :"IDENTIFIER"
SQL
?column?,xyz
123,foobar

This is quite powerful in combination with SQL queries stored in files as we can parameterize them.

Imagine a SQL script that we use once a month to gather some stats on our user base:

SELECT
    count(*) AS total,
    count(*) FILTER (
        WHERE date_trunc('month', registered_at) = (:'MONTH' || '-01')::timestamptz
    ) AS sign_ups
FROM users
WHERE date_trunc('month', registered_at) <= (:'MONTH' || '-01')::timestamptz

We can now extract the CSV for this query every month without having to modify the SQL file at all:

$ psql2csv -v MONTH=2019-04 < query.sql > data.csv

PostgreSQL offers a great built-in way to extract CSV. By using standalone commands we can streamline the extraction without having to enter an interactive psql session.

psql2csv offers an even simpler mechanism to do so and I hope I have convinced you to give it a spin.

Currently psql2csv is available as a standalone script or through Homebrew:

$ brew install psql2csv

If you’d like to have psql2csv available through other package managers, I’d be happy to publish it with some help of yours.

You might also like my other posts on PostgreSQL:

You might come across a slightly different COPY syntax. There are two older syntaxes that were used before PostgreSQL version 9.0 and 7.3. These are documented here. ↩
Note that meta-commands, i.e. those starting with a backslash, do not require a terminating semicolon. ↩