Exporting CSV from PostgreSQL Streamlining your reporting with variables and psql2csv
Running a query and exporting the resulting data is a common requirement when
working with databases, be it for visualizing the data, using it as input to
another tool, or simply sharing it with others. PostgreSQL provides the COPY
command, allowing us to do so from psql.
GUIs such as pgAdmin usually have a data export feature built in. In this article I’ll be focusing on psql and the command line in general. After all, I’m a fan of using PostgreSQL on the command line.
Exporting Data in a psql Session
The COPY
command copies data from a table or from a query to a file.
It is also used to move data from a file to a table. As we’re talking about
exporting data, we’ll be focusing on the former use case.
The interface of the COPY TO
command is as follows1:
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
where option can be one of:
FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
ENCODING 'encoding_name'
Let’s introduce a small table that we’ll use to explore the COPY
command
a bit:
CREATE TABLE users (
id serial primary key,
name text NOT NULL,
age integer NOT NULL
);
INSERT INTO users (name, age) VALUES
('Anne', 24),
('Bill', 37),
('Carrie', 39),
('Daniel', 19);
We can now copy this entire table to a file in a psql session:
db=# COPY users TO 'users.csv';
ERROR: relative path not allowed for COPY to file
Looks like that didn’t work… Maybe the docs for COPY can help. The doc
for the filename
argument specifically states:
An input file name can be an absolute or relative path, but an output file name must be an absolute path.
Fair enough, we can specify an absolute path if we have to. Although it would be nicer to work with relative paths, obviously.
Further, the docs also state:
COPY
with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible by the PostgreSQL user (the user ID the server runs as) and the name must be specified from the viewpoint of the server.
This sounds like a showstopper. When connecting to a remote database, any file on my computer is obviously inaccessible by that PostgreSQL user on the server.
To remedy this, psql provides a meta-command called \copy
that works just like
COPY
, but without the above limitations. As per the psql docs for
\copy
:
Performs a frontend (client) copy. This is an operation that runs an SQL
COPY
command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
To achieve this, \copy
uses COPY TO STDOUT
under the hood (and COPY FROM
STDIN
when importing data).
Try to run the above command again, this time with \copy
2:
db=# \copy users TO 'users.csv'
COPY 4
Seems like the command copied all four rows. Here’s how the resulting
users.csv
looks like:
1 Anne 24
2 Bill 37
3 Carrie 39
4 Daniel 19
Nice! But that doesn’t quite look like CSV.
That’s because we did not specify the FORMAT
option, which has three possible
values: text
(default), csv
, and binary
.
Let’s get some CSV this time:
db=# \copy users TO 'users.csv' WITH (FORMAT csv)
COPY 4
1,Anne,24
2,Bill,37
3,Carrie,39
4,Daniel,19
What about the headers? We can get those with the HEADER
option:
db=# \copy users TO 'users.csv' WITH (FORMAT csv, HEADER)
COPY 4
id,name,age
1,Anne,24
2,Bill,37
3,Carrie,39
4,Daniel,19
As initially mentioned, we can not only copy entire tables, but also query results.
How about omitting the id
column, reordering and renaming the other two
columns, filtering, and sorting the data?
db=# \copy (SELECT age AS "Age", name AS "First Name" FROM users WHERE age >= 21 ORDER BY age DESC) TO 'users.csv' WITH (FORMAT csv, HEADER)
COPY 3
Age,First Name
39,Carrie
37,Bill
24,Anne
As you can see, with our custom query we can bring the data into the desired format.
In addition to the HEADER
option, there are several other options that can be
specified:
DELIMITER
- Specify the value delimiter (defaults to
,
) QUOTE
- Specify the quote character to be used when a value contains the delimiter
or when the column is being quoted forcefully (defaults to
"
) ESCAPE
- Specify the escape character to be used when a value contains the quote
character (defaults to the
QUOTE
value) NULL
- Specify how
NULL
values are represented (defaults to empty string) FORCE_QUOTE
- Force quote some or all columns, even if quotes aren’t necessary (defaults to no column)
ENCODING
- Specify the encoding (defaults to the client encoding)
From my own experience these are rarely needed. Consult the COPY
docs
for more details.
Extracting CSV Without Entering a psql Shell
You might want to use the exported CSV data in a script or pipe it to another program. Or you might simply want to execute a single query and get the output as a CSV, without having to enter the interactive psql shell.
psql has the --command
option (-c
for short), with which we can execute the
specified command and exit. Alternatively, the command can be passed in through
standard input. This allows us to execute a COPY
command and print the output
to standard output.
$ psql --quiet --command \
'COPY (SELECT name, age FROM users) TO STDOUT WITH (FORMAT csv, HEADER)'
name,age
Anne,24
Bill,37
Carrie,39
Daniel,19
This use case was so frequent for me that I needed a tool which would allow me
to specify just the raw SQL query without the surrounding COPY
command.
Introducing psql2csv
psql2csv is a thin wrapper around psql that I wrote, which does exactly that. It puts the focus on the query, hiding the boilerplate. The above becomes:
$ psql2csv 'SELECT name, age FROM users'
name,age
Anne,24
Bill,37
Carrie,39
Daniel,19
It also accepts the query through standard input, allowing us to read a query stored in a file. Storing the extracted CSV is a matter of redirecting the output to a file:
$ psql2csv < query.sql > data.csv
psql2csv will read the query, remove any trailing semicolon, and wrap it in a
COPY TO STDOUT
command.
With this, psql2csv can streamline some of your manual reporting work. For instance, you might have a collection of SQL queries you use periodically. Throwing together a small script that runs them all through psql2csv is quite simple.
It also has options to specify all of the COPY
formatting options explained
above for extra convenience. Further, it allows to specify the time zone you
want the query to be run in (useful when doing counts per day in a specific time
zone) as well as the search_path
for the case where you’re using schema for
multi-tenancy.
psql Variables
psql has a neat feature that I wish I had discovered earlier. Through the
--set
or --variable
(-v
for short) option you can pass variables to the
psql session that can be used in your queries either
- as literals (
:VAR
), - as strings (
:'VAR'
), or - as identifiers (
:"VAR"
).
Unfortunately, this only works when passing the command through standard input,
not through the --command
or -c
option:
$ psql --quiet -v INT=123 -v STR=foobar -v IDENTIFIER=xyz <<SQL
SELECT :INT, :'STR' AS :"IDENTIFIER"
SQL
?column? | xyz
----------+--------
123 | foobar
(1 row)
Because psql2csv is just a thin wrapper around psql that forwards options to psql, we can make use of variables in psql2csv, too:
$ psql2csv -v INT=123 -v STR=foobar -v IDENTIFIER=xyz <<SQL
SELECT :INT, :'STR' AS :"IDENTIFIER"
SQL
?column?,xyz
123,foobar
This is quite powerful in combination with SQL queries stored in files as we can parameterize them.
Imagine a SQL script that we use once a month to gather some stats on our user base:
SELECT
count(*) AS total,
count(*) FILTER (
WHERE date_trunc('month', registered_at) = (:'MONTH' || '-01')::timestamptz
) AS sign_ups
FROM users
WHERE date_trunc('month', registered_at) <= (:'MONTH' || '-01')::timestamptz
We can now extract the CSV for this query every month without having to modify the SQL file at all:
$ psql2csv -v MONTH=2019-04 < query.sql > data.csv
PostgreSQL offers a great built-in way to extract CSV. By using standalone commands we can streamline the extraction without having to enter an interactive psql session.
psql2csv offers an even simpler mechanism to do so and I hope I have convinced you to give it a spin.
Currently psql2csv is available as a standalone script or through Homebrew:
$ brew install psql2csv
If you’d like to have psql2csv available through other package managers, I’d be happy to publish it with some help of yours.
You might also like my other posts on PostgreSQL: