Before we move on to manipulating array values stored in the database, it is useful to first learn the basics.
Representing a list
SELECT '{a,b,c}'::text[];
text
----------
{a,b,c}
(1 row)
We have to specify the type. Else postgres would not know the type since it looks like a string.
Or we could also use the ARRAY[] style
# SELECT ARRAY['a','b','c'];
array
-------
{a,b,c}
SELECTing FROM an array
So why doesn’t the following work?
# SELECT * FROM ARRAY['a','b','c']
ERROR: 42601: syntax error at or near "ARRAY"
LINE 1: SELECT * FROM ARRAY['a','b','c'];
^
LOCATION: scanner_yyerror, scan.l:1134
SELECT FROM expects that the data source is:
- a set of rows
- OR points to a set of word (table)
ARRAY[] returns an atomic value. Which is why it works with just SELECT but fails when FROM is used.
Convert arrays into rows using UNNEST
# SELECT UNNEST(ARRAY['a','b']);
unnest
--------
a
b
(2 rows)
Convert rows into arrays using ARRAY()
Watch out. We are not using ARRAY[]. We are using array().
SELECT array(
SELECT UNNEST(ARRAY['a','b'])
);
array
-------
{a,b}
(1 row)
To summarize:
UNNESTconverts array into a series of rows.array()converts a bunch of rows into an array.ARRAY[]used to represent/define an array.
Concatenate arrays using array_cat()
# SELECT array_cat('{a,b}'::text[], '{b,c}'::text[]);
array_cat
-----------
{a,b,b,c}
(1 row)
We coerced the argument into text arrays to indicate to array_cat what data types it was dealing with. If we used the ARRAY[] initializer, this would not be required. Example below.
SELECT array_cat(ARRAY['a','b'], ARRAY['b', 'c']);
Notice that the final array contains b twice. We would love to remove duplicates. There are two ways to do it and we'll explore both below.
Removing duplicates from an array using DISTINCT
This method works with integers, text, float, etc
DISTINCT helps remove duplicates from all data types. But it requires rows as input. Not arrays.
- So first convert an array to rows using
UNNEST - Pass it to DISTINCT
- And then if you still require an array as output, use
array()
SELECT DISTINCT UNNEST(
'{apple, banana, cat, apple}'::text[]
);
unnest
--------
banana
cat
apple
(3 rows)
SELECT DISTINCT UNNEST(
'{1.245, 1.3, 1.5, 1.2}'::float[]
) ORDER BY 1;
And we have columns. But we need an array back as output. So convert using array()
SELECT array(
SELECT DISTINCT UNNEST(
'{apple, banana, cat, apple}'::text[]
)
);
array
--------------------
{banana,cat,apple}
(1 row)
What if we want it sorted? Use ORDER BY 1
SELECT array(
SELECT DISTINCT UNNEST(
'{apple, banana, cat, apple}'::text[]
) ORDER BY 1
);
ORDER BY 1would sort by the first column (We could starting with1here). In this case we are working with on-the-fly arrays, without giving them column names. So we use column number references.
Want to try this with a float array? Let's do it.
SELECT array(
SELECT DISTINCT UNNEST(
'{1.2, 1.245, 2.3, 1.245, 1.246}'::text[]
)
);
array
-----------------------
{1.2,1.246,2.3,1.245}
(1 row)
Removing duplicates from integer arrays using sort and uniq
Caveat: This method only works for integer arrays (yeah so no floats too).
sort and uniq are functions available in the intarray extension. Enable the extension in your database by running the following statement.
CREATE EXTENSION IF NOT EXISTS intarray;
Try using the uniq function.
SELECT uniq(
'{1,2,1,3}'::int[]
);
uniq
-----------
{1,2,1,3}
(1 row)
Oops. Does not work. uniq only works on a sorted list. So use the sort function to sort it first.
SELECT uniq(
sort(
'{1,2,1,3}'::int[]
)
);
uniq
---------
{1,2,3}
(1 row)
I haven't yet benchmarked in detail which method works the fastest for sorting integers. I used both methods as a part of another benchmarking to update 100k rows (coming soon). For now, all I can say is, both aren't slow. I'll try a more detailed benchmarking another day.
What else?
These are essentials that I find useful to know. There are a lot more array functions that Postgres supports. Go check it out.
I'll be sharing more from my notes on PostgreSQL. Subscribe to get updates.
Top comments (2)
thanks for sharing. Good stuff
How would I go about creating a unique set out a query like this
the expected result would be {thoughts,energy,mind, postgres,"data base"}
lovely article