loading...

Database Views vs Table Functions

chochos profile image Enrique Zamudio ・5 min read

Views are a very useful feature of relational databases. You can define a complex query and store it as a view, and then query that view as if it were a table.

I have a complex query involving a join across 4 tables. That's a lot of columns. So I created a view to simplify things, and it works beautifully. Except, one day we had to change the type of a column, because we had screwed up, and it required us to drop the view. The situation is similar to this:

--A simple table as an example
CREATE TABLE example(
  table_id BIGSERIAL PRIMARY KEY,
  created  TIMESTAMP,
  other_stuff VARCHAR(20)
);

--An example of a view, as a simple query to the table
CREATE VIEW example_view AS SELECT * FROM example;

--If we try to modify the table, we get an error
ALTER TABLE example ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE;

--We need to drop the view so we can modify the table
DROP VIEW example_view;

--Then we can alter it
ALTER TABLE example ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE;

--Finally, re-create the view
CREATE VIEW example_view AS SELECT * FROM example;

In my case, the table we need to modify had over 8 million rows, so the ALTER TABLE command took several seconds to run, during which there was no view, so all functionality that depended on that view had to be taken offline.

After altering the table, creating the view takes no time at all, but the downtime during the table modification was an inconvenience I wanted to eliminate. So I started looking for alternatives.

Table Functions

Reading the PostgreSQL documentation I came across table functions. These are functions that return a table-like structure and you can query them as you would query a table. Following the previous example, we can create a table function instead of the view like so:

CREATE FUNCTION example_function()
  --The return type must include column names and types, like a table definition
  RETURNS TABLE(table_id BIGINT, created TIMESTAMP, other_stuff VARCHAR) AS
$$
BEGIN
  RETURN QUERY SELECT * FROM example;
END
$$
language plpgsql;

And then we can query it like this:

SELECT * FROM example_function();

If we later decide to alter the table, we can do so without getting an error. But, next time we query the table function we will get an error, because the table structure no longer matches the return type of the function. So we still have a problem in which queries stop working after altering a table.

A bit of metaprogramming

One approach to solve this is with a bit of metaprogramming. Instead of directly creating the table function, we can create a function to create the table function, dynamically defining the return type from the metadata of the table. This way, when we alter the table, we can simply run the generator function again and the table function will be updated with the new table metadata. It's something like this:

CREATE FUNCTION create_table_function() RETURNS INT AS
$META$
DECLARE
  columns TEXT;
  cmd     TEXT;
BEGIN
  --Here we create a string with all the column names and their types
  --separated by commas, to use as the return type of the function
  SELECT (SELECT STRING_AGG(column_name || ' ' || data_type, ', ')
    from information_schema.columns where table_name = 'example'
    group by table_name) INTO columns;

  --Here we create the whole function definition
  SELECT 'CREATE FUNCTION example_function() RETURNS TABLE(' ||
    columns || ') AS $$ BEGIN ' ||
    'RETURN QUERY SELECT * FROM example; END ' ||
    '$$ language plpgsql;'
    INTO cmd;
  --First we need to drop the function, because we can't replace
  --an existing one due to the different return type
  DROP FUNCTION IF EXISTS example_function();
  EXECUTE cmd;
  RETURN 1;
END
$META$
language plpgsql;

The function above creates the command to create example_function, constructing the return type from the example table metadata. When we call this generator function, it will create the table function we need. So after altering the table, we only need to call this again and the table function will be updated instantly. Well, almost instantly.

There is still a minor problem: what if someone alters a column from a table used by our table function, and forgets to regenerate the table function itself? A lot of errors will start to occur. One workaround for this is to make the query performed by the table function cast every column to the type defined in the function's return type. This requires some tweaking to the table function, again using the table's metadata:

CREATE FUNCTION create_table_function() RETURNS INT AS
$META$
DECLARE
  columns TEXT;
  query   TEXT;
  cmd     TEXT;
BEGIN
  --This remains the same
  SELECT (SELECT STRING_AGG(column_name || ' ' || data_type, ', ')
    from information_schema.columns where table_name = 'example'
    group by table_name) INTO columns;

  --Now we create a string with each column, casting it to the type
  --it currently has, so it always matches the return type of the function
  SELECT (SELECT STRING_AGG('CAST(example.' || column_name || ' AS ' || data_type, '), ')
    from information_schema.columns where table_name = 'foo'
    group by table_name) INTO query;

  --we create the function with the new query
  SELECT 'CREATE FUNCTION example_function() RETURNS TABLE(' ||
    columns || ') AS $$ BEGIN ' ||
    --we need a closing parens because of the way STRING_AGG works
    'RETURN QUERY SELECT ' || query || ') FROM example;' ||
    'END $$ language plpgsql;'
    INTO cmd;
  DROP FUNCTION IF EXISTS example_function();
  EXECUTE cmd;
  RETURN 1;
END
$META$
language plpgsql;

Because every column is cast to the data type that the table had when the table function was created, altering one of these columns does not affect the function's return type, so now the function is not affected at all by the modification (except if you drop a column from the table, but this is also a problem with views). The table function can be updated later.

Everything's a trade-off

Of course there are disadvantages to this approach. The most evident one is the clunkiness of creating the table function dynamically, compared to the simplicity of creating a view.

A big disadvantage however, is that you can't index table functions; the only way I've found around this limitation is to have one function per query, or add parameters to your function so it can perform different queries depending on the arguments passed to it.

In any case, altering the types of table columns is not something you should be doing often. The most common case is extending VARCHAR columns, which would not affect the table function (but it can affect a view, so this is yet another advantage of table functions vs views).

I'm not trying to advocate for table functions instead of views for every case, but I think this is a very useful trick to have at hand if you find yourself in a situation where you need to alter a table that's involved in a view and need to do something to minimize downtime.

Posted on by:

chochos profile

Enrique Zamudio

@chochos

Author of j8583 and jAlarms. Java Champion, worked on the Ceylon to Javascript compiler 2012-2017, moved to cryptocurrencies since then.

Discussion

markdown guide
 
 

Something I didn't go into in the article is that the view I originally wrote was also created dynamically, using the metadata for the tables involved in the join, mostly because I didn't want to have to write the big query because it's not just a matter of writing it once but because it's something that has to be maintained, whereas we only need to call a function to regenerate the view (or the table function) after altering something to regenerate the view (or table function).

I couldn't find a way to do this with CTE's, and the original query needs to return all those columns from the joins anyway.

 

Got it, thanks for the explanation! :)