loading...
RITER.CO

SQL tests in your smart framework

aratak profile image Alexey Osipenko Updated on ・6 min read

Many complain that sql-code, in one form or another, becomes unsupported very quickly and, in fact, it is not even worth starting to write any logic in sql. "Use sql only as tables, and implement all the logic inside the code", they say not entirely without some slyness at the same time.

Well, tell me, some specific data types are nevertheless permissible in sql, aren't they? For example, is it considered shameful to use enum instead of string, or it is still acceptably? It seems to be permitted even in the eyes of ardent defenders of code purity. And what about using of a floating point numbers of certain accuracy (precision and scale) - is it still eligible? Or reasonable default values like created_at timestamp without time zone DEFAULT CURRENT_TIMESTAMP, we are allowed to use them so far, right?

If answers for all questions above are positive, then most likely we can also add data validation to be sure that the format of, say, a string is met. We mean, for example, checking the format of the email address, absence of dangerous characters in fields-identifiers of the address bar (slug or uid), or even validation of uniqueness in a specific subset of records (scope).

And then it's a short road to stored procedures. Let's say, you have a 'name' field, and you want to make sure that the value of this field is unique without matching case. It's a typical task, right? Traditional MVP-frameworks say that verification should be done in advance, like SELECT 1 FROM table_name WHERE lower(name) = $1, and if a record is present, then saving process mustn't even be started. A database, as a concept, tells us that it should be checked at the time of recording. In some cases constraints on tables come to our rescue, in others we can apply rules and triggers. In general, it would be much more convenient to do that with a database, except, support of the written code becomes more complicated.

In our project riter.co we came up with some simple rules, adhering to which you can worry about sql-code no more than about some other piece of your project, for example javascript-code or css-rules.

  • It is necessary to clearly distinguish cases when you create a rule, a trigger, and a constraint. Rules should be better defined on your own in a separate project, but anyway you will certainly follow some priority of creation. Like, if it is possible to create a constraint, then create a constraint. If not, you should create a rule. As a last resort, if even a rule is not suitable in your case, that use a trigger.

  • Standards for naming triggers and rules. If you want to create some sort of a trigger on the table, the name must be uniquely defined. I propose to begin thinking about this naming rule with "#{table_name}_#{affected_colum_name}_#{verb}_on_#{action}". You will get something like "companies_subdomain_lowercase_on_insert". In addition, then it will be much easier to find existing rules with the mask:

  SELECT routine_name AS name
  FROM information_schema.routines
  WHERE
    routine_name LIKE 'companies_%_on_insert' AND
    routine_type ='FUNCTION' AND
    specific_schema='public';
  • KISS. All procedures in your database must be simple. Complex things result in complex bugs, and bugs at the level of the database are quite scary. Let's keep them simple. It is better to leave complicated logic to high-level programming languages with their frameworks. For example, we can convert everything to a lowercase or generate a unique identifier inside the database, but when it is about checking the intersection of time intervals, it is better, probably, to do that in the code.

  • Tests. Actually, the entire post is written for the sake of this point, and all that is said above is more like a preamble.

Tests should be presented in three ways, as always: integration, unit and acceptance.

Unit-tests have much less sense and in the overwhelming majority they must be written in a high-level code. It's a good practice to write unit tests for some single procedures independent of the current state of the database. For example, we have such a procedure that looks very nice in the context of unit tests:

CREATE FUNCTION public.generate_slug(slug character varying, table_name character varying) RETURNS character varying
    LANGUAGE plpgsql
    AS $$
      DECLARE
        key varchar;
        query text;
        found varchar;
      BEGIN
        query := 'SELECT slug FROM ' || quote_ident(table_name) || ' WHERE slug = ';
        key := slug;

        EXECUTE query || quote_literal(slug) INTO found;

          WHILE found IS NOT NULL LOOP
            key := encode(gen_random_bytes(5), 'base64');
            key := replace(key, '/', '-');
            key := replace(key, '+', '-');
            key := replace(key, '=', '');
            key := slug || '-' || key;

            EXECUTE query || quote_literal(key) INTO found;
          END LOOP;

        RETURN key;
      END;
      $$;

By and large, it's impossible to implement a full mock and stub, so real unit-tests can not be written either.

Integration tests should be written in the language of the framework chosen for the project, and they must check that all your sql-triggers' actions are reflected in the objects and structures of your program. For example:

subject { create(:company, name: 'CamelCase') }
its(:name) { is_expected.to eq('camelcase') }

Firstly, it will check the properly working RETURNING name. Secondly, it will test work of your framework and its willingness to cooperate with the database. For example, if your framework is ready to monitor the correctness of data in some fields, then such tests can reveal a conflict and it can be resolved in time.

What about acceptance-tests, that's where things get interesting. After all, it really does not matter what exactly makes all letters from the name field lowercase, whether triggers, or rules. It is much more important to be convinced of the results.

First, you need to agree on the structure. We have considered several options and reached the most reasonable one - for each individual test we create a separate function, then we call it and check the return value. Of course, we want these tests to run along with all the other tests, so we use a small wrapper with our frameworks.

Here's the test itself:

it { expect(ApplicationRecord.connection.execute("SELECT #{f.function_name}()").to_a.first[f.function_name]).to eq('ok') }

It's terribly simple, isn't it? Certainly, before running the test, you need to create this function. We do this right before the test:

before do
  ApplicationRecord.connection.execute("CREATE OR REPLACE FUNCTION #{f.function_name}() RETURNS varchar LANGUAGE plpgsql AS $$ #{f.content} $$;")
end

And after the test we immediately delete the function:

after do
  ApplicationRecord.connection.execute("DROP FUNCTION IF EXISTS #{f.function_name}()")
rescue ActiveRecord::StatementInvalid
  nil
end

Now about content of this function. The most convenient solution is to keep the content in separate files with the .sql extension and read it before creating the sql-function:

    Dir[File.expand_path('**/*_spec.sql', __dir__)].map do |path|
      content = IO.read(path).strip
      basename = Pathname.new(path)
                         .relative_path_from(Rails.root.join('spec', 'sql'))
                         .to_s
                         .gsub('/', '__')
                         .gsub(/_spec.sql\z/, '')
      OpenStruct.new(path: path.to_s, content: content, empty?: content.empty?, basename: basename, function_name: "rspec_#{basename}")
    end

A name of the function, that is function_name, can be generated from the file name with some authentic prefix, so that you could distinguish test-function from all other functions. Then you will be able to make sure that there is nothing superfluous left in the database once more before and after running all the tests:

DO $$
DECLARE
  routine record;
BEGIN
  FOR routine IN
    SELECT routine_name AS name
    FROM information_schema.routines
    WHERE
      routine_name LIKE 'rspec_%' AND
      routine_type ='FUNCTION' AND
      specific_schema='public'
  LOOP
    EXECUTE 'DROP FUNCTION ' || quote_ident(routine.name);
  END LOOP;
END $$ LANGUAGE plpgsql;

And the test itself is a matter of technique. Here is, for example, a separately saved company_name_spec.sql, which will check the uniqueness of the name field in the companies table.

DECLARE
  company_name companies.name%TYPE;
BEGIN

  INSERT INTO companies (name) VALUES ('qwe1') RETURNING name INTO company_name;
  ASSERT company_name = 'qwe1', 'companies.name is allowed to be a string';

  BEGIN
    INSERT INTO companies (name) VALUES ('qwe1') RETURNING name INTO company_name;
    ASSERT FALSE, 'companies.name should raise unique violation';
  EXCEPTION
    WHEN unique_violation THEN
      ASSERT TRUE;
  END;

  RETURN 'ok';
END;

Instead of conclusions, I would like to say that an intelligent database is the very place that must be thoroughly covered with tests. And leave your excuses in the past, now you know how to write them.

Discussion

pic
Editor guide