DEV Community

Cover image for Brace yourself: Using BigQuery as an operational backend
matthieucham for Stack Labs

Posted on • Updated on

Brace yourself: Using BigQuery as an operational backend

BigQuery as backend ? WTF ?

BigQuery has become a major player in the field of Data analytics solutions. It provides an ever-growing list of powerful features in an easy, performant and cost-effective way. However BigQuery definitely is OLAP, while the sensible option for an application backend is OLTP.
Therefore, using BigQuery as a backend may seem weird, and... it is indeed! Just to scratch the surface of many problems that would arise:

  • BigQuery is not optimized for writing, but for performing complex queries.
  • Single-line inserts are discouraged.
  • BigQuery does not enforce keys, foreign keys nor constraints
  • BigQuery does not perform well with normalized schema, on the contrary it encourages denormalization
  • ...

So, if you think about BigQuery to be the storage backend of a full-blown application, you should definitely think again.

There can be some situation though, where this design has some interest...

Think of a huge analytics data platform revolving around BigQuery. Now imagine a corner case of the platform where some configuration have to be stored in a relational fashion, and this configuration data will have an impact on how data will be accessed. In order to update the configuration, an API is exposed. There are now 2 options to store the configuration values:

  • The normal way: set up a transactional database like PostgreSQL or MySQL, maybe through CloudSQL since we're dealing with the Google Cloud Platform. Use it as application's storage backend. Setup redundancy and backup strategy. Sort out IAM permissions. Query configuration data from BigQuery via federated queries. All of this will of course cost you some extra dollars.

  • The hacky way: store the configuration in a BigQuery dataset somewhere, and profit of the near-free hosting and redundancy provided by this serverless database. Use the bigquery client API to integrate with the application. Query like any other dataset.

Image description

Don't try this at home ! This stunt is being performed by professionals

Enforcing unicity constraints on BigQuery table

All warnings having been exposed, let's proceed with the implementation. It's not really difficult because Google provide BigQuery client libraries for many languages. It is also possible to use the REST Api.

The problem is, even if you agreed on cutting corners by not provisioning a proper OLTP database, you may still need to enforce some constraints.

Imagine that your configuration table consists in the following schema:

| Id (STRING) | Attribute (STRING) | Value (STRING) |
Enter fullscreen mode Exit fullscreen mode

And you want to enforce that Id values are unique. Normally, this would be a simple UNIQUE(Id) statement in OLTP databases. But such a statement doesn't exist in BigQuery.

Luckily, there is a new feature of BigQuery to the rescue: Transactions

BigQuery Transactions

Multi-statement transactions feature is covered by Pre-GA offerings at the time of writing. It enables the wrapping of standard SQL scripts into atomic transactions.

So, to compensate the absence of a traditional UNIQUE constraint, we can implement the following sequence when saving or modifying an entry into the table:

  • 1) Open a transaction
  • 2) Search if the Id to save already exists, raise an error if found
  • 3) Insert the new entry
  • 4) Commit the transaction

Here is an implementation of this script:

-- 1
BEGIN TRANSACTION;
-- 2
SELECT * FROM (
    SELECT COUNT(1) AS conflict
    FROM `configds.configtable`
    WHERE Id=@input_id
) WHERE
IF (conflict=0, TRUE, ERROR("Id already exists));
-- 3
INSERT INTO `configds.configtable`
VALUES(@input_id,@input_attribute,@input_value);
-- 4
COMMIT TRANSACTION;
Enter fullscreen mode Exit fullscreen mode

At step 2, ERROR() will automatically rollback the transaction, so that step 3 will not occur. Named parameters are used here to protect against SQL injection.

Call this script from the application backend on config saving, and this will protect the table against concurrent inserts from the application's clients.

Limitations

As mentioned earlier, this way of implementing constraints is not to be generalized. Here are the most prominent limitations:

  • Transactions are only Pre-GA at the moment
  • The unicity is enforced at the application level, but not at the database level. Nothing prevents another BigQuery client, like the BigQuery console itself, from inserting rows regardless of unicity of Id. Only the application is safe
  • Performance is very poor: it takes several seconds to perform the script.
  • Not supported by ORMs, you have to write plain SQL queries and be careful of SQL injections

Conclusion

The new BigQuery Multi-statement Transactions feature enables the usage of BigQuery as a somewhat-workable application backend, which can come in handy if used with high caution. Still, carefully consider the trade-offs vs a traditional OLTP database, and be prepared to defend your choices if you follow this path !

Thanks for reading! Iā€™m Matthieu, data engineer at Stack Labs.
If you want to discover the Stack Labs Data Platform or join an enthousiast Data Engineering team, please contact us.


Cover photo by Domi Nemeth on Unsplash

Top comments (2)

Collapse
 
viroos profile image
Maciej Sawicki

I understand that the example showing cumbersome of Cloud SQL approach was just an excuse to present a nice BigQuery hack, but it worth mention that in such case Firestore may be a better option. For example:

db.collection(u'configtable').add({u'input_attribute': input_value})

Collapse
 
matthieucham profile image
matthieucham

Hi Maciej, you're right Firestore is definitely an option to consider. Here however, I needed to store attributes in a place where I could request it from BigQuery and join with other tables of BigQuery. So Firestore was of no help.
Thanks for your comment !