HAP

Posted on Aug 19, 2021

Table Partitioning with django

#django #postgres

"This is my django partitioning solution. There are many like it, but this one is mine."

The impatient can jump to the codey bits.

Background

The axiom can be applied to anything, but "django is great... until it isn't". There is no explicit or implicit support for partitioning in django, but it's not impossible. Join me for my tale of hacking a solution to this problem.

Are you sitting comfortably? Then we'll begin.

I had already made some code to "convert" a table in postgres to a partitioned table. It, too, worked great... until it didn't. I had tried some other modules, but they all had their problems and simply did not work with our app or how we are deploying. Despite our convert code, we still had the issue of being unable to initially create a partitioned table. So I took a deep dive into the django code and eventually found the pieces I needed in the BaseDatabaseSchemaEditor class.

So, I tried to subclass this and figure out how to tell the migration app to use it. But with all of my searching (and a definite time window for completion) it looked like I would have to write my own migration app. This was distasteful because we were already using a tenant module's app for schema migrations and I really did not want to write yet another django app just to change one SQL statement and emit a few deferred ones. So I focused on the methods in BaseDatabaseSchemaEditor to see exactly what happened during create and delete model (table).

In the BaseDatabaseSchemaEditor class, there were two methods I focused on: table_sql for creating a table; and delete_model for dropping a table. Once I had read over these methods and understood what they were doing, I formulated my hack. This required some method overrides, and enable function, a disable function, a list of known partitioned model names, a partition information metaclass, and two support functions.

Python Code

Support Functions

I wanted a function that would be able to get me any model I wanted, globally. It needed to be able to return a model based on the model_name, app_label.model_name or table_name. This function's data would need to be cached and that cache loaded at runtime to ensure that all models were compiled and loaded into django's app/model cache.

_load_db_models

def _load_db_models():
    """Initialize the global dict that will hold the table_name/model_name -> Model map"""
    qualified_only = set()
    for app, _models in django.apps.apps.all_models.items():
        for lmodel_name, model in _models.items():
            qualified_model_name = f"{app}.{lmodel_name}"
            if lmodel_name in DB_MODELS:
                qualified_only.add(lmodel_name)
                del DB_MODELS[lmodel_name]

            DB_MODELS[qualified_model_name] = model
            DB_MODELS[model._meta.db_table] = model
            if lmodel_name not in qualified_only:
                DB_MODELS[lmodel_name] = model

This uses a module-level cache variable named DB_MODELS defined as an empty dict. There is also a module-level lock defined as:

import threading
...
DB_MODELS_LOCK = threading.Lock()

get_model

def get_model(model_or_table_name):
    """Get a model class from the model name or table name"""
    with DB_MODELS_LOCK:
        if not DB_MODELS:
            _load_db_models()
    return DB_MODELS[model_or_table_name.lower()]

Thus if I had an app named eek and an ORM model defined in this app as

from django.db import models
...
class MySuperCoolModel(models.Model):
    class Meta:
        db_table = "my_super_cool_model"
    id = models.UUIDField()
    start_date = models.DateField()
    ...

Then each these calls would all return the MySuperCoolModel class:

get_model("mysupercoolmodel")
get_model("MySuperCoolModel")
get_model("eek.MySuperCoolModel")
get_model("my_super_cool_model")

PartitonedModels

This is simply a list of model names that should be partitioned when created. I called my list PARTITIONED_MODEL_NAMES

Metaclass

This is a ORM model attribute that itself is a class containing information needed to make a partitioned table. It takes this form:

class PartitionInfo:
    partition_type = <str: a partition type such as "RANGE" or "LIST" or "HASH">
    partition_cols = <list[str]: a list of ORM model columns (this needs to match the format of the column as it would appear in the database>

As applied to MySuperCoolModel:

PARTITION_MODEL_NAMES = [
    ...
    "MySuperCoolModel",
]

class MySuperCoolModel(models.Model):
    class PartitionInfo:
        partition_type = "RANGE"
        partition_cols = ["start_date"]
    class Meta:
        db_table = "my_super_cool_model"
    id = models.UUIDField()
    start_date = models.DateField()
    ...

Override Methods

table_sql

I made a new function called p_table_sql (p_ for partitioned). This function will call the original function which I call o_table_sql (o_ for original) to get the initial SQL created as normal. Then I check to see if there is a name match from the model class passed in against the PARTITIONED_MODEL_NAMES list. If there is a match, I use get_model to retrieve the full ORM model class.

If I have a name match and the ORM model has the PartitionInfo metaclass, then I proceed with the new functionality. Otherwise, I return the original sql.

The new functionality consists of adding the PARTITION BY <type> (<cols>) clause to the create table SQL. Also the PRIMARY KEY qualifier is removed from the SQL. The primary key is then added to the deferred SQL list as an ALTER TABLE statement. This is inserted into the deferred_sql list before any FOREIGN KEY statements. If you have needs for tracking information or other post-create sql to be run for the new table, append that sql to the deferred_sql list.

This is because django does not support multiple column primary keys well, if at all. For partitioned models, it is best that any column that you would consider a primary key be created explicitly and not be a automatic incrementing field. UUID type is a good candidate for this.

from django.db.models import ForeignKey
...
def _count_fk_fields(model):
    num_fk = 0
    for f in model._meta.fields:
        num_fk += int(isinstance(f, ForeignKey))
    return num_fk


def p_table_sql(self, model):
    # Use default model class for the original django SQL generation
    sql, params = self.o_table_sql(model)

    # Based on model name match, get the defined model from the app
    # from the django migration processing
    if model.__name__ in PARTITIONED_MODEL_NAMES:
        pmodel = get_model(model.__name__)
    else:
        pmodel = None

    # If there was a partition name match and the class has the required attribute,
    # use this information to add the partition clause to the create table sql
    # Otherwise, return the original sql and params
    if pmodel is not None and hasattr(pmodel, "PartitionInfo"):
        LOG.info(f"Creating PARTITIONED TABLE {pmodel._meta.db_table}")
        partition_cols = pmodel.PartitionInfo.partition_cols
        sparams = {
            "partition_type": pmodel.PartitionInfo.partition_type.upper(),
            "partition_cols": ", ".join(f'"{c}"' for c in partition_cols),
        }

        # The primary key will be overridden here
        # Partitioned tables require that the partition column(s) be part of the primary key
        p_sql = self.sql_partitioned_table % sparams
        sql = sql.replace("PRIMARY KEY", "") + p_sql

        pk_cols = partition_cols[:]
        try:
            mod_pk = pmodel._meta.pk.get_attname()
        except Exception:
            pass
        else:
            pk_cols.append(mod_pk)

        sparams = {
            "table_name": f'"{pmodel._meta.db_table}"',
            "constraint_name": f'"{pmodel._meta.db_table}_pk"',
            "constraint_cols": ", ".join(f'"{c}"' for c in pk_cols),
        }
        pk_constraint = self.sql_partitioned_pk % sparams

        num_fk = _count_fk_fields(pmodel)
        if num_fk:
            self.deferred_sql.insert(-num_fk - 1, pk_constraint)
        else:
            self.deferred_sql.append(pk_constraint)

        # Add any deferred sql statement for post-create to
        # the self.deferred_sql list
    else:
        LOG.info(f"Creating TABLE {model._meta.db_table}")

    return sql, params

delete_model

This function (which I've named p_delete_model) needs to do any pre-processing before the table is dropped. Use the original functionality (which I've reassigned to o_delete_model) to do the actual table drop.

For my implementation, I have a tracking table that lists all of the partitions of any of my partitioned tables. So I want to clean that up.

def p_delete_model(self, model):
    if model.__name__ in PARTITIONED_MODEL_NAMES:
        pmodel = get_model(model.__name__)
    else:
        pmodel = None

    if pmodel is not None and hasattr(pmodel, "PartitionInfo"):
        sparams = {"partitioned_table_name": pmodel._meta.db_table}
        with self.connection.cursor() as cur:
            drop_partitions_sql = cur.mogrify(self.sql_drop_partitions, sparams).decode("utf-8")
        self.execute(drop_partitions_sql)

    self.o_delete_model(model)

Enable Function

This function will set SQL templates for partitioning as well as reassign original functionality to o_ attributes and assign the new functionality to the original attributes.

def set_partitioned_schema_editor(schema_editor):
    """
    Add attributes and override method of given schema_editor to allow partition table sql statements
    to be emitted.
    """
    # Add SQL templates, if not already present
    if not hasattr(schema_editor, "sql_partitioned_table"):
        setattr(schema_editor, "sql_partitioned_table", " PARTITION BY %(partition_type)s (%(partition_cols)s) ")

    if not hasattr(schema_editor, "sql_partitioned_pk"):
        setattr(
            schema_editor,
            "sql_partitioned_pk",
            "ALTER TABLE %(table_name)s ADD CONSTRAINT %(constraint_name)s PRIMARY KEY (%(constraint_cols)s)",
        )

    # Template to drop partitions by using the partition manager trigger function set
    # on the table
    if not hasattr(schema_editor, "sql_drop_partitions"):
        drop_partitions_sql = """
DELETE
  FROM partitioned_tables
 WHERE schema_name = current_schema
   AND partition_of_table_name = %(partitioned_table_name)s
"""
        setattr(schema_editor, "sql_drop_partitions", drop_partitions_sql)

    # Backup original method to emit create table sql and replace with the new method
    if not hasattr(schema_editor, "o_table_sql"):
        setattr(schema_editor, "o_table_sql", schema_editor.table_sql)
        setattr(schema_editor, "table_sql", types.MethodType(p_table_sql, schema_editor))

    if not hasattr(schema_editor, "o_delete_model"):
        setattr(schema_editor, "o_delete_model", schema_editor.delete_model)
        setattr(schema_editor, "delete_model", types.MethodType(p_delete_model, schema_editor))

Disable Function

This function will restore the schema_editor instance back to default.

def unset_partitioned_schema_editor(schema_editor):
    # Delete partition template attributes, if present
    if not hasattr(schema_editor, "sql_partitioned_table"):
        delattr(schema_editor, "sql_partitioned_table")

    if not hasattr(schema_editor, "sql_partitioned_pk"):
        delattr(schema_editor, "sql_partitioned_pk")

    if not hasattr(schema_editor, "sql_drop_partitions"):
        delattr(schema_editor, "sql_drop_partitions")

    # Restore original functionality for create table sql emit
    if not hasattr(schema_editor, "o_table_sql"):
        setattr(schema_editor, "table_sql", schema_editor.o_table_sql)
        delattr(schema_editor, "o_table_sql")

    # Restore original functionality for delete_model method
    if not hasattr(schema_editor, "o_delete_model"):
        setattr(schema_editor, "table_sql", schema_editor.o_delete_model)
        delattr(schema_editor, "o_delete_model")

Migrations

Once all of this is done, I had to utilize it in a migration file. After I dug through the django logic that ran migrations, I could see that (at least with migrate_schemas from django_tenant_schemas) that the schema_editor was created as a context manager and utilized throughout all migration files to be applied from the current state.

This means that I can set and clear the schema editor from within a migration file as discrete RunPython migration operations. The first operation should be to set the schema_editor for partitioning with the reverse action to unset. The last operation should be to unset the schema_editor with the reverse action to set.

Example:

from <your_partition_code_module> import set_partitioned_schema_editor
from <your_partition_code_module> import unset_partitioned_schema_editor

def set_partition_mode(apps, schema_editor):
    set_partitioned_schema_editor(schema_editor)


def unset_partition_mode(apps, schema_editor):
    unset_partitioned_schema_editor(schema_editor)


class Migration(migrations.Migration):

    dependencies = []

    operations = [
        migrations.RunPython(code=set_partition_mode, reverse_code=unset_partition_mode),
        migrations.CreateModel(
            name="OCPAllCostLineItemProjectDailySummaryP",
            fields=[
                ("id", models.UUIDField(serialize=False)),
                ("source_type", models.TextField()),
                ...
            ]
        ),
        migrations.RunPython(code=unset_partition_mode, reverse_code=set_partition_mode),
    ]

It's a little complicated, but it does work.

I hope that this can be adapted to your projects and open up the possibility of partitioning to those projects that need it.

Top comments (1)

Anuj Sharma • Apr 14 '23

@redhap I kind of want to try your code to implement partition in my project, but it's unclear where to place which code block. It would be better if you can mention where the code blocks will go (in which file), and even better if you can share an example on GitHub.

DEV Community