Robin Muhia

Posted on Mar 17

Implementation of Elastic Search in Django

#django #elasticsearch #webdev #python

Introduction

In the first article, we delved into how elastic search works under the hood.

In this article, we will implement elastic search in a Django application.

This article is intended for someone familiar with Django, we will not be explaining setup deeply or functionality such as models and views.

Setup

Clone this repository into a folder of your choosing.



git clone git@github.com:robinmuhia/elasticSearchPOC.git .

Get the repo from this Github link

We need three specific libraries that we will use as they abstract a lot of what we need to implement elastic search.



django-elasticsearch-dsl==8.0
elasticsearch==8.0.0
elasticsearch-dsl==8.12.0

Create a virtual environment, activate it and install the dependencies in the requirements.txt file



python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Your project structure should look like the image below;

Now we're ready to go.

Understanding the project

Settings file

The project is a simple Django application. It has your usual setup structure.

In the config folder, we have our settings.py file.
For the purpose of this project, our elastic search settings are simple as shown below;



INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "whitenoise.runserver_nostatic",
    "django.contrib.staticfiles",
    "django_extensions",
    "django_elasticsearch_dsl",
    "rest_framework",
    "elastic_search.books",
]

ELASTICSEARCH_DSL = {
    "default": {
        "hosts": [os.getenv("ELASTICSEARCH_URL", "http://localhost:9200")],
    },
}
ELASTICSEARCH_DSL_SIGNAL_PROCESSOR = "django_elasticsearch_dsl.signals.RealTimeSignalProcessor"
ELASTICSEARCH_DSL_INDEX_SETTINGS = {}
ELASTICSEARCH_DSL_AUTOSYNC = True
ELASTICSEARCH_DSL_AUTO_REFRESH = True
ELASTICSEARCH_DSL_PARALLEL = False

In a production ready application, i would recommend using the CelerySignalProcessor. The RealTimeSignalProcessor re-indexes documents immediately any changes are made to a model. CelerySignalProcessor would handle the re-indexing asynchronously to ensure that our users would not have to experience added latency when they modify any of our models. You would have to set up Celery though.

Read more about the nuances of settings here.

Models



from django.db import models


class GenericMixin(models.Model):
    """Generic mixin to be inherited by all models."""

    id = models.AutoField(primary_key=True, editable=False, unique=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    class Meta:
        abstract = True
        ordering = ("-updated_at", "-created_at")
class Country(GenericMixin):
    name = models.CharField(max_length=200)

    def __str__(self):
        return self.name


class Genre(GenericMixin):
    name = models.CharField(max_length=100)

    def __str__(self):
        return self.name


class Author(GenericMixin):
    name = models.CharField(max_length=200)

    def __str__(self):
        return self.name


class Book(GenericMixin):
    title = models.CharField(max_length=100)
    description = models.TextField()
    genre = models.ForeignKey(Genre, on_delete=models.CASCADE, related_name="genres")
    country = models.ForeignKey(Country, on_delete=models.CASCADE, related_name="countries")
    author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="authors")
    year = models.IntegerField()
    rating = models.FloatField()

    def __str__(self):
        return self.title

The Generic Mixin has fields that should be inherited by all models. For a production application, i would recommend using a UUID as a primary key but we will use a normal incrementing integer field as it is easier for this project.

The models are pretty self-explanatory but we will be indexing and querying the book model. Our goals are to be able to search for a book using its title and description, while also being able to filter year and rating.

Documents file

We have documents.py file in the books folder.
This folder is important and should be named as such. Our documents will be written here. For our book model, the code is shown below;



from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry

from elastic_search.books.models import Author, Book, Country, Genre


@registry.register_document
class BookDocument(Document):
    genre = fields.ObjectField(
        properties={
            "name": fields.TextField(),
        }
    )
    country = fields.NestedField(
        properties={
            "name": fields.TextField(),
        }
    )
    author = fields.NestedField(
        properties={
            "name": fields.TextField(),
        }
    )

    class Index:
        name = "books"

    class Django:
        model = Book
        fields = [
            "title",
            "description",
            "year",
            "rating",
        ]

        related_models = [Genre, Country, Author]

    def get_queryset(self):
        return super().get_queryset().select_related("genre", "author", "country")

    def get_instances_from_related(self, related_instance):
        if isinstance(related_instance, Genre):
            return related_instance.genres.all()
        elif isinstance(related_instance, Country):
            return related_instance.countries.all()
        elif isinstance(related_instance, Author):
            return related_instance.authors.all()
        else:
            return []

Import Statements:

We import necessary modules and classes from django_elasticsearch_dsl and our Django models.

Document Definition:

We define a BookDocument class which inherits from Document, provided by django_elasticsearch_dsl.

Registry Registration:

We register the BookDocument class with the registry using the @registry.register_document decorator. This tells the Elasticsearch DSL library to manage this document.

Index Configuration:

We specify the name of the Elasticsearch index for this document as "books". This index name should be unique within the Elasticsearch cluster.

Django Model Configuration:

Under the Django class nested within BookDocument, we link the document to the Django model (Book) and specify which fields of the model should be indexed.

Fields Mapping:

Inside the BookDocument class, we define fields for the Elasticsearch document. These fields map to the fields in the Django model. Some fields, such as genre, country, and author, are nested objects.

Related Models Handling:

We specify related models (Genre, Country, Author) that should be indexed along with the Book model. For each related model, we define how to retrieve instances related to the main model. This involves specifying which fields to index from related models.

Queryset Configuration:

We override the get_queryset method to specify how the queryset should be retrieved. In this case, we use select_related to fetch related objects efficiently.

Instances from Related:

We define the get_instances_from_related method to handle instances from related models. This method is used to retrieve instances related to the main model for indexing purposes.

Views



import copy
from abc import abstractmethod

from elasticsearch_dsl import Document, Q
from rest_framework.decorators import action
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.request import Request
from rest_framework.response import Response
from rest_framework.viewsets import ModelViewSet

from elastic_search.books.documents import BookDocument
from elastic_search.books.models import Book
from elastic_search.books.serializers import BookSerializer


class PaginatedElasticSearchAPIView(ModelViewSet, LimitOffsetPagination):
    document_class: Document = None

    @abstractmethod
    def generate_search_query(self, search_terms_list, param_filters):
        """This method should be overridden
        and return a Q() expression."""

    @action(methods=["GET"], detail=False)
    def search(self, request: Request):
        try:
            params = copy.deepcopy(request.query_params)
            search_terms = params.pop("search", None)
            query = self.generate_search_query(
                search_terms_list=search_terms, param_filters=params
            )

            search = self.document_class.search().query(query)
            response = search.to_queryset()

            results = self.paginate_queryset(response)
            serializer = self.serializer_class(results, many=True)

            return self.get_paginated_response(serializer.data)
        except Exception as e:
            return Response(e, status=500)


class BookViewSet(PaginatedElasticSearchAPIView):
    serializer_class = BookSerializer
    queryset = Book.objects.all()
    document_class = BookDocument

    def generate_search_query(self, search_terms_list: list[str], param_filters: dict):
        if search_terms_list is None:
            return Q("match_all")
        search_terms = search_terms_list[0].replace("\x00", "")
        search_terms.replace(",", " ")
        search_fields = ["title", "description"]
        filter_fields = ["year", "rating"]
        query = Q("multi_match", query=search_terms, fields=search_fields, fuzziness="auto")

        wildcard_query = Q(
            "bool",
            should=[
                Q("wildcard", **{field: f"*{search_terms.lower()}*"}) for field in search_fields
            ],
        )
        query = query | wildcard_query

        if len(param_filters) > 0:
            filters = []
            for field in filter_fields:
                if field in param_filters:
                    filters.append(Q("term", **{field: param_filters[field]}))
            filter_query = Q("bool", should=[query], filter=filters)
            query = query & filter_query

        return query

Structure

The PaginatedElasticSearchAPIView class has two important methods. The generate search query method has an abstractmethod decorator which means that any class that inherits it has to implement said method.

The other search method adds an endpoint search that accepts a get request and handles the search functionality . It copies the parameters from the URL and then passes the parameters to the generate search query function. The function should return an Elasticsearch Query which will be searched from and then converted to a queryset. The queryset will be paginated over and returned to the user.

In a production app, i would recommend handling the exception by logging the error and defaulting to use Django Rest Framework's search so at the least our search will always work.

Implementation

In the BookViewSet, we provide the document that we will execute the search on.

We also implement the abstract method. Let us explain the query one by one.

Input Parameters:

search_terms_list: These are the words or phrases a user types into the search bar when looking for a book.

param_filters: These are additional filters or conditions a user might want to apply to narrow down their search, like searching only for books published in a certain year or have a certain rating.

Understanding the Search Process:

If the user doesn't provide any search terms, it means they want to see all the books available. So, we create a "match-all" query to fetch all books.

If the user provides search terms, we want to look for those terms in specific fields of our books, like title or description. We also want to be flexible with our search, allowing for slight misspellings or variations in the search terms. That's where the "fuzziness" parameter comes into play. It helps us find similar words even if the user misspells something.

Additionally, we might want to support wildcard searches, where a user can use placeholders like '' to match any characters. For example, searching for 'hist' would match 'history', 'historic', etc.

If there are any filter parameters provided, we want to apply those filters to our search results. For example, if a user wants to see only books published in the year 2022, we want to include that condition in our search.

Constructing the Query:

We use the Elasticsearch DSL (Domain-Specific Language) to construct our search query. This query is like a set of instructions written in a language Elasticsearch understands.
We build our query step by step, considering all the different scenarios mentioned above.
We use the Q class from Elasticsearch DSL to create different parts of our query, such as match queries, wildcard queries, and filter queries.
Finally, we combine all these parts to form a comprehensive search query that captures both the user's search terms and any additional filters they might have applied.

Output:

The method returns the constructed search query, ready to be executed against our Elasticsearch index.
This query will fetch the relevant books based on the user's search terms and filters, providing them with accurate and tailored search results.

URLS

We now setup this up in our urls.py file;



from rest_framework.routers import SimpleRouter

from elastic_search.books import views

router = SimpleRouter()

router.register("books", views.BookViewSet)

urlpatterns = router.urls

Data

We need data to search against and thus there is a factories.py file that will populate the data for us in a db.

First lets create a database;
Set up postgres and run the following commands;



sudo -u postgres psql
DROP USER IF EXISTS elastic;

CREATE USER elastic WITH CREATEDB CREATEROLE SUPERUSER LOGIN PASSWORD 'elastic';

DROP DATABASE IF EXISTS elastic;

CREATE DATABASE elastic WITH OWNER postgres;

GRANT ALL ON DATABASE elastic TO elastic;
\q

Populate the data in the db;



python manage.py generate_test_data 1000

This will create a large dataset for us to run our queries against

Set up elastic search

Run the following to start a local elasticsearch instance with docker



docker run --rm --name elasticsearch_container -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.10.2

Populate index

Now we can populate our index to test out the application



python manage.py search_index --rebuild

Query Time!!!

Start the server



python manage.py runserver

Head to postman or any API testing platform of your choice

Our base query will be this



http://localhost:8000/api/books/search/

A get request is shown below;

lets make a query for a movie with consumer;

Lets misspell consumer, we get same result

lets test the filter;

Conclusion

We have implemented elastic search and tested it live. We have got expected results. There exists other queries such as nested queries that can be added to include author and country into the search and filters but they are out of the scope of this tutorial. In a future article, i may add them. However, in our next article, we will add a CI/CD pipeline that can be used to test our application.

DEV Community