Atlas Search for the Gutenberg Project

Timon Vogel — Wed, 12 Jan 2022 13:52:51 +0000

Overview of My Submission

Atlas Search is one of MongoDB's most powerful features.
The choice for a project was to build a database cluster and use Atlas to search it.

But what information should we gather for our MongoDB?
Project Gutenberg has been chosen as the project's source for data.
Why? Because everyone loves books and free stuff!

Here's the code: https://github.com/timonvogel/gutenberg-search

The Web Application

The web application is a straightforward Python Flask app. It routes just the index page and two simple templates. On it is a search box with the results below it.

Dealing with MongoDB and Atlas Search, which I will cover in the next parts, was the most interesting part.

MongoDB Gutenberg Cluster

I began by following MongoDB's excellent tutorial offered by MongoDB docs.atlas.mongodb.com. After playing around with a test cluster, I went ahead and built the real cluster as well as a user with write access.

It was rather straightforward, particularly in terms of connecting to the cluster using pymongo and the connection string.

The connection string was kept in a separate file called _secrets.py:

import pymongo
import _secrets

client = pymongo.MongoClient(_secrets.connection_string)

books = client.gutenberg.books

This is how the application accesses the Gutenberg cluster.

Populating with Data

There is a handy repository for downloading the whole Gutenberg Project database: https://github.com/pgcorpus/gutenberg.

The metadata I needed was obtained by running the get_data.py script.
Then it was just a matter of writing a little script to parse the csv data and push it to my new Gutenberg cluster.

for row in csv_it:

  if len(books_buffer) > BOOKS_BUFF_LEN:
    books.insert_many(books_buffer)

  book_info = {
    "book_id":row[0],
    "title":row[1],
    "author":insert_author((row[2], row[3], row[4])),
    "language":row[5],
    "subjects":row[7],
  }

  books_buffer.append(book_info)

Note how the whole buffer is inserted with just one command: books.insert_many(books_buffer)

Script: https://github.com/timonvogel/gutenberg-search/blob/main/metadata_to_mongodb.py

Fetching Gutenberg Books

When a user submits the search form the value is saved in a URL parameter that is visible to the server. It is then supplied to the atlas_search function where the Atlas Search is performed. The code looks like the following:

results = books.aggregate([
    {
        '$search': {
            'index': 'default',
            'text': {
                'query': search_term,
                'path': {
                  'wildcard': '*'
                }
            }
        }
    }, {
        '$limit': 20
    }, {
        '$project': {
            "title": 1,
            "author": 1,
            "book_id": 1,
            "subjects": 1,
            "_id": 0
        }
  }
])

The search term was a little tough to get correct, however the Atlas Search documentation helped me with some examples: https://docs.atlas.mongodb.com/atlas-search/index-definitions/

In the example above the search index default is used and the query string is stored in the variable search_term. The other important thing is the path field since it controls which data fields Atlas Search will index. I ended up with many blank responses because I messed this field up in the beginning.

Putting it together

Everything appeared to be ready when the Atlas query was implemented!
MongoDB was ready to provide the data, and the web application was ready to display the results.

In the search.html template, I programmed a simple results display, making sure it doesn't allow any invalid inputs and can handle connections.

Lessons learned

Is there anything I've learned from it?
Without a doubt!
Once you've mastered the fundamentals of MongoDB (and there isn't much to learn), you'll be tempted to use it for your next project instead of, say, MySQL, which requires you to deal with datatypes and sophisticated query expressions.
MondoDB is a lot easier to use, which I really appreciate.

I'm also glad to have Python Flask on hand, which helps me to quickly construct simple web applications.
This allowed me to concentrate on the most crucial aspect of the project, MongoDB and Atlas Search.

During this endeavor, I also found the MongoDB web interface. It came in handy in a variety of ways, but especially when it came to testing Atlas Search queries.

https://github.com/timonvogel/gutenberg-search

Submission Category:

This would be the own adventure thing though it's Atlas Search.

Link to Code

timonvogel / gutenberg-search

Web application to search the Gutenberg Project's database, made with Python Flask and MongoDB

Gutenberg Search

A simple MonogDB web application.

About

This is a straightforward search interface for the Project Gutenberg database. It features a more appealing look than the original gutenberg.org website.

The data is stored in a MongoDB cluster and was retrieved using the scripts from the following repository: github.com/pgcorpus/gutenberg

The stack of this application can be summarized as follows:
docker-container{ python-flask --> uwsgi --> nginx --> :80 }.

The server connects to the MongoDB cluster perform an Atlas Search query for each response.

Installation

Install the python modules flask and pymongo.
pip: pip install flask pymongo
Clone this repo and follow the Development and deployment section.

Creating a Gutenberg MongoDB cluster

The result of this step is publicly available. To find the cluster and access credentials, look through the source code.

If you want to reproduce this work, follow these steps: :

git clone github.com/pgcorpus/gutenberg
Run python get_data.py…

View on GitHub

DEV Community: Timon Vogel