<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Timon Vogel</title>
    <description>The latest articles on DEV Community by Timon Vogel (@timonvogel).</description>
    <link>https://dev.to/timonvogel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F791132%2Fa499d1cd-7947-4668-88b1-37b8152bbae8.png</url>
      <title>DEV Community: Timon Vogel</title>
      <link>https://dev.to/timonvogel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/timonvogel"/>
    <language>en</language>
    <item>
      <title>Atlas Search for the Gutenberg Project</title>
      <dc:creator>Timon Vogel</dc:creator>
      <pubDate>Wed, 12 Jan 2022 13:52:51 +0000</pubDate>
      <link>https://dev.to/timonvogel/mongodb-gutenberg-project-4o9j</link>
      <guid>https://dev.to/timonvogel/mongodb-gutenberg-project-4o9j</guid>
      <description>&lt;h3&gt;
  
  
  Overview of My Submission
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Atlas Search&lt;/strong&gt; is one of MongoDB's most powerful features.&lt;br&gt;
The choice for a project was to build a database cluster and use Atlas to search it.&lt;/p&gt;

&lt;p&gt;But what information should we gather for our MongoDB?&lt;br&gt;
Project Gutenberg has been chosen as the project's source for data.&lt;br&gt;
Why? Because everyone loves books and free stuff!&lt;/p&gt;

&lt;p&gt;Here's the code: &lt;a href="https://github.com/timonvogel/gutenberg-search"&gt;https://github.com/timonvogel/gutenberg-search&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DVz_XEcf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rxwed7oo6pi0strh17kb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DVz_XEcf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rxwed7oo6pi0strh17kb.png" alt="Screenshot of the website" width="880" height="457"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Web Application
&lt;/h3&gt;

&lt;p&gt;The web application is a straightforward &lt;strong&gt;Python Flask&lt;/strong&gt; app. It routes just the index page and two simple templates. On it is a search box with the results below it.&lt;/p&gt;

&lt;p&gt;Dealing with MongoDB and Atlas Search, which I will cover in the next parts, was the most interesting part.&lt;/p&gt;
&lt;h3&gt;
  
  
  MongoDB Gutenberg Cluster
&lt;/h3&gt;

&lt;p&gt;I began by following MongoDB's excellent tutorial offered by MongoDB &lt;a href="//docs.atlas.mongodb.com"&gt;docs.atlas.mongodb.com&lt;/a&gt;. After playing around with a test cluster, I went ahead and built the real cluster as well as a user with write access.&lt;/p&gt;

&lt;p&gt;It was rather straightforward, particularly in terms of connecting to the cluster using pymongo and the connection string.&lt;/p&gt;

&lt;p&gt;The connection string was kept in a separate file called _secrets.py:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pymongo
import _secrets

client = pymongo.MongoClient(_secrets.connection_string)

books = client.gutenberg.books
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is how the application accesses the Gutenberg cluster.&lt;/p&gt;
&lt;h3&gt;
  
  
  Populating with Data
&lt;/h3&gt;

&lt;p&gt;There is a handy repository for downloading the whole Gutenberg Project database: &lt;a href="https://github.com/pgcorpus/gutenberg"&gt;https://github.com/pgcorpus/gutenberg&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The metadata I needed was obtained by running the get_data.py script.&lt;br&gt;
Then it was just a matter of writing a little script to parse the csv data and push it to my new Gutenberg cluster.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for row in csv_it:

  if len(books_buffer) &amp;gt; BOOKS_BUFF_LEN:
    books.insert_many(books_buffer)

  book_info = {
    "book_id":row[0],
    "title":row[1],
    "author":insert_author((row[2], row[3], row[4])),
    "language":row[5],
    "subjects":row[7],
  }

  books_buffer.append(book_info)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Note how the whole buffer is inserted with just one command: &lt;code&gt;books.insert_many(books_buffer)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Script: &lt;a href="https://github.com/timonvogel/gutenberg-search/blob/main/metadata_to_mongodb.py"&gt;https://github.com/timonvogel/gutenberg-search/blob/main/metadata_to_mongodb.py&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Fetching Gutenberg Books
&lt;/h3&gt;

&lt;p&gt;When a user submits the search form the value is saved in a URL parameter that is visible to the server. It is then supplied to the &lt;code&gt;atlas_search&lt;/code&gt; function where the Atlas Search is performed. The code looks like the following:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;results = books.aggregate([
    {
        '$search': {
            'index': 'default',
            'text': {
                'query': search_term,
                'path': {
                  'wildcard': '*'
                }
            }
        }
    }, {
        '$limit': 20
    }, {
        '$project': {
            "title": 1,
            "author": 1,
            "book_id": 1,
            "subjects": 1,
            "_id": 0
        }
  }
])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The search term was a little tough to get correct, however the Atlas Search documentation helped me with some examples: &lt;a href="https://docs.atlas.mongodb.com/atlas-search/index-definitions/"&gt;https://docs.atlas.mongodb.com/atlas-search/index-definitions/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above the search index &lt;code&gt;default&lt;/code&gt; is used and the query string is stored in the variable &lt;code&gt;search_term&lt;/code&gt;. The other important thing is the &lt;code&gt;path&lt;/code&gt; field since it controls which data fields Atlas Search will index. I ended up with many blank responses because I messed this field up in the beginning.&lt;/p&gt;
&lt;h3&gt;
  
  
  Putting it together
&lt;/h3&gt;

&lt;p&gt;Everything appeared to be ready when the Atlas query was implemented!&lt;br&gt;
MongoDB was ready to provide the data, and the web application was ready to display the results.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;search.html&lt;/code&gt; template, I programmed a simple results display, making sure it doesn't allow any invalid inputs and can handle connections. &lt;/p&gt;
&lt;h3&gt;
  
  
  Lessons learned
&lt;/h3&gt;

&lt;p&gt;Is there anything I've learned from it?&lt;br&gt;
Without a doubt!&lt;br&gt;
Once you've mastered the fundamentals of MongoDB (and there isn't much to learn), you'll be tempted to use it for your next project instead of, say, MySQL, which requires you to deal with datatypes and sophisticated query expressions.&lt;br&gt;
MondoDB is a lot easier to use, which I really appreciate.&lt;/p&gt;

&lt;p&gt;I'm also glad to have Python Flask on hand, which helps me to quickly construct simple web applications.&lt;br&gt;
This allowed me to concentrate on the most crucial aspect of the project, MongoDB and Atlas Search.&lt;/p&gt;

&lt;p&gt;During this endeavor, I also found the MongoDB web interface. It came in handy in a variety of ways, but especially when it came to testing Atlas Search queries. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/timonvogel/gutenberg-search"&gt;https://github.com/timonvogel/gutenberg-search&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Submission Category:
&lt;/h3&gt;

&lt;p&gt;This would be the &lt;strong&gt;own adventure&lt;/strong&gt; thing though it's Atlas Search.&lt;/p&gt;
&lt;h3&gt;
  
  
  Link to Code
&lt;/h3&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--566lAguM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/timonvogel"&gt;
        timonvogel
      &lt;/a&gt; / &lt;a href="https://github.com/timonvogel/gutenberg-search"&gt;
        gutenberg-search
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Web application to search the Gutenberg Project's database, made with Python Flask and MongoDB
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/timonvogel/gutenberg-search./website-demo.png"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_8bFyaZl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/timonvogel/gutenberg-search./website-demo.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
Gutenberg Search&lt;/h1&gt;
&lt;p&gt;A simple MonogDB web application.&lt;/p&gt;
&lt;h2&gt;
About&lt;/h2&gt;
&lt;p&gt;This is a straightforward search interface for the Project Gutenberg database. It features a more appealing look than the original &lt;a href="https://gutenberg.org" rel="nofollow"&gt;gutenberg.org&lt;/a&gt; website.&lt;/p&gt;
&lt;p&gt;The data is stored in a MongoDB cluster and was retrieved using the scripts from the following repository: &lt;a href="https://github.com/pgcorpus/gutenberg"&gt;github.com/pgcorpus/gutenberg&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The stack of this application can be summarized as follows:&lt;br&gt;
&lt;code&gt;docker-container{ python-flask --&amp;gt; uwsgi --&amp;gt; nginx --&amp;gt; :80 }&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The server connects to the MongoDB cluster perform an &lt;strong&gt;Atlas Search&lt;/strong&gt; query for each response.&lt;/p&gt;
&lt;h2&gt;
Installation&lt;/h2&gt;
&lt;p&gt;Install the python modules &lt;code&gt;flask&lt;/code&gt; and &lt;code&gt;pymongo&lt;/code&gt;.&lt;br&gt;
pip: &lt;code&gt;pip install flask pymongo&lt;/code&gt;&lt;br&gt;
Clone this repo and follow the Development and deployment section.&lt;/p&gt;
&lt;h2&gt;
Creating a Gutenberg MongoDB cluster&lt;/h2&gt;
&lt;p&gt;The result of this step is publicly available. To find the cluster and access credentials, look through the source code.&lt;/p&gt;
&lt;p&gt;If you want to reproduce this work, follow these steps: :&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone github.com/pgcorpus/gutenberg&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;python get_data.py&lt;/code&gt;…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/timonvogel/gutenberg-search"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



</description>
      <category>atlashackathon</category>
    </item>
  </channel>
</rss>
