DEV-AI

Posted on Jun 22, 2024

Implementing a Search Page with MongoDB and Elasticsearch

In this article, we will walk through the process of setting up a search page that leverages MongoDB and Elasticsearch to search across multiple referenced collections. This approach is particularly useful when dealing with complex data relationships and needing powerful search capabilities.

Prerequisites

MongoDB: Installed on your server or using a managed service like MongoDB Atlas.
Elasticsearch: Installed on your server or using a managed service like Elastic Cloud.
Node.js: For backend implementation.
Logstash: To sync data from MongoDB to Elasticsearch.

To effectively utilize Elasticsearch's powerful search capabilities, it's crucial to transform and synchronize data from MongoDB into a format suitable for indexing. This involves flattening nested data structures and ensuring that only relevant, indexable data is included. In this section, we'll cover the process of data exchange, transformation, and synchronization using Logstash.

Understanding Data Exchange

Data Extraction

MongoDB Source Configuration: Logstash can be configured to extract data directly from MongoDB. This involves setting up a connection to your MongoDB instance and specifying the collections to be monitored.

Data Transformation

Flattening Data: MongoDB collections often contain nested documents and references to other collections. To make this data indexable in Elasticsearch, we need to flatten these nested structures. Flattening involves merging related data from different collections into a single, cohesive document.
Example: Suppose we have the following collections:
- users: Contains user information.
- posts: Contains posts made by users, with each post referencing a user by user_id.
We need to transform these collections into a single document structure containing both post and user information.

Example Configuration with Logstash

Logstash Input Configuration

Logstash can be configured to read from MongoDB using the mongodb input plugin. Here's an example configuration:

input {
  mongodb {
    uri => 'mongodb://localhost:27017/mydatabase'
    placeholder_db_dir => '/opt/logstash-mongodb/'
    placeholder_db_name => 'logstash_sqlite.db'
    collection => 'posts'
    batch_size => 5000
  }
}

uri: Connection string for MongoDB.
placeholder_db_dir: Directory to store state information.
placeholder_db_name: Name of the SQLite database file to store state information.
collection: Name of the MongoDB collection to monitor.
batch_size: Number of documents to process in each batch.

Logstash Filter Configuration

To flatten the data and enrich posts with user information, we use the aggregate filter plugin:

filter {
  aggregate {
    task_id => "%{user_id}"
    code => "
      map['user'] ||= {}
      event.to_hash.each { |k, v| map['user'][k] = v }
    "
    push_previous_map_as_event => true
    timeout => 3
  }
}

task_id: A unique identifier for aggregating related data, in this case, user_id.
code: The script to enrich and flatten data.
push_previous_map_as_event: Ensures the aggregated data is pushed as a single event.
timeout: Time to wait before pushing the event.

Logstash Output Configuration

Finally, configure the output to send the transformed data to Elasticsearch:

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "posts_with_users"
  }
}

hosts: Elasticsearch server address.
index: Name of the Elasticsearch index to store the data.

Keeping Data Updated

Real-time Synchronization

Change Data Capture: Utilize MongoDB Change Streams to capture real-time changes in the MongoDB collections. This ensures that any updates, insertions, or deletions in MongoDB are reflected in Elasticsearch.
Logstash Configuration: Logstash, when configured with the appropriate input plugins, can continuously monitor MongoDB collections for changes and apply them to Elasticsearch.

Indexing Only Relevant Data

Selective Indexing: Focus on indexing only the fields that are relevant for search queries. This reduces the index size and improves search performance.
Example: If you are only interested in searching posts by content and user details, configure Logstash to only include post_content, user.name, and user.email in the events sent to Elasticsearch.

Example Logstash Pipeline

Here is a complete example of a Logstash pipeline that extracts, transforms, and loads data from MongoDB to Elasticsearch:

input {
  mongodb {
    uri => 'mongodb://localhost:27017/mydatabase'
    placeholder_db_dir => '/opt/logstash-mongodb/'
    placeholder_db_name => 'logstash_sqlite.db'
    collection => 'posts'
    batch_size => 5000
  }
}

filter {
  aggregate {
    task_id => "%{user_id}"
    code => "
      map['user'] ||= {}
      event.to_hash.each { |k, v| map['user'][k] = v }
    "
    push_previous_map_as_event => true
    timeout => 3
  }

  # Select only relevant fields
  mutate {
    remove_field => ["_id", "user_id"]
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index =>




## Step 1: Set Up MongoDB and Elasticsearch

### MongoDB Installation

Follow the [MongoDB installation guide](https://docs.mongodb.com/manual/installation/) for your operating system. Alternatively, you can use a managed service like MongoDB Atlas.

### Elasticsearch Installation

Follow the [Elasticsearch installation guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) for your operating system. Alternatively, you can use a managed service like Elastic Cloud.

## Step 2: Data Modeling and Indexing

### Identify Collections and Relationships

Assume we have two collections:

- `users`: Contains user information.
- `posts`: Contains posts made by users, with each post referencing a user.

### Flatten Data for Elasticsearch

Elasticsearch works best with denormalized (flattened) data. This means we need to create a single document structure that includes fields from both `users` and `posts`.

## Step 3: Sync Data from MongoDB to Elasticsearch

### Use a Data Sync Tool

To keep your Elasticsearch index updated with data from MongoDB, you can use tools like Logstash, Mongo-Connector, or custom scripts using MongoDB Change Streams.

### Example with Logstash

#### Install Logstash

Follow the [Logstash installation guide](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html) for your operating system.

#### Create a Logstash Configuration File

Here’s an example configuration that denormalizes data from `users` and `posts` collections into a single index:

plaintext
input {
mongodb {
uri => 'mongodb://localhost:27017/mydatabase'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'posts'
batch_size => 5000
}
}

filter {
# Enrich posts with user data
aggregate {
task_id => "%{user_id}"
code => "
map['user'] ||= {}
event.to_hash.each { |k, v| map['user'][k] = v }
"
push_previous_map_as_event => true
timeout => 3
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "posts_with_users"
}
}


#### Run Logstash

Start Logstash with your configuration file.

sh
bin/logstash -f logstash.conf


## Step 4: Index Data in Elasticsearch

Ensure that your data is indexed correctly in Elasticsearch. You can verify this by querying the Elasticsearch index:

sh
curl -X GET "localhost:9200/posts_with_users/_search?pretty"


## Step 5: Create the Search Page

### Backend Setup

#### Choose a Backend Framework

We will use Node.js for this example.

#### Install Elasticsearch Client

Install the Elasticsearch client library for Node.js:

sh
npm install @elastic/elasticsearch


#### Example Code

Create a file named `app.js` and add the following code:

javascript
const { Client } = require('@elastic/elasticsearch');
const express = require('express');
const app = express();

const client = new Client({ node: 'http://localhost:9200' });

async function search(query) {
const { body } = await client.search({
index: 'posts_with_users',
body: {
query: {
multi_match: {
query: query,
fields: ['post_content', 'user.name', 'user.email']
}
}
}
});
return body.hits.hits;
}

app.get('/search', async (req, res) => {
const query = req.query.q;
const results = await search(query);
res.json(results);
});

app.listen(3000, () => {
console.log('Server is running on port 3000');
});


#### Run the Server

Start your Node.js server:

sh
node app.js

Top comments (1)

EveCrystali • Nov 4 '24

unfortunately there is a typo in the markdown format making the article hard to read. But the subject is interesting

DEV Community