DEV Community

Cover image for The Brain Behind Intelligent AI: MongoDB Meets Zero-Shot Learning
Hasini Sivaram
Hasini Sivaram

Posted on

The Brain Behind Intelligent AI: MongoDB Meets Zero-Shot Learning

Team Members

This project was developed by:

Introduction

Artificial Intelligence systems today are powerful—but they often come with a limitation. Most models are trained to perform well only within predefined categories. The moment a new or unseen input appears, their performance drops. This becomes a serious issue in real-world applications like a college ERP system, where user queries are dynamic and unpredictable.

Our project was built to solve exactly this challenge. Instead of relying solely on traditional prediction-based models, we designed a system that not only classifies unseen text using Zero-Shot Learning but also remembers past interactions, analyzes patterns, and provides meaningful insights. The goal was simple yet ambitious: to build a system that behaves intelligently over time, not just instantly.

The Problem We Set Out to Solve

While working with AI systems, we identified a key limitation: models perform well within predefined boundaries but struggle with unfamiliar inputs. In dynamic environments like a College ERP, where queries constantly change, this makes traditional classification less effective.

Although zero-shot learning helps handle unseen inputs, it is not enough. The system does not retain past interactions, compare similar queries, or provide meaningful insights, and often behaves like a black box. These gaps showed the need for a system that learns from experience, builds context, and evolves over time.

How We Designed the Solution

To address these challenges, we built a Zero-Shot Classification system integrated with MongoDB. Instead of using the database only for storage, we made it a central part of the system’s intelligence, where data actively enhances functionality at every stage.

MongoDB serves multiple roles in this architecture: as a memory layer to store past queries and predictions, a search engine for efficient retrieval, an analytics engine for generating insights, and an explainability layer to provide context behind outputs.

This approach shifts the system from being model-centric to system-centric, enabling it to learn, adapt, and evolve over time.

The System Flow

A user query is first converted into embeddings, then processed through search and classification layers, stored for future use, and finally analyzed to generate insights.
This pipeline connects AI with data in a way that enables real-world usability.

MongoDB at the Core

Choosing MongoDB early proved to be crucial for our system. AI applications handle diverse and unstructured data—such as queries, embeddings, predictions, and analytics—which are difficult to manage with rigid relational schemas.

MongoDB’s flexible document model allowed us to store varied data easily, adapt to changes, and scale as needed. Its smooth integration with AI pipelines simplified development, letting us focus on building features instead of redesigning the database.

Full-Text Search — Fast Retrieval Layer

Before applying intelligence, the system must quickly identify relevant data.

db.predictions.createIndex({ query_text: "text" });

db.predictions.find({
  $text: { $search: "attendance below 75" }
});

Enter fullscreen mode Exit fullscreen mode

In a College ERP system, this is useful when:

  • Students search for attendance-related queries

  • The system needs to filter relevant records quickly
    Full-text search acts as a first-level filter, reducing search space and improving response time.

Vector Search — Core Intelligence

This is where the system becomes truly intelligent. Unlike keyword-based search, vector search understands the meaning behind queries. For example, “Show my attendance less than 75” and “attendance below 75%” may differ in wording but share the same intent, which the system recognizes.

Each query is converted into an embedding that captures its meaning, and is compared with stored vectors to retrieve results based on semantic similarity rather than exact words.

This enables better understanding, more accurate classification, and flexible handling of diverse queries.

db.predictions.aggregate([
  {
    $vectorSearch: {
      index: "embedding_index",
      path: "embedding",
      queryVector: [/* user query embedding */],
      numCandidates: 100,
      limit: 5
    }
  }
]);
Enter fullscreen mode Exit fullscreen mode

Aggregation Pipeline — Analytics Engine

Once data is stored, its value comes from analysis. The aggregation pipeline transforms raw data into meaningful insights instead of just storing it. In a College ERP system, it helps identify students with low attendance, analyze query trends, and monitor system usage, turning raw data into useful and actionable information.

db.predictions.aggregate([
  {
    $facet: {
label)
      categoryDistribution: [
        {
          $group: {
            _id: "$predicted_label",
            count: { $sum: 1 }
          }
        },
        {
          $sort: { count: -1 }
        }
      ],
      attendanceStatus: [
        {
          $match: { predicted_label: "Attendance" }
        },
        {
          $project: {
            student_id: 1,
            attendance: "$value",
            status: {
              $cond: {
                if: { $lt: ["$value", 75] },
                then: "Low",
                else: "Safe"
              }
            }
          }
        }
      ],
    }
  }
]);

Enter fullscreen mode Exit fullscreen mode

$facet — Multi-Analysis Optimization

In real-world systems, multiple insights are often needed at once. Instead of running separate queries, the $facet operator performs multiple analyses in a single operation, such as calculating attendance distribution, low-attendance counts, and average attendance.

This improves performance and efficiency while enabling faster, real-time updates for dashboards.

db.predictions.aggregate([
  {
    $facet: {
      categoryDistribution: [
        {
          $group: {
            _id: "$predicted_label",
            count: { $sum: 1 }
          }
        }
      ],
      averageConfidence: [
        {
          $group: {
            _id: null,
            avgConfidence: { $avg: "$confidence_score" }
          }
        }
      ]
    }
  }
]);
Enter fullscreen mode Exit fullscreen mode

$lookup — Explainability Layer

AI systems often act like black boxes, showing results without clear reasoning. Using the $lookup operator, we connect predictions with additional context, making outputs more meaningful with explanations.

For example, for the query “Show my attendance less than 75”, the system returns the value along with status and an explanation that it is below the minimum requirement.

This improves transparency, builds trust, and enhances usability.

db.predictions.aggregate([
  {
    $lookup: {
      from: "labels",
      localField: "predicted_label",
      foreignField: "label_name",
      as: "label_details"
    }
  },
  {
    $unwind: "$label_details"
  }
]);
Enter fullscreen mode Exit fullscreen mode

Example labels collection:

{
  label_name: "Attendance",
  description: "Student attendance details and thresholds",
  threshold: 75
}
Enter fullscreen mode Exit fullscreen mode

Indexing — Performance Optimization

As data grows, maintaining performance becomes critical. By using full-text indexing and vector indexing, the system ensures fast query execution and efficient data retrieval. This not only supports scalability but also provides a smooth user experience. Such optimization is essential in ERP systems, where data continuously increases over time.

db.predictions.createIndex({ query_text: "text" });

db.predictions.createIndex({ predicted_label: 1 });

db.predictions.createIndex({ predicted_label: 1, value: 1 });

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 768,
      "similarity": "cosine"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

How MongoDB Fits Perfectly in This Project

Feature Role Benefit
NoSQL Flexible schema Handles dynamic AI outputs
JSON Storage Native format Easy integration
Vector Search Semantic matching Intelligent system
Aggregation Data analysis Real-time insights
Indexing Performance Scalability

It is not just a database—it is a core part of the system architecture.

Advantages of Using MongoDB in the Right Places

Storage Layer

  • Stores queries, embeddings, and predictions

  • Acts as system memory

Search Layer

  • Combines full-text and vector search

  • Provides both keyword and semantic matching

Analytics Layer

  • Aggregation and $facet

  • Generates real-time insights

Explainability Layer

  • $lookup operations

  • Connects outputs with meaningful explanations

A Real Example: College ERP Query Flow

For the query “Show my attendance less than 75”, the system converts the input into embeddings, performs vector search to find similar queries, and classifies it under “Attendance.” It then retrieves the relevant data, stores the result, updates analytics, and displays the output.

The response includes the attendance value, its status (below threshold), and an actionable insight, such as suggesting additional classes. This shows how AI and MongoDB work together to create a complete and intelligent system.

What This System Achieves

By combining Zero-Shot Learning with MongoDB, the system delivers:

  • Semantic understanding

  • Efficient data processing

  • Real-time analytics

  • Explainable AI

What We Learned

Building this system highlighted an important insight:
AI models alone are not enough.
Real-world systems require:

  • Memory

  • Context

  • Analytics

  • Explainability
    The real value comes from combining AI with the right data system.

What’s Next

  • Improved semantic search

  • Enhanced analytics dashboards

  • Real-time processing pipelines

  • Scalable deployment

Conclusion

This is not just a classification system, but a complete intelligent system designed to go beyond simple predictions. It is capable of remembering past interactions, understanding user intent, analyzing data for insights, and evolving over time.
At the core of this transformation is MongoDB, which enables the system to function as a cohesive and continuously improving solution.

GitHub Repository :https://github.com/nsree0507/PFSD_Team7.git
Demo video :

Top comments (0)