Hasini Sivaram

Posted on Apr 27

The Brain Behind Intelligent AI: MongoDB Meets Zero-Shot Learning

#mongodb #ai #nlp #python

Team Members

This project was developed by:

@hasini_sivaram
@pothineni_rahinisai_
@suvidha_sreenichenametla
@bavuna_aashritha
We would like to express our sincere gratitude to @chanda_rajkumar for his valuable guidance and support throughout this project. His insights into system design, architecture, and development played a crucial role in shaping our College ERP(Text classification using zero-shot).

Introduction

Artificial Intelligence systems today are powerful—but they often come with a limitation. Most models are trained to perform well only within predefined categories. The moment a new or unseen input appears, their performance drops. This becomes a serious issue in real-world applications like a college ERP system, where user queries are dynamic and unpredictable.

Our project was built to solve exactly this challenge. Instead of relying solely on traditional prediction-based models, we designed a system that not only classifies unseen text using Zero-Shot Learning but also remembers past interactions, analyzes patterns, and provides meaningful insights. The goal was simple yet ambitious: to build a system that behaves intelligently over time, not just instantly.

The Problem We Set Out to Solve

While working with AI systems, we identified a key limitation: models perform well within predefined boundaries but struggle with unfamiliar inputs. In dynamic environments like a College ERP, where queries constantly change, this makes traditional classification less effective.

Although zero-shot learning helps handle unseen inputs, it is not enough. The system does not retain past interactions, compare similar queries, or provide meaningful insights, and often behaves like a black box. These gaps showed the need for a system that learns from experience, builds context, and evolves over time.

How We Designed the Solution

To address these challenges, we built a Zero-Shot Classification system integrated with MongoDB. Instead of using the database only for storage, we made it a central part of the system’s intelligence, where data actively enhances functionality at every stage.

MongoDB serves multiple roles in this architecture: as a memory layer to store past queries and predictions, a search engine for efficient retrieval, an analytics engine for generating insights, and an explainability layer to provide context behind outputs.

This approach shifts the system from being model-centric to system-centric, enabling it to learn, adapt, and evolve over time.

The System Flow

A user query is first converted into embeddings, then processed through search and classification layers, stored for future use, and finally analyzed to generate insights.
This pipeline connects AI with data in a way that enables real-world usability.

MongoDB at the Core

Choosing MongoDB early proved to be crucial for our system. AI applications handle diverse and unstructured data—such as queries, embeddings, predictions, and analytics—which are difficult to manage with rigid relational schemas.

MongoDB’s flexible document model allowed us to store varied data easily, adapt to changes, and scale as needed. Its smooth integration with AI pipelines simplified development, letting us focus on building features instead of redesigning the database.

Full-Text Search — Fast Retrieval Layer

Before applying intelligence, the system must quickly identify relevant data.

db.predictions.createIndex({ query_text: "text" });

db.predictions.find({
  $text: { $search: "attendance below 75" }
});

In a College ERP system, this is useful when:

Students search for attendance-related queries
The system needs to filter relevant records quickly
Full-text search acts as a first-level filter, reducing search space and improving response time.

Vector Search — Core Intelligence

This is where the system becomes truly intelligent. Unlike keyword-based search, vector search understands the meaning behind queries. For example, “Show my attendance less than 75” and “attendance below 75%” may differ in wording but share the same intent, which the system recognizes.

Each query is converted into an embedding that captures its meaning, and is compared with stored vectors to retrieve results based on semantic similarity rather than exact words.

This enables better understanding, more accurate classification, and flexible handling of diverse queries.

db.predictions.aggregate([
  {
    $vectorSearch: {
      index: "embedding_index",
      path: "embedding",
      queryVector: [/* user query embedding */],
      numCandidates: 100,
      limit: 5
    }
  }
]);

Aggregation Pipeline — Analytics Engine

Once data is stored, its value comes from analysis. The aggregation pipeline transforms raw data into meaningful insights instead of just storing it. In a College ERP system, it helps identify students with low attendance, analyze query trends, and monitor system usage, turning raw data into useful and actionable information.

db.predictions.aggregate([
  {
    $facet: {
label)
      categoryDistribution: [
        {
          $group: {
            _id: "$predicted_label",
            count: { $sum: 1 }
          }
        },
        {
          $sort: { count: -1 }
        }
      ],
      attendanceStatus: [
        {
          $match: { predicted_label: "Attendance" }
        },
        {
          $project: {
            student_id: 1,
            attendance: "$value",
            status: {
              $cond: {
                if: { $lt: ["$value", 75] },
                then: "Low",
                else: "Safe"
              }
            }
          }
        }
      ],
    }
  }
]);

$facet — Multi-Analysis Optimization

In real-world systems, multiple insights are often needed at once. Instead of running separate queries, the $facet operator performs multiple analyses in a single operation, such as calculating attendance distribution, low-attendance counts, and average attendance.

This improves performance and efficiency while enabling faster, real-time updates for dashboards.

db.predictions.aggregate([
  {
    $facet: {
      categoryDistribution: [
        {
          $group: {
            _id: "$predicted_label",
            count: { $sum: 1 }
          }
        }
      ],
      averageConfidence: [
        {
          $group: {
            _id: null,
            avgConfidence: { $avg: "$confidence_score" }
          }
        }
      ]
    }
  }
]);

$lookup — Explainability Layer

AI systems often act like black boxes, showing results without clear reasoning. Using the $lookup operator, we connect predictions with additional context, making outputs more meaningful with explanations.

For example, for the query “Show my attendance less than 75”, the system returns the value along with status and an explanation that it is below the minimum requirement.

This improves transparency, builds trust, and enhances usability.

db.predictions.aggregate([
  {
    $lookup: {
      from: "labels",
      localField: "predicted_label",
      foreignField: "label_name",
      as: "label_details"
    }
  },
  {
    $unwind: "$label_details"
  }
]);

Example labels collection:

{
  label_name: "Attendance",
  description: "Student attendance details and thresholds",
  threshold: 75
}

Indexing — Performance Optimization

As data grows, maintaining performance becomes critical. By using full-text indexing and vector indexing, the system ensures fast query execution and efficient data retrieval. This not only supports scalability but also provides a smooth user experience. Such optimization is essential in ERP systems, where data continuously increases over time.

db.predictions.createIndex({ query_text: "text" });

db.predictions.createIndex({ predicted_label: 1 });

db.predictions.createIndex({ predicted_label: 1, value: 1 });

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 768,
      "similarity": "cosine"
    }
  ]
}

How MongoDB Fits Perfectly in This Project

Feature	Role	Benefit
NoSQL	Flexible schema	Handles dynamic AI outputs
JSON Storage	Native format	Easy integration
Vector Search	Semantic matching	Intelligent system
Aggregation	Data analysis	Real-time insights
Indexing	Performance	Scalability

It is not just a database—it is a core part of the system architecture.

Advantages of Using MongoDB in the Right Places

Storage Layer

Stores queries, embeddings, and predictions
Acts as system memory

Search Layer

Combines full-text and vector search
Provides both keyword and semantic matching

Analytics Layer

Aggregation and $facet
Generates real-time insights

Explainability Layer

$lookup operations
Connects outputs with meaningful explanations

A Real Example: College ERP Query Flow

For the query “Show my attendance less than 75”, the system converts the input into embeddings, performs vector search to find similar queries, and classifies it under “Attendance.” It then retrieves the relevant data, stores the result, updates analytics, and displays the output.

The response includes the attendance value, its status (below threshold), and an actionable insight, such as suggesting additional classes. This shows how AI and MongoDB work together to create a complete and intelligent system.

What This System Achieves

By combining Zero-Shot Learning with MongoDB, the system delivers:

Semantic understanding
Efficient data processing
Real-time analytics
Explainable AI

What We Learned

Building this system highlighted an important insight:
AI models alone are not enough.
Real-world systems require:

Memory
Context
Analytics
Explainability
The real value comes from combining AI with the right data system.

What’s Next

Improved semantic search
Enhanced analytics dashboards
Real-time processing pipelines
Scalable deployment

Conclusion

This is not just a classification system, but a complete intelligent system designed to go beyond simple predictions. It is capable of remembering past interactions, understanding user intent, analyzing data for insights, and evolving over time.
At the core of this transformation is MongoDB, which enables the system to function as a cohesive and continuously improving solution.

GitHub Repository :https://github.com/nsree0507/PFSD_Team7.git
Demo video :