Building Neuro-Morph: A Python Full-Stack AI App Using MongoDB

#edgecomputing #backend #systemdesign #mongodb

Written By Mahendra, Dev Rajeev, Nihal
Guided By Chanda Rajkumar

WHY PYTHON END-TO-END?

When the Neuro-Morph team chose their language, the answer was deliberate: Python end-to-end. Python sits at a rare intersection — dominant in machine learning, mature in web frameworks, and deeply integrated with container orchestration tooling. A single language across every layer means minimal glue code, zero context-switching overhead, and a unified type system from the API boundary all the way down to the RL training loop.

TECH STACK OVERVIEW

API — FastAPI
Async REST endpoints, Pydantic validation, zero-config OpenAPI docs

ML — PyTorch
DQN RL agent training & inference with dynamic computation graphs

QUEUE — Celery + Redis
Async mutation task dispatch, retry logic, result backend

OPS — Docker + K8s
Minimal multi-stage images, Python operators via kopf

OBS — Prometheus + Grafana
Metrics scraping, alerting rules, real-time dashboards

CI/CD — GitHub Actions
Lint → test → build → push in under four minutes

TEST — Pytest + Hypothesis
**

[API LAYER] FastAPI in Production

The control plane backbone is FastAPI, chosen over Flask for three non-negotiable reasons: native async/await, automatic request validation via Pydantic, and zero-effort OpenAPI documentation. For a system that dispatches mutation commands with millisecond responsiveness, synchronous WSGI was simply not an option.
Mutation Endpoint

Below is the core endpoint that receives a trigger from the RL engine and dispatches it as an async Celery task:
from fastapi import FastAPI, HTTPException

`from pydantic import BaseModel
from app.tasks import dispatch_mutation

app = FastAPI(title='Neuro-Morph Control API', version='1.0')

class MutationRequest(BaseModel):
    action: str          # e.g. 'rotate_port', 'redeploy_container'
    target_service: str
    priority: int = 1

@app.post('/mutations/dispatch')
async def trigger_mutation(req: MutationRequest):
    if req.priority not in range(1, 4):
        raise HTTPException(status_code=422, detail='Priority must be 1-3')
    task = dispatch_mutation.delay(req.action, req.target_service)
    return {'task_id': task.id, 'status': 'queued'}`

Pydantic's BaseModel validates incoming JSON, converts types, and returns structured 422 errors — all with zero boilerplate. The endpoint is a pure async function, freeing the event loop while Celery handles the heavy lifting in the background worker pool.

[ML LAYER] The PyTorch RL Agent

The Reinforcement Learning engine is Neuro-Morph's most Python-native component. Built in PyTorch with a Deep Q-Network (DQN) architecture, the agent continuously learns the optimal mutation policy from environment interactions. PyTorch was chosen over TensorFlow for its dynamic computation graph, which makes rapid policy experimentation significantly easier.

RL Module Structure

neuro_morph/rl/
  ├── agent.py         # DQN: select_action(), learn(), update_target()
  ├── environment.py   # Gym-compatible: step(), reset(), observation_space
  ├── replay_buffer.py # Experience replay: push(), sample(batch_size)
  ├── reward.py        # Shaping: attack_blocked, service_uptime
  ├── train.py         # Episode runner, checkpoint saving
  └── serve.py         # Inference server: load model, expose predict(state)

The DQN Agent Core

`import torch, torch.nn as nn

class DQNAgent(nn.Module):
    def __init__(self, state_dim: int, action_dim: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(state_dim, 128), nn.ReLU(),
            nn.Linear(128, 128),       nn.ReLU(),
            nn.Linear(128, action_dim)
        )

    def forward(self, state: torch.Tensor) -> torch.Tensor:
        return self.net(state)   # Q-values for each action

    def select_action(self, state, epsilon=0.05):
        if torch.rand(1).item() < epsilon:
            return torch.randint(self.net[-1].out_features, (1,)).item()
        with torch.no_grad():
            return self.forward(state).argmax().item()`

The state vector encodes real-time metrics: connection anomaly scores, port scan frequency, CPU/memory deltas, and recent mutation history. The action space maps to concrete operations — port rotation, container redeploy, IP swap, API key refresh, TLS certificate cycle. Rewards balance two competing objectives: maximize attack disruption while minimizing legitimate traffic latency.

Dev Tip: Use torch.jit.script() to compile your trained DQN to TorchScript before serving — it cuts inference latency by 30–40% and removes the Python GIL bottleneck in high-throughput environments.

[TASK QUEUE] Celery + Redis

Mutation operations are never executed synchronously inside the API request lifecycle. Redeploying a container or rotating a port can take several seconds — blocking an async FastAPI handler would be wasteful and unpredictable. The solution: a Celery task queue backed by Redis, which decouples command dispatch from command execution.

` # app/tasks.py
from celery import Celery
from app.orchestrator import KubernetesOrchestrator

celery_app = Celery('neuro_morph', broker='redis://redis:6379/0')
k8s = KubernetesOrchestrator()

@celery_app.task(bind=True, max_retries=3, default_retry_delay=5)
def dispatch_mutation(self, action: str, service: str):
    try:
        if action == 'redeploy_container':
            k8s.rolling_restart(service)
        elif action == 'rotate_port':
            k8s.patch_service_port(service)
        return {'status': 'success', 'action': action}
    except Exception as exc:
        raise self.retry(exc=exc)`

Celery's bind=True pattern gives the task access to its own context for retry logic. With max_retries=3 and exponential backoff, transient Kubernetes API errors are handled gracefully without manual intervention. Redis serves double duty — broker for task messages and result backend for task status queries from the monitoring dashboard.

[INFRA] Docker, Kubernetes & Python Operators

Every Neuro-Morph service is packaged as a minimal Docker image built from python:3.11-slim. Multi-stage builds keep production images under 180 MB — the build stage installs all dependencies; the runtime stage copies only the compiled application.

`# Stage 1 — Build
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --prefix=/install --no-cache-dir -r requirements.txt

# Stage 2 — Runtime
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]`

Kubernetes mutations are applied via a custom operator written in Python using the kopf framework. The operator watches for MutationEvent custom resources and applies rolling restarts, network policy patches, and config map updates. It runs as a standard Deployment using in-cluster service account credentials — no external access required. Kopf handles leader election, retry logic, and structured logging, keeping the operator code focused purely on business logic.

[TESTING] Pytest, Fixtures & Mocking

Neuro-Morph uses Pytest as the sole test runner across all layers, with four distinct test categories:

Unit Tests — individual RL agent methods, reward functions, and API validators in isolation using mock Kubernetes clients and synthetic state tensors.
Integration Tests — spin up a test FastAPI app with TestClient, verify that POST /mutations/dispatch correctly enqueues a Celery task and returns the expected task ID.
Property-Based Tests — use Hypothesis to generate random state vectors and verify that the DQN agent always returns a valid action index.
End-to-End Tests — a Docker Compose environment with a stubbed Kubernetes API server runs full mutation cycles, verifying observable state changes within defined time bounds.

# tests/test_api.py
    from fastapi.testclient import TestClient
    from unittest.mock import patch
    from app.main import app

    client = TestClient(app)

    @patch('app.tasks.dispatch_mutation.delay')
    def test_mutation_dispatch_queues_task(mock_delay):
        mock_delay.return_value.id = 'abc-123'
        resp = client.post('/mutations/dispatch', json={
            'action': 'rotate_port',
            'target_service': 'api-gateway',
            'priority': 2
        })
        assert resp.status_code == 200
        assert resp.json()['task_id'] == 'abc-123'
        mock_delay.assert_called_once_with('rotate_port', 'api-gateway')

Coverage Target: 90% on API and task layers, 80% on the RL module. Use pytest-cov with --fail-under=80 in CI to enforce the floor and prevent regressions from sneaking into main.

Github Repository:-https://github.com/N1hal-tech/Neuro-Morph