DEV Community: Sarvagya Jaiswal

[Boost]

Sarvagya Jaiswal — Sun, 15 Mar 2026 11:51:12 +0000

Catching Deepfakes in Real-Time: A Spatial-Temporal Approach with EfficientNet-B0 and Bi-LSTM

Sarvagya Jaiswal ・ Mar 15

#computervision #machinelearning #python #pytorch

Catching Deepfakes in Real-Time: A Spatial-Temporal Approach with EfficientNet-B0 and Bi-LSTM

Sarvagya Jaiswal — Sun, 15 Mar 2026 11:50:58 +0000

Catching Deepfakes in Real-Time: A Spatial-Temporal Approach with EfficientNet-B0 and Bi-LSTM

The problem with most early deepfake detection models is that they treat video as a collection of static images. They pass individual frames through a Convolutional Neural Network (CNN) and look for spatial artifacts—weird blurring around the jawline, mismatched skin tones, or pixelated boundaries.

But modern deepfakes (especially those generated by GANs and diffusion models) have virtually eliminated static spatial artifacts. A single frame often looks flawless. What gives a deepfake away isn't the space; it is the time. The blink rate is unnatural. The micro-expressions jitter. The lip-sync drifts off by a fraction of a second.

To catch a modern deepfake, you cannot just look at a picture. You have to understand the sequence. Here is how I built a Spatial-Temporal Deepfake Detector using PyTorch, combining an EfficientNet-B0 backbone for spatial feature extraction with a Bi-LSTM network for temporal sequence analysis.

1. The Architecture: Why Spatial-Temporal?

Processing raw video directly is computationally brutal. Instead, the architecture works in two distinct phases:

Spatial Extraction (The "What"): We sample $N$ frames from a video and pass each frame through a pre-trained EfficientNet-B0. We discard the final classification layer, using the network strictly as a feature extractor. EfficientNet-B0 was chosen because it perfectly balances high-dimensional feature extraction with low computational overhead.
Temporal Analysis (The "When"): The sequence of extracted feature vectors is then fed into a Bidirectional Long Short-Term Memory (Bi-LSTM) network. The Bi-LSTM analyzes the sequence both forwards and backwards, searching for temporal inconsistencies and unnatural frame-to-frame transitions.

2. Building the Hybrid Model in PyTorch

Stitching a CNN to an RNN requires careful tensor dimension management. Here is the core PyTorch module that bridges the two networks:

import torch
import torch.nn as nn
from torchvision import models

class DeepfakeDetector(nn.Module):
    def __init__(self, sequence_length=20, hidden_dim=256, lstm_layers=2):
        super(DeepfakeDetector, self).__init__()
        self.sequence_length = sequence_length

        # 1. Spatial Feature Extractor (EfficientNet-B0)
        efficientnet = models.efficientnet_b0(pretrained=True)
        self.feature_extractor = nn.Sequential(*list(efficientnet.children())[:-1])
        self.feature_dim = 1280 

        # 2. Temporal Sequence Modeler (Bi-LSTM)
        self.lstm = nn.LSTM(
            input_size=self.feature_dim,
            hidden_size=hidden_dim,
            num_layers=lstm_layers,
            batch_first=True,
            bidirectional=True
        )

        # 3. Final Classifier
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim * 2, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, 1),
            # Sigmoid for binary probability
            nn.Sigmoid()
        )

    def forward(self, x):
        # x shape: (Batch, Sequence_Length, C, H, W)
        batch_size, seq_len, c, h, w = x.size()

        # Reshape for CNN processing
        x = x.view(batch_size * seq_len, c, h, w)
        spatial_features = self.feature_extractor(x)

        # Reshape back to sequence for LSTM
        spatial_features = spatial_features.view(batch_size, seq_len, self.feature_dim)
        lstm_out, _ = self.lstm(spatial_features)

        # Use final timestep for classification
        final_timestep_out = lstm_out[:, -1, :]
        return self.classifier(final_timestep_out)

3. The Deployment Challenge: Video Processing in Gradio

Deploying this model introduces a unique challenge: you aren't just handling images; you are handling video streams.

When deploying this to Hugging Face via Gradio, I had to write a custom preprocessing pipeline using OpenCV (cv2.VideoCapture) to extract the video frames, sample them evenly to match the model's sequence_length, and stack them into a 5D PyTorch tensor (Batch, Sequence, Channels, Height, Width).

import gradio as gr
import cv2

def process_video(video_path):
    # OpenCV logic to extract exactly 20 frames
    # Apply Resize and Normalization transforms
    # Perform model inference
    return prediction_score

interface = gr.Interface(
    fn=process_video,
    inputs=gr.Video(),
    outputs=gr.Label(label="Authenticity Analysis"),
    title="Spatial-Temporal Deepfake Detector"
)

The Takeaway

Catching deepfakes is no longer just a computer vision problem; it is a sequence modeling problem. By utilizing an EfficientNet-B0 to understand the space and a Bi-LSTM to understand the time, we can flag the unnatural temporal micro-jitters that standard frame-by-frame analysis misses.

[Boost]

Sarvagya Jaiswal — Sun, 15 Mar 2026 11:45:31 +0000

Building an Adaptive RAG Agent with LangGraph: Dynamic Routing and Stateful Memory

Sarvagya Jaiswal ・ Mar 15

#machinelearning #python #ai #langchain

Building an Adaptive RAG Agent with LangGraph: Dynamic Routing and Stateful Memory

Sarvagya Jaiswal — Sun, 15 Mar 2026 11:45:06 +0000

Building an Adaptive RAG Agent with LangGraph: Dynamic Routing and Stateful Memory

Building a basic "Retrieve and Generate" (RAG) pipeline takes about ten lines of code these days. But what happens when a user asks a simple greeting? Your system wastes compute querying a vector database. What happens on turn five of a conversation when the user says, "Wait, explain that second point again?" A naive RAG system suffers from amnesia and fails entirely.

To build a production-grade AI assistant, you need more than a linear chain. You need a stateful, decision-making agent.

Here is how I engineered an Adaptive RAG Assistant using LangGraph to handle dynamic search routing and stateful memory injection, completely eliminating context amnesia.

1. The Core Problem: Linear Chains vs. State Machines

Standard LangChain workflows are Directed Acyclic Graphs (DAGs). Data flows from A -> B -> C. But real human conversation is cyclical. We loop back, we clarify, and we change topics.

I migrated the architecture to LangGraph because it treats the LLM workflow as a state machine. By defining a global State object that gets passed between nodes, the application can loop, make decisions, and retain context over time.

Here is the foundation of the graph state:

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator

# The graph state that persists across all nodes
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    context: str
    routing_decision: str

2. The Brain: Dynamic Routing Strategy

Not every query requires a massive vector database search. To optimize latency and compute, I built a routing node that evaluates the user's query and assigns it to one of three strategies:

Light Search: For greetings or general knowledge ("Hello," "What is Python?"). Bypasses the retriever entirely and uses the LLM's internal knowledge.
Standard Search: For direct factual questions. Triggers a standard semantic search against the vector store.
Deep Search: For complex, multi-hop queries. Triggers an agentic loop that might query the database multiple times to synthesize an answer.

Here is what that routing logic looks like in the graph:

def route_query(state: AgentState):
    query = state["messages"][-1].content

    # Prompting the LLM to act as a router
    router_prompt = f"Analyze this query and classify the required search depth: 'Light', 'Standard', or 'Deep'. Query: {query}"
    decision = llm.invoke(router_prompt).content.strip()

    return {"routing_decision": decision}

# Defining the LangGraph conditional edges
workflow.add_conditional_edges(
    "router_node",
    lambda x: x["routing_decision"],
    {
        "Light": "llm_direct_node",
        "Standard": "vector_search_node",
        "Deep": "agentic_research_node"
    }
)

3. Curing Amnesia: Stateful Memory Injection

The most frustrating part of interacting with a standard RAG bot is its inability to remember the previous message.

Because LangGraph inherently passes the AgentState object through the execution graph, I structured the messages key to append every new interaction natively using operator.add.

When the workflow routes to the retrieval node, it doesn't just embed the user's latest message. It injects the last 3 turns of conversation into a contextualizer prompt.

def retrieve_and_inject(state: AgentState):
    # Extract chat history
    chat_history = state["messages"][:-1]
    latest_query = state["messages"][-1].content

    # Rewrite the query based on conversation history
    contextualized_query = contextualize_llm.invoke(
        f"History: {chat_history}\nLatest: {latest_query}\nRewrite query for vector search:"
    ).content

    # Perform retrieval using the rewritten, context-aware query
    docs = vector_store.similarity_search(contextualized_query, k=4)
    context_str = "\n".join([d.page_content for d in docs])

    return {"context": context_str}

If the user says, "Tell me about LangGraph," and then follows up with, "How does it compare to LangChain?", the retriever understands that "it" refers to LangGraph, pulling the correct documents from the vector space.

The Takeaway

If you are building an AI application meant for real users, you have to move past naive linear chains.

By leveraging LangGraph for stateful orchestration, you can build systems that actually think about how to answer a question before they start searching, saving compute and creating a vastly superior, context-aware user experience.