DEV Community: Pratyay Banerjee

Kaizen — 𝘓𝘦𝘵 𝘺𝘰𝘶𝘳 𝘧𝘰𝘤𝘶𝘴 𝘧𝘰𝘭𝘭𝘰𝘸 𝘺𝘰𝘶! 🎯

Pratyay Banerjee — Wed, 04 Mar 2026 07:15:11 +0000

Disclaimer: This is a submission for the Built with Google Gemini: Writing Challenge

Presenting Kaizen 🦄

Kaizen is a multi-agent Chromium extension that quietly tracks where your attention actually settles while you browse. It doesn’t block anything or try to push productivity. Instead, it helps you notice when your mind has drifted and guides you back to the thread you were following. Our brains are wired to conserve energy, so the moment we pause to ponder, we begin to wander. This is why long browsing sessions often feel scattered. Kaizen steps in at that exact point, especially for people who experience attention drift or ADHD-like patterns, helping the web feel connected again rather than fragmented.ㅤ

🔗 Try it out here: https://kaizen.apps.sandipan.dev

Product Demo ▶️

The name "Kaizen" comes from the Japanese concept of continuous improvement (改善). Small, steady gains. That's the whole philosophy — not perfection, just awareness.

Inspiration 💡

If you write code or study on the web, you’ve likely lived this moment — a tab for documentation leads to a blog post, then a video, then a forum thread, and somewhere between the scrolls, the thread of your original question frays. Minutes later, you know you saw something useful, but you can’t quite recall where, or what 😭

“Distraction is the modern poverty. Focus is the new wealth.” — James Clear.

The truth is, the internet has made information abundant, but our ability to retain and build on that knowledge hasn't kept pace. The truth is, our brains are wired to conserve energy, so the moment we pause to ponder, we begin to wander. This is why long browsing sessions often feel scattered.

I suffer from ADHD. My co-builder @sandipndev does too. We've both tried the usual focus apps, especially the ones that block websites or guilt-trip you with timers. They made us feel worse, not better.

So, around New Year's, we decided that our resolution for 2026 would be to actually fix this problem, but not with another blocker or pomodoro clone, but with something that genuinely understands how attention works! That motivation landed us at the Commit To Change: An AI Agents Hackathon 2026 hosted by Encode Club, and that's where Kaizen was born.

We asked ourselves: what if we could utilize AI smartly to understand where your attention actually settles, be it the content you read, the figure you saw, or the video you watched, and turn those moments into a private learning loop? We wanted to turn scattered web browsing into genuine learning, and that too without blocking sites or nagging you to focus, but by understanding what you're actually paying attention to and helping you build on it.

Built on Google gemini-3.1-flash, Kaizen supercharges your browser activity while keeping your data private. It notices, gently reflects, and helps you remember, the kind of help that keeps users rooted in the activity because progress is felt, not forced!

Codebase / App Repository 🔗

Kaizen 👉 github.com/anikvox/kaizen [Open Source]
Live App 👉 kaizen.apps.sandipan.dev

anikvox / kaizen

overcome adhd

Kaizen

kaizen.apps.sandipan.dev · Focus that Follows You

A privacy-first browser extension that tracks where your attention actually goes and gently helps you stay on track — without blocking content or enforcing rigid workflows.

Built by CS students with ADHD who wanted a tool that understands attention patterns, not one that locks you out.

Screenshots

Left: Extension side panel with focus tracking and growing tree. Right: Gentle nudge when you drift.

Features

Cognitive Attention Tracking — Knows what you're reading, watching, and listening to — not just which tabs are open
Focus Guardian Agent — Detects doomscrolling and distraction patterns, sends supportive nudges instead of blocking
AI Chat with Memory — Ask "What was I reading about today?" and get context-aware answers from your browsing history
Auto-Generated Quizzes — Turn passive reading into active recall with knowledge verification quizzes
Insights & Achievements — Track streaks, milestones, and focus patterns over time
…

View on GitHub

What it does 🤔

Kaizen acts as your AI co-pilot for focused learning on the web. It runs silently in the background, tracking what you actually pay attention to, including what you read, watch, and explore, and utilizes Google Gemini & Opik to help you stay focused, remember what matters, and test your understanding.

As you browse, Kaizenm gradually turns your native attention into learning. When your focus slips, it offers gentle nudges to bring you back. When you finish reading or watching something, it surfaces quick recall prompts to reinforce what you just absorbed. It occasionally slips in short, well-timed quizzes to check your understanding while the idea is still fresh. And when you want to go deeper, its context-aware chat remembers where you’ve been, helping you connect ideas and build knowledge over time.

Kaizen is supercharged with a plethora of awesome features,

🧠 Cognitive Attention Tracking — tracks where your mind actually settles across text, images, audio & YouTube

🤖 Multi-Agent AI System — four coordinated agents (Focus Guardian, Chat, Focus Clustering, Mental Health) powered by Gemini

💬 Agentic Co-Pilot Chat — tool-calling assistant that synthesizes your reading sessions with context-aware insights

🌊 Supportive Pulse Nudges — gentle reminders when you drift, never blocking — with self-calibrating sensitivity

📝 Knowledge Quizzes — auto-generated verification quizzes from your actual browsing content

📊 Cognitive Analytics Dashboard — attention entropy, browsing fragmentation, late-night patterns over 7–90 day windows

🌱 Growing Plant Gamification — a virtual plant that grows with your focus time

🔐 Privacy-First Engine — PII anonymization, encrypted API keys (AES-256-GCM), GDPR-compliant with full data export/deletion

🔭 Full Opik Observability — every LLM call, tool invocation, and agent decision traced end-to-end

Also, for more transparency, we also present a comparative analysis of available solutions,

Traditional Approach	Kaizen Approach
🟠 Blocks websites entirely	🟢 Supportive pulse nudges — zero blocking
🟠 Binary "focused" / "distracted" state	🟢 Granular cognitive attention sensing
🟠 Punishes distraction	🟢 Understands attention patterns and gently guides
🟠 No understanding of what you're learning	🟢 Tracks reading, images, audio, video — builds context
🟠 Cloud-locked data silos	🟢 Privacy-first, PII-anonymized AI

We wanted something that understands where your attention actually goes and gently helps you stay on track — without locking you out of anything.

How Gemini powers the system ⚡

"We used Gemini" tells you nothing. So let me be specific about how deeply it's woven into every layer of Kaizen.

Gemini is the system default provider throughout Kaizen, integrated via the Vercel AI SDK (@ai-sdk/google v3.0.22) alongside the direct Google SDK (@google/genai v1.40.0). Every agent, every summarization call, every quiz — Gemini handles it unless the user explicitly switches to another provider (Anthropic Claude or OpenAI GPT-4 are available as alternatives).

Here's the core provider resolution logic:

// service.ts — LLM Provider Resolution
export class LLMService {
  getProvider(): LLMProvider {
    // 1. Check user's custom provider + encrypted API key
    if (this.settings?.llmProvider) {
      const provider = this.tryCreateUserProvider();
      if (provider) return provider;
    }
    // 2. Fall back to system Gemini
    return this.createSystemProvider(); // → gemini-2.5-flash-lite
  }
}

// models.ts — System Defaults
export const SYSTEM_DEFAULT_PROVIDER: LLMProviderType = "gemini";
export const SYSTEM_DEFAULT_MODEL = "gemini-2.5-flash-lite";

And the GeminiProvider class wraps the Vercel AI SDK with full tool-calling, multimodal content (text + base64 images), and streaming support:

// providers/gemini.ts
export class GeminiProvider implements LLMProvider {
  readonly providerType = "gemini" as const;

  constructor(config: LLMProviderConfig) {
    this.google = createGoogleGenerativeAI({
      apiKey: config.apiKey,
    });
  }

  async generate(options: LLMGenerateOptions): Promise<LLMResponse> {
    const result = await generateText({
      model: this.google(this.model),
      system: options.systemPrompt,
      messages,
      tools: options.tools,
      experimental_telemetry: getTelemetrySettings({
        name: `gemini-${this.model}`,
        userId: this.userId,
      }),
    });
    // Extract toolCalls and toolResults from response...
  }

  async stream(options: LLMStreamOptions): Promise<void> {
    const result = streamText({
      model: this.google(this.model),
      system: options.systemPrompt,
      messages,
      tools: options.tools,
    });
    for await (const chunk of result.textStream) {
      await options.callbacks.onToken(chunk, fullContent);
    }
  }
}

ㅤ

The four agents 🤖

Kaizen isn't an usual GPT wrapper with a focus timer bolted on. It's a coordinated multi-agent system where each agent has a specific job, its own set of tools, and its own Gemini-powered decision loop.

Agent	What it does	How Gemini is used
🛡️ Focus Guardian	Monitors your browsing every 60 seconds. Detects doomscrolling, distraction, and focus drift. Sends nudges when confidence is high enough.	Gemini analyzes 15 minutes of activity context (domain switches, dwell times, social media ratio) and returns a structured JSON decision at `temperature: 0.1` for consistency.
💬 Chat Agent	Conversational AI with tool-calling. You can ask "what was I reading about today?" and it queries your actual attention data.	Gemini streams responses via `streamText()` with up to 5 agentic steps. It autonomously selects from 11 tools to ground answers in real data.
🎯 Focus Agent	Clusters your attention into focus sessions. Figures out what topics you're working on and tracks evolution.	Gemini runs an agentic loop (up to 10 iterations) calling tools like `create_focus`, `merge_focuses`, `update_focus` to organize attention data into coherent sessions.
🧘 Mental Health Agent	Generates cognitive wellness reports — fragmentation, sleep patterns, media balance, quiz retention.	Gemini runs another agentic loop with specialized tools (`analyze_sleep_patterns`, `analyze_focus_quality`, `analyze_media_balance`, `think_aloud`) and produces a full report in supportive, non-clinical language.

◉ Temperature tuning across tasks 🌡️

Different tasks need different levels of creativity. We tuned Gemini's temperature for each use case:

// config.ts — LLM Configuration Presets
export const LLM_CONFIG = {
  decision:        { temperature: 0.1, maxTokens: 10   },  // Should we nudge? Yes/no.
  summarization:   { temperature: 0.3, maxTokens: 200  },  // Factual, deterministic
  focusAnalysis:   { temperature: 0.3, maxTokens: 50   },  // Concise clustering
  imageDescription:{ temperature: 0.3, maxTokens: 150  },  // Vision captions
  titleGeneration: { temperature: 0.7, maxTokens: 20   },  // Creative but short
  agent:           { temperature: 0.7, maxTokens: 4096 },  // Chat — balanced
  quizGeneration:  { temperature: 0.9, maxTokens: 2000 },  // We *want* variety!
};

At 0.1, Gemini is disciplined — it gives consistent nudge decisions. At 0.9, it generates creative quiz question phrasing without going off the rails. That predictability across the temperature range was one of the reasons we kept Gemini as the default over other providers.

◉ Tool-calling in practice 🔧

The Chat Agent's tool-calling is where Gemini's structured output really shines. When you ask "what have I been focusing on?", here's what actually happens:

User message arrives
  → Gemini evaluates available tools
  → Selects: get_active_focus
  → Tool executes Prisma query against PostgreSQL
  → Results returned to Gemini
  → Gemini composes a response grounded in your data
  → Response streamed back via SSE

The 11 tools available to the Chat Agent:

get_attention_data — recent text/image/audio/YouTube attention
get_active_website — what tab you're on right now
get_active_focus — your current focus topics
search_browsing_history — search past activity
get_reading_activity — reading session data
get_youtube_history — YouTube watch history
get_focus_history — past focus sessions
get_current_time — current time in user's timezone
get_current_weather — weather at user's location
set_user_location — remember location (geocoding via OpenMeteo)
set_translation_language — language preferences

Gemini picks which tools to call, interprets the results, and sometimes chains multiple tool calls in a single turn. We capped it at 5 steps per message to prevent runaway loops. Here's the actual execution from agent.ts:

// chat/agent.ts — Agentic Chat Execution
const result = streamText({
  model: provider(modelId),
  system: systemPrompt,  // Fetched from Opik prompt library
  messages: coreMessages, // Supports multimodal (text + images)
  tools,
  maxSteps: 5,
  onStepFinish: (step) => {
    // Create Opik span for each tool call
    if (step.toolCalls && step.toolCalls.length > 0) {
      for (const toolCall of step.toolCalls) {
        const toolSpan = trace?.span({
          name: `tool:${toolCall.toolName}`,
          type: "tool",
          input: { args: toolCall.args },
        });
        toolSpan?.end({ result: /* tool output */ });
      }
    }
  },
});

We tested Gemini, Claude, and GPT-4 for this pipeline. Gemini's tool selection was the most reliable for our use case — it rarely picked the wrong tool or returned malformed tool calls across 11 different tool schemas. That's why it became the default.

◉ Multimodal attention — Gemini Vision 👁️

When you linger on an image while browsing, the extension tracks your hover duration and confidence score. If you're actually paying attention, Kaizen sends the image as base64-encoded data directly to Gemini for caption generation:

// providers/gemini.ts — Multimodal content formatting
private formatUserContent(content: LLMMessageContent) {
  return content.map((part) => {
    if (part.type === "image") {
      return {
        type: "image" as const,
        image: `data:${part.mimeType};base64,${part.data}`,
      };
    }
    return { type: "text" as const, text: part.text };
  });
}

This means the Chat Agent can later tell you "you were looking at a diagram of TCP handshakes" instead of just "you visited a networking article." The image summaries + text summaries together form Kaizen's memory layer.

◉ Quiz generation from real attention 📝

When you hit "Generate Quiz," a pg-boss background job fires. The Quiz Agent pulls your recent attention data, feeds it to Gemini at temperature: 0.9, and generates 10 multiple-choice questions based on what you've been reading. A content hash prevents duplicate questions across sessions. The quiz stays valid for 24 hours.

This is probably the feature I'm most proud of. Passive reading becomes active recall, and you didn't have to do anything extra. You just browsed normally, and now there's a quiz waiting for you. 🎯

◉ Focus Guardian — the self-learning nudge engine 🛡️

The Focus Guardian runs autonomously, analyzing your last 15 minutes of activity. Here's what actually happens in the decision loop (from focus-agent.ts):

// agent/focus-agent.ts — Focus Guardian Decision
const prompt = `${promptData.content}

RECENT ACTIVITY (last 15 minutes):
- Domains visited: ${context.recentDomains.join(", ")}
- Number of different sites: ${context.domainSwitchCount}
- Average time per page: ${Math.round(context.averageDwellTime / 1000)}s
- Social media/entertainment time: ${Math.round(context.socialMediaTime / 1000)}s
- Reading time (estimated): ${Math.round(context.readingTime / 1000)}s
- Has active focus: ${context.hasActiveFocus ? `Yes (${context.focusTopics.join(", ")})` : "No"}

USER FEEDBACK HISTORY:
- False positive rate: ${(feedback.falsePositiveRate * 100).toFixed(0)}%
- Acknowledged rate: ${(feedback.acknowledgedRate * 100).toFixed(0)}%
- Sensitivity setting: ${sensitivity}`;

const response = await provider.generate({
  messages: [{ role: "user", content: prompt }],
});

Nudge types: doomscroll, distraction, break, focus_drift, encouragement, and all_clear. There's a configurable cooldown between nudges so it never feels like nagging.

The system self-calibrates. Every nudge records whether you acknowledged it, dismissed it, or marked it as a false positive:

// Sensitivity auto-adjustment from user feedback
if (response === "false_positive") {
  newSensitivity = Math.max(0.1, newSensitivity - 0.05); // fewer nudges
} else if (response === "acknowledged") {
  newSensitivity = Math.min(0.9, newSensitivity + 0.02); // nudge was helpful
}

Over time, the agent learns your patterns. If it keeps getting it wrong, it backs off. If it's on point, it stays the course.

The Tech Stack ⚙️

Everything runs on a TypeScript monorepo (pnpm workspaces):

kaizen/
├── apps/
│   ├── api/          # Hono backend — agents, data ingestion, SSE
│   ├── extension/    # Plasmo browser extension — attention sensors
│   └── web/          # Next.js dashboard — analytics, chat, settings
├── packages/
│   ├── api-client/   # Shared typed API client
│   └── ui/           # Shared component library
└── docker-compose.yml

Layer	What we used
Runtime	Node.js 22+
Backend	Hono v4.6.14, Prisma ORM v6.2.1, PostgreSQL 16
Job Queue	pg-boss v12 (single-concurrency, resource-aware)
Real-time	Custom SSE (Server-Sent Events) for cross-device sync
Auth	Clerk v1.21.4 (web), device token handshake (extension)
AI	Google Gemini via Vercel AI SDK v6.0.77 (`@ai-sdk/google` + `@google/genai`)
Observability	Comet Opik v1.0.6 — tracing, prompt library, anonymizers
Extension	Plasmo, React, TypeScript
Dashboard	Next.js 15, Tailwind CSS, Lucide Icons
Encryption	AES-256-GCM for API key storage

Attention sensors 📡

The extension runs separate monitors for different content types:

Sensor	File	What it tracks
📖 Text	`monitor-text.ts`	Paragraphs read, words processed, reading progress, sustained attention duration
🖼️ Image	`monitor-image.ts`	Hover duration, confidence score → triggers Gemini Vision for caption generation
🔊 Audio	`monitor-audio.ts`	Playback duration, active listening time
📺 YouTube	background scripts	Watch time, captions ingestion, video context

Each sensor generates a confidence score (0–100) based on hover duration, scroll velocity, and viewport position. A quick skim doesn't count as learning. Sustained attention does.

Database Schema 🗄️

// Core attention tracking
TextAttention    → text, wordsRead, confidence, timestamp
ImageAttention   → src, alt, hoverDuration, summary (AI-generated)
AudioAttention   → playbackDuration, activeTime
YoutubeAttention → captions, activeWatchTime

// Agentic features
Focus            → item, keywords[], isActive, lastActivityAt
AgentNudge       → type, message, confidence, reasoning, response
Pulse            → userId, message (short nudges)

// Quiz system
Quiz             → questions (JSON), contentHash (deduplication)
QuizAnswer       → selectedIndex, isCorrect
QuizResult       → totalQuestions, correctAnswers

// User settings (encrypted API keys)
UserSettings     → geminiApiKeyEncrypted, llmProvider, llmModel

ㅤ

Real-time SSE events 📡

Custom Server-Sent Events sync state across browser extension + dashboard:

pomodoro-tick — Timer updates
chat-message-created/updated — Chat streaming
active-tab-changed — Tab context sync
focus-changed — Focus session state
settings-updated — Cross-device settings sync
pulses-updated — Nudge notifications

Observability with Opik 🔭

We integrated Comet Opik for full observability across the entire agent system. This turned out to be one of the best decisions we made, as you can't evaluate what you can't see.

What we instrumented

🔗 Tracing — Every LLM call, every tool invocation, every agent decision is traced end-to-end. Traces are grouped by thread ID so you can follow the full decision flow:

// telemetry.ts — Opik Trace Hierarchy
const trace = client.trace({
  name: options.name,
  input: options.input ? anonymizeInput(options.input) : undefined,
  metadata: { ...options.metadata, environment: process.env.NODE_ENV },
  tags: options.tags || ["kaizen"],
  threadId: options.threadId,
});

// Nested spans for each step
const span = trace.span({
  name: "tool:get_attention_data",
  type: "tool",
  input: anonymizeInput({ args: toolCall.args }),
});
span.update({ output: processedOutput, endTime: new Date() });

The resulting trace hierarchy looks like:

Trace: chat-agent
├── Span: streamText [type: llm]
│   ├── Span: tool:get_active_website [type: tool]
│   ├── Span: tool:get_attention_data [type: tool]
│   └── Span: tool:search_browsing_history [type: tool]
└── Span: followUp-streamText [type: llm]

📚 Prompt Library — All 11 system prompts live in Opik under named entries, fetched fresh on every call with local fallbacks:

// prompt-provider.ts — Opik-first, local fallback
export async function getPromptWithMetadata(name: PromptName) {
  if (isOpikPromptsEnabled()) {
    const opikPrompt = await getPromptFromOpik(name);
    if (opikPrompt?.content) {
      return { content: opikPrompt.content, source: "opik",
               promptVersion: opikPrompt.commit };
    }
  }
  return { content: LOCAL_PROMPT_MAP[name], source: "local" };
}

This let us iterate on prompts without redeploying code. We'd see a bad nudge in a trace, tweak the prompt in Opik's dashboard, and the fix was live immediately.

🔒 Anonymizers — Before anything gets logged to Opik, we strip PII using @cdssnc/sanitize-pii:

// anonymizer.ts — PII Protection
function isSensitiveKey(key: string): boolean {
  const sensitivePatterns = [
    /^userId$/i, /password/i, /secret/i,
    /token/i, /api[_-]?key/i, /auth/i,
    /credential/i, /private[_-]?key/i,
  ];
  return sensitivePatterns.some((pattern) => pattern.test(key));
}

// User inputs → anonymized. LLM outputs → preserved for debugging.
export function anonymizeInput<T>(data: T): T { return anonymizeData(data); }
export function anonymizeOutput<T>(data: T): T { /* only redact sensitive keys */ }

🛡️ Guardrails — Agents validate inputs before tool execution. The Focus Guardian only fires a nudge when confidence exceeds a dynamically adjusted threshold. The Chat Agent validates tool arguments before running Prisma queries.

Why Opik mattered 🎯

Early on, the Focus Guardian was nudging people during legitimate deep dives. Someone would be reading a 30-minute technical article, and the agent would flag it as distraction because the domain switching pattern looked similar to aimless browsing.

Without Opik, we'd have said "the AI is dumb" and guessed at fixes.

With tracing, we could pull up the exact decision chain: here's the 15 minutes of context the agent saw, here's the domain switch count, here's the confidence score, here's the prompt, here's the output. The problem was obvious — the prompt didn't have a strong enough signal for sustained single-topic browsing. We tweaked the prompt in Opik, the fix deployed without a code change, and false positives dropped.

That cycle — trace the failure → find the root cause → fix the prompt → verify in production — happened dozens of times.

What worked well ✅

Tool-calling was reliable. We tested Gemini, Claude, and GPT-4 for our agent pipelines, and Gemini's structured output parsing was the most consistent for our use case. The Chat Agent makes autonomous tool selections across 11 different tools, and Gemini rarely picked the wrong one or returned malformed tool calls. This is why it became our system default.
The million-token context window was a real advantage. Gemini 3.1-flash and 3.1-flash-lite both support 1M token context windows. For the Focus Agent's clustering loop, which sometimes processes hours of attention data across many topics, especially having that headroom meant we didn't have to aggressively truncate context. We could pass in a richer activity history and get better clustering decisions.
Temperature control behaved predictably. From 0.1 for binary decisions to 0.9 for quiz generation, Gemini responded consistently. At 0.1 it was disciplined; at 0.9 it got creative without going off the rails. That predictability across the full range was a real win.
Multimodal input worked out of the box. We send base64-encoded images directly to Gemini for caption generation. The quality of image descriptions was good enough that the Chat Agent could later reference them meaningfully. No separate vision pipeline needed.
The model fetcher dynamically discovers new models. We use @google/genai to fetch the live model list from the Gemini API (filtering for generateContent support), with sorting priority baked in for the upcoming gemini-series family, so when new model lands, Kaizen will pick it up automatically.

Research 📚

We don’t remember things just because we saw them. We remember them when we bring them back to mind. A small, well-timed reminder can turn a passing moment online into something that actually sticks. And it works better when the reminder supports your intention rather than trying to control your behavior. When your browser can quietly keep track of the ideas you spent time on and surface them again when you need them, the pressure of “trying to hold everything in your head” eases up.

This is especially supportive for people with ADHD, where working memory and task switching can feel heavy, and for people who experience early memory decline, where gentle spaced recall helps keep learning active. Kaizen helps keep the thread. Small nudges, quick check-ins, and context that stays with you, so you don’t have to start from scratch every time you return to a thought.

Kaizen keeps attention anchored to meaning, not effort! ✨

Challenges we ran into 😤

We did run into a few challenges along the way. Since we were working from different time zones, coordinating calls and staying in sync took some extra effort. Most of our collaboration happened asynchronously, which meant we had to be very clear about decisions and hand-offs.

On the technical side, figuring out what “real attention” meant was something we had to refine multiple times. We experimented with how much weight to give scroll patterns, mouse movement, viewport position, and reading pace so that quick skims didn’t count as learning. Handling different types of content also took care, especially images and YouTube videos, since the context needed to stay meaningful, not noisy.

Besides, some other challenges we faced aren't limited to,

Multi-turn coherence degraded over long conversations. After 10+ turns with interleaved tool calls, the Chat Agent would sometimes lose track of earlier context or repeat information. We partially fixed this by injecting a conversation summary into the system prompt, but it meant extra token usage. Not unique to Gemini, but noticeable.
Streaming with tool calls needed careful handling. When Gemini decides mid-stream to call a tool, the handoff between text chunks and tool-call events required state management in our SSE layer. The Vercel AI SDK abstracted most of it, but edge cases (tool call at the very start, multiple rapid tool calls) needed explicit handling.
Occasional overconfidence in Focus Guardian decisions. At temperature: 0.1, when the Focus Guardian is wrong, it's confidently wrong. A few times it classified focused research (lots of Stack Overflow tabs) as aimless browsing. The fix was better prompting + the feedback loop, not a model change.

What we learned 🙌

Proper sleep is very important! 😛

Well, a lot of things, both summed up in technical & non-technical sides. We learned that — it’s one thing to get the AI features working, and another to make them feel good while someone is actually browsing. Most of our time went into small details: when to nudge, when to stay quiet, how to store attention history without slowing down browser, and how to keep things calm instead of distracting. Shipping Kaizen from a barebone idea into something stable took a lot of iteration, testing, and rethinking. It reminded us that real products are built in the tiny decisions, not the big demos! 🤗

There are a few more items that I'd love to share with the community,

🟦 Building for attention requires restraint. The hardest design decisions weren't technical. They were about when not to act. Our early Focus Guardian nudged aggressively, mostly it felt like a backseat driver. So, our lesson was that if your tool annoys people, they'll uninstall it. Being right isn't enough; you have to be right at the right moment.
🟦 Agents need structure, not freedom. We initially gave the Chat Agent broad instructions. The results were inconsistent. What worked was constraining each agent to a narrow job with specific tools and clear decision boundaries. The Focus Guardian doesn't chat. The Chat Agent doesn't nudge. In sharp, Specialization + Coordination > Generalization.
🟦 Observability isn't optional for agent systems. Without Opik traces, we'd still be guessing why nudges misfired. We stopped treating the AI as a black box and started treating it like any other system component with logs and metrics.
🟦 The real product is the quiet moments. Nobody remembers the quiz that worked perfectly. They remember the time the extension stayed silent for 45 minutes during a Wikipedia deep dive they genuinely cared about, and then gently reminded them about the assignment they'd originally set out to work on. Getting those moments right took dozens of prompt iterations and hundreds of traced decisions.
🟦 Gemini as a default provider was the right call. After benchmarking all three providers, Gemini's combination of reliable tool-calling, 1M context window, and consistent temperature behavior made it the best fit. Our system makes potentially dozens of Gemini calls per user per hour — attention summaries, focus clustering, guardian checks — and reliability at that volume mattered more than peak performance on any single call.

What's next? 🚀

We're continuing to develop Kaizen and planning the next release cycle:

⏰ Spaced repetition — surfacing what you read at the moment you're most likely to forget it
🕸️ Topic relationship mapping — showing how things you learn connect across sessions
⚡ Better batching — optimized Gemini call grouping during long browsing sessions
📤 Export to note-taking tools — so learning doesn't stay trapped in the extension
👥 Shareable study threads — lightweight collaboration for shared focus sessions

For Gemini specifically, we're interested in structured output (JSON mode) for agent responses. Right now we parse freeform text from several agent pipelines, and guaranteed JSON would let us simplify those parsing layers.

End notes 🙌🏻

As CS students who struggle with ADHD, we primarily built Kaizen because we needed it ourselves. Traditional blockers felt like punishment. Our New Year's resolution was to build the tool we wished existed, something that doesn't lock you out, doesn't judge, just watches where your attention goes, learns your patterns, and gently, continuously helps you get better. That's what kaizen (改善) stands for, i.e. continuous improvement.

Huge thanks to DEV and MLH for hosting this writing challenge, and to the Google Gemini team for building models that actually hold up under real multi-agent workloads! 🙌

Permissive License ⚖️

MIT License

Fragments — 𝙏𝙞𝙣𝙮 𝙁𝙧𝙖𝙜𝙢𝙚𝙣𝙩𝙨 𝙢𝙖𝙠𝙞𝙣𝙜 𝙮𝙤𝙪𝙧 𝙇𝙞𝙛𝙚 𝙁𝙪𝙡𝙡 ✨

Pratyay Banerjee — Mon, 05 Jan 2026 03:25:34 +0000

𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝘆 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻: Best Use of Mux

Note: We'd also like to consider our project under the Show and Tell track. This is a submission for the DEV's Worldwide Show and Tell Challenge Presented by Mux

Participants: @neilblaze & @achalbajpai

Ever lose that perfect video moment because you forgot where you saw it? That frame that could've sparked your next big idea – gone. Your brain isn't a hard drive. That's why we built Fragments! ✨

Video ▶️

— OR —

ㅤ

What we built 🤔

Fragments is your visual second brain for creators who believe great work is shaped by intentional consumption. It's a seamless synergy of both Webapp and Chrome extension that lets you capture, organize, and rediscover video moments that spark insight! Ideas don't appear in isolation — they emerge from moments you notice, save, and revisit over time. In many creative fields, a short video clip can convey more insight than pages of text. ⚡

💡 The whole idea behind Fragments is that great creators aren't just skilled at making — they're careful about what they consume. We built a tool that makes capturing those fleeting "aha!" moments as effortless as a keyboard shortcut. Whether you're a designer spotting a slick animation, a developer watching a tutorial, or a researcher collecting interview clips — Fragments ensures nothing gets lost!

🔗Try it out here: https://fragmentsofmux.vercel.app

ㅤ

How it works? 💣

Users install our Chrome extension and sign up via Supabase Auth with Google OAuth. Once authenticated, they can capture any screen content with a simple Alt+Shift shortcut. The recording (max 60 seconds) gets uploaded directly to Mux for processing. Mux handles video storage, streaming, thumbnail generation, and AI-powered transcription. Users add tags, notes, and categories while saving. The dashboard provides a beautiful gallery view with GIF previews, full-text search across titles/tags/transcripts, and detailed analytics per fragment. Everything syncs in real-time via Supabase PostgreSQL database.

ㅤ

Codebase / App Repository 🔗

Fragments 👉 github.com/neilblaze/fragments [Open Source]
Fragments Extension 👉 github.com/Neilblaze/fragments/releases/tag/v1.0.0

Neilblaze / fragments

Tiny Fragments making your Life Full ✨

Fragments

Fragments is a web-app and chromium extension designed for creators who believe that great work is shaped by intentional consumption. Ideas do not appear in isolation. They emerge from moments you notice, save, and revisit over time. In many creative fields, a short video clip can convey more insight than pages of text.

Fragments helps you capture, organize, and rediscover those moments so they are available when you need them most, built with Mux and Supabase for the DEV's Worldwide Show and Tell Challenge 2025.

Tech Stack

Next.js 15
Mux (Video processing and streaming and AI features)
Supabase (Authentication and Database)
Tailwind CSS + Shadcn UI

Prerequisites

Node.js 20 or higher
Mux account and API tokens
Supabase project and service role key

Env Configuration

Create a .env file in the root directory with the following variables.

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key

MUX_TOKEN_ID=your_mux_token_id
MUX_TOKEN_SECRET=your_mux_token_secret
MUX_WEBHOOK_SECRET=your_mux_webhook_secret

DASHBOARD_URL=http://localhost:3000

Database Setup

Open your Supabase…

View on GitHub

ㅤ

Features 🎠

Chrome Extension with invisible capture (Alt+Shift shortcut, max 60s)

Mux Video Processing for streaming, thumbnails, and GIF previews

AI-Powered Transcription via Mux's auto-generated captions

Deep Search across titles, tags, notes, and transcripts

Neobrutalist UI with RetroUI-inspired design (Tailwind + ShadCN)

Google OAuth via Supabase Authentication

Public/Private Sharing with NSFW detection and age verification

Reddit-style Voting (upvotes/downvotes) on community fragments

View Analytics tracking per fragment

MP4 Downloads via Mux Static Renditions

Tag Management with popular tag suggestions

NSFW Restrictions with NSFW.js on every fragments

Responsive Design works on desktop and mobile

Real-time Updates via Supabase subscriptions

Privacy-first with Row Level Security (RLS) policies

Privacy & Security 🔐

Fragments deals with personal video captures, which can be sensitive. We've implemented Row Level Security (RLS) policies in Supabase ensuring users can only access their own fragments. Public sharing is explicit and opt-in. NSFW content is detected and requires age verification to view. All API calls are authenticated, and video processing happens securely through Mux's infrastructure.

Background 📜

Here's the thing — creators consume thousands of videos but only a handful of moments truly matter. Those 10-second clips that demonstrate a technique, explain a concept, or inspire an idea. But where do they go? 😔

The problem with current solutions:

Bookmarks get lost, notes lack context, and rewatching entire videos to find that one moment is painful. Creators need a system that captures moments in context, makes them searchable, and surfaces them when relevant. It's 2026, and we're still losing inspiration to forgotten browser tabs.

The core problem is that video content is hard to search and organize. Text notes can be searched, but video moments require watching. Until now.

ㅤ

Traditional Approach	Fragments Solution
🟠 Bookmarking full videos	🟢 Capture only the moment that matters
🟠 Text notes without visual context	🟢 Video clips with searchable transcripts
🟠 Scattered across multiple apps	🟢 Single organized library
🟠 Can't search video content	🟢 Full-text search across transcripts
🟠 No way to preview quickly	🟢 Looping GIF previews in gallery
🟠 No sharing workflow	🟢 Public/private with community feed

ㅤ

Fragments transforms passive video consumption into an active, searchable knowledge base. Capture what matters, search by what was said, and build your visual second brain!

Fragments changes the game by making video moments as searchable as text, as organized as notes, and as shareable as links! 👪

ㅤ

Why Mux?

Video processing is make-or-break for a tool like Fragments. Mux gives us everything we need without the headaches.

Mux handles video upload, processing, streaming, thumbnail generation, GIF creation, and AI transcription — all through a single, elegant API. This lets us focus on the user experience instead of video infrastructure.

// Mux Integration in Fragments 🎬

export const muxService = {
    async createUpload(corsOrigin?: string) {
        const mux = getMux();
        const upload = await mux.video.uploads.create({
            cors_origin: corsOrigin || "*",
            new_asset_settings: {
                playback_policy: ["public"],
                static_renditions: [{ resolution: "highest" }],
                input: [{
                    generated_subtitles: [{
                        language_code: "en",
                        name: "English (auto)",
                    }],
                }],
            },
        });
        return { uploadUrl: upload.url, uploadId: upload.id };
    },

    getThumbnailUrl(playbackId: string, time = 0) {
        return `https://image.mux.com/${playbackId}/thumbnail.png?time=${time}`;
    },

    getGifUrl(playbackId: string, start = 0, end = 5) {
        return `https://image.mux.com/${playbackId}/animated.gif?start=${start}&end=${end}`;
    },
};

Mux powers our entire video stack:

Upload & Processing — Direct uploads from the extension
Streaming — HLS playback via @mux/mux-player-react
Thumbnails & GIFs — Instant preview generation
AI Transcription — Auto-generated captions searchable in our database
Static Renditions — MP4 downloads for users
Analytics — View counts and engagement metrics

Data Overview	Assets	Engagement
Metrics	Error Logs	Views Metrics

We dove deep into Mux's API for webhooks, asset management, and playback customization. Building a video-first app was a learning curve — we had to understand encoding, streaming protocols, and optimal UX for video galleries. Mux made it manageable!

Use of Mux (Additional Prize Category) 🎬

As mentioned before, Fragments utilizes 7 distinct Mux features beyond just video hosting. Here's a deep dive into our implementation:

1. Direct Uploads (Extension → Mux)

We use Mux Direct Uploads to enable our Chrome extension to upload screen recordings directly to Mux without routing through our server.

// Chrome extension uploads directly to Mux
const upload = await mux.video.uploads.create({
    cors_origin: "*",
    new_asset_settings: {
        playback_policy: ["public"],
        static_renditions: [{ resolution: "highest" }],
        input: [{
            generated_subtitles: [{ language_code: "en", name: "English (auto)" }],
        }],
    },
});
// Extension uses upload.url to PUT video directly

2. Mux Player React

We use @mux/mux-player-react for seamless HLS playback with built-in controls, customizable theming, and analytics tracking.

<MuxPlayer
    playbackId={playbackId}
    metadata={{
        video_id: playbackId,
        video_title: title,
        player_name: "Fragments Dashboard",
    }}
    accentColor="#ff6101"
    streamType="on-demand"
/>

3. Dynamic Thumbnails

Mux Image API generates thumbnails on-the-fly for our gallery cards:

// Thumbnails at any timestamp
`https://image.mux.com/${playbackId}/thumbnail.png?time=${time}&width=640`

4. Animated GIF Previews

We create looping GIF previews for gallery hover states using Mux's GIF endpoint:

// 5-second looping previews
`https://image.mux.com/${playbackId}/animated.gif?start=0&end=5&fps=15&width=320`

5. AI-Powered Auto-Transcription

We enable auto-generated subtitles during asset creation. The transcripts are searchable in our database:

input: [{
    generated_subtitles: [{
        language_code: "en",
        name: "English (auto)",
    }],
}]

6. Static Renditions (MP4 Downloads)

Users can download fragments as MP4 files via Static Renditions:

// Download URL for users
`https://stream.mux.com/${playbackId}/highest.mp4?download=${filename}`

7. Webhooks + NSFW Moderation Pipeline

Our most creative use of Mux! We listen to Mux Webhooks and trigger content moderation:

// Webhook handler: video.asset.ready
case "video.asset.ready": {
    // Update fragment status
    await supabase.from("fragments").update({
        mux_asset_id: assetId,
        mux_playback_id: playbackId,
        status: "ready",
        thumbnail_url: `https://image.mux.com/${playbackId}/thumbnail.png`,
    });

    // Trigger NSFW detection using Mux thumbnails!
    fetch("/api/moderate", {
        method: "POST",
        body: JSON.stringify({ fragmentId }),
    });
}

The NSFW moderation leverages Mux thumbnails — we extract frames at multiple timestamps and run them through OpenAI's omni-moderation-latest model:

// Moderation uses Mux thumbnail API for frame extraction
const thumbnailUrls = [
    `https://image.mux.com/${playbackId}/thumbnail.png?time=1&width=640`,
    `https://image.mux.com/${playbackId}/thumbnail.png?time=5&width=640`,
    `https://image.mux.com/${playbackId}/thumbnail.png?time=10&width=640`,
];

// Each frame is analyzed for NSFW content
const results = await openai.moderations.create({
    model: "omni-moderation-latest",
    input: [{ type: "image_url", image_url: { url: thumbnailUrl } }],
});

If flagged, the fragment is automatically marked as NSFW and requires age verification to view publicly. Note that, due to exhaustion of OpenAI credits (on our end), we're rolled back this feature to NSFW.js which functions using an ultra-lightweight onDevice MobileNet v2 model (4.2MB), that works flawlessly!

ㅤ

Design 🎨

We were heavily inspired by the Neobrutalist design — bold borders, stark shadows, and intentional imperfection. Our UI uses RetroUI components combined with Tailwind CSS and ShadCN UI for a distinctive aesthetic.

Capture: Minimal, invisible UI that doesn't interrupt flow

Organize: Tags, categories, and notes for context

Search: Find by what was said, not just titles

Share: Public community feed with voting

We focused on making the gallery feel like a creative workspace — GIF previews loop automatically, cards have subtle hover effects, and the search experience is fast and intuitive.

CREDITS

Design Resources: RetroUI, Neobrutalist principles
Icons: Lucide React, Hugeicons
Typography: Syne, Space Mono, Geist Mono

Challenges we ran into 😤

Building a screen capture extension + video platform brought some interesting technical challenges.

The biggest headache was getting screen capture to work reliably across different websites and tab contexts. Chrome's Manifest V3 has strict security policies, and coordinating the capture → upload → process → display flow required careful state management.

Real-time video processing meant handling async webhooks from Mux and updating the UI accordingly. We implemented polling for status updates while waiting for transcription to complete.

Performance optimization for the gallery was crucial — loading dozens of GIF previews without janky scrolling required lazy loading and careful memory management.

We're really proud of creating a capture experience that feels invisible and a gallery that makes rediscovery delightful! :)

ㅤ

What's next? 🚀

Fragments has serious potential to become the go-to visual knowledge base for creators. We've built the foundation, and there's so much more to explore!

What we're building next:

Browser Integration: Support for Firefox and Edge
Collections: Group related fragments into themed collections
AI Summaries: Auto-generated descriptions for each fragment
Collaboration: Share collections with teams
Mobile App: Capture from mobile screens
API Access: Let developers build on top of Fragments

We're excited to expand Mux integration, improve search accuracy, and build a thriving creator community!

ㅤ

End Notes 🙌🏻

Huge thanks to DEV for hosting this challenge, the Mux team for incredible video infrastructure and documentation, and to the open-source community for inspiration! 🙌

Permissive License ⚖️

MIT License

Dawn — 𝘾𝙚𝙣𝙨𝙤𝙧𝙨𝙝𝙞𝙥 𝙍𝙚𝙨𝙞𝙨𝙩𝙖𝙣𝙩 𝙈𝙚𝙙𝙞𝙖 ⚡

Pratyay Banerjee — Sun, 07 Sep 2025 19:28:29 +0000

𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝘆 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻 for the Midnight Network "Privacy First" Challenge: Protect That Data prompt

Note: We'd also like to consider our project for the Best Tutorial track!

👥 Participants: @neilblaze, @sandipndev & @subhamx

Video ▶️

💡 TIP: Watch at 1.25x for better experience!

CLI-Demo ▶️

🔎 Why we built it

Today, whistleblowers and citizens face two major problems:

Fear of retaliation: Speaking out against powerful entities risks jobs, reputations, and even lives.
Censorship of evidence: Reports can be deleted, altered, or ignored, leaving no permanent public record.

Dawn solves this by:

Allowing anyone to publish anonymously but with verifiable authenticity via attestation
Making every report permanent and censorship-resistant through blockchain
Organizing reports into Boards (categorization) so the public, journalists, and NGOs can discover, verify, and act on credible information

💡 In short: we protect voices, preserve evidence, and rebuild trust in media

🌘 What is Dawn?

Dawn is a censorship-resistant media platform built on the Midnight Network.

It’s designed to ensure that your voice cannot be silenced, while the authenticity of every report is cryptographically verified.

Users can publish Reports into thematic Bulletins (like Governance, Healthcare, or Corporate). Each report is immutable, linked to a decentralized store, and backed by zero-knowledge attestations that prove the author’s role without revealing their identity.

🌟 Why We Stand Out

Lace Wallet Integration in the Web: One of the only projects with a fully functioning end to end Midnight Lace Wallet integration in the UI
Live Working Deployment: Midnightodawn is live and publicly accessible and works on any Chromium browser with Midnight Lace Wallet extension
Custom Attestation Service: whose proof is verified using their public key inside our smart contract

⛓ Nodes

Attestation Service
- Uses LLM to determine board access for an user based on their email
- Generates signed attestations
- https://attestor.midnightodon.xyz, https://mail.midnightodawn.xyz
User Interface
- Clean web UI deployed to Vercel
- Features: connect wallet, browse Reports, filter by board, view/download attached files
Command Line Interface (CLI)
- Alternate client for publishing Reports without using the web UI
- Uses same attestation and contract verification pipeline

🏃🏻‍♂️ Features

Zero-Knowledge Role Verification
- Users prove eligibility (e.g., government employee, healthcare professional, citizen) via attestations
- Attestation service issues a signed board proof after OTP email verification
- Smart contract verifies proofs with stored EDDSA (Poseidon) keys
Censorship-Resistant Publishing
- Publish immutable Reports with title, content, summary, and attachments (PDFs, docs)
- Reports are organized into Bulletins under Categories
Wallet Integration
- Deployed on Midnight testnet
- Complete end-to-end integration through Lace Wallet for transaction signing
Smart Contracts
- Witness-based verification of signed attestations
- On-chain state stores Reports, linked to decentralized storage
- Supports multiple contract deployments for duplicate instances
Infrastructure & Deployment
- Web frontend on Vercel
- Attestation/email services containerized with Docker Compose, running via static IP
Continuous testing with GitHub Actions for smart contracts

📢 Proof of Work

You can see and interact with the DApp yourself by using the links below. We encourage you to do it so that you can get a feel of our application.

💡 NOTE: You'll need a funded Lace Midnight Wallet on Midnight Network Testnet.

🏠 Portal: https://midnightodawn.xyz
🔗 Dummy Emails: https://mail.midnightodawn.xyz
🔗 GitHub Repository: https://github.com/sandipndev/dawn

💎 BONUS: Follow the getting started guide to setup Dawn on your local!

⚙️ How it works

Visit the portal → midnightodawn.xyz
Connect Lace Wallet → Authorize the DApp in your Midnight Lace Wallet (testnet)
Browse Reports → Instantly see all existing Reports and filter them by board/category
Create a Report → Click Create Report to start publishing.
Get an Attestation
- Enter your email → receive a one-time code (via our toy SMTP service).
- Paste the code to verify → the Attestation Service issues a signed proof of which boards you can post to
Write your Report
- Select your Board → Attestation grants you access to boards. Everyone can post to Citizen Board, but only verified domains can access Government or Healthcare boards
- Fill in Title, Content, Summary, and optional Footnotes/References.
- Content and Footnotes support rich text and file uploads (e.g., PDFs)
Publish
- Submit the Report → sign the transaction in Lace Wallet
- The smart contract verifies the attestation on-chain using stored public keys (EDDSA over Poseidon)
- If valid, the Report is permanently stored, linked to its off-chain attachments
Done 🎉 → The Report appears in the list instantly. Your eligibility is proven, but your identity is never revealed or saved

System Architecture 📊

🚀 From Demo to Vision

Our hackathon demo proves the concept. The MVP vision goes further:

Pluggable verifiers: support multiple attestation methods (gov councils, NGOs, zk-email, Merkle roots).
Spam resistance: rate-limiting nullifiers to prevent Sybil abuse.
Richer discovery tools: allow journalists and citizens to navigate Bulletins, Sub-Categories, and Categories at scale.

🌅 Closing

Midnightodawn demonstrates how censorship-resistant media can be built on Midnight. Every Report is permanent, every author is protected, and every piece of evidence can be trusted.

From Midnight to Dawn, truth survives the night!

Wynnie 🦄 — 𝘠𝘰𝘶𝘳 𝘚𝘩𝘰𝘱𝘱𝘪𝘯𝘨, 𝘯𝘰𝘸 𝘰𝘯 𝘈𝘶𝘵𝘰𝘱𝘪𝘭𝘰𝘵!

Pratyay Banerjee — Mon, 28 Jul 2025 02:17:17 +0000

𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝘆 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻: Business Automation Voice Agent

Note: We'd also like to consider our project under the Real-Time Voice Performance track.

Participants: @neilblaze & @achalbajpai.

Video ▶️

ㅤ

What we built 🤔

Wynnie is your smart autonomous AI shopping companion / agent that revolutionizes how people shop online through simple natural language. It's like having a personal shopping genie 🧞 that actually understands what you want and handles everything automatically, starting from finding products to getting the best deals, it got all covered! We built this using AssemblyAI's speech recognition that can detect over 50 languages on the fly! ⚡

💡 The whole idea behind a smart system like this is that it eliminates all the tedious shopping work while making everything accessible to literally anyone - whether you speak English, Hindi, Spanish, or whatever. Plus it scales incredibly well because each user gets their own personalized shopping brain. Most importantly, it caters to this huge mass of elderly people who have money to spend and need to shop, but they're being shut out by friction designed for digital natives. That's millions of potential customers just sitting there, frustrated and underserved, whom we (as well as business owners) care about and thus we're serving what they want!

🏠Homepage: https://wynnie-v1.vercel.app
🔗Try it out here: https://wynnie.vercel.app

ㅤ

How it works? 💣

Users simply sign up using Google OAuth, and we handle the same via Firebase. Once they’re in, they land on the dashboard, and from there, they can ask for anything, either by typing it out or just talking. If it’s voice, AssemblyAI's speech recognition kicks in to transcribe everything in real-time, even down to the word-level timestamps and formatting. That transcription is then piped to OpenAI, which pulls out what the user wants, any key details, and even picks up on tone or emotion. Then our AI agents jump in, digging through SERP APIs and Perplexity Sonar to find the best product matches, factoring in things like location, coupons, and what they’ve liked before. Once the picks are ready, they hit the inventory, coupons get auto-applied using Synphase, and payments are seamlessly handled through UPI LiteX. All of it’s tracked and stored in Supabase, keeping everything clean, secure, and seamless.

ㅤ

App Repository 🔗

Wynnie.AI 👉 https://github.com/achal-b/wynnie [Open source on GitHub]

ㅤ

Features 🎠

Autonomous AI shopping buddy that delivers true performance!

AssemblyAI's Voice recognition that works with 50+ languages automatically!

Scales like crazy with our multi-agent orchestrated architecture

Real-time product hunting using Perplexity AI (Sonar) & SERP API

Smart deal finder that optimizes your cart automatically!

Intelligent delivery planning with eco-friendly routing

Works offline as a Progressive Web App (PWA)!

Speaks your language - literally any of 50+ languages, with AssemblyAI

Knows who's talking with speaker identification & automatically isolates background noise.

Clean, modern interface built with Next.js & Tailwind

Google OAuth SSO via Google Firebase

Live price tracking and bundle suggestions!

True AI recommendations that finetuned to user's experience!

Supabase backend for blazing fast performance!

Seamless payments via UPI-Litex are highly secure & E2E encrypted!

Comes with batteries & has CI/CD via GitHub actions.

Saves you 💰 + tons of time!

Works for everyone - accessibility first!

Privacy-focused and GDPR* compliant!

ㅤ

System Architecture 📊

Privacy & Security 🔐

Wynnie deals with your shopping data and payment info, which is pretty sensitive stuff. We've gone overboard on security to make sure everything stays locked down and 100% GDPR compliant.

All communication happens over encrypted channels, and we use Supabase's built-in security features. Voice data gets processed securely through AssemblyAI's endpoints. Down the road, we're planning full end-to-end encryption for everything.

Background 📜

Here's the thing - online shopping is still a pain for way too many people! Language barriers, confusing interfaces, and just the overwhelming number of choices make it really hard for people to find what they actually need and get good deals. 😔

Source: https://retailwire.com/discussion/are-retailers-making-it-too-tough-for-seniors-to-shop-online

ㅤ

Most shopping sites basically dump you into this maze where you have to:

Navigate complex menus, compare tons of products manually, figure out which deals are actually good, and somehow optimize everything yourself. If you don't speak the main language perfectly, or if you have accessibility needs, or if you just find tech interfaces confusing - you're pretty much out of luck. It's 2025 and we're still making people work way too hard just to buy stuff.

The core problem is that e-commerce platforms are built like databases instead of conversations. Nobody shops by filling out forms, rather we shop by talking about what we need.

ㅤ

Traditional E-commerce Pain Points	Wynnie's AI Solution
🟠 Users manually search for products	🟢 AI-driven intent detection + voice/text input
🟠 Overwhelming product listings	🟢 Personalized, context-aware recommendations
🟠 No clarity on best deals or coupons	🟢 Auto-applied coupons via Synphase Scraper
🟠 Complex checkout flows	🟢 Streamlined voice-first ordering system
🟠 Static dashboards and limited insights	🟢 Dynamic dashboard with conversational UX
🟠 No real-time decision feedback	🟢 LLM-as-Judge provides on-the-fly optimization
🟠 Limited customer engagement	🟢 Conversational agents tailored to user needs
🟠 Siloed services & fragmented UX	🟢 Unified AI Orchestrator with agent collaboration

ㅤ

Instead of making people navigate complex websites, Wynnie lets you just talk naturally about what you're looking for. The AI figures out your intent, researches products automatically, finds the best deals, and presents you with optimized options. It's like having a really smart friend who knows everything about shopping!

Wynnie changes the game by understanding natural conversation, making smart decisions independently, and handling all the tedious optimization work automatically! 👪

Beyond just understanding what you say, Wynnie does the heavy lifting with real-time product research through Perplexity AI, automatic deal optimization, and smart delivery planning.

The kicker? Users don't need to learn anything new — just talk naturally about what you want! This makes shopping accessible to everyone, regardless of language, tech skills, or physical abilities.

We're aiming for shopping that's fast, smart, and genuinely helpful through AI conversations that understand context, preferences, and optimize for the best outcomes automatically! ✨

Our mission is making online shopping work for everyone through natural AI conversations that connect people with exactly what they need through Intelligent Shopping Automation.

Snapshots 🖼️

Why AssemblyAI?

Voice recognition is make-or-break for accessible shopping. AssemblyAI's Universal Speech Model gives us the accuracy and language support we need without the headaches.

AssemblyAI automatically detects what language someone's speaking from a list of 50+, figures out who's talking when, and gives us word-level timing. This lets us build shopping experiences that actually work for real people having real conversations.

// AssemblyAI Voice Processor for Wynnie 🦄

async transcribeAudio({
  file,
  speech_model = 'universal',
  language_code,
  punctuate = true,
  format_text = true,
  speaker_labels = false,
  speakers_expected,
}: AssemblyAITranscriptionRequest): Promise<AssemblyAITranscriptionResponse> {
  try {
    const audioUrl = await this.uploadAudio(file);
    const transcriptionJob = await this.startTranscription(audioUrl, {
      speech_model, language_code, punctuate,
      format_text, speaker_labels, speakers_expected,
    });
    const completedTranscription = await this.pollForCompletion(transcriptionJob.id);
    return completedTranscription;
  } catch (error) {
    console.error('Error transcribing audio with AssemblyAI:', error);
    throw error;
  }
}

AssemblyAI processes all our voice input and turns natural speech into shopping intent. The multi-language support means we can help users regardless of what language feels most comfortable to them.

Also, thanks for the $50 credits which helped us quickly get started! 🙏🏻

Our AI agent system (powered by OpenAI's GPT-4o mini) generates smart yet fast shopping recommendations where AssemblyAI handles the voice input, and our orchestrator coordinates specialized agents for finding products, optimizing deals, and planning delivery. Everything works together seamlessly! 🙂

We dove deep into AssemblyAI's advanced features like speaker diarization and confidence scoring. Building voice-first interfaces was definitely a learning curve since most of us come from traditional web development. We had to study voice interaction patterns and accessibility guidelines from scratch.

Besides AssemblyAI, we learned tons about autonomous agent architectures, real-time AI coordination, and building systems that actually scale. 🌟

ㅤ

Design 🎨

We were heavily inspired by the revised version of Double Diamond design process, a model popularized by the British Design Council, which not only includes visual design, but a full-fledged research cycle in which you must discover and define your problem before tackling your solution & then finally deploy it.

Discover: Understanding why current shopping experiences fail so many people.

Define: Figuring out what an autonomous shopping agent actually needs to do.

Develop: Building the multi-agent system that handles real conversations.

Deliver: Launching with PWA support and continuous learning from real users.

We used Figma extensively, focusing on voice interaction flows and accessibility patterns. Our friend Praveen did the user testing which helped us refine how the AI responds and when it asks for clarification.

CREDITS

Design Resources: Figma Community, Web Accessibility Initiative
Icons: Lucide React, accessibility-focused icon sets
Typography: Manrope and other system fonts for maximum readability

Challenges we ran into 😤

Building an autonomous shopping agent brought some really interesting technical challenges, especially around coordinating multiple AI services in real-time.

The biggest headache was getting all our AI services to work together smoothly without creating noticeable delays. We needed AssemblyAI for voice processing, Perplexity AI for product research, and OpenAI for reasoning, all of them working in harmony. Orchestrating these different agents while maintaining fast response times required some creative prompt engineering and smart fallback strategies.

Getting voice recognition accuracy right across different accents and speaking styles was trickier than expected. AssemblyAI's auto-detection helped a ton, but we still had to fine-tune confidence thresholds and build intelligent fallback mechanisms.

Real-time optimization without sacrificing privacy was another puzzle. We ended up doing as much processing as possible on the client side while using secure API calls for the AI services.

We're really proud of creating a shopping experience that genuinely works across languages and provides intelligent optimization. The multi-agent architecture successfully handles complex shopping tasks without human intervention! :)

ㅤ

What's next? 🚀

Wynnie has serious potential to change how people think about online shopping. We want this to be the thing that finally makes e-commerce work for everyone, regardless of language, tech comfort, or physical abilities!

What we're building next:

Predictive Shopping: AI that suggests things before you even ask
Visual Product Search: Point your camera at something and find it online
Group Shopping: Shop with friends and family through shared conversations
Sustainability Scoring: See the environmental impact of your purchases
Cross-Platform: Native mobile apps and smart speaker integration

We're excited to expand language support, improve AI accuracy, and connect with more online retailers!

ㅤ

End Notes 🙌🏻

Huge thanks to DEV for hosting this challenge and the AssemblyAI team for excellent documentation and API design, and to the open-source community for inspiration and support! 🙌

Permissive License ⚖️

Apache 2.0

Kleio — Transform Meetings into Actionable Insights ⚡

Pratyay Banerjee — Mon, 25 Nov 2024 07:54:48 +0000

Categories of Submission for the AssemblyAI Challenge :

No More Monkey Business 🙈
Really Rad Real-Time 🤖

Team 👥 : @neilblaze & @rds_agi

What We Built

Kleio is a bleeding-edge SaaS AI solution that enhances your meeting experience. Our platform seamlessly integrates with your video conferencing tools to capture, analyze, and distill the essence of your meetings into clear, concise, and actionable formats ✨

And it's much much more than that! Check out the video below 👇🏻

Demo Video ▶️

App Tryout Link 🔗

Kleio 👉 kleio.vercel.app / [Deployed on Vercel ▲]

How does it Work?

Kleio operates through a browser extension that, upon user authentication and consent, integrates with Google Meet sessions to capture audio streams. The system implements dual processing: primary audio segments are stored in Cloudflare R2 Datastore and processed through AssemblyAI's speech-to-text engine, while a fallback mechanism leverages the WebSpeech API to store temporary caption snapshots in IndexDB. Real-time processing via AssemblyAI's LeMUR generates contextual summarization, analytics, sentiments, and mindmaps, all accessible through an end-to-end encrypted Next.js dashboard built with ShadCN UI components. The dashboard enables real-time querying and converts discussions into AI-generated handwritten study notes, with all insights available immediately post-meeting.

Features 🎠

AI-Powered Intelligent Transcription with advanced speech-to-text technology
Automatic Meeting Summary Generation highlighting key points and action items
One-Click Presentation Creation transforming meetings into professional slides
Comprehensive Meeting Analytics tracking participation, patterns, and productivity
Smart Handwritten Answer Note Generation (especially useful for students)
Seamless Platform Integration, supports Google Meet, MSFT Teams, and Zoom (Web)
Custom Chrome Extension for ease of access!
Real-Time Meeting Transcript Analysis
Enterprise-Grade Security with E2E Encryption
Scalable Infrastructure supporting meetings of all sizes
Instant Insights Delivery through intuitive web application
Instant Collaborative Sharing of Meeting Insights

Target Audience 👥

Students
Business Professionals
Startups
Enterprise Teams
Remote Workers
Collaborative Teams

So, how's AssemblyAI's LeMUR being used here? 🤔

Automatic transcription of audio recordings
Generating context-aware answers to specific questions
Supporting multiple question formats
Handling predefined answer options
Processing spoken data with intelligent retrieval
Extracting insights from meeting transcripts
Enabling Q&A functionality on audio content
Flexible summarization of meeting discussions

Design 🎨

Discover: a deep dive into the problem we are trying to solve.

Define: synthesizing the information from the discovery phase into a problem definition.

Develop: think up solutions to the problem.

Deliver: pick the best solution and build that.

Moreover, we utilized design tools like Figma & Photoshop to prototype our designs before doing any coding. Through this, we are able to get iterative feedback so that we spend less time re-writing code.

🟦 GitHub Repository: https://github.com/H4CK4TH0N/kleio

To run the app locally, follow this guide.

🟦 License: MIT

What's next? 🚀

We believe that our App has great potential. We just really want this project to have a positive impact on people's lives! We would love to make it more cross-platform and multilingual so that the user interaction increases to a great extent!

Conclusion 🐣

It has been all fun, and we got to learn so many things in such a short span 🙌. Thank you #DEV #DEVCommunity & #AssemblyAI for hosting this hackathon! 💙

VisuSpeak — 𝙑𝙞𝙨𝙪𝙖𝙡𝙞𝙯𝙚 𝙩𝙤 𝙎𝙥𝙚𝙖𝙠 👀🗣️

Pratyay Banerjee — Mon, 24 Jun 2024 06:51:25 +0000

This project has been archived!

HealthifAI — 𝘚𝘦𝘢𝘮𝘭𝘦𝘴𝘴 𝘏𝘦𝘢𝘭𝘵𝘩𝘤𝘢𝘳𝘦 𝘴𝘰𝘭𝘶𝘵𝘪𝘰𝘯𝘴 𝘧𝘰𝘳 𝘗𝘳𝘰𝘷𝘪𝘥𝘦𝘳𝘴 🏥⚕️

Pratyay Banerjee — Mon, 20 Feb 2023 21:43:43 +0000

Category Submission:

Wacky Wildcard 🃏
Smooth Shifters 🌬️

What we built 🤗

HealthifAI is a smart Web application built to provide Seamless Healthcare solutions for Providers & is fueled by Linode. 🏥⚕️

Creators :

Pratyay Banerjee (@neilblaze)
Subham Sahu (@subhamx)

Inspiration 💡

Healthcare is one of the most important and critical industries in the world. Providing quality medical care to patients is essential, but it is often hindered by various challenges such as overburdened healthcare workers, lack of medical devices in rural areas, and administrative stress.

With the advent of artificial intelligence and machine learning, the healthcare industry has a unique opportunity to tackle these challenges head-on and revolutionize the way medical care is delivered.

With this as context, we plan to tackle the Provider Shortage & Burnout and Access to Care strategic themes.

What it does 🤔

HealthifAI aims to tackle several key pain points in the healthcare industry — specifically for the following :

Provider Shortage & Burnout :

Intuitive, easy & safe digital patient record entry which eliminates the need for manual and legacy record entry methods.
We provide an ML-powered "soft diagnosis" to save time for doctors and nurses.
We have location-based COVID-19 alerts to better equip workers.
Multilingual speech-to-text notes, because it's easier!
Reminder system to help with medication/check-ups. Keeping track of everything is hard!

Access to care :

Multilingual communication model that transcribes speech from any language into English. This is particularly helpful in rural areas where communication is a barrier.
Experimental Computer-Vision powered heart rate monitor. This transforms everyday hand-held devices into medical devices - an exciting vision for the future!

App Tryout Link 🔗

👉 Home : https://healthifai-with.tech [Frontend deployed on Vercel ▲ & Backend deployed on Linode]

Alternate URL : http://45.79.166.94 (Hosted on Linode)

📌 Endpoint List :

https://healthifai-with.tech/login → Login page

https://healthifai-with.tech/result → Bad Request (4xx)

https://healthifai-with.tech/result?sym1=cough&sym2=itching&sym3=none&sym4=none&sym5=none → Disease Prediction [Note : Passing "none" is allowed, provied we have to pass all five symptoms. Feel free to explore list of symptoms over here. Methods allowed : POST & GET]

https://healthifai-with.tech/transcribe → OpenAI Whisper Auto-Translate EN Transcription [ You can view the uploaded recdummy.mp3 file recorded in Hindi (HI) language here. Methods allowed : POST & GET]

https://healthifai-with.tech/d3data?sym1=cough&sym2=itching&sym3=none&sym4=none&sym5=none → Returns only precautions.

Video ▶️

Privacy & Security 🔐

HealthifAI handles a wide range of sensitive information as healthcare data. In the wrong hands, this data could dramatically harm individuals. We took special efforts and considerations to ensure that our platform protects the privacy and sensitive information of all of our users making it 100% GDPR compliant!

How we built it ⚙️

HealthifAI was built using cutting-edge AI and Machine Learning technologies, including OpenAI's whisper as well as DETR (End-to-End Object Detection) model with ResNet-50 backbone.

For the disease detection part, we've used a Kaggle dataset which can be found here. Our machine learning model performs extremely well in terms of disease prediction as we benchmarked an accuracy of more than 92% over a prolonged period. Based on the inference, we also return symptom severity and basic precautions in no time!

We used the Flask framework to build a RESTful API that can handle incoming requests and return appropriate responses. For the front-end, we used React.JS & Tailwind as CSS framework. The Authentication (OAuth) has been done by Firebase & we’re also using the Cloudstore database for storing user logs. We have deployed the front-end of our Webapp on Vercel & most importantly, the backend is running on Linode.

The API was integrated with the Open-AI speech-to-text model "whisper" to transcribe speech from any language into English. Further, Gaussian Naive Bayes for classification was implemented to "soft diagnose" patients based on their symptoms.

We're also running our custom algorithm to analyse and return the heartbeat in realtime using a concept called as photoplethysmography, where we leverage a camera & with the capability of face detection, we record images of facial skin, as skin can represent changes in arterial blood volume between the systolic and diastolic phases of the cardiac cycle & then we return the ROI. The computer-vision powered heart rate monitor was built using image processing techniques built with OpenCV. In essence, the camera detects sensitive changes in the neck and forehead which is then used to infer heart rate.

Screenshots 🖼️

Disease Prediction	Heartbeat Monitor

Link to Source Code 👨‍💻

Neilblaze / HealthifAI

HealthyfAI — Crafted with 💙

Category Submission:

Wacky Wildcard 🃏
Smooth Shifters 🌬️

What we built 🤗

HealthifAI is a smart Web application built to provide Seamless Healthcare solutions for Providers & is fueled by Linode. 🏥⚕️

Inspiration 💡

With this as context, we plan to tackle the Provider Shortage & Burnout and Access to Care strategic themes.

What it does 🤔

HealthifAI aims to tackle several key pain points in the healthcare industry — specifically for the following…

View on GitHub

Permissive License ⚖️

MIT

Why Linode?

Building scalable systems are always a tricky thing. Although our app is not serving millions of customers as of now, but as software enthusiast, we strive to build an infinitely scalable application. And here Linode helped us a lot.

Thanks to Linode for providing us with $100 credits! 😊

Linode is a cloud-based virtual machine service that we used to deploy the backend of our HealthifAI project. We chose Linode because it provides a reliable and scalable hosting solution for our web application, and it also offers competitive pricing.

Once our virtual machine was up and running, we installed and configured the necessary softwares and other tools required for our application. We also set up security measures, including firewalls and SSL certificates, to ensure that our backend was protected from potential cyber threats.

Linode provided us with an easy-to-use interface to manage our virtual machine, including monitoring tools to track server performance and resource usage. It also allowed us to scale our resources up or down depending on the demands of our application.

Overall, using Linode allowed us to deploy a reliable and scalable backend for our HealthifAI project, without worrying about the complexities of managing our own physical servers.

Design 🎨

Discover: a deep dive into the problem we are trying to solve.

Define: synthesizing the information from the discovery phase into a problem definition.

Develop: think up solutions to the problem.

Deliver: pick the best solution and build that.

Moreover, we utilized design tools like Figma, Photoshop & Illustrator to prototype our designs before doing any coding. Through this, we are able to get iterative feedback so that we spend less time re-writing code.

Research 📚

Research is the key to empathizing with users: we found our specific user group early and that paves the way for our whole project. Here are a few of the resources that were helpful to us —

What do healthcare workers spend most time on? | NIH
Measuring Heart-rate through muted videos | MIT Media lab
An overview of healthcare in rural areas | Rural Health Information
Provider Burnout | NIH
Communication in rural healthcare | Optimizing rural healthcare
Can we use ML to diagnose diseases? | NIH
Lack of medical workers plagues developing world | Reuters
Linode docs
Linode Compute Instances

CREDITS

Design Resources : Freepik, Behance
Icons : Icons8, fontawesome
Font : Righteous / Roboto / Raleway

Challenges we ran into 😤

Building HealthifAI was not without its challenges. One of our challenges was integrating the various AI and machine learning technologies into a cohesive and functional system. This required a deep understanding of each technology, as well as expertise in data processing and software engineering. We participated in hourly review sessions to share findings of distributed research - our biggest challenge was sticking to tight schedules! Moreover, we also ran into troubles while deploying the backend on Linode, but thanks to it's amazing documentation, things got sorted quite quickly!

We are proud of finishing the project on time which seemed like a tough task as we started working on it quite late due to other commitments. We were also able to add most of the features that we envisioned for the app during ideation. And as always, working overnight was pretty fun! :)

What's next? 🚀

The sky's the limit for HealthifAI. We are already exploring new ways to improve and expand the platform, including incorporating new technologies and partnering with healthcare providers to bring our vision to a wider audience. We're committed to making a real impact in the healthcare industry and changing lives for the better.

Conclusion 🐣

That's it for now! We can't wait to see the impact that HealthifAI will have on the world. Stay tuned for updates and more exciting developments! Also, I would love to thank my project partner @subhamx for helping me, & Special thanks goes to @devencourt for resolving everyone's doubts! 🙌.

And as always, thank you #DEV #DEVCommunity & #Linode for hosting this hackathon! 💚

Binoculearn.ai — 𝘓𝘦𝘢𝘳𝘯𝘪𝘯𝘨 𝘪𝘯 𝘭𝘰𝘸-𝘣𝘢𝘯𝘥𝘸𝘪𝘥𝘵𝘩 𝘪𝘯𝘵𝘦𝘳𝘯𝘦𝘵 𝘙𝘦𝘥𝘦𝘧𝘪𝘯𝘦𝘥⚡

Pratyay Banerjee — Thu, 08 Dec 2022 21:47:04 +0000

Category Submission:

Choose Your Own Adventure ⚛
Google Cloud Superstar ☁️

What we built 🤔

Binoculearn is a bleeding-edge smart P2P educational video conferencing web application aimed to deliver a reliable frame rate & is backed by low-latency support along with low jitter (smooth and consistent), as well as high audio quality. We do this by converting the video stream into ASCII characters on the client side and send it via WebRTC using Twilio’s video conferencing service deployed on Google Cloud App Run ☁️ & is fuelled by MongoDB Atlas! 🍃

💡 Implementing video conferencing using this technique saves bandwidth bidirectionally & especially on the receiver end. This method is both vertically & horizontally scalable as we can feed more users as they enter the conference.

App Tryout Link 🔗

Binoculearn.Ai 👉 https://binoculearn-5jjb23a4oq-ue.a.run.app [Deployed on Google Cloud App Run ☁️]

Features 🎠

P2P lagfree video conferencing app with ultra-low bandwidth support!

Bleeding-edge Image Compression Algorithm!

Vertically & Horizontally Scaleble [Currently capped at 4, because of Twilio Credits]

Twilio Live Transcription [Stored in MongoDB Atlas]

P2P Messaging with Sentiment Analysis via Natural Language API!

Generate Summary & Transcript of the meeting!

File Sharing (blob) via MongoDB Atlas!

User Dashboard with Previous Activity Tracker!

Minimalist UI/UX powered by ReactJS & Tailwind CSS

High Quality Multiplexed Audio!

Overall Meeting Emotion Tracker

MongoDB Atlas as Non-SQL DB

Deployed on Google Cloud App Run

Saves 💰 + Internet Data!

Secure O-Auth via Firebase by Google!

100% GDPR compliant & SEO friendly inteface!

Video ▶️

Privacy & Security 🔐

Binoculearn deals with a wide range of sensitive information. In the wrong hands, this data could dramatically harm individuals. We took special efforts and considerations to ensure that our platform protects the privacy and sensitive information of all of our users making it 100% GDPR compliant!

We also made sure that all data is sent securely over the network. Binoculearn leverages the security benefits of TLS for encryption. We also encoded all of our data using Base64 encoding. Ideally, in a future iteration, we would like to encrypt all data using a more secure method.

Background 📜

As ubiquitous and fast as the internet seems in developed countries, developing countries still struggle with reliable internet connections. The impact of poor internet connectivity exasperate the education inequality between children from prosperous countries and children from developing countries, because the latter cannot benefit from remote learning via video conferencing. 😔

Last year, it was found that millions of students in the state of Odisha in India are stuck at home with no access to either internet or online education. My friend (teammate) Subham Sahu, an Odisha native, has had first-hand experience of interruptions during his undergraduate studies.

For even those who have access to the internet, the price is premium and the bandwidth is limited. For instance, while talking to his parents in India, Subham found that they frequently run out of their allocated 1 GB far before the allowance period, after which the bandwidth gets throttled: stalled frames, choppy audio, painful delays, and eventual disconnections, and subsequent retries are a normal occurrence, but still arguably much better than normal telephone conversations because he gets to “see” them.

At the heart of the problem may lack tele-infrastructure for the implementation of education on virtual platforms.

Why RGB Frames are heavy?	Why of What?

Videos are array of images that consists of large bytes of data which packets transfer from one place to another. Using our smart algorithm, we are compressing the image and then converting it into grayscale bit images which are then transfused by ASCII characters while transmitting. 🪄

To address this problem, we propose a new approach based on the insight that if we are willing to give up some realism or realistic rendering of faces and screens, then there is a whole new world of face and screen representations that can be derived for ultra-low bandwidth, with an acceptable quality of experience! 👪

The proposed solution can be primarily implemented as software needing no change in the underlying infrastructure! This would in turn be cheaper, and allow internet access to people that are currently being marginalized based on their affordability.

We aim to prioritize a reliable frame rate with low latency and low jitter (smooth and consistent), as well as high audio quality serving the purpose of online education for the better! ✨

Our goal with this platform is to connect students in poorly connected areas with highly qualified teachers in metropolitan areas and abroad to facilitate Cost-Effective Stable Remote Collaboration.

Screenshots 🖼️

Home Page [Before O-Auth]	Home Page [After O-Auth]

Host a Meeting [Old / 16:9]	Join a Meeting [Old / 16:9]

Host a Meeting [New]	Join a Meeting [New]

User Dashboard [New]	Dashboard Insight [New]

☝️ Final Demo done by Pratyay [Click Above]

Description 🦄

Experience a Superfast, low-latency P2P videochat even on ultra-low Bandwidth networks. Redefining the communication gap, Binoculearn is a MIT licensed open-sourced project made for students, by the students & will be Free Forever! ⚡

On top of the bandwidth-saving functionality, we also offer educational and content-moderation tools like Sentiment Analysis via Google Cloud's Natural Language API for session chat QnA and our custom Trained ML model deployed on GCP for Meeting Summarization.

These features allow both the educators and students to maintain decorum in the meeting and also have follow-up material to retain information about the meeting!

Link to Source Code 👨‍💻

Neilblaze / Binoculearn.AI

𝘓𝘦𝘢𝘳𝘯𝘪𝘯𝘨 𝘪𝘯 𝘭𝘰𝘸-𝘣𝘢𝘯𝘥𝘸𝘪𝘥𝘵𝘩 𝘪𝘯𝘵𝘦𝘳𝘯𝘦𝘵 𝘙𝘦𝘥𝘦𝘧𝘪𝘯𝘦𝘥⚡ — Project Submission for MongoDB Atlas Hackathon'22

Binoculearn.ai

𝘓𝘦𝘢𝘳𝘯𝘪𝘯𝘨 𝘪𝘯 𝘭𝘰𝘸-𝘣𝘢𝘯𝘥𝘸𝘪𝘥𝘵𝘩 𝘪𝘯𝘵𝘦𝘳𝘯𝘦𝘵 𝘙𝘦𝘥𝘦𝘧𝘪𝘯𝘦𝘥⚡ — Project Submission for MongoDB Atlas Hackathon'22 🍃

Binoculearn is a bleeding-edge smart P2P educational video conferencing web application aimed to deliver a reliable frame rate & is backed by low-latency support along with low jitter (smooth and consistent), as well as high audio quality. We do this by converting the video stream into ASCII characters on the client side and send it via WebRTC using Twilio’s video conferencing service & is fuelled by MongoDB Atlas! 🍃

💡 Implementing video conferencing using this technique saves bandwidth bidirectionally & especially on the receiver end. This method is horizontally scalable as we can feed more users as they enter the conference.

Installing / Getting started

There are two folders, where my-app is for the front-end & server is for the backend.

Setting up Dev [Make sure `.env` is loaded with your own credentials]

git clone

…

View on GitHub

Permissive License ⚖️

MIT

How we built it ⚙️

First and foremost, it is Crafted with 💙. The whole process can be broken into the following points :-

➤ React.JS, Redux + Tailwind CSS on the frontend
➤ Express.js, Node.js, Sockets, WebRTC, Twilio Live on the backend

➤ Prisma for connecting the Frontend to the MongoDB Atlas to store user Data + Logs
➤ External services like Twilio, GCP Natural Language API
➤ GitHub as CI/CD and Google's App Engine for Deployment [Duplicate Deployment done on Vercel]

QnA model Architecture BERT:	Summarizer Architecture

➤ Prisma Schema for MongoDB Atlas:

➤ Database Deployments for MongoDB Atlas [Click Below .Gif ⬇️]:

➤ Data Services — MongoDB Atlas [Click Below .Gif ⬇️]:

➤ Collections — MongoDB Atlas [Binoculearn]:

➤ Session Timeline-Chart MongoDB Atlas [Binoculearn]:

➤ Session Sentiment — Timeline MongoDB Atlas [Binoculearn]:

Why MongoDB Atlas?

Databases are always a tricky thing, as they're stateful in nature. Although our app is not serving millions of customers, but as software enthusiast, we strive to build an infinitely scalable application. And here MongoDB Atlas helped a lot.

We didn't have to manage the database, no need to work on complex networking between shards, etc. (And thanks to Google Cloud Run, our server can scale too. Hopefully this app gets viral and we're able to cater a million real users! 😊)

We used MongoDB as our primary database to store user sessions, meeting details, user socket IDs (which powers our real-time websockets engine), etc.

We're also generating summary of the meeting, and instead of generating in an synchronous flow, we're using MongoDB realm to have a serverless function hosted to generate the summary. In this way, we're able to keep the app more cohesive! 🙂

We learnt a lot of things including most of the MongoDB Realm, and many of the features of MongoDB Atlas. Data modelling in MongoDB was bit tricky, as both Neel and I come from transitional relational database space. We together studied few of the video lectures from MongoDB University to understand the anti patterns and in-depth data modelling in MongoDB.

Apart from MongoDB we learnt about Google App Engine, Cloud run, and Cloud build. 🌟

Design 🎨

Discover: a deep dive into the problem we are trying to solve.

Define: synthesizing the information from the discovery phase into a problem definition.

Develop: think up solutions to the problem.

Deliver: pick the best solution and build that.

Research 📚

Research is the key to empathizing with users: we found our specific user group early and that paves the way for our whole project. Here are a few of the resources that were helpful to us —

CREDITS

Design Resources : Freepik, Behance
Icons : Icons8, fontawesome
Font : Urbanist / Roboto / Raleway

Challenges we ran into 😤

This project was initially built in under 24 hours, 19th Nov — 20th Nov, 2022 & I actually got to know about the MongoDB Atlas hackathon from Subham on 20th of November, 2022 (IST time) when we were eagerly waiting for MetroHack's result. Personally, I'm a Hackathon freak. I love hackathons because it helps to generate specific ideas in a distinct domain within a small span of time & as a result, it not only increases the creativity but also enhances the curiosity while tackling corner cases while building the project/product. Since then, we kept committing towards this project. We [I, Subham & Gaurang] did face some challenges during the hackathon, many of which ironically related to working remotely. One of the major challenges was the time difference. All of us participated from different time zones, which created communication challenges.

One of the biggest challenge we faced was with deployment. Since our application has a web socket endpoint, we had very few deployment options. We initially thought that App Engine standard environment would serve the purpose, but in the end realised that it doesn't support Websockets. Finally we had to migrate our deployment stack to Google Cloud Run, which provided us the flexibility of environment and also allowed to use web sockets.

What's next? 🚀

We believe that our App has great potential. We just really want this project to have a positive impact on people's lives! We would love to make it more scalable & cross-platform so that the user interaction increases to a great extent! Also, it's noted that we do have a bunch of ideas in our bucket-list which we are looking forward to transform into reality!

Conclusion 🐣

It has been all fun, & I would love to thank my buddies @subhamx & Gaurang for helping me, & Special thanks goes to @stanimiravlaeva, @mlynn & @joel__lord 🙌. And as always, thank you #DEV #DEVCommunity & #MongoDB for hosting this hackathon! 💚

Update [13/12/2022] ⚠️ — Few days ago, we ran out of credits, hence API credentials had been revoked! You can run the same on your local, hence, use your own credentials. But recently, we re-deployed it via Google Cloud Run, so hopefully you can explore our live-app at — https://binoculearn-5jjb23a4oq-ue.a.run.app

Update [15/12/2022] ⚠️ — Replaced Old Demo Video with New one!

Who-of-us? — Find else he'll escape 😉

Pratyay Banerjee — Sun, 10 Jan 2021 12:10:28 +0000

What I built

Whoofus is a sleuth game based on phaser js, where the crux of the matter is that you need to be at utmost alert to find out the killer!

Immersive yourself as a Detective, & solve the mystery to find out who of us did it!

Category Submission: Random Roulette

App Link

Whoofus is deployed on both DigitalOcean & Vercel. The game can be played at,

Whoofus ~ DigitalOcean ➟ Play Here
Whoofus ~ Vercel ➟ Play Here

Gameplay Demo :- https://youtu.be/x7WqmYT5tLc

Screenshots

Homepage

The main map

Game Demo

Description

As previously mentioned, Whoofus is a single-player TPF (Third person View) sluth game based on phaser js, where the crux of the matter is that you need to be at utmost alert to find out the killer! Immersive yourself as a Detective, & solve the mystery to find out who did it! There are 8 characters roaming about this map, moving about. There are walkways and pathways in which they can meet one another. When they meet, one of them can either kill the other one, or they could just pass by each other. When there is a murder, as a detective, your task is to find out who is the murderer of that specific dead body. And also not to mention, we also added some Easter Eggs inside this game! 😉

Whoofus runs on both on PC & mobile directly on the browser (on Canvas), although it's always recommended to play on PC for much better experience! ✨

Link to Source Code

Github Repository : https://github.com/Neilblaze/Whoofus
Initial commit - 12th Dec'2020

Permissive License

➤ Whoofus is registered under Apache License 2.0

Background

It all started with a detective themed hackathon on 12th Dec'20. Personally, I'm a Hackathon freak. I love hackathons because it helps to generate specific ideas in a distinct domain within a small very period of time & as a result, it not only increases the creativity but also enhances the curiosity while tackling corner cases while building the project/product. Unfortunately, we couldn't finish making Whoofus during the hackathon (since my exams were going on then) but now, it's ready to play! :D

How I built it

We used the following tech stacks for building our project,

➤ Phaser Js
➤ Vue.js
➤ Vanilla Javascript
➤ HTML
➤ Bootstrap
➤ Tailwind CSS
➤ jQuery

Challenges we ran into

We weren't that much expert in phaser js. So, we had a lot of issues to set up the map and push those characters into the same. But with the same side, the documentation of phaser is just awesome as it provides lots of materials to get started! The game engine works on Vue.js, which wasn't a piece of cake for all of us since most of us prefer working in React.js. Also, it was a bit difficult for us to collaborate in a virtual setting but we somehow managed to finish the project on time.

What we learned?

A lot of things, both summed up in technical & non-technical sides. For the technical side, we got to learn so much about configuring the game engine which is entirely written on phaser.js. We also gained some UI/UX skills while one of us was busy building the frontend side of the project. Not to mention, Stackoverflow was the gem for us while we're troubleshooting some complicated issues.

Coming to the deployment part, I've been deploying apps to DigitalOcean since Hacktoberfest'19 😉
And I can fairly say, it's very easy to deploy apps over there specifically using DigitalOcean App Platform! For newbies, this guide from DO will get them started!
Also not to mention, the same instance is also being deployed via Vercel as mentioned above.

What's next for Whoofus

We're going to go through many changes & planning to add the following updates to the project in the future,

Improve the UI of the game.
Add more characters (Currently 8 characters available)
Add multiplayer feature (on Socket.io)
Live Chat support.

Whoofus can be further developed to add more features to make it more attractive & fluidic on every device! This would include some research work which we are planning to undergo soon!

Additional Resources/Info

Conclusion

It has been all fun, & I would love to thank my friend Sandipan for helping me, & as always, thank you #DEV #DEVCommunity & DigitalOcean for hosting this hackathon! ❤️