DEV Community

Tamesh Sivaguru
Tamesh Sivaguru

Posted on

Take your voice anywhere, transcribe on YOUR hardware.

GitHub Copilot CLI Challenge Submission

*This is a submission for the GitHub Copilot CLI Challenge*

🎀 Whisper-Typing Mobile

Privacy-First Speech-to-Text, Anywhere.

I transformed an existing open-source Windows desktop app into a full-scale, cross-platform mobile ecosystem in a single 3-hour session using the GitHub Copilot CLI.

The Challenge: How do you use high-end speech-to-text on a phone while keeping audio data 100% private and avoiding expensive cloud API fees?

The Solution: A self-hosted mobile architecture that leverages your home PC’s GPU power over a secure mesh network.

πŸš€ The Build at a Glance

Metric Result
Time to Build ~3 Hours
Lines of Code ~6,500 production lines
Files Created 50+ files
Architecture 8 Phases (Backend βž” Docker βž” Mobile βž” Docs)
Status Production-Ready

πŸ› οΈ The Tech Stack

  • Frontend: Flutter (Material Design 3) + gRPC Client
  • Backend: Python 3.13 + FastAPI + gRPC + Protocol Buffers
  • Inference: faster-whisper + Ollama (NVIDIA CUDA 12.4)
  • Networking: Tailscale Mesh Network (Encrypted Tunnel)
  • DevOps: Docker with GPU Passthrough

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Android Phone  β”‚  Push-to-talk recording
β”‚  Flutter App    β”‚  Real-time transcription
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ 
         β”‚ gRPC over Tailscale (E2E Encrypted)
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Docker Container              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚  β”‚   gRPC Server    β”‚ Port 50051         β”‚
β”‚  β”‚ (Transcription)  β”‚                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚  β”‚  Web Admin Panel β”‚ Port 8080          β”‚
β”‚  β”‚ (Configuration)  β”‚                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚  β”‚    Whisper AI    β”‚ Utilizes Home GPU  β”‚
β”‚  β”‚  faster-whisper  β”‚ via NVIDIA CUDA    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Enter fullscreen mode Exit fullscreen mode

✨ Key Features

πŸ“± Mobile App

  • Push-to-Talk: Simple, intuitive recording interface.
  • AI Improvement: Integrated Gemini support to polish transcriptions.
  • Onboarding Wizard: A 4-page setup flow for permissions and connection testing.
  • History & Clipboard: Session-based history with one-tap copy.

πŸ”Œ Backend & Security

  • Privacy-First: Your voice never touches the cloud. Phone βž” Tailscale βž” Your PC.
  • Hardware Ownership: Use your own NVIDIA GPU for blazing-fast local transcription.
  • Web Admin: Browser-based monitoring and configurationβ€”no SSH required.
  • One-Command Deploy: docker-compose up -d and you're live.

🧠 My Experience with GitHub Copilot CLI

This wasn't just "autocomplete"β€”it was a senior pair programmer. Here is how the CLI changed the game:

1. From Idea to Production in 180 Minutes

Starting from a Windows-only desktop app, I asked the CLI to plan a cross-platform expansion. It designed an 8-phase architecture and helped me execute every single one. Without it, this would have been 2–3 weeks of research and prototyping.

2. Context-Aware Engineering

The CLI didn't just write code; it wrote my code.

  • It respected my strict linting rules (ruff with ALL enabled).
  • It matched my Google-format docstring style.
  • It understood Python 3.10+ type hint requirements automatically.

Example: Copilot knew to use lazy logging to comply with ruff G004 and used Python 3.10+ generics without being prompted.

# Generated by Copilot CLI to match my project standards
from typing import Iterator
import logging

logger = logging.getLogger(__name__)

def transcribe(self, audio: bytes) -> str:
    """Transcribes audio using faster-whisper.

    Args:
        audio: Raw audio bytes in WAV format.
    Returns:
        Transcribed text string.
    """
    logger.info("Processing audio: %s", audio_id) # Validated for lazy logging

Enter fullscreen mode Exit fullscreen mode

3. Documentation as a First-Class Citizen

Normally, documentation is the last thing developers do. The CLI made it part of the flow, generating 7 comprehensive guides (Docker, Backend, User Guides, and QA procedures) that were accurate to the code we just wrote.

πŸ’‘ The "Aha!" Moments

  • Parallel Tool Calling: Watching the CLI read three files simultaneously to understand a cross-service bug was eye-opening.
  • Context Retention: It remembered a Tailscale IP discussion from Phase 1 while we were working on Phase 8.
  • Error Recovery: When a command failed, it didn't quit; it analyzed the stack trace, proposed a fix, and kept moving.

πŸ”— Links & Resources

Final Verdict: The Copilot CLI doesn't replace developer judgmentβ€”it amplifies it. It handled the mechanical boilerplate with zero fatigue, allowing me to focus entirely on the privacy architecture and user experience.

Top comments (0)