I nearly shipped a build that looked healthy until the first launch hit a missing dependency. That kind of failure is useful because it tells you exactly where the app still feels like a script instead of software. The first thing I fixed was not the recording loop, not the UI, and not the transcription flow. I fixed startup.
I think of that moment as a turnstile: the app either has the pieces it needs and moves forward, or it stops immediately and tells you what is missing. There is no half-start, no confusing traceback buried after partial initialization, and no false sense that the app is ready when it is not.
Yapper is a speech-to-text desktop app that types wherever the cursor is positioned, and the repo supports two entry points: console mode and tray mode. The cleanest place to study the startup path is the main entry file, because it shows the order of operations with almost no ceremony. It sets up import search paths, checks dependencies, and only then pulls in the rest of the application core.
The first real job: fail early and explain why
The most important thing the entry file does is not transcription. It is dependency validation. The file defines a check_dependencies() function before any of the heavier imports happen, and that function checks the exact packages the console path needs: pyaudio, keyboard, and openai.
That order matters. If those imports are delayed until after the app has already begun constructing runtime objects, the failure becomes noisy and expensive to debug. By checking first, Yapper turns missing prerequisites into a single, readable startup message.
Here is the core of that function:
import platform
import sys
def check_dependencies():
missing = []
try:
import pyaudio # noqa: F401
except ImportError:
missing.append('pyaudio')
try:
import keyboard # noqa: F401
except ImportError:
missing.append('keyboard')
try:
import openai # noqa: F401
except ImportError:
missing.append('openai')
if missing:
print()
print(' Missing dependencies. Please install them:')
print(' pip install ' + ' '.join(missing))
if 'pyaudio' in missing:
system = platform.system()
if system == 'Windows':
print()
print(' For PyAudio on Windows, you may need:')
print(' pip install pipwin')
print(' pipwin install pyaudio')
elif system == 'Darwin':
print()
print(' For PyAudio on macOS:')
print(' brew install portaudio')
print(' pip install pyaudio')
else:
print()
print(' For PyAudio on Linux, first install:')
print(' sudo apt-get install python3-pyaudio portaudio19-dev')
sys.exit(1)
That code is doing a few things at once, and each one is deliberate.
First, it records missing modules in a list instead of failing on the first import error. That means the user sees the whole install problem in one run, not one missing package at a time. If pyaudio, keyboard, and openai are all absent, the app reports all three together. That saves a round trip through the startup path.
Second, it gives pyaudio special handling. That is the package most likely to require platform-specific installation guidance, so the function branches on the detected operating system and prints the right next step for Windows, macOS, or Linux. That is not cosmetic. It is the difference between a useful error and a support ticket.
Third, it exits immediately. That is the correct move. If a desktop app cannot import its essential runtime packages, it should not limp into a partial state and wait to fail somewhere deeper in the audio pipeline.
Why the import order matters
Right above the dependency check, the entry file adds the application directory to the Python import path. Then it runs the dependency check. Only after that does it import the rest of the core layer from core.
That ordering is the real startup architecture.
The file is saying: make the application modules importable, confirm the external packages exist, and only then assemble the working system. That means the environment is validated before the app constructs the settings object, audio recorder, transcriber, text typer, sound player, voice activity detector, or transcription history.
That matters because those classes are not decorative. The recorder depends on the audio stack. The transcriber depends on the OpenAI client. The keyboard module drives hotkey behavior. If any of those dependencies are missing, the app should fail before it tries to bind them into a live recording session.
Here is the launch flow as a small map of the real startup sequence:
flowchart TD
entry["app/yapper.py starts"] --> path["Add APP_DIR to sys.path"]
path --> deps["Run check_dependencies()"]
deps -- "missing packages" --> missing["Print install guidance and exit"]
deps -- "all packages present" --> core["Import core modules"]
core --> settings["Load Settings"]
settings --> build["Create recorder, transcriber, typer, etc."]
build --> run["Start console interaction"]
The settings file is pinned to the app tree
The other startup decision that matters is where settings live. The config module centralizes that decision with a private settings file variable, a getter, and a setter.
The default path is computed from the application directory, not from the current working directory. That is the right move for a portable desktop app and the right move for repeatable startup behavior. It means the app can be launched from different shells or entry points without changing where it finds the settings file.
Here is the path logic:
from pathlib import Path
from typing import Optional
_SETTINGS_FILE: Optional[Path] = None
def get_settings_path() -> Path:
global _SETTINGS_FILE
if _SETTINGS_FILE is None:
app_dir = Path(__file__).parent.parent.resolve()
_SETTINGS_FILE = app_dir / 'settings.json'
return _SETTINGS_FILE
def set_settings_path(path: Path) -> None:
global _SETTINGS_FILE
_SETTINGS_FILE = path
This is one of those details that users never notice when it is correct and instantly notice when it is wrong. If settings depend on the current working directory, the app becomes fragile: start it from one folder and it behaves one way, start it from another and it behaves differently. By anchoring the file path to the app tree, the config layer keeps persistence stable across launches.
The config module also imports default settings, the settings schema, supported languages, and hotkey definitions from a constants module. That tells you where the contract lives. The constants file defines what a valid configuration looks like; the config layer loads, saves, and validates against that contract.
That separation is important. I do not want the persistence layer deciding what a language means or which hotkeys are valid. I want it to enforce the contract and get out of the way.
How startup settings shape the runtime
Once the app passes dependency validation, Yapper builds its runtime from the settings object. This is where the startup path turns into a working session.
The constructor turns the settings dictionary into the live configuration for the rest of the app. The recorder gets the selected audio device index. The transcriber gets the API key, language hint, translation target, grammar correction flag, and wake-word cleaning disabled. The sound feedback object respects the audio feedback toggle. Voice activity detection reads the volume threshold and can be disabled entirely. Transcription history respects the save-history flag.
That is a lot of state flowing through one constructor, and that is exactly why the startup path is worth understanding. It is not just loading a JSON file. It is deciding how the entire session should behave.
A few details are especially important:
- Wake-word cleaning is explicitly disabled in console mode. That behavior belongs to the background-oriented tray path, not the hotkey-driven console path.
- The recorder is parameterized with the selected device index, so the app can target a specific microphone instead of assuming a default device forever.
- Voice activity detection is optional. If disabled in settings, the app does not create it.
- Transcription history is also optional, controlled by the save-history setting.
That constructor is not flashy, but it is where startup settings become runtime behavior.
The audio layer is why dependency checks exist at all
If you look at the audio module, the reason for the startup gate becomes obvious. The recorder is built around a specific audio configuration: 16 kHz sample rate, mono input, 1024-sample chunks, and 16-bit sample width. That is not a vague wrapper around audio. It is a concrete recording path that expects the audio stack to be present and operational.
The device enumeration module adds another piece: microphone enumeration and device selection. The app can only make a meaningful choice about recording if it can inspect available input devices. That is another reason the audio library gets special handling in the dependency check.
I like this design because it pushes risk to the front. If audio support is missing, the app tells you before it opens a recorder or starts waiting for a hotkey. That is the right place for the failure.
The API client is the next explicit failure boundary
The startup gate is not the only disciplined part of the app. the API module wraps the OpenAI interaction in a narrow client that uses explicit exception types and retry settings.
The client defines custom error types for general failures, connection problems, and rate limiting, and it sets a thirty-second timeout, three max retries, and exponential backoff delays of one, two, and four seconds.
class APIError(Exception):
pass
class APIConnectionError(APIError):
pass
class APIRateLimitError(APIError):
pass
class APIClient:
DEFAULT_TIMEOUT = 30
MAX_RETRIES = 3
RETRY_DELAYS = [1, 2, 4]
That wrapper matters because startup validation and runtime network handling solve different problems. The dependency check protects the app from missing local prerequisites. The API client protects transcription from network instability and rate limiting. Together they create a predictable failure model: the app either cannot start because the machine is missing something, or it can start and then report network problems in a controlled way.
That is what makes the system feel intentional. The app does not treat the OpenAI client like an afterthought; it gives the API layer the same kind of explicit structure it gives the startup path.
Closing the loop
What I trust most in this codebase is that it refuses to pretend startup is trivial. the entry file validates the environment before it imports the core layers. the config module pins settings to a known location. the API module wraps network interaction in explicit error types and retries. Those decisions make the app easier to reason about, easier to launch, and easier to debug.
That is what a good desktop startup path does: it narrows the unknowns before the user ever records a word. Once that gate opens, the rest of the app can do its real job—capture audio, transcribe it, and type the result—without carrying avoidable startup surprises forward into the session.
🎧 Listen to the audiobook — Spotify · Google Play · All platforms
🎬 Watch the visual overviews on YouTube
📖 Read the full 13-part series with AI assistant
Top comments (0)