DEV Community

Cover image for Part 5: From Local Model to Live Demo - Publishing to Hugging Face with Warp/Oz
syamaner
syamaner

Posted on

Part 5: From Local Model to Live Demo - Publishing to Hugging Face with Warp/Oz

At the end of the first two-day Warp/Oz agentic development sprint, the project had a working data pipeline, a fine-tuned AST model, and an ONNX export path for Raspberry Pi 5.

The project has continued to improve since then, with more data and better models. This post captures what was achieved in that initial publishing and serving cycle, including the 973-chunk dataset snapshot discussed below, not the current latest repository state.

The hardware story had reached its milestone. But a model that lives in experiments/baseline_v2/checkpoint-best on a personal laptop is not a community contribution. It is a local artefact.

This post covers the remaining work: publishing the model, ONNX variants, and dataset to Hugging Face Hub; discovering why pipeline_tag: audio-classification does not give you a working inference widget; building a Gradio Space to fill that gap; and debugging four container failures in Hugging Face's runtime. The live Space is at huggingface.co/spaces/syamaner/coffee-first-crack-detection.

In this post

The Packaging & Open Source Milestone

By the end of that sprint, the pipeline was validated end-to-end. The last remaining story in the epic was making it publicly accessible. The /push-to-hub skill, one of the four parameterised skills from Post 1, handled the publication sequence. I invoked it once; Oz ran the full chain from inside the Warp terminal.

The chain had three parts. First, the model. model.push_to_hub(repo_id) and extractor.push_to_hub(repo_id) are the Hugging Face-native packaging contract: both the model weights and the ASTFeatureExtractor configuration land together at syamaner/coffee-first-crack-detection. Any consumer calling ASTForAudioClassification.from_pretrained("syamaner/coffee-first-crack-detection") gets everything they need in one call. I uploaded the README separately with an explicit HfApi.upload_file call to make the model card explicit and reproducible.

Before the upload, _validate_model_card() runs:

# scripts/push_to_hub.py
def _validate_model_card(model_dir: Path) -> None:
    ...
    required = {"pipeline_tag", "license", "base_model"}
    missing = required - set(metadata or {})
    if missing:
        raise ValueError(f"README.md frontmatter missing required fields: {missing}")
Enter fullscreen mode Exit fullscreen mode

This is spec-driven development enforcing itself. One missing required field in the model card and this publish path fails before upload.

Second, the ONNX variants. Both the FP32 (345MB) and INT8 (89.9MB) models were uploaded under onnx/fp32/ and onnx/int8/ subfolders on the same model repo. Each subfolder includes a copy of preprocessor_config.json, making every variant self-contained:

ASTFeatureExtractor.from_pretrained(
    "syamaner/coffee-first-crack-detection", subfolder="onnx/int8"
)
Enter fullscreen mode Exit fullscreen mode

A Raspberry Pi 5 with no repo clone can download and run the INT8 model with a single Hub call. That was the design goal from the beginning: hf_hub_download as the deployment primitive, not scp.

Third, and most importantly: the dataset. DatasetDict.push_to_hub("syamaner/coffee-first-crack-audio") published all 973 annotated chunks across the training, validation, and test splits, with audio cast to 16 kHz at push time. Anyone pulling the dataset gets properly formatted audio without running the four-step chunking pipeline from Post 2.

The dataset at huggingface.co/datasets/syamaner/coffee-first-crack-audio is, as far as I can determine, the first public annotated audio dataset for coffee roasting first crack detection. I could not find an equivalent dataset on Hugging Face, Kaggle, or in the papers I reviewed. The model weights are useful to anyone replicating this specific setup. The dataset is the contribution that remains useful even if someone rebuilds with a different architecture, adds more recording sessions, or trains for a different roasting event entirely. A model is one implementation. A labelled dataset is infrastructure.

The Widget Illusion

The model card had pipeline_tag: audio-classification. I assumed that was enough because I had seen that metadata associated with inference widgets on supported Hugging Face model pages: upload a file, click Compute, get predictions. I pushed the model and opened the Hub page expecting that experience.

The widget area read: "This model isn't deployed by any Inference Provider."

Hugging Face model card showing 'This model isn't deployed by any Inference Provider'

Hugging Face inference widgets require the model to be actively served by a provider. For this custom AST fine-tune, automatic hosted inference was not available. A paid Hugging Face Inference Endpoint could have served it, and commercial providers may serve some models, but this model needed an explicit backend. pipeline_tag describes the task; it does not provision compute.

The attempt was to add a widget YAML block with example audio URLs, the documented approach for pre-loading inputs into an inference widget (commit 1cf2b21):

# README.md (Hugging Face model card frontmatter)
widget:
  - src: https://huggingface.co/syamaner/coffee-first-crack-detection/resolve/main/audio_examples/first_crack_sample.wav
    example_title: "First crack (10s clip)"
  - src: https://huggingface.co/syamaner/coffee-first-crack-detection/resolve/main/audio_examples/no_first_crack_sample.wav
    example_title: "No first crack (10s clip)"
Enter fullscreen mode Exit fullscreen mode

It did nothing. The documentation explains how to enrich an inference widget with example inputs. In this case, there was no deployed backend, so the widget panel did not render and the metadata had nowhere to appear.

The correct path was a Gradio Space. A practical Hugging Face community serving path is a Space: a containerised app that you own, instrument, and deploy yourself, running on their free CPU tier. pipeline_tag signals intent. A Space provides the actual serving path.

The Pivot to Gradio

A Hugging Face Space is a Git repository with a README.md containing YAML frontmatter that tells Hugging Face's runtime what to run and how. The core runtime configuration for this Space was only a few lines:

# spaces/README.md
sdk: gradio
sdk_version: "6.11.0"
app_file: app.py
pinned: false
license: apache-2.0
models:
  - syamaner/coffee-first-crack-detection
Enter fullscreen mode Exit fullscreen mode

Hugging Face provisions a container, installs the dependencies from spaces/requirements.txt, and runs app.py. The Space gets a public URL. For this deployment, there was no Dockerfile, no Kubernetes manifest, and no CI configuration in the serving path.

I specified the UI requirements: a dropdown to select example clips, an audio upload component, a classify button, and a label output showing the two-class probabilities. Oz built the initial spaces/app.py in one pass as part of PR #28, using gr.Blocks for layout control rather than the higher-level gr.Interface:

# spaces/app.py
with gr.Blocks(title="☕ Coffee First Crack Detection") as demo:
    with gr.Row():
        with gr.Column():
            example_dd = gr.Dropdown(choices=list(_EXAMPLES), label="Try an example")
            audio_in   = gr.Audio(type="filepath", label="Upload Audio (WAV / MP3)")
            submit_btn = gr.Button("Classify", variant="primary")
        with gr.Column():
            output = gr.Label(num_top_classes=2, label="Prediction")

    example_dd.change(fn=load_example, inputs=example_dd, outputs=audio_in)
    submit_btn.click(fn=classify, inputs=audio_in, outputs=output)
Enter fullscreen mode Exit fullscreen mode

Copilot's review of PR #28 caught the most important structural issue: the original implementation initialised the pipeline at import time, with _pipe = pipeline(...) running unconditionally on module load. In a containerised Space, this means a cold-start crash if the Hub is temporarily unreachable during startup, with no error surfaced to the user. Copilot flagged it; the fix was a lazy initialisation pattern:

# spaces/app.py
_pipe: object = None

def _get_pipe() -> object:
    global _pipe
    if _pipe is None:
        _pipe = hf_pipeline("audio-classification", model=_REPO_ID)
    return _pipe
Enter fullscreen mode Exit fullscreen mode

The pipeline now loads on the first inference call and is cached for subsequent requests. A startup failure becomes a user-visible gr.Error on first classify, not a silent container crash.

Copilot also flagged that spaces/requirements.txt listed neither gradio nor huggingface-hub explicitly; both were implicit transitive dependencies. The explicit gradio==6.11.0 pin was added; without it the Space is not reproducible outside the Hugging Face runtime. PR #28 received 10 inline comments across three review passes.

The first deployment did not work. Four container failures appeared in sequence before the Space came up cleanly.

Agentic Debugging in the Container

Container debugging on Hugging Face Spaces followed a specific boundary. Oz could run local commands and edit files from Warp, but it could not directly see the Space log stream in the browser. I deployed, waited for the build, copied the relevant log excerpt into Oz, then used the proposed fix-and-verify loop in the terminal. This was not autonomous log watching; it was spec-driven local execution plus human-in-the-loop transfer of external platform logs.

The four failures were:

Bug Symptom in container log Fix
colorFrom: "brown" in Space YAML frontmatter Hugging Face rejected the metadata because brown is not a valid Hugging Face Space colour Changed to a valid colour value
sdk_version: "5.0.0" → Gradio 5.x imported HfFolder, removed from huggingface_hub 0.23+ ImportError: cannot import name 'HfFolder' from 'huggingface_hub' Bump sdk_version to "6.11.0" (a12c46e)
hf_hub_download() returns path in /root/.cache/huggingface/, outside Gradio 6.x allowed directories gradio.exceptions.InvalidPathError: Cannot move /root/.cache/huggingface/hub/... Add local_dir="/tmp" (882559e)
Gradio event loop cleanup during startup ValueError: Invalid file descriptor: -1 logged from asyncio/base_events.py Rule out SSR first, then apply a narrow cleanup patch before importing Gradio

The first three were one-commit fixes. None appeared in local testing because they were specific to the containerised Hugging Face runtime. The fourth looked like SSR at first, but the actual bug was lower in Gradio's event loop cleanup.

The SSR Bug

The fourth failure was the one worth dissecting. After the first three fixes landed, the Space built successfully; the build log showed no errors. Then this appeared at startup:

Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set ssr_mode=False in launch())
Enter fullscreen mode Exit fullscreen mode

Then the app still launched, but the logs looked broken. Startup emitted a runtime ValueError in the Space logs (issue #32):

ValueError: Invalid file descriptor: -1
  File "asyncio/base_events.py", BaseEventLoop.__del__
Enter fullscreen mode Exit fullscreen mode

The error came from Python's asyncio event loop destructor. Gradio 6.x creates intermediate event loops during startup. When those loops are garbage-collected, BaseEventLoop.__del__ tries to close an already-invalid file descriptor (-1). The traceback is logged as Exception ignored in:, then Python catches and discards it. The app runs correctly; the error is log noise rather than a functional failure.

The initial diagnosis pointed to Gradio's experimental SSR mode, which is enabled by default. Adding ssr_mode=False to demo.launch() removed the SSR startup banner but did not suppress the GC error. It occurs regardless of SSR state, on both Python 3.12 and 3.13.

The actual fix was a monkey-patch applied before Gradio is imported. It is intentionally narrow: it suppresses only ValueError: Invalid file descriptor: -1 during event loop cleanup and re-raises other ValueErrors.

# spaces/app.py, applied before `import gradio`
import asyncio.base_events as _base_events

def _patch_asyncio_event_loop_del():
    original_del = getattr(_base_events.BaseEventLoop, "__del__", None)
    if original_del is None:
        return
    if getattr(original_del, "_spaces_app_patched", False):
        return

    def _patched_del(self: _base_events.BaseEventLoop) -> None:  # noqa: ANN
        try:
            original_del(self)
        except ValueError as exc:
            if str(exc) != "Invalid file descriptor: -1":
                raise

    _patched_del._spaces_app_patched = True  # type: ignore[attr-defined]
    _base_events.BaseEventLoop.__del__ = _patched_del  # type: ignore[attr-defined]

_patch_asyncio_event_loop_del()
Enter fullscreen mode Exit fullscreen mode

That diagnosis took two iterations of the same loop described above. First, ssr_mode=False ruled out SSR as the direct cause. Then the narrower event-loop patch fixed the noisy cleanup path.

CI/CD: What Was Designed

Every change to the model card, dataset card, or Space currently requires a manual upload to Hugging Face Hub. Three Hugging Face targets, three separate operations, and an easy place to get out of sync. After PR #28 landed, I filed issue #34 to automate it.

The design: a single GitHub Actions workflow triggered on every push to main, running three parallel jobs:

Source file Hugging Face target Destination
README.md Model repo: syamaner/coffee-first-crack-detection README.md
data/DATASET_CARD.md Dataset repo: syamaner/coffee-first-crack-audio README.md (renamed)
spaces/app.py, spaces/README.md, spaces/requirements.txt Space repo: syamaner/coffee-first-crack-detection same filenames

The approach uses HfApi.upload_file rather than a full git-push sync. The model card is a single file at the repo root, and the dataset card requires renaming on upload. Selective file upload is simpler than mirroring the entire repository. An HF_TOKEN secret in GitHub Actions is the only prerequisite.

This is still open. It is not a complex workflow to write; the issue body has the full design. Blog post drafting and follow-on dataset work took priority. It is the last gap between "it works" and "it stays consistent without manual intervention."

The Result

The Space is live at huggingface.co/spaces/syamaner/coffee-first-crack-detection. Upload a 10-second WAV or MP3, click Classify, and you get the model's probability output for first_crack and no_first_crack.

The project has continued to improve since then, but this series captures the first milestone: making the detector usable outside my development machine.

What the series delivered: a complete, public, reproducible audio ML pipeline, from recording sessions and Label Studio annotation through Hugging Face-native training, ONNX INT8 edge deployment, and a live inference UI. The model, dataset, and ONNX variants are all on the Hub. The source is on GitHub. The annotated coffee roasting audio dataset is public for anyone who wants to build on it.

The prototype that started this ran on a laptop. This one runs on a low-cost ARM board, or as a hosted Gradio Space accessed through a browser.

Links

Project:

Tools:


References

1. Hugging Face Spaces & Gradio

2. Hugging Face Inference Providers

  • Hugging Face Inference Providers Documentation: Documents which model types are eligible for automatic inference widget hosting and which require explicit provider deployment. The gap between pipeline_tag and a working widget is explained here.

3. Python asyncio & Gradio Event Loop Cleanup

  • CPython asyncio BaseEventLoop.__del__: The destructor that raises ValueError: Invalid file descriptor: -1 when garbage-collecting event loops whose self-pipe is already closed. Affects Python 3.12 and 3.13 when Gradio creates intermediate loops during startup.

Top comments (0)