DEV Community

Cover image for Updating ASR examples in Hugging Face Transformers Hub datasets, clearer args, smoother Windows setup
OMOTAYO OMOYEMI
OMOTAYO OMOYEMI

Posted on

Updating ASR examples in Hugging Face Transformers Hub datasets, clearer args, smoother Windows setup

I merged a docs/examples update to 🤗 Transformers that:

The problem (why this change was needed)

Some users following the Automatic Speech Recognition (ASR) examples hit setup errors because older instructions referenced local dataset scripts. Modern datasets expects Hub datasets (e.g., Common Voice by version and language), so commands like --dataset_name="common_voice" were fragile or ambiguous—especially on Windows.

What changed (at a glance)

  • Use explicit Hub dataset IDs in CTC commands
    Example: mozilla-foundation/common_voice_17_0(versioned, reproducible)

  • Clarify arguments in the example script help
    dataset_name → the dataset ID on the Hub
    dataset_config_name → the subset/language (e.g., en, tr, clean)

  • Tiny Windows notes
    How to activate venv in PowerShell
    How to run formatters without make

Before vs After (one line that mattered)

Before

--dataset_name="common_voice" \
--dataset_config_name="tr" \
Enter fullscreen mode Exit fullscreen mode

After

--dataset_name="mozilla-foundation/common_voice_17_0" \
--dataset_config_name="tr" \
Enter fullscreen mode Exit fullscreen mode

This small change prevents the “dataset scripts are no longer supported” error and makes runs reproducible.

Quickstart: CTC finetuning (Common Voice, Turkish)

python run_speech_recognition_ctc.py \
  --dataset_name="mozilla-foundation/common_voice_17_0" \
  --dataset_config_name="tr" \
  --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
  --output_dir="./wav2vec2-common_voice-tr-demo" \
  --overwrite_output_dir \
  --num_train_epochs="15" \
  --per_device_train_batch_size="16" \
  --gradient_accumulation_steps="2" \
  --learning_rate="3e-4" \
  --warmup_steps="500" \
  --eval_strategy="steps" \
  --text_column_name="sentence" \
  --length_column_name="input_length" \
  --save_steps="400" \
  --eval_steps="100" \
  --layerdrop="0.0" \
  --save_total_limit="3" \
  --freeze_feature_encoder \
  --gradient_checkpointing \
  --fp16 \
  --group_by_length \
  --push_to_hub \
  --do_train --do_eval
Enter fullscreen mode Exit fullscreen mode

Change dataset_config_nameto your language (e.g., en, hi, …).

Windows notes (PowerShell)

# activate venv
.\.venv\Scripts\Activate.ps1

# if 'make' isn't available, run formatters directly
python -m black <changed_paths>
python -m ruff check <changed_paths> --fix
Enter fullscreen mode Exit fullscreen mode

Impact

  • Fewer setup failures (no local script pitfalls)
  • Reproducible examples (pinned, versioned datasets)
  • Better cross-platform DX (Windows included)
  • Consistency with current 🤗 Datasets guidance

Proof (validation)

Thanks to the Transformers maintainers for the review and merge!

If you try the example or want to improve the docs further drop a comment or open a PR. 🙌

Top comments (0)