I merged a docs/examples update to 🤗 Transformers that:
- Pins CTC training commands to Hub datasets (instead of deprecated local scripts)
- Clarifies
dataset_name
vsdataset_config_name
help (matches 🤗 Datasets docs) - Adds small Windows setup notes Merged PR: https://github.com/huggingface/transformers/pull/41027
The problem (why this change was needed)
Some users following the Automatic Speech Recognition (ASR) examples hit setup errors because older instructions referenced local dataset scripts. Modern datasets expects Hub datasets (e.g., Common Voice by version and language), so commands like --dataset_name="common_voice"
were fragile or ambiguous—especially on Windows.
What changed (at a glance)
Use explicit Hub dataset IDs in CTC commands
Example:mozilla-foundation/common_voice_17_0
(versioned, reproducible)Clarify arguments in the example script help
dataset_name
→ the dataset ID on the Hub
dataset_config_name
→ the subset/language (e.g., en, tr, clean)Tiny Windows notes
How to activate venv in PowerShell
How to run formatters without make
Before vs After (one line that mattered)
Before
--dataset_name="common_voice" \
--dataset_config_name="tr" \
After
--dataset_name="mozilla-foundation/common_voice_17_0" \
--dataset_config_name="tr" \
This small change prevents the “dataset scripts are no longer supported” error and makes runs reproducible.
Quickstart: CTC finetuning (Common Voice, Turkish)
python run_speech_recognition_ctc.py \
--dataset_name="mozilla-foundation/common_voice_17_0" \
--dataset_config_name="tr" \
--model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
--output_dir="./wav2vec2-common_voice-tr-demo" \
--overwrite_output_dir \
--num_train_epochs="15" \
--per_device_train_batch_size="16" \
--gradient_accumulation_steps="2" \
--learning_rate="3e-4" \
--warmup_steps="500" \
--eval_strategy="steps" \
--text_column_name="sentence" \
--length_column_name="input_length" \
--save_steps="400" \
--eval_steps="100" \
--layerdrop="0.0" \
--save_total_limit="3" \
--freeze_feature_encoder \
--gradient_checkpointing \
--fp16 \
--group_by_length \
--push_to_hub \
--do_train --do_eval
Change dataset_config_name
to your language (e.g., en
, hi
, …).
Windows notes (PowerShell)
# activate venv
.\.venv\Scripts\Activate.ps1
# if 'make' isn't available, run formatters directly
python -m black <changed_paths>
python -m ruff check <changed_paths> --fix
Impact
- Fewer setup failures (no local script pitfalls)
- Reproducible examples (pinned, versioned datasets)
- Better cross-platform DX (Windows included)
- Consistency with current 🤗 Datasets guidance
Proof (validation)
- Reviewed and approved by maintainers; merged to main
- All CI checks green (quality + example tests)
- PR: https://github.com/huggingface/transformers/pull/41027
Thanks to the Transformers maintainers for the review and merge!
If you try the example or want to improve the docs further drop a comment or open a PR. 🙌
Top comments (0)