DEV Community

Cover image for A beginner's guide to the Basic-Pitch model by Rhelsing on Replicate
aimodels-fyi
aimodels-fyi

Posted on • Originally published at aimodels.fyi

A beginner's guide to the Basic-Pitch model by Rhelsing on Replicate

This is a simplified guide to an AI model called Basic-Pitch maintained by Rhelsing. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Model overview

basic-pitch is a lightweight, efficient, and easy-to-use Python library for Automatic Music Transcription (AMT) developed by Spotify's Audio Intelligence Lab. It competes with much larger and more resource-hungry AMT systems in terms of its multipitch support, ability to generalize across instruments, and note accuracy. Unlike similar models like musicgen-songstarter-v0.2, cantable-diffuguesion, riffusion, and stable-audio-prod, basic-pitch is specifically focused on polyphonic note transcription and multipitch estimation rather than more general music generation.

Model inputs and outputs

basic-pitch takes an audio file as input and generates a MIDI file transcription, complete with pitch bends. The model is instrument-agnostic and supports polyphonic instruments, so it can transcribe a wide variety of musical recordings.

Inputs

  • Audio file: Any sound file compatible with the librosa library, including .mp3, .ogg, .wav, .flac, and .m4a. The audio will be downmixed to mono and resampled to 22050 Hz before processing.

Outputs

  • MIDI file: A MIDI file containing the transcribed notes, including pitch bends.
  • WAV file: An optional WAV file rendering of the MIDI transcription.
  • Model outputs: Raw model outputs can be saved as an NPZ file.
  • Note events: Predicted note events can be saved as a CSV file.

Capabilities

basic-pitch is capable of accurately...

Click here to read the full guide to Basic-Pitch

Top comments (0)