A beginner's guide to the Basic-Pitch model by Rhelsing on Replicate

#coding #ai #machinelearning #programming

This is a simplified guide to an AI model called Basic-Pitch maintained by Rhelsing. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Model overview

basic-pitch is a lightweight, efficient, and easy-to-use Python library for Automatic Music Transcription (AMT) developed by Spotify's Audio Intelligence Lab. It competes with much larger and more resource-hungry AMT systems in terms of its multipitch support, ability to generalize across instruments, and note accuracy. Unlike similar models like musicgen-songstarter-v0.2, cantable-diffuguesion, riffusion, and stable-audio-prod, basic-pitch is specifically focused on polyphonic note transcription and multipitch estimation rather than more general music generation.

Model inputs and outputs

basic-pitch takes an audio file as input and generates a MIDI file transcription, complete with pitch bends. The model is instrument-agnostic and supports polyphonic instruments, so it can transcribe a wide variety of musical recordings.

Inputs

Audio file: Any sound file compatible with the librosa library, including .mp3, .ogg, .wav, .flac, and .m4a. The audio will be downmixed to mono and resampled to 22050 Hz before processing.

Outputs

MIDI file: A MIDI file containing the transcribed notes, including pitch bends.
WAV file: An optional WAV file rendering of the MIDI transcription.
Model outputs: Raw model outputs can be saved as an NPZ file.
Note events: Predicted note events can be saved as a CSV file.