DEV Community

Cover image for How Song Recognition Technology Works?
Richard Shaju
Richard Shaju

Posted on

How Song Recognition Technology Works?

Have you ever wondered how music recognition platforms like Shazam and Google Assistant can identify songs even when you hum them?

After reading this article, you will have a better understanding of how this works.

Like many other voice assistants, Google Voice Assistant uses "audio fingerprinting" technology to recognize songs.

Let's understand how Audio fingerprinting works:

Digital Representation: The audio captured using the microphone is converted into a digital format. This digital representation contains information about the sound waves, such as frequency, amplitude, and duration.

Segmentation: The audio signal is often segmented into smaller chunks or frames. This segmentation helps in analyzing the audio in smaller, more manageable segments, allowing for more precise matching.

Transformation: Each segment of the audio may undergo various transformations to enhance certain characteristics or make the data more suitable for analysis. Common transformations include converting the audio into frequency domain representations using techniques like Fourier Transform.

Fingerprint Generation: From the transformed audio data, a fingerprint or signature is generated. This fingerprint is a condensed representation of the audio segment's unique characteristics. Various methods can be used to create fingerprints, such as hashing techniques or more advanced signal processing methods.

Query Matching: When a user requests song recognition, the algorithm captures the audio, extracts features, generates fingerprints, and then compares these fingerprints against the fingerprints stored in the database. This comparison involves techniques like similarity measurement, where the similarity between the query fingerprint and database fingerprints is calculated using metrics like Hamming distance or cosine similarity.

Result Presentation: Finally, if a match is found, the algorithm retrieves the corresponding metadata from the database and presents it to the user, typically including information such as the song title, artist, and album.

This is a brief explanation of audio fingerprinting. If you are interested, research more about it.

Wait... what are Fourier Transform & cosine similarities?

Let me explain that in another article.

Till then bye👋

Top comments (0)