Guillaume Vernade for Google AI

Posted on Dec 8

Lyria RealTime: The Developer’s Guide to Infinite Music Streaming

#gemini #lyria #tutorial #music

You love generating static songs with classic text to music models? Prepare to conduct a never-ending symphony. Introducing Lyria RealTime, Google DeepMind’s experimental model that doesn't just generate music—it jams with you like it did during the Toro y Moi IO pre-show:

While traditional music generation models work like a jukebox (input prompt -> wait -> get song), Lyria RealTime operates on the principle of "Music as a Verb." It creates a persistent, bidirectional streaming connection that produces a continuous 48kHz stereo stream. You can steer, warp, and morph the audio in the moment, making it the first generative model truly designed for interactive experiences.

And the best part? Right now the model is free to use!

Here's quick summary of what you'll learn about in this guide:

This guide will walk you through building with Lyria RealTime using the Gemini API.

This guide will cover:

How Lyria RealTime Works (The "Goldfish Memory" Architecture)
Project Setup
Basic Streaming (The "Hello World" of Music)
Steering the Stream (Weighted Prompts)
Advanced Configuration (BPM, Density, & Scale)
Blueprints for the Future: Advanced Use Cases
Prompting Strategies & Best Practices
Where to play with Lyria Real Time

Jump directly to the last section if you want to play directly with Lyria RealTime, for ex. as a DJ, driving a spaceship or using your camera.

Note: for an interactive version of this post, checkout the python cookbook.

1) How Lyria RealTime Works

Lyria RealTime uses a low-latency WebSocket connection to maintain a live communication channel with the model. Unlike offline models that plan a whole song structure (Intro-Verse-Chorus), Lyria operates on a chunk-based autoregression system.

It generates audio in 2-second chunks, looking back for a few seconds of context to maintain the rhythmic "groove" while looking forward at your current controls to decide the style. This means the model doesn't "compose songs" in the traditional sense; it navigates musical states.

2) Project Setup

To follow this guide, you will need:

An API key from Google AI Studio (it can be a free one).
The Google Gen AI SDK.

Install the SDK:
Python (3.12+ recommended):

pip install "google-genai>=1.52.0"

JavaScript / TypeScript:
You'll need at least the 1.30 version of the JS/TS SDK

npm install @google/genai

Note: The following examples use the Python SDK for demonstration. For JS/TS code sample, check the AI studio Apps.

3) Basic Streaming

To start a session, you connect to the model (models/lyria-realtime-exp), send an initial configuration, and start the stream. The interaction loop is asynchronous: you send commands, and the server continuously yields raw audio chunks.

[Note: Ensure you are using the v1alpha API version for experimental models like Lyria].

import asyncio
from google import genai
from google.genai import types

client = genai.Client(http_options={'api_version': 'v1alpha'})

async def main():
    async def receive_audio(session):
        """Background task to process incoming audio chunks."""
        while True:
            async for message in session.receive():
                if message.server_content.audio_chunks:
                    # 'data' is raw 16-bit PCM audio at 48kHz
                    audio_data = message.server_content.audio_chunks.data
                    # Add your audio playback logic here!
            await asyncio.sleep(10**-12)

    async with (
        client.aio.live.music.connect(model='models/lyria-realtime-exp') as session,
        asyncio.TaskGroup() as tg,
    ):
        # 1. Start listening for audio
        tg.create_task(receive_audio(session))

        # 2. Send initial musical concept
        await session.set_weighted_prompts(
            prompts=[types.WeightedPrompt(text='elevator music', weight=1.0)]
        )

        # 3. Set the vibe (BPM, Temperature)
        await session.set_music_generation_config(
            config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
        )

        # 4. Drop the beat
        await session.play()

        # Keep the session alive
        await asyncio.sleep(30) 

if __name__ == "__main__":
    asyncio.run(main())

Congratulations, you've got some elevator music!

Not impressed? That's just the beginning, dear padawan, now comes the cool part.

4) Steering the Stream (Weighted Prompts)

This is where the magic happens. Unlike static generation, you can send new WeightedPrompt messages while the music is playing to smoothly transition the genre, instruments, or mood.

The weight parameter is your fader. A weight of 1.0 is standard, but you can use multiple prompts to blend influences.

Example: Morphing from Piano to Live Performance

from google.genai import types

# Send this while the loop is running to shift the style
await session.set_weighted_prompts(
    prompts=[
        # Keep the piano strong
        {"text": "Piano", "weight": 2.0},
        # Add a subtle meditative layer
        types.WeightedPrompt(text="Meditation", weight=0.5),
        # Push the 'Live' feeling
        types.WeightedPrompt(text="Live Performance", weight=1.0),
    ]
)

Note: As the model generates chunks after chunks, the changes can take a few seconds (usually around 2s) to be reflected in the music.

Pro Tip: Cross-fading

Drastic prompt changes can be abrupt. For professional results, implement client-side cross-fading by sending intermediate weight values rapidly (e.g., every 500ms) to "morph" the music smoothly.

Example: The "Morph" Function

import asyncio
from google.genai import types

async def cross_fade(session, old_prompt, new_prompt, duration=2.0, steps=10):
    """Smoothly morphs from one musical idea to another."""
    step_time = duration / steps

    for i in range(steps + 1):
        # Calculate the blend ratio (alpha goes from 0.0 to 1.0)
        alpha = i / steps

        await session.set_weighted_prompts(
            prompts=[
                # Fade out the old
                types.WeightedPrompt(text=old_prompt, weight=1.0 - alpha),
                # Fade in the new
                types.WeightedPrompt(text=new_prompt, weight=alpha),
            ]
        )
        await asyncio.sleep(step_time)

# Usage in your main loop:
# Morph from 'Ambient' to 'Techno' over 5 seconds
await cross_fade(session, "Ambient Drone", "Hard Techno", duration=5.0)

Note that this code sample assumes all your prompts have a weight of 1 which might not be the case.

5) Advanced Configuration (The Knobs)

Lyria RealTime exposes parametric controls that change the structure of the music. If you aren't a musician, think of these controls as the physics of the audio world:

Density (0.0 - 1.0): Think of this as "Busyness."
- Low (0.1): A lonely drummer playing once every few seconds. Sparse.
- High (0.9): A chaotic orchestra where everyone plays at once. Intense.
Brightness (0.0 - 1.0): Think of this as "Muffled vs. Crisp."
- Low (0.1): Listening to music from outside a club, through a wall. Dark and bass-heavy.
- High (0.9): Listening through high-end headphones. Sharp, clear, and treble-heavy.
BPM (60 - 200): The heartbeat of the track (Beats Per Minute).
Scale: The "Mood." It forces the music into a specific set of notes (Key/Mode).

Important: While density and brightness can be changed smoothly on the fly, changing the BPM or Scale is a fundamental structural shift. You must call reset_context() for these changes to take effect. This will clear the model's "short-term memory," causing a hard cut in the audio.

Example: The "Hard Drop"

# Changing structural parameters requires a context reset
await session.set_music_generation_config(
    config=types.LiveMusicGenerationConfig(
        bpm=140, 
        scale=types.Scale.C_MAJOR_A_MINOR, # Force happy/neutral mood
    )
)

# This command is mandatory for BPM/Scale changes to apply!
await session.reset_context()

6) Blueprints for the Future: Advanced Use Cases

We’ve covered basic streaming, but Lyria’s parametric controls allow for applications that connect the physical world to the audio stream. Here are four ideas to get you started.

Use Case A: The "Biometric Beat" (Fitness & Health)

Most fitness apps use static playlists that rarely match your actual pace. Because Lyria allows for real-time bpm and density control, you can build a music engine that is biologically coupled to the user.

Heart Rate Monitor (HRM) -> BPM: Map the user's heart rate directly to the track's tempo.
Accelerometer -> Density: If the user is sprinting (high variance in movement), increase density to 1.0 to add percussion and complexity. If they stop to rest, drop density to 0.2 for an ambient breakdown.

Use Case B: The "Democratic DJ" (Social Streaming)

Since WeightedPrompts accept float values, you can build a collaborative radio station for Twitch streams or Discord bots where the audience votes on the genre. Instead of a winner-take-all system, Lyria can blend the votes.

Input: 100 users vote. 60 vote "Cyberpunk", 30 vote "Jazz", 10 vote "Reggae".
Normalization: Convert votes to weights (0.6, 0.3, 0.1).
Result: The model generates a dominant Cyberpunk track with clear Jazz harmonies and a subtle Reggae backbeat and changes it overtime according to the votes.

Use Case C: "Focus Flow" (Productivity)

Deep work requires different audio textures than brainstorming. You can map Lyria's brightness and guidance parameters to a Pomodoro timer to guide the user's cognitive state.

Deep Work Phase: Low brightness (darker, warmer sounds), Low density (minimal distractions), High guidance (repetitive, predictable).
Break Phase: High brightness (energetic, crisp), High density, Low guidance (creative, surprising).

Use Case D: "Realtime Game music" (Gaming)

Coming from the gaming industry I could not avoid thinking of a gaming idea for Lyria Real Time. You could have Lyria create the music of the game in real time based on:

The game's own style: a bunch of prompts that defines the game and the overall ambiance,
The environment: different prompts depending on whether you're in a busy city, in a forest or sailing the Greek seas,
The player's action: are they fighting, then add the "epic" prompt, investigating instead, change it for the "mysterious" one,
The players' current condition: You could change the BPM and the weight of a "danger" prompt depending on the player's health bar. The lower it is, the more stressful the music would be.

7) Prompting Strategies & Best Practices

The Prompt Formula:
Through testing, a reliable formula has emerged: [Genre Anchor] + [Instrumentation] + [Atmosphere]...

Instruments: 303 Acid Bass, Buchla Synths, Hang Drum, TR-909 Drum Machine...
Genres: Acid Jazz, Bengal Baul, Glitch Hop, Shoegaze, Vaporwave...
Moods: Crunchy Distortion, Ethereal Ambience, Ominous Drone, Swirling Phasers...

Developer Best Practices:

Buffer Your Audio: Because this is real-time streaming over the network, implement client-side audio buffering (2-3 chunks) to handle network jitter and ensure smooth playback.
The "Settling" Period: When you start a stream or reset context, the model needs about 5-10 seconds to "settle" into a stable groove.
Safety Filters: The model checks prompts against safety filters. Avoid asking for specific copyrighted artists ("Style of Taylor Swift"); instead, deconstruct their sound into descriptors ("Pop, female vocals, acoustic guitar").
Instrumental Only: The model is only instrumental. While you can set music_generation_mode to VOCALIZATION, it produces vocal-like textures (oohs/aahs), not coherent lyrics.
Session duration limit: The session are currently limited to 10mn, but you can just restart a new one afterwards.

More details and prompt ideas in Lyria RealTime's documentation.

8. Ready to Jam? Choose your preferred way to play with Lyria RealTime

One of the easiest places to try is AI Studio, where a couple of cool apps are available for you to play with, and to vibe-customize to your needs:

Prompt DJ, MIDI DJ and MusicFX (US only) let you add and mix multiple prompts in real time:

Space DJ lets you navigate the universe of music genders with a spacecraft! I personally love navigating around the italo-disco and euro-france planets.

Lyria Camera creates music in real time based on what it sees. I'd love to have that connected to my dashcam!

The Magenta website also features a lot of cool demos. It's also a great place to get more details on Deepmind's music generation models.
Finally, check the magical mirror demo I made that uses Lyria to create background music according to what it tells (Gemini generates the prompts on the fly):

And now the floor is yours, what will you create using Lyria RealTime?

Resources:

Documentation
Magenta website and blog for the latest news on the music generation models.
AI Studio gen-media apps

Top comments (8)

Jess Lee • Dec 9

@mikeydorje thought you'd be interested in this one!

Mikey Dorje • Dec 10

Thanks @jess! I totally plan on diving in to this further. It's wild. I love Toro y Moi too. His performance with it for the IO pre-show was fantastic!