Franco Zanardi

Posted on Jun 29, 2025

How to Create Content-Aware Animated Subtitles with Python

#python #showdev #tutorial #ai

Standard subtitles are functional, but what if they could be smarter? What if they could automatically highlight key information as it's spoken, making your content more engaging and easier to follow?

In this tutorial, we'll write a Python script that does just that. We'll take a simple cooking video and create subtitles that automatically:

Animate word-by-word.
Highlight specific ingredients in one color.
Emphasize critical actions (like "mix" or "bake") in another.

This is what I call "content-aware" subtitling. We'll use Pycaps, an open-source library I built to make this kind of complex video processing accessible.

Here’s a glimpse of what we’ll create:

Let's get started.

Project Setup

First, make sure you have pycaps installed. The instructions are in the project's README.

Next, create a folder for our project and add two empty files:

render.py (our Python script)
styles.css (our stylesheet)

You'll also need a video file. I'll use one named cooking_video.mp4.

Step 1: Basic Subtitles with Custom Styling

Let's start by getting some styled subtitles on the screen. We'll write the script to transcribe the video and apply our custom CSS for a clean, professional look.

In styles.css, let's define our base style. We'll use a friendly, bold font and a "karaoke" effect where the current word is highlighted.

/* styles.css */

.word {
    /* Default style for all words */
    font-family: cursive, sans-serif;
    font-weight: 700;
    font-size: 18px;
    color: white;
    padding: 2px 3px;
    text-shadow:
        1px 1px 0px black, -1px 1px 0px black,
        1px -1px 0px black, -1px -1px 0px black,
        2px 2px 3px rgba(0, 0, 0, 0.7);
}

.word-being-narrated {
    /* The word currently being spoken */
    color: #ffc107;
    /* A warm yellow */
}

Now, in render.py, we'll write the code to use these styles.

# render.py
from pycaps import *

# Initialize the pipeline builder
builder = CapsPipelineBuilder()
builder.with_input_video("cooking_video.mp4")
builder.add_css("styles.css")
builder.add_segment_splitter(LimitByCharsSplitter(min_limit=10, max_limit=15))

# Build and run the pipeline
pipeline = builder.build()
pipeline.run()

print("Styled subtitles added!")

Run this script (python render.py). You'll get a new video file with perfectly timed, styled subtitles. This is our foundation. You will get something like this:

Step 2: Content-Aware Highlighting with Custom Tags

This is where it gets interesting. We want to style words based on their meaning. To do this, we'll use the pycaps Tagging System to automatically label words.

Let's define rules for two AI custom tags:

ingredient: We'll use a bit of AI to identify cooking ingredients.
key-action: And then to identify important cooking verbs.

This requires an API key setup, see the docs for info.

We'll use the SemanticTagger to manage these rules in our script.

# render.py (updated with SemanticTagger)
from pycaps import *

# 1. Create a tagger to hold our custom rules
tagger = SemanticTagger()

# 2. Rule 1: Use AI to tag cooking ingredients.
tagger.add_ai_rule(
    tag=Tag("ingredient"),
    instructions="cooking ingredients (e.g: flour, sugar, eggs)"
)

# 3. Rule 2: Use AI to tag important cooking actions.
tagger.add_ai_rule(
    tag=Tag("key-action"),
    instructions="important cooking verbs or techniques (e.g: mix, bake)"
)

# Initialize the pipeline builder
builder = CapsPipelineBuilder()
builder.with_input_video("cooking_video.mp4")
builder.add_css("styles.css")
builder.add_segment_splitter(LimitByCharsSplitter(min_limit=10, max_limit=15))

# 4. Add our custom tagger to the pipeline
builder.with_semantic_tagger(tagger)

# Build and run the pipeline
pipeline = builder.build()
pipeline.run()

print("Styled subtitles added!")

Now that our script is applying the tags, let's give them a unique style in styles.css.

/* styles.css (additions for custom tags) */

/* ... (keep the previous .word styles) ... */

/* Style for words tagged as 'ingredient' */
.word.ingredient {
  color: #4ade80; /* A fresh green */
}

/* Style for words tagged as 'key-action' */
.word.key-action {
  color: #60a5fa; /* A nice blue */
  font-weight: 800; /* Extra-Bold */
}

Run the script again. Now, your subtitles will automatically change color whenever an ingredient or a key action is mentioned!

See the results:

Step 3: Contextual Animations

Let's add a final layer of polish. We'll add a zoom-in animation for all ingredient and key-action words.

# render.py (final version)
from pycaps import *

tagger = SemanticTagger()
tagger.add_ai_rule(
    tag=Tag("ingredient"),
    instructions="cooking ingredients (e.g: flour, sugar, eggs)"
)
tagger.add_ai_rule(
    tag=Tag("key-action"),
    instructions="important cooking verbs or techniques (e.g: mix, bake)"
)

# Initialize the pipeline builder
builder = CapsPipelineBuilder()
builder.with_input_video("cooking_video.mp4")
builder.add_css("styles.css")
builder.add_segment_splitter(LimitByCharsSplitter(min_limit=10, max_limit=15))
builder.with_semantic_tagger(tagger)

# --- Add Animations ---
# A 'zoom' animation for every word with tag 'key-action' or 'ingredient'
builder.add_animation(
    animation=ZoomInPrimitive(duration=0.3, init_scale=0.7, overshoot=OvershootConfig()),
    when=EventType.ON_NARRATION_STARTS,
    what=ElementType.WORD,
    tag_condition=TagConditionFactory.parse("key-action or ingredient")
)

# Build and run the pipeline
pipeline = builder.build()
pipeline.run()

print("Final animated video is ready!")

And there we have it. A fully automated script that produces beautiful, dynamic, and content-aware subtitles.

Beyond the Script: Templates and the CLI

While this script is powerful, for a recurring task, you'd want to reuse this style. This is where Templates come in. You can package your styles.css and all your pipeline logic (including tagger rules and animations) into a reusable pycaps.template.json file. For all the options, see the JSON Configuration Reference.

Once you have a template, you can process new videos in a single line from your terminal:

pycaps render --input episode2.mp4 --template my-cooking-style

Final Words

We've seen how a combination of Python scripting, CSS, and a smart tagging engine can automate the creation of high-quality, animated subtitles. This approach gives you the power to create a unique look for your video content.

The project is fully open-source. I invite you to explore it on GitHub, try the online Web UI demo if you want to experiment without code, and see what you can build.

Top comments (1)

Franco Zanardi • Jun 30 '25

If you want to learn more about this library, you can read this other post :)