Descript is one of those tools where the description sounds too good to be true -- "you edit audio by editing the transcript, like a word processor" -- and then you actually try it and realize it's exactly what they said.
I want to give you a proper walkthrough here, because the learning curve isn't in learning how to use Descript. The learning curve is in recognizing when to trust it and when to work around its limitations. That comes from time with the tool, and I can give you a head start.
What Makes Descript Different
Most audio editors are waveform editors. You see colorful blobs representing sound, you cut and rearrange the blobs, you zoom into the waveform to find the exact frame where a word starts. This requires training. It takes practice to get fast at.
Descript works the opposite way. You record or import audio, Descript transcribes it, and you edit the transcript. Highlight a section of text you don't want -- a stumble, an "um," a whole tangent that went nowhere -- and delete it. The audio at that part of the recording disappears.
For podcasters, whose editing is primarily subtractive (cutting things out rather than building things up), this is dramatically faster than traditional methods.
The other key feature: Overdub. Overdub lets you type new words into the transcript and have Descript generate audio in your voice to fill the gap. Correct a mispronounced word. Add a sentence that would clarify an explanation. Fix a stumble without re-recording the whole segment.
I'll be honest about Overdub's limitations in a minute. But the core workflow -- import, transcribe, edit as text -- genuinely changes the experience of audio editing.
Setting Up Your First Project
Sign up for Descript. Free tier is available. Creator ($24/month) gives you 10 hours of transcription per month, which is enough for regular podcast production. Pro ($40/month) has unlimited transcription.
When you first open Descript, you're looking at a project workspace that might look a bit like a mix between a word processor and a video editor. Don't be intimidated -- you'll mostly be working in the transcript area on the right side of the screen.
Create a new project. Name it after your episode. I use "[Show Name] Ep[Number]" as a convention -- makes finding projects later much easier.
Step 1: Import Your Audio
Drag and drop your recorded audio file into the Descript project window. Accepted formats include MP3, WAV, M4A, AIFF, and most common audio types.
Descript immediately starts transcribing. For a 30-minute episode, transcription takes roughly 2-4 minutes. For a 60-minute episode, expect 5-8 minutes.
While it's transcribing: don't close the window. You can have multiple Descript projects open in separate tabs, but if you navigate away entirely, transcription may pause.
Multiple speakers: Descript supports speaker identification. When you import, you can tell it how many speakers are in the recording and it will attempt to label sections by speaker. This isn't perfect -- especially when speakers interrupt each other or talk over each other -- but for well-recorded two-host conversations, it works reasonably well.
Step 2: Review the Transcript
When transcription completes, you'll see your audio represented two ways: the timeline at the bottom (traditional waveform view) and the transcript in the main window.
First task: scan through the transcript and fix obvious errors.
Descript's transcription accuracy is good -- in the 95-98% range for clear, well-recorded audio with standard English and no unusual names or jargon. But it will get proper nouns wrong, technical terms wrong, and unusual names wrong with some regularity.
Click anywhere in the transcript to edit text directly, just like a word processor. You're correcting the transcript for accuracy -- Descript links transcript text to audio, so correcting the transcript doesn't change the audio, just the text representation.
This step matters more if you're going to publish the transcript as show notes or accessibility text. If you're only using it for editing, you can skim rather than thoroughly proofread.
Step 3: Remove Filler Words
This is where Descript earns its subscription cost for most podcasters.
In the menu: Action → Remove Filler Words (or look for "Filler Words" in the Actions panel on the right).
Descript will show you every detected instance of "um," "uh," "like," "you know," and other filler words. You can preview them before applying. You can set a threshold -- if you use "like" conversationally but not as a filler word, you can tell it to only remove the clearly stumbled uses.
Apply filler word removal, and every detected instance disappears from both the transcript and the audio simultaneously.
A word of honest advice: don't remove every filler word. Some "ums" and "uhs" are natural thought pauses that make conversation feel real. If you denoise and remove every filler word on every episode, you'll end up with something that sounds weirdly robotic -- like a person speaking who never hesitates, which isn't how people speak. I remove the obvious stumbles and leave the occasional natural pause.
Step 4: Tighten Pauses
Action → Shorten Silences. Set a maximum silence length -- I use 0.8-1.0 seconds as a default. Silences longer than your threshold will be compressed to that length.
This removes the "dead air" that makes podcast listening feel sluggish without removing natural pauses between sentences.
Preview the results before applying. If something sounds unnaturally rushed, increase the threshold slightly.
Step 5: Edit the Transcript Directly
Now for the main editing work.
Highlight any section of the transcript you don't want -- a false start, a tangent that went nowhere, a stumble, an extended silence where someone looked up a reference. Press delete. The audio disappears.
This is the fundamental Descript workflow and it's as intuitive as it sounds. You're reading your episode, cutting what doesn't work, the same way you'd edit a written piece.
Tips:
- For sections that stumble and restart, you'll see the stumbled version then the cleaner restart in the transcript. Highlight the stumble, delete, keep the restart.
- If you delete something that created a jarring cut between two sentences, Descript has a "heal" feature that tries to smooth the transition. Action → Heal Audio.
- Save frequently. Descript auto-saves, but habits are good.
Step 6: Overdub -- AI Voice Corrections
Overdub is Descript's most impressive feature. It's also the one with the most caveats.
To use Overdub, you first need to train a voice model. This requires recording a script Descript provides -- approximately 10-30 minutes of reading. Descript uses this recording to create an AI model of your voice that can generate new speech.
Once you have an Overdub voice, you can click into any point in the transcript, type new words, and Descript will generate audio in your voice for those words.
Use cases:
- A mispronounced word you can hear bothering you on playback
- A sentence that would help clarify an explanation but wasn't in the original recording
- A stumble you couldn't cut cleanly because the words on either side were too connected
The honest limitation: Overdub voices sound good but have a slight AI quality that careful listeners will notice. The generated audio doesn't quite match the specific vocal texture of the surrounding recording -- it's "you-adjacent" rather than perfectly seamless. For large sections of narration, this is audible. For small corrections of a word or two, it usually works.
Use Overdub for surgical corrections, not wholesale replacement of recorded audio.
Step 7: Audio Enhancement
Descript includes basic noise removal under Actions → Enhance Voice or Noise Reduction depending on your version. This processes the audio to reduce background noise and room reverb.
For most recordings in a quiet room, this step isn't necessary. For recordings with any audible room noise, air conditioning, or hum, run the enhancement.
For more aggressive cleanup of problem audio, export from Descript and run through Adobe Podcast Enhance (free, web-based) before your final export. Descript for the structural editing, Adobe for the audio quality polish.
Step 8: Export Your Episode
File → Export and choose your settings.
For podcast distribution, export as MP3 at 128kbps mono (for speech-only podcasts) or 192kbps stereo (for shows with music). These are standard podcast compression settings.
Name the file with a clear convention: showname-ep007-descriptive-title.mp3.
Step 9: Publish
Upload your exported MP3 to your podcast hosting platform -- Buzzsprout, Transistor, Castos, or whatever you're using. Add your title, description, and any chapter markers. Publish.
If you exported a Descript transcript (under the export options), paste it into your episode description or create a dedicated transcript page.
Descript Pricing: Is It Worth It?
Free tier: 1 hour of transcription per month. Overdub access for testing. No watermarks on audio export. Good enough to evaluate whether the workflow fits you.
Creator ($24/month): 10 hours of transcription per month, which accommodates roughly 8-10 standard podcast episodes depending on length. Overdub for voice corrections. Full export options. This is the tier most active podcasters need.
Pro ($40/month): Unlimited transcription, higher Overdub quality, more collaboration features, 4K video support (relevant if you're recording video podcasts). Worth it if you're producing daily content or working with a team.
Honest Limitations
Overdub can sound slightly robotic. This is a real limitation, not a minor complaint. For word-level corrections, it works fine. For adding whole sentences you didn't record, the result sounds AI-generated to attentive ears.
The pricing gets expensive at volume. If you're producing 20+ hours of content per month, you'll want the Pro tier. $40/month is reasonable; it's just worth understanding the math before assuming Creator covers everything.
Learning curve for new users. Descript is intuitive for people who are comfortable with text editing and understand what they're trying to do. For someone who has never edited audio and doesn't quite understand what the tool is doing, there's more onboarding friction. The Descript YouTube channel has genuinely good tutorial content if you need a visual walkthrough.
Collaboration has a learning curve. The team features work, but if you're handing projects off to a collaborator who's new to Descript, plan for some setup time.
My Bottom Line
Descript is the right editing tool for podcasters who want to spend their time on content rather than on learning audio engineering. The transcript-based workflow is genuinely faster for the editing work that most podcast episodes require.
The Overdub limitations are real but they don't break the workflow -- they just mean you set the right expectations for what AI voice corrections can and can't do.
Start on the free tier. Edit one episode. If the workflow clicks for you, Creator at $24/month is a fair price for what you get.
Related reading:
Top comments (0)