Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

About Podcast Captions and Transcripts

Drop an MP3 or WAV file on the upload box and Whisper runs against the audio just like it does on a video. The output is a full transcript you can edit inline, plus SRT and VTT files you can attach to the episode page or embed on a podcast hosting platform. There is no upload step, which matters for unreleased episodes under embargo, interviews with named guests where the audio is not for public distribution yet, or any sensitive content that cannot leave the device.

The plain transcript on its own is worth more than the SRT for SEO purposes. Most podcast hosts let you paste a transcript onto the episode page, and doing that gives the episode a fighting chance in search for the topics you actually covered, rather than just the title field and an episode number.

When this fits

Episode page SEO

Pasting a full transcript onto the episode page turns a single-line title and description into thousands of indexable words. Episodes with transcripts consistently outrank episodes without them on long-tail topic queries.

Audiograms for social

When you cut an audiogram clip for Instagram or LinkedIn, the audio needs captions to be watchable in a sound-off feed. The same transcription run produces the SRT you burn into the audiogram.

Accessibility compliance

Audio-only content is exempt from many accessibility rules, but published transcripts are increasingly expected for university and government podcasts. This page produces them with no extra workflow.

Show notes and chapter markers

A clean transcript is the source material for show notes and chapter markers. Edit the transcript here, copy the result into your hosting platform, done.

How to transcribe a podcast episode

1 Drop the audio file

Upload your MP3 or WAV. The file stays on your device for the whole flow.

2 Wait for the transcript

Whisper transcribes the audio locally. A 60-minute episode typically finishes in 5 to 10 minutes on a recent laptop.

3 Edit and clean up

Scan the transcript for misheard words, add speaker labels at change-of-speaker points, and split paragraphs where it makes sense.

4 Export

Download the SRT for video audiograms, the VTT for embedded players, or copy the text directly into your show-notes editor.

Frequently asked questions

Does this work with audio-only files like MP3?

Yes. The upload box accepts both video and audio. Whisper does its speech recognition pass against the audio track of whatever you drop in, so a standalone MP3 or WAV file works exactly the same as a video file with audio.

How long can a podcast episode be?

There is no hard length cap, but the captioning depends on your device having enough memory to hold the audio and run Whisper. On laptops you can comfortably transcribe a 90-minute interview. Past two hours, splitting the episode in half and running the tool twice is the safer route.

Can I export just the text without the SRT timing?

The SRT and VTT formats both include the transcript text alongside the timecodes. If you want plain text without timing, copy it directly out of the inline transcript panel after the run completes. The text is one block per spoken segment, ready to paste into a show-notes editor.

Do you support multi-speaker labelling?

Whisper produces a flat transcript without speaker labels. You can add speaker labels manually in the inline transcript editor (the convention is "Host: ..." or "Guest: ..." at the start of each speaker change), but automated speaker diarisation is not part of this tool today.

How accurate is the transcription on a noisy field recording?

Whisper is unusually robust on background noise compared to older speech-to-text systems, but a windy outdoor field recording will still drop accuracy by 10 to 15 points compared to a clean studio interview. The inline editor is built for catching the words it gets wrong.

Your video never leaves your device

All processing happens locally in your browser, and your files never leave your device. The page reads your video through a standard browser file input, holds the bytes in memory, runs Whisper for speech recognition in a Web Worker, and writes the captioned MP4 back to your disk. No upload, no cloud transcription queue, no external copy.

Related Tools and Resources

Free SRT Generator

Standalone SRT-only export flow for video audiograms.

Free VTT Generator

WebVTT for HTML5 podcast players.

Extract audio from video

Pull an MP3 out of a video interview before transcribing.

Video editor for podcasters

Edit a podcast video version alongside the audio episode.