Podcast transcript generator

# Podcast Captions and Transcripts

MP3 and WAV inputFull transcriptSRT and VTT exports

Podcast captioning has two unrelated jobs. One is accessibility, making the episode usable for deaf and hard-of-hearing listeners. The other is search, where a transcript turns a 60-minute audio file into thousands of indexable words that Google can read. This page covers both at the same time.

### Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

## About Podcast Captions and Transcripts

Drop an MP3 or WAV file on the upload box and Whisper runs against the audio just like it does on a video. The output is a full transcript you can edit inline, plus SRT and VTT files you can attach to the episode page or embed on a podcast hosting platform. There is no upload step, which matters for unreleased episodes under embargo, interviews with named guests where the audio is not for public distribution yet, or any sensitive content that cannot leave the device.

The plain transcript on its own is worth more than the SRT for SEO purposes. Most podcast hosts let you paste a transcript onto the episode page, and doing that gives the episode a fighting chance in search for the topics you actually covered, rather than just the title field and an episode number.

## When this fits

#### Episode page SEO

Pasting a full transcript onto the episode page turns a single-line title and description into thousands of indexable words. Episodes with transcripts consistently outrank episodes without them on long-tail topic queries.

#### Audiograms for social

When you cut an audiogram clip for Instagram or LinkedIn, the audio needs captions to be watchable in a sound-off feed. The same transcription run produces the SRT you burn into the audiogram.

#### Accessibility compliance

Audio-only content is exempt from many accessibility rules, but published transcripts are increasingly expected for university and government podcasts. This page produces them with no extra workflow.

#### Show notes and chapter markers

A clean transcript is the source material for show notes and chapter markers. Edit the transcript here, copy the result into your hosting platform, done.

## How to transcribe a podcast episode

#### 1 Drop the audio file

Upload your MP3 or WAV. The file stays on your device for the whole flow.

#### 2 Wait for the transcript

Whisper transcribes the audio locally. A 60-minute episode typically finishes in 5 to 10 minutes on a recent laptop.

#### 3 Edit and clean up

Scan the transcript for misheard words, add speaker labels at change-of-speaker points, and split paragraphs where it makes sense.

#### 4 Export

Download the SRT for video audiograms, the VTT for embedded players, or copy the text directly into your show-notes editor.

## Frequently asked questions

### Does this work with audio-only files like MP3?

Yes. The upload box accepts both video and audio. Whisper does its speech recognition pass against the audio track of whatever you drop in, so a standalone MP3 or WAV file works exactly the same as a video file with audio.

### How long can a podcast episode be?

There is no hard length cap, but the captioning depends on your device having enough memory to hold the audio and run Whisper. On laptops you can comfortably transcribe a 90-minute interview. Past two hours, splitting the episode in half and running the tool twice is the safer route.

### Can I export just the text without the SRT timing?

The SRT and VTT formats both include the transcript text alongside the timecodes. If you want plain text without timing, copy it directly out of the inline transcript panel after the run completes. The text is one block per spoken segment, ready to paste into a show-notes editor.

### Do you support multi-speaker labelling?

Whisper produces a flat transcript without speaker labels. You can add speaker labels manually in the inline transcript editor (the convention is "Host: ..." or "Guest: ..." at the start of each speaker change), but automated speaker diarisation is not part of this tool today.

### How accurate is the transcription on a noisy field recording?

Whisper is unusually robust on background noise compared to older speech-to-text systems, but a windy outdoor field recording will still drop accuracy by 10 to 15 points compared to a clean studio interview. The inline editor is built for catching the words it gets wrong.

Privacy by architecture

## All processing happens locally in your browser, and your files never leave your device.

No upload step, no server queue, no waiting.

Verify in 30 seconds

1. 01`⌘⌥I` Open DevTools and switch to the Network panel.
2. 02 Filter to fetch and XHR requests.
3. 03 Drop your file in and start the tool.
4. 04 You will see the app bundle, the WASM binary on first visit, and nothing involving your file.

How this works

Video and audio processing runs through FFmpeg compiled to WebAssembly. Hardware decoding goes through the browser's WebCodecs API. Speech recognition runs against a Whisper model that downloads once and caches in your browser, never streamed from a third-party server.

## Related Tools and Resources

#### [Free SRT Generator](/free-srt-generator)

Standalone SRT-only export flow for video audiograms.

#### [Free VTT Generator](/vtt-generator)

WebVTT for HTML5 podcast players.

#### [Extract audio from video](/audio/extraction)

Pull an MP3 out of a video interview before transcribing.

#### [Video editor for podcasters](/video-editor-for-podcasters)

Edit a podcast video version alongside the audio episode.

---
Source: [https://vidstudio.app/captions-for-podcasts](https://vidstudio.app/captions-for-podcasts)
