Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

About SRT Generator

This page turns any video into an SRT file in three steps. Drop the video on the page, wait for Whisper to transcribe the audio, click Download SRT. The output is a clean .srt file with millisecond-accurate timecodes and one caption per spoken segment, ready to upload alongside a YouTube video, drop into Premiere or Final Cut, or attach to a Vimeo player. The encoding is UTF-8, which covers every language Whisper transcribes.

If you need styling, alignment cues, or per-caption positioning, look at VTT (WebVTT) instead. SRT does not support any of that, by design. Most workflows do not need it, which is why SRT remains the default.

When this fits

YouTube longform uploads

YouTube Studio accepts SRT uploads as the standard caption format. Upload the SRT alongside the video and the CC button works on every viewer surface.

Editing in Premiere or Final Cut

Both NLEs import SRT directly as a subtitle track. Drag the file onto the timeline and the captions land as an editable text layer.

Vimeo and embedded players

Vimeo, Wistia, Mux, and most pro video hosts use SRT as the canonical input for closed captions. The export from this page works with all of them.

Localisation handoff

A translator localises an SRT one block at a time. Generating the source-language SRT here gives the translator a clean starting point with timecodes already correct.

How to generate an SRT file from a video

1 Drop the video or audio

Any video or audio format works. The file is processed in your browser, never uploaded.

2 Let Whisper transcribe

The model runs locally and produces a transcript with millisecond timecodes for each segment.

3 Edit if needed

Fix any wrong words in the inline transcript panel. Split or merge segments where the timing feels off.

4 Download the SRT

Click Download SRT. The file lands in your downloads folder as a standard SubRip .srt file.

Frequently asked questions

What exactly is in an SRT file?

Plain text. Each caption block starts with a sequence number, then the timecode range in HH:MM:SS,ms format, then one or more lines of caption text, then a blank line before the next block. The file extension is .srt and the encoding should be UTF-8 for non-English text.

How accurate are the timecodes?

Whisper produces timecodes to the millisecond, and the SRT export preserves that precision. The accuracy of the timing relative to the actual spoken word depends on Whisper's segmentation, which is usually correct within 200 milliseconds.

Can I edit the SRT before downloading?

Yes. The inline transcript editor on this page lets you edit each caption's text and adjust segment boundaries before export. The downloaded SRT reflects whatever you have in the editor at the moment you click Download.

Does the SRT include line breaks?

Whisper produces one caption per spoken segment, which is usually a single line. For longer segments the inline editor lets you insert line breaks manually, which then appear in the SRT as separate lines within the same caption block.

How is SRT different from VTT?

SRT is older and more universal. VTT supports styling cues and positioning that SRT does not. For most platforms (YouTube, social, NLEs), SRT is the right pick. For embedded HTML5 video on a website you control, VTT is the better pick.

Your video never leaves your device

All processing happens locally in your browser, and your files never leave your device. The page reads your video through a standard browser file input, holds the bytes in memory, runs Whisper for speech recognition in a Web Worker, and writes the captioned MP4 back to your disk. No upload, no cloud transcription queue, no external copy.

Related Tools and Resources

Free VTT Generator

WebVTT for HTML5 video and embedded players.

Auto Caption Generator

Full flow including burn-in to MP4, not just SRT export.

Podcast Captions and Transcripts

Same SRT export, focused on audio-only input.

Closed Captions Generator

WCAG-oriented version for accessibility compliance.