Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

About VTT Generator

This page exports VTT in addition to SRT. Drop a video, wait for Whisper, click Download VTT. The output is a .vtt file with the standard WEBVTT header line, millisecond cue timing, and one caption block per spoken segment. The file is ready to attach to a video element on any website that supports HTML5 video, which is every modern browser.

If your video is going to YouTube, TikTok, or any social platform, you probably want SRT instead. VTT specifically wins when the destination is a website you control, where the styling features are useful and the integration is direct via the track element.

When this fits

Self-hosted video on a website

When you serve a video file directly from your own server, the HTML5 video tag with a track child is the simplest way to add captions. VTT plugs into that pattern with no conversion step.

Documentation site demos

A demo video embedded on a docs page wants captions for accessibility. VTT lands directly in the HTML5 track element without going through a player library.

Mux, Cloudflare Stream, and JW Player

Pro video infrastructure platforms accept both SRT and VTT, with a slight preference for VTT because of the styling support. Either works.

Reactive web apps with custom players

Custom React or Vue video players built on top of HTML5 video tend to standardise on VTT because the spec is web-native and the parsing libraries are smaller.

How to generate a WebVTT file

1 Drop the video or audio

The captioning runs locally in this browser tab.

2 Wait for the transcript

Whisper produces a millisecond-timed transcript ready for either SRT or VTT export.

3 Optional editing

Fix any misheard words in the inline transcript editor before exporting.

4 Download VTT

Click Download VTT. The file lands in your downloads folder with the standard WEBVTT header and cue timing.

Frequently asked questions

What is the difference between SRT and VTT?

SRT is older, simpler, and universal across video editors and social platforms. VTT (WebVTT) is the web-native format and supports styling cues, positioning, and metadata that SRT cannot represent. For YouTube and social, use SRT. For HTML5 video on your own website, VTT is the cleaner pick.

How do I add a VTT file to an HTML5 video element?

Add a track element inside your video tag, with src pointing to the .vtt file, kind set to "captions" or "subtitles", and srclang set to the language code. The browser handles the rest and shows a CC button on the player automatically.

Does the VTT file support multiple languages?

One VTT file represents one language. To support multiple languages, generate one VTT per language, then add multiple track elements to the same video tag with different srclang attributes. The browser exposes a language picker in the captions menu.

Can I add styling cues to the VTT output?

Not from this page directly. The export produces standard caption blocks without styling cues. If you need cue settings (positioning, alignment, vertical text), you can add them manually to the downloaded .vtt file in any text editor.

Why does my video player not show the captions even after I added the track?

The most common cause is a CORS issue. Browsers refuse to load VTT files cross-origin by default, so the video file and the VTT file usually need to be served from the same origin. If they are not, the server holding the VTT file needs CORS headers allowing access from the video's origin.

Your video never leaves your device

All processing happens locally in your browser, and your files never leave your device. The page reads your video through a standard browser file input, holds the bytes in memory, runs Whisper for speech recognition in a Web Worker, and writes the captioned MP4 back to your disk. No upload, no cloud transcription queue, no external copy.

Related Tools and Resources

Free SRT Generator

Same flow, exports SRT for YouTube, NLEs, and social platforms.

Auto Caption Generator

Full flow including burn-in to MP4.

Closed Captions Generator

Accessibility-framed version with SDH conventions.

Subtitles and Text

Manual subtitle overlay flow for cases where transcription is not needed.