Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

About Closed Captions Generator

This page generates the speech part of closed captions automatically, then provides an inline editor where you can add the rest by hand. The convention is to put non-speech audio in square brackets, like [door slams] or [laughs], and to put speaker labels at the start of a caption when there are multiple speakers and the speaker changes. Both conventions are followed in the editor: square brackets render as captions, and a leading word followed by a colon registers as a speaker label.

If your captions need formal WCAG compliance, two extra steps usually matter beyond auto-generation. Review every line for accuracy by listening to the audio while reading the caption, and verify the timing is within 200 milliseconds of the actual speech. Auto-captions are a starting point; compliance comes from the review pass that follows.

When this fits

University course recordings

Most universities require captions on recorded lectures for ADA compliance. The auto-flow produces 90 percent of the work; the review pass and the SDH cues for non-speech audio are the remaining 10 percent.

Government and public-sector video

Section 508 in the US and the EU Web Accessibility Directive both require captions on government video content. SRT and VTT exports from this page satisfy the file-format requirements.

Corporate accessibility programs

Internal training, town halls, and public-facing marketing all increasingly require closed captions. A single tool that handles transcription, SDH editing, and export keeps the workflow simple.

Healthcare and pharma video

Patient education and clinical training video usually fall under regulated content rules that require captions. Browser-local processing avoids the data-transfer question entirely.

How to generate closed captions

1 Upload the video

Browser-local processing means the file stays on your device through the captioning and review pass.

2 Run the transcript

Whisper produces the speech portion of the captions with accurate timecodes.

3 Add SDH cues

In the editor, add bracketed non-speech audio cues and speaker labels where needed. Listen while you review.

4 Export SRT or VTT

Download the captions as SRT for YouTube and NLE workflows, or VTT for HTML5 video on a website you control.

Frequently asked questions

What is the difference between subtitles and closed captions?

Subtitles transcribe spoken dialog and are aimed at viewers who can hear the audio but want the text (translation, noisy environments). Closed captions add non-speech audio cues like [door slams] and speaker labels for deaf and hard-of-hearing viewers. Closed captions are required for WCAG compliance; subtitles are not.

Are auto-generated closed captions WCAG compliant?

Auto-generation alone is not sufficient. WCAG requires that captions accurately represent the audio, which means a human review pass to fix transcription errors and add non-speech audio cues. This page automates the speech transcription part and provides the editor for the review pass; the compliance call comes from the review.

How do I add non-speech audio cues like [music playing]?

In the inline transcript editor, edit any caption block and add the bracketed cue on its own line or alongside the spoken text. The convention is short descriptions like [music plays], [door slams], [audience laughs], and [phone rings]. The square brackets signal to viewers that the content is non-speech audio.

How do I add speaker labels?

Type the speaker name followed by a colon at the start of the caption (for example, "Host: Welcome to the show"). When the speaker changes mid-segment, split the segment in the editor and add a new speaker label to the second half. Two letter abbreviations work too if the speakers were introduced earlier.

Does this support SDH (Subtitles for the Deaf and Hard-of-hearing)?

SDH is the practical name for closed captions with non-speech audio cues. The conventions described above (bracketed audio cues, speaker labels at change-of-speaker, accurate timing) match what SDH calls for. The output format is standard SRT or VTT, which players treat as either subtitles or closed captions depending on the metadata you set when uploading.

Your video never leaves your device

All processing happens locally in your browser, and your files never leave your device. The page reads your video through a standard browser file input, holds the bytes in memory, runs Whisper for speech recognition in a Web Worker, and writes the captioned MP4 back to your disk. No upload, no cloud transcription queue, no external copy.

Related Tools and Resources

Auto Caption Generator

Faster flow for subtitles when WCAG compliance is not the goal.

Free SRT Generator

SRT-only export flow.

Free VTT Generator

WebVTT export for HTML5 video.

Podcast Captions and Transcripts

Audio-only version of the same accessibility flow.