Drop your video or audio file here

or click to browse

Supports video and audio files up to 2048MB

About Auto Captions, No Upload

This page takes the WebAssembly path. When you select a video, the bytes are read by JavaScript into browser memory. A Web Worker pulls down a quantised Whisper model on first run (cached after that, around 80 MB), and runs the speech recognition pass entirely inside the worker thread. FFmpeg, also compiled to WebAssembly, handles the audio extraction and the optional burn-in step. The final output, whether an SRT file or a captioned MP4, comes off your own disk.

The verification you can do in 30 seconds: open DevTools before you drop the file, switch to the Network tab, filter to XHR and fetch. You will see the initial bundle load, the one-time model fetch, and nothing else. Compare that filter against the same step on Kapwing, VEED, Clipchamp, or Canva and the difference shows up immediately as a multi-megabyte POST going out.

When this fits

NDA and embargoed content

Sponsorship material before launch, unreleased product demos, and any video tied to a confidentiality clause should not be uploaded to a third-party caption server, even briefly.

Medical and legal recordings

Patient consultations and legal depositions carry storage requirements that a cloud captioning service usually cannot satisfy. Browser-local processing keeps the question off the table.

Air-gapped and offline environments

After first load, the captioning flow works without a network connection. Secure facilities, trains with patchy connectivity, and airplane mode all work.

Enterprise compliance reviews

When the security team has to approve a captioning tool for internal use, "no upload" is a much shorter conversation than "uploads but encrypted in transit and at rest with these specific retention guarantees".

How to caption a video without uploading

1 Open the page

The Whisper model and FFmpeg WASM binary download on first visit and cache for future runs.

2 Drop the file

Your video stays in browser memory. The Network panel in DevTools will confirm no outbound request includes your media.

3 Verify locally

While the transcription runs, watch the Network panel. You will see no upload, only the cached model and bundle loading.

4 Export

Download the SRT, VTT, or burned-in MP4. The output is written back to your disk by the browser, with no server involvement.

Frequently asked questions

How can a caption tool run without uploading anything?

The captioning pipeline (Whisper for speech recognition, FFmpeg for audio extraction and burn-in) is compiled to WebAssembly and shipped to your browser as part of the page. Your browser executes the WASM code locally against your video file. The browser tab is doing the work that other captioning services do on their servers.

Is this slower than cloud-based captioning?

For short and medium videos, no, because cloud captioning pays its time cost in upload and queue wait that browser-local skips. For very long videos (over an hour), it can be slower because a single laptop CPU competes with a multi-GPU cloud server. The crossover point is around 30 minutes of input audio on a recent laptop.

What is the one-time download on first visit?

About 80 MB combined: the Whisper model weights and the FFmpeg WASM binary. Both cache in your browser after the first run, so subsequent visits load instantly. The download itself is the only outbound network traffic involved in the captioning flow.

How do I verify that nothing is uploaded?

Open DevTools, switch to the Network panel, filter to fetch and XHR requests. Drop a video in and let it transcribe. You will see only the initial app bundle, the model fetch on first visit, and no further outbound requests for your media. This check works against any web tool claiming to be browser-local.

What if my IT policy blocks WebAssembly?

Some enterprise security policies disable WASM in browsers. If WASM is blocked, this page will fail at the captioning step. The simplest workaround is to load the page on a personal device or in a browser profile that allows WASM. There is no server fallback because the entire pipeline depends on WASM running.

Your video never leaves your device

All processing happens locally in your browser, and your files never leave your device. The page reads your video through a standard browser file input, holds the bytes in memory, runs Whisper for speech recognition in a Web Worker, and writes the captioned MP4 back to your disk. No upload, no cloud transcription queue, no external copy.

Related Tools and Resources

Auto Captions, No Signup

Same browser-local engine with no account gate either.

Free Captions, No Watermark

Clean burn-in output, no logo, no signup wall.

Private video editor

Same browser-local architecture applied to full video editing.

Video editor, no upload

Technical explanation of the WebCodecs + FFmpeg WASM editor.