Home

Use Whisper on Mac to Transcribe Audio and Video Files Instantly in Terminal

Turn your Mac into a powerful transcription machine using the same AI that powers OpenAI's ChatGPT. With just a few Terminal commands, you can convert audio and video files into accurate text in minutes.

If you've never touched Terminal before, don't worry — setting up Whisper on macOS Sequoia 15 is easier than it looks and worth it. Whether you're working with YouTube videos, interviews, lectures, or voice notes, Whisper can handle all the heavy lifting.

Whisper is a free, open-source speech-to-text neural network from OpenAI that runs entirely on your machine — no internet required after setup. Once you get it going, it's fast, secure, and dead simple — and it can chew through just about any audio or video format you throw at it. And it's the perfect tool if you're sick of glitchy web-based transcription services, expensive Mac apps, and clunky browser extensions with limitations, such as file size caps, watermarks, ads, or lousy accuracy.

Yes, it lives in the Terminal — that black box of mystery most folks avoid. But here's the thing: if you can copy and paste, you can run Whisper. Once it's installed, transcribing a file is literally one line. There is no bloated interface, no uploading and waiting, and no monthly fee.

And if you're not ready to mess with the command line? You still have options. There are Mac apps like MacWhisper and Whisper Transcription that give you a drag-and-drop interface powered by Whisper under the hood. Browser-based services like Whisper demo on Hugging Face make it even easier — though you'll usually trade some privacy and flexibility for convenience. However, the command-line version is still the most powerful and flexible way to use Whisper, and it's the official implementation maintained by OpenAI. If you want complete control, this is the version you want.

Or you can skip all of it and just send ChatGPT the file via its web or desktop app — it can transcribe or translate it for you using Whisper.

So if you're tired of jumping through hoops just to get clean transcripts — whether you're a student, podcaster, journalist, or just someone trying to archive your Zoom calls — it's time to take five minutes and set up something that just works. Let's dive in.

Requirements

Through the instructions below, you'll be installing and using the following tools:

Whisper command-line tool from OpenAI: The core transcription engine that converts speech to text.
FFmpeg: Required for Whisper to open, convert, and process audio and video files.
Python 3.10 or later: The programming language Whisper is written in.
Homebrew: A package manager that makes installing Whisper, FFmpeg, and Python easy.

To run these tools successfully, you'll need:

A Mac running macOS Monterey 12.3 or later: Preferably macOS Sequoia 15 or later on an Apple Silicon chip for faster performance.
At least 8 GB of RAM and some free disk space: Larger Whisper models can use a lot of memory — especially on long files — but smaller models work fine on most setups.
Terminal app: Preinstalled on macOS — you'll use it to enter the setup and transcription commands.

Setting up Whisper on macOS

Follow these steps to install everything you need and start transcribing files. If you already have Homebrew, Python, and FFmpeg installed, it's still worth checking those steps out to ensure everything is up to date.

Open Terminal on your Mac

Terminal is the command-line app built into macOS — it’s how you’ll install and run Whisper. You don’t need to know how to code, just how to paste in commands. To open Terminal, press Command + Space, type “Terminal,” and hit Return. You can also find it in the Utilities folder in your Applications directory or in the Other folder in Launchpad.

Mac Launchpad’s “Other” folder showing system utility apps like Terminal, Activity Monitor, Disk Utility, and Console, with a magnified focus on the Terminal app icon.

Install or update Homebrew

Homebrew is a package manager for macOS — like the App Store but for powerful command-line tools. It makes it easy to install everything Whisper needs behind the scenes.

If you don't have Homebrew installed, paste this command and press Return:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

This command may look intimidating, but here's what it all means:

/bin/bash is the path to the Bash shell binary on macOS.
-c tells your Mac to run the following command (provided as a string) in the Bash shell.
The part in quotes — "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" — uses curl (a command-line tool that fetches data from the internet) to download Homebrew's official installer script from GitHub.
Here's what those flags mean:
- -f = Fail silently on server errors. This prevents incomplete or corrupted downloads from being processed.
- -s = Run silently. This suppresses progress output and error messages.
- -S = Show errors if any occur. This only displays error messages when used with -s, allowing silent mode to still report problems if something goes wrong.
- -L = Follow redirects automatically. This is important for handling URL redirects, which are common when downloading from GitHub.

This one-liner downloads the official Homebrew install script from GitHub, pipes it directly into the Bash shell using -c, and executes it to install Homebrew automatically.

If Homebrew is already installed, update it by running:

brew update

Install Python 3.10 (or newer)

Python is the programming language Whisper is written in. Apple includes an older version on macOS, but Whisper needs a newer one to run properly. Homebrew makes it easy to install the correct version.

Whisper requires Python 3.10 or above. Install it with:

brew install python

If you already have Python installed but aren't sure if it's the right version, check it with:

python3 --version

If it's older than 3.10, you can upgrade it with:

brew upgrade python

You're good to go once you're on Python 3.10 or newer.

Install FFmpeg

FFmpeg is a tool for processing audio and video files. It helps Whisper handle all kinds of media formats, such as MP3, MP4, M4A, and WAV. Without FFmpeg, Whisper can't read or convert your files.

To install it using Homebrew:

brew install ffmpeg

If you already have FFmpeg installed, make sure it's up to date:

brew upgrade ffmpeg

You can verify that FFmpeg is working by running:

ffmpeg -version

If it prints version info, you're good.

Install Whisper via pip

Pip is Python's built-in package manager — it's how you install Python apps like Whisper. You'll use pip to download and install Whisper directly from OpenAI's GitHub repository.

First, make sure pip is up to date:

pip3 install --upgrade pip

Then install Whisper:

pip3 install git+https://github.com/openai/whisper.git

Run a transcription with Whisper

Once Whisper is installed, you can transcribe audio and video files (MP3, MP4, M4A, WAV, and more) using a single command. It supports a range of pretrained models, from lightweight and fast to large and highly accurate.

Audio files are transcribed much faster than video files, so you may want to extract the audio from your videos and use that with Whisper instead — especially when working with a larger model. On a Mac, you can quickly export audio from a video file using QuickTime Player.

Basic usage (auto-detects language)

The --model tiny option runs the fastest and uses the least memory, while the --model large option offers the best accuracy but requires significantly more RAM and takes longer to process.

whisper your_file.mp4 --model tiny
whisper your_file.mp4 --model base
whisper your_file.mp4 --model small
whisper your_file.mp4 --model medium
whisper your_file.mp4 --model large

Specify language for faster, more accurate results

If you know your file is in English, you can specify it using --language en or --language English:

whisper your_file.mp4 --language English --model tiny
whisper your_file.mp4 --language English --model base
whisper your_file.mp4 --language English --model small
whisper your_file.mp4 --language English --model medium
whisper your_file.mp4 --language English --model large

When using one of the commands above, the output will print directly in the same Terminal window.

However, Whisper can create .txt (plain transcript), .srt (standard subtitle format used by most video players and editors), and .vtt (Web Video Text Tracks format used for HTML5 video, YouTube, etc.) transcription files in the same directory as the original media file. If needed, add flags like --output_format txt (to specify a specific format) or --task translate (which automatically translates foreign languages into English).

For example, the following transcribes the file in English and outputs it to a .txt document in the same directory.

whisper your_file.mp4 --language en --model small --output_format txt

To generate subtitles for a foreign-language video in English, the following command will generate .txt, .srt, and .vtt transcription files in the same folder as your video or audio file.

whisper your_file.mp4 --task translate --model medium

Want just subtitle files (like .srt) and not the plain text transcript? Run:

whisper your_file.mp4 --language en --task translate --output_format srt

To see all available options:

whisper --help

Final thoughts

Whisper in Terminal isn't just a transcription tool — it's a secret weapon for creators, journalists, students, and anyone who deals with spoken content. The setup process might feel a bit technical the first time, but once it's up and running, it's incredibly simple to use.

That said, Whisper models run locally and can be slow, depending on your Mac's hardware. If you work with large files and want faster results, stick to the tiny or base models. If you need higher accuracy and don't mind the extra processing time, go for medium or large.

Full list of Whisper arguments and options

If you want to explore everything Whisper can do — including output formats, language support, and advanced flags — you can run whisper --help in Terminal. Here's the complete list of available options for quick reference:

usage: whisper [-h] [--model MODEL] [--model_dir MODEL_DIR] [--device DEVICE]
               [--output_dir OUTPUT_DIR]
               [--output_format {txt,vtt,srt,tsv,json,all}]
               [--verbose VERBOSE] [--task {transcribe,translate}]
               [--language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}]
               [--temperature TEMPERATURE] [--best_of BEST_OF]
               [--beam_size BEAM_SIZE] [--patience PATIENCE]
               [--length_penalty LENGTH_PENALTY]
               [--suppress_tokens SUPPRESS_TOKENS]
               [--initial_prompt INITIAL_PROMPT]
               [--condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT]
               [--fp16 FP16]
               [--temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK]
               [--compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD]
               [--logprob_threshold LOGPROB_THRESHOLD]
               [--no_speech_threshold NO_SPEECH_THRESHOLD]
               [--word_timestamps WORD_TIMESTAMPS]
               [--prepend_punctuations PREPEND_PUNCTUATIONS]
               [--append_punctuations APPEND_PUNCTUATIONS]
               [--highlight_words HIGHLIGHT_WORDS]
               [--max_line_width MAX_LINE_WIDTH]
               [--max_line_count MAX_LINE_COUNT]
               [--max_words_per_line MAX_WORDS_PER_LINE] [--threads THREADS]
               [--clip_timestamps CLIP_TIMESTAMPS]
               [--hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD]
               audio [audio ...]

positional arguments:
  audio                 audio file(s) to transcribe

options:
  -h, --help            show this help message and exit
  --model MODEL         name of the Whisper model to use (default: turbo)
  --model_dir MODEL_DIR
                        the path to save model files; uses ~/.cache/whisper by
                        default (default: None)
  --device DEVICE       device to use for PyTorch inference (default: cpu)
  --output_dir OUTPUT_DIR, -o OUTPUT_DIR
                        directory to save the outputs (default: .)
  --output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}
                        format of the output file; if not specified, all
                        available formats will be produced (default: all)
  --verbose VERBOSE     whether to print out the progress and debug messages
                        (default: True)
  --task {transcribe,translate}
                        whether to perform X->X speech recognition
                        ('transcribe') or X->English translation ('translate')
                        (default: transcribe)
  --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}
                        language spoken in the audio, specify None to perform
                        language detection (default: None)
  --temperature TEMPERATURE
                        temperature to use for sampling (default: 0)
  --best_of BEST_OF     number of candidates when sampling with non-zero
                        temperature (default: 5)
  --beam_size BEAM_SIZE
                        number of beams in beam search, only applicable when
                        temperature is zero (default: 5)
  --patience PATIENCE   optional patience value to use in beam decoding, as in
                        https://arxiv.org/abs/2204.05424, the default (1.0) is
                        equivalent to conventional beam search (default: None)
  --length_penalty LENGTH_PENALTY
                        optional token length penalty coefficient (alpha) as
                        in https://arxiv.org/abs/1609.08144, uses simple
                        length normalization by default (default: None)
  --suppress_tokens SUPPRESS_TOKENS
                        comma-separated list of token ids to suppress during
                        sampling; '-1' will suppress most special characters
                        except common punctuations (default: -1)
  --initial_prompt INITIAL_PROMPT
                        optional text to provide as a prompt for the first
                        window. (default: None)
  --condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT
                        if True, provide the previous output of the model as a
                        prompt for the next window; disabling may make the
                        text inconsistent across windows, but the model
                        becomes less prone to getting stuck in a failure loop
                        (default: True)
  --fp16 FP16           whether to perform inference in fp16; True by default
                        (default: True)
  --temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK
                        temperature to increase when falling back when the
                        decoding fails to meet either of the thresholds below
                        (default: 0.2)
  --compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD
                        if the gzip compression ratio is higher than this
                        value, treat the decoding as failed (default: 2.4)
  --logprob_threshold LOGPROB_THRESHOLD
                        if the average log probability is lower than this
                        value, treat the decoding as failed (default: -1.0)
  --no_speech_threshold NO_SPEECH_THRESHOLD
                        if the probability of the <|nospeech|> token is higher
                        than this value AND the decoding has failed due to
                        `logprob_threshold`, consider the segment as silence
                        (default: 0.6)
  --word_timestamps WORD_TIMESTAMPS
                        (experimental) extract word-level timestamps and
                        refine the results based on them (default: False)
  --prepend_punctuations PREPEND_PUNCTUATIONS
                        if word_timestamps is True, merge these punctuation
                        symbols with the next word (default: "'“¿([{-)
  --append_punctuations APPEND_PUNCTUATIONS
                        if word_timestamps is True, merge these punctuation
                        symbols with the previous word (default:
                        "'.。,，!！?？:：”)]}、)
  --highlight_words HIGHLIGHT_WORDS
                        (requires --word_timestamps True) underline each word
                        as it is spoken in srt and vtt (default: False)
  --max_line_width MAX_LINE_WIDTH
                        (requires --word_timestamps True) the maximum number
                        of characters in a line before breaking the line
                        (default: None)
  --max_line_count MAX_LINE_COUNT
                        (requires --word_timestamps True) the maximum number
                        of lines in a segment (default: None)
  --max_words_per_line MAX_WORDS_PER_LINE
                        (requires --word_timestamps True, no effect with
                        --max_line_width) the maximum number of words in a
                        segment (default: None)
  --threads THREADS     number of threads used by torch for CPU inference;
                        supercedes MKL_NUM_THREADS/OMP_NUM_THREADS (default:
                        0)
  --clip_timestamps CLIP_TIMESTAMPS
                        comma-separated list start,end,start,end,...
                        timestamps (in seconds) of clips to process, where the
                        last end timestamp defaults to the end of the file
                        (default: 0)
  --hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD
                        (requires --word_timestamps True) skip silent periods
                        longer than this threshold (in seconds) when a
                        possible hallucination is detected (default: None)

Don't Miss: How to Remove or Add 'Where from' Metadata in Files on macOS

Cover photo, screenshots, and GIFs by Gadget Hacks.

Apple's iOS 26 and iPadOS 26 updates are packed with new features, and you can try them before almost everyone else. First, check our list of supported iPhone and iPad models, then follow our step-by-step guide to install the iOS/iPadOS 26 beta — no paid developer account required.