top of page

Can ChatGPT Transcribe Audio and How Does It Work?


ChatGPT & Audio Transcription — A Complete Overview


1. What “transcribe audio with ChatGPT” really means

Scenario

Built-in to ChatGPT?

How to do it today

Live dictation (you speak, words appear)

Yes — with the Voice icon in the mobile apps, desktop app, and web

Tap or click Voice, grant mic access, and start talking.

Continuous voice dialogue (“Jarvis mode”)

Yes — Advanced Voice (GPT-4o / 4o-mini) on Plus, Team, and Pro plans; daily preview for Free users

Same Voice icon → pick an output voice; ChatGPT listens and replies aloud in real time.

Upload a prerecorded file (MP3/WAV) for a transcript

No — not in the chat UI

Use the Whisper API or a third-party tool that calls it.


2. Option A — Speak directly in the ChatGPT interface

  • Platforms: iOS, Android, desktop app, and web.

  • How it works:

    1. Press Voice and start speaking.

    2. Standard Voice converts your speech to text (Whisper) before sending it to a GPT model.

    3. Advanced Voice feeds raw audio directly into GPT-4o for faster, more natural replies.

  • Cost & limits: Standard Voice is free; Advanced Voice uses a daily or monthly audio-minute allowance that varies by plan.

  • Best for: Quick dictation, hands-free Q&A.

  • Not for: Long recordings—you still can’t attach files in chat.


3. Option B — Whisper API for file transcription

Key facts

Details

Price

About $0.006 per audio minute, billed by the second.

Max upload size

25 MB per request—split larger files and stitch results later.

Languages

50-plus, auto-detected.

Output formats

json, text, srt, and vtt.

Typical uses

Meeting recordings, podcasts, caption generation, searchable archives.

Tip: Pipe the returned text into a ChatGPT completion call if you need summaries, action items, or analysis.


4. Option C — GPT-4o Realtime Audio (developer preview)

  • What it is: A streaming API that lets you send and receive live audio with sub-second latency.

  • Indicative pricing: Roughly $0.06 per audio minute in, $0.24 per minute out (token-based, subject to change).

  • Best for: Voice-first apps, live translation, call-center agents, or anywhere you need conversational AI on the phone.

  • Status: In preview for selected developers, with broader availability expected later in 2025.


5. Common pitfalls & quick fixes

Pitfall

Fix

Trying to drag-and-drop an audio file into chat

Use Whisper API or a wrapper; the chat UI still rejects audio uploads.

Oversize file errors (HTTP 400)

Compress to ≤25 MB or chunk and reassemble transcripts.

Unexpected Whisper costs

Whisper charges by duration, not file size—trim silence before uploading.

Privacy concerns

Enterprise and regulated customers can enable zero-retention or host an open-source Whisper model on-prem.

6. Quick decision tree

  1. Want instant dictation or voice chat?→ Use the Voice icon inside ChatGPT.

  2. Have a recorded file under 25 MB?→ Call the Whisper API.

  3. Building a fully voice-driven product with live audio?→ Explore the GPT-4o Realtime Audio preview.


__________

ChatGPT itself can’t take audio file uploads yet, but between free in-app dictation, the affordable Whisper API, and the new GPT-4o realtime stack, there’s a solution for nearly every audio-to-text need in 2025.

bottom of page