Alexander AI Voice
The open-source AI voice studio

Clone, dictate and create.

Clone voices, generate speech across seven TTS engines, dictate into any app, and talk to agents in voices you own. A free and local alternative to ElevenLabs and WisprFlow, running entirely on your machine.

macOS, Windows, Linux

try me!

Alexander AI Voice

Jarvis
Dry wit, composed British AI assistant
en
Samuel L. Jackson
Commanding intensity with sharp, punchy delivery
en
Bob Ross
Gentle, soothing voice full of quiet encouragement
en
Sam Altman
Measured, thoughtful Silicon Valley cadence
en
Morgan Freeman
Rich, warm baritone with gravitas and calm authority
en
Linus Tech Tips
Enthusiastic, fast-paced tech explainer energy
en
Fireship
Rapid-fire, deadpan tech humor with zero filler
en
Scarlett Johansson
Smooth, low alto with understated warmth
en
Dario Amodei
Calm, precise articulation with academic depth
en
David Attenborough
Warm, reverent narration with wonder and precision
en
Zendaya
Relaxed, modern delivery with effortless cool
en
Barack Obama
Measured cadence with rhythmic pauses and gravitas
en
Generate speech using Jarvis...
EnglishQwen 1.7BRobot
Morgan Freeman
enQwen 1.7B0:08
2 minutes ago
The neural pathways of human speech contain more complexity than any language model can fully capture, yet we keep pushing the boundaries of what is possible.
Samuel L. Jackson
enQwen 1.7B0:07
15 minutes ago
In a world increasingly shaped by artificial intelligence, the human voice remains our most powerful tool for connection and storytelling.
Jarvis
enQwen 0.6B0:09
1 hour ago
The architecture of modern text-to-speech systems reveals an elegant interplay between transformer models and acoustic feature prediction.
Bob Ross
enChatterbox0:06
3 hours ago
Welcome to the next chapter. Every great story begins with a single voice, and today that voice can be yours.
Linus Tech Tips
enQwen 1.7B0:05
5 hours ago
Local inference gives you complete control over your voice data. No cloud, no subscriptions, no compromises.
0:00/0:00
Sponsor Alexander AI Voice

Get your logo in front of 170k+ monthly visitors.

Alexander AI Voice is open-source and used by creators, voice artists, podcasters, writers, developers, accessibility users, and curious humans all over the world. Sponsor the project and your logo lands on the homepage, in the app, in the README, and on the sponsors page — in front of every one of them.

Become a sponsorFrom $500 / month

Professional voice tools, zero compromise

Everything you need to clone voices, generate speech, and produce multi-voice content — running entirely on your machine.

Near-Perfect Voice Cloning

Multiple TTS engines for exceptional voice quality. Clone any voice from a few seconds of audio with natural intonation and emotion.

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations between characters.

Audio Effects Pipeline

Apply pitch shift, reverb, delay, compression, and more — then save as presets. Preview effects live and set defaults per voice profile.

Local or Remote

Run GPU inference locally with Metal, CUDA, ROCm, Intel Arc, or DirectML — or connect to a remote machine. One-click server setup with automatic discovery.

Audio Transcription

Powered by Whisper for accurate speech-to-text. Automatically extract reference text from voice samples.

Unlimited Generation Length

Generate up to 50,000 characters in one go. Text is auto-split at sentence boundaries, generated per-chunk, and crossfaded seamlessly.

Any clip becomes a voice.

Three ways to get a sample in. Upload a clip, record from your microphone, or capture audio playing on your system. Alexander AI Voice clones the voice from as little as 3 seconds of audio.

Upload a clip
Drag and drop any audio file — WAV, MP3, FLAC, or WebM.
Record from microphone
Live waveform preview while you record. Up to 30 seconds.
System audio capture
Clone a voice from a YouTube video, podcast, or any app playing audio.
Start Recording

Click to record from your microphone.
Maximum duration: 30 seconds.

Capture

Dictate anywhere. Paste into any app.

Hold a shortcut anywhere on your machine, speak, release. The transcript lands in a focused text field in any app, or your clipboard. Agents speak back through the same pill in any cloned voice.

Hold
on macOS,
CtrlAlt
on Windows — from anywhere on your machine.
Recording
0:00
Whisper Base74M99 langs
Whisper Small244M99 langs
Whisper Medium769M99 langs
Whisper Large1.5B99 langs
Whisper Turbo809M99 langs

Whisper, sized for every machine

Base, Small, Medium, Large, and Turbo. Pick the size that fits your hardware and quality bar — 99 languages across every tier, all running locally.

rawum so like i think we should ship it on friday, actually no wait, tuesday
cleanI think we should ship it on Tuesday.
Qwen3 · refining...

Refined transcripts

A local LLM cleans ums, self-corrections, and punctuation without rephrasing. Optional, toggleable, and never leaves your machine.

via MCP·Claude Code
Speaking · Morgan
Tests passing. Ready to merge.

Agents speak in voices you own

Any MCP-aware agent — Claude Code, Cursor, Cline — gets a voice with one tool call. The pill surfaces when an agent is speaking, so you always see what’s coming out of your machine.

MCP

Every agent gets a voice.

One tool call — voicebox.speak— and any MCP-aware agent can talk to you in a voice you’ve cloned. Claude Code, Cursor, Cline, or anything that speaks MCP.

01Add Alexander AI Voice to your MCP config
{
  "mcpServers": {
    "voicebox": {
      "url": "http://127.0.0.1:17493/mcp"
    }
  }
}
02The tool is now available
// In any MCP-aware agent:
await voicebox.speak({
  text: "Deploy complete.",
  profile: "Morgan",
})
Also exposed as POST /speakfor anything that doesn’t speak MCP — ACP, A2A, shell scripts, or custom harnesses.
Claude Code
$claude run
Tests passing (42 files)
Build succeeded in 12.4s
voicebox.speak({ profile: "Morgan" })
$
On your desktop
Speaking · Morgan
Tests passing. Ready to merge.

Per-agent voice

Bind each MCP client to a voice profile. Claude Code in Morgan, Cursor in Scarlett — you know which agent is talking without looking.

Always visible

Every agent-initiated speech surfaces the pill. No silent background TTS — you always see what’s coming out of your machine.

Open protocols

MCP ships day one. ACP, A2A, and anything else built on a tool-call primitive slots into the same endpoint.

Personalities

Voices with a personality.

Give any voice profile a free-form personality. Then Rewrite your text in their voice, or let them Compose a fresh line of their own — your cloned voice, in full character.

Marlowe
Voice profile · cloned from a 12s sample
Personality

1940s noir detective. World-weary, cynical, every situation a metaphor for the city's underbelly. Talks like he's seen one stack trace too many.

Rewrite
Compose
Your text
the build is done and we shipped to production
In character
Marlowe, in character
Build's wrapped, ship's left the dock. Another stack of code makes its way into prod, another row of green checks lining the wall.

Rewrite

Restate your text in their voice while preserving every idea. Same content, their delivery — for scripts, dubs, and consistent character voice across long-form work.

Compose

No input needed — hit the button and the character improvises a fresh line of their own. Roll again for another take. Useful for game dialogue, narration cues, or character barks.

Built-in REST API

Your local voice API

Every engine you download becomes a REST endpoint on your machine. Build apps, games, and voice tools with full programmatic control — no API keys, no rate limits, no per-character fees.

API Reference
http://127.0.0.1:17493
POST/generateGenerate speech
POST/generate/{id}/cancelCancel a generation
GET/profilesList voice profiles
POST/profilesCreate a new profile
GET/models/statusModel catalog & state
GET/historyPast generations
GET/healthServer health
Generate a linecurl
curl -X POST http://127.0.0.1:17493/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to the game, player one.",
    "profile_id": "b3f1c2d4-5e6f-4a7b-8c9d-0e1f2a3b4c5d",
    "engine": "qwen_custom_voice",
    "instruct": "warm, slow, cinematic"
  }' \
  --output line.wav

Games

Generate NPC dialogue on the fly, localize characters into new languages, or ship expressive voice lines without a studio.

Apps & agents

Give your app or AI agent a voice. Real-time narration, accessibility readouts, voice replies — all running on the user's machine.

Scripts & tools

Batch-generate audiobook chapters, automate podcast intros, or wire Alexander AI Voice into your Stream Deck. It's just a localhost URL.

No API keysNo rate limitsNo per-character feesWorks offlineYour audio, your machine

Supported models

Pick the right model for every job — TTS, transcription, refinement. All models run locally on your hardware. Download once, use forever.

TTS Engines

Text → speech. Voice cloning, preset voices, and delivery control.

07 models

Qwen3-TTS

by Alibaba
1.7B0.6B

High-quality multilingual cloning with natural prosody. The only engine with delivery instructions — control tone, pace, and emotion with natural language.

10 langsDelivery instructions

Chatterbox

by Resemble AI

Production-grade voice cloning with the broadest language support. 23 languages with zero-shot cloning and emotion exaggeration control.

23 langs

Chatterbox Turbo

by Resemble AI
350M

Lightweight and fast. Supports paralinguistic tags — embed [laugh], [sigh], [gasp] directly in your text for expressive speech.

Fast[tag] support

LuxTTS

by ZipVoice

Ultra-fast, CPU-friendly cloning at 48kHz. Exceeds 150x realtime on CPU with ~1GB VRAM. The fastest engine for quick iterations.

150x realtime48kHz

Qwen CustomVoice

by Alibaba
1.7B0.6B

Nine premium preset speakers with natural-language style control. "Speak slowly with warmth", "authoritative and clear" — tone and pace adapt.

Instruct control10 langs

TADA

by Hume AI
3B1B

Speech-language model with text-acoustic dual alignment. Built for long-form — 700s+ coherent audio without drift. Multilingual at 3B.

10 langsLong-form

Kokoro

by hexgrad · Apache 2.0
82M

Tiny 82M-parameter TTS that runs at CPU realtime with negligible VRAM. Pre-built voice styles — pick a voice, type, generate.

CPU realtimePreset voices

Transcription

Speech → text. Multi-language STT for dictation and captures.

02 models

Whisper

by OpenAI
1.5B769M244M74M

The default. Mature multilingual ASR across a wide size range — pick Tiny for speed or Large for best accuracy.

99 langs

Whisper Turbo

by OpenAI
809M

Pruned Whisper Large v3. Near-best quality at roughly 8x the speed — the right default for real-time dictation.

99 langs8x faster

Language Models

Transcript refinement, persona replies, and on-device reasoning.

01 model

Qwen3

by Alibaba
4B1.7B0.6B

Powers transcript cleanup, persona voice replies, and the voice I/O loop. Shares its runtime with the TTS/STT stack — one model cache, one GPU story.

RefinementPersona replies

Download Alexander AI Voice

Available for macOS, Windows, and Linux. No dependencies required.