DocsProvider Capabilities

Provider Capabilities

Feature support by provider and tier. Unsupported modifiers return 400.

Capability Matrix

ProviderTierSSMLMarkupSpeedMulti-SpeakerModel SelectionPromptBitrate Config
Google[1]PremiumConditionalConditionalConditionalNoNoNoYes
Google[2]UltraYesNoNoConditionalYesYesYes
PollyPremiumYesNoNoNoNoNoNo
PollyUltraYesNoNoNoNoNoNo
KokoroPremiumNoNoYesNoNoNoYes

[1] SSML, markup, and speed vary by voice family — check GET /api/v1/voices.

[2] Multi-speaker and text limits vary by selected model — see Google Ultra Model Details.

Models By Tier

These are the public voice families you will see in GET /api/v1/voices, grouped by provider and tier.

ProviderTierModels
GooglePremiumCasual, Chirp-HD, Chirp3-HD, Neural2, News, Polyglot, Standard, Studio, Wavenet
GoogleUltraGemini 2.5 Flash TTS, Gemini 2.5 Pro TTS, Gemini 2.5 Flash Lite Preview TTS, Gemini 3.1 Flash TTS Preview
PollyPremiumGenerative, Neural, Standard
PollyUltraLong-Form
KokoroPremiumKokoro

Looking for supported languages and live voice availability? Browse the Voice Library for a visual view, or use GET /api/v1/voices for the full API response. For per-voice capability discovery (streaming, formats, limits, speed, prompt), use GET /api/v1/voices/{voice_id}.

Text And Prompt Limits

The limits below show the maximum accepted text for each provider and tier, plus prompt limits where prompt is supported. Google Ultra limits vary by selected model.

ProviderTierMax Text(bytes)Streaming Max(bytes)Prompt Limits(bytes)
GooglePremium500,0005,000Not supported
GoogleUltraVaries by modelVaries by modelVaries by model
PollyPremium100,0003,000Not supported
PollyUltra100,0003,000Not supported
KokoroPremium5,0005,000Not supported

When prompt is used, chars_charged includes both text and prompt bytes.

Google Ultra Model Details

Google Ultra limits vary by selected model. Use the model field in your request to select a specific model.

ModelMax TextStream MaxPrompt MaxCombined MaxMulti-Speaker
Flash TTSgemini-2.5-flash-tts4,0004,0004,0008,000Yes
Pro TTSgemini-2.5-pro-tts4,0004,0004,0008,000Yes
Flash Lite Previewgemini-2.5-flash-lite-preview-tts300300300300No
3.1 Flash TTS Previewgemini-3.1-flash-tts-preview4,0004,0004,0008,000Yes

Flash Lite Preview is a reduced-capacity preview model best suited for very short text. Use Flash TTS or Pro TTS for longer content.

Voice-Level Capabilities

The matrix above covers provider/tier-level features. For voice-specific capabilities, use GET /api/v1/voices. Each voice includes:

FieldDescription
supports_ssmlWhether this voice supports SSML input format
supports_markupWhether this voice supports markup input format
supports_multispeakerWhether this voice supports multi-speaker mode
model_typeVoice tier: "premium" or "ultra"
languageVoice language code (e.g., en-US)
providerVoice provider (e.g., google, polly, kokoro)
available_modelsAvailable ultra model IDs for this voice (ultra voices with model selection only)

Always check voice-level fields before sending optional modifiers. See GET /api/v1/voices for the full response shape.

Output Formats

The async API (POST /api/v1/tts) supports three output formats, identical across all providers. Set output_format in your request (default: wav).

FormatSample RatesBitrate RangeDefault BitrateNotes
wav8k–48kUncompressed PCM. Largest files, no bitrate setting.
mp38k–48k32–320 kbps128 kbpsWidest playback compatibility.
ogg_opus8k, 12k, 16k, 24k, 48k6–320 kbps64 kbpsBest quality-to-size ratio.

Bitrate is configurable for Google and Kokoro (output_bitrate_kbps). Other providers manage bitrate internally. Use sample_rate_hertz to control sample rate for any format.

Streaming Delivery

Short-form generation supports streaming delivery. In the web app, use Stream Preview in the advanced settings on /tts. For the public API, use POST /api/v1/tts/stream. Every streaming request creates a normal job with the same billing and storage behavior as async generation. If matching audio is already available, the service returns the completed result immediately instead of opening a new live stream. Streaming is short-form only — maximum text length varies by provider. See the Streaming Max column in the Text and Prompt Limits table.

ProviderTierFamilyStreamingDefault TransportSupported Formats (API v1)Notes
GooglePremiumChirp3HDYesogg_opusogg_opus (24k/48k), wav (24k/48k), pcm (24k/48k), mulaw (8k), alaw (8k)Max speaking rate: 2.0
GooglePremiumChirpHDYesogg_opusogg_opus (24k/48k), wav (24k/48k), pcm (24k/48k), mulaw (8k), alaw (8k)Max speaking rate: 2.0
GooglePremiumWavenetNo
GooglePremiumNeural2No
GooglePremiumStudioNo
GooglePremiumStandardNo
GooglePremiumCasualNo
GooglePremiumPolyglotNo
GooglePremiumNewsNo
GoogleUltraGemini TTSYeswavwav (24k), pcm (24k)Speaking rate not supported
PollyPremiumStandardYesogg_opusogg_opus (48k), mp3 (8k–24k), wav (8k/16k), ogg_vorbis (8k–24k), pcm (8k/16k), mulaw (8k), alaw (8k)
PollyPremiumNeuralYesogg_opusogg_opus (48k), mp3 (8k–24k), wav (8k/16k), ogg_vorbis (8k–24k), pcm (8k/16k), mulaw (8k), alaw (8k)
PollyPremiumGenerativeYesogg_opusogg_opus (48k), mp3 (8k–24k), wav (8k/16k), ogg_vorbis (8k–24k), pcm (8k/16k), mulaw (8k), alaw (8k)
PollyUltraLong-FormYesogg_opusogg_opus (48k), mp3 (8k–24k), wav (8k/16k), ogg_vorbis (8k–24k), pcm (8k/16k), mulaw (8k), alaw (8k)
KokoroPremiumKokoroYesogg_opusogg_opus (24k), mp3 (24k), wav (24k), pcm (24k)

API v1 callers can request any supported format from the matrix above via output_format and sample_rate_hertz. If omitted, the default stream format is used. The /tts web app exposes only the browser-playable subset (ogg_opus, mp3, wav). Bitrate selection is not available in stream mode. The saved artifact uses the same streamed format. See API v1 Streaming for request/response details.

Back to Documentation

© 2026 AI TTS Microservice. All rights reserved.