How to Use a TTS API: From First Request to Production

Integrating a text-to-speech API into your app is straightforward once you understand the pattern: authenticate, submit text, poll for the result, and retrieve the audio. This tutorial walks through each step with working examples in curl, Python, and Node.js — using the AI TTS Microservice API as the reference. It covers the async flow (submit, poll, download) and introduces streaming delivery for real-time playback.

Step 1: Get an API Key

API access requires a key. In AI TTS Microservice, you create one from the Dashboard under the API tab. The key is shown once — copy it immediately. Keys are prefixed with tts_.

API keys are available to pay-as-you-go, Pro, and Enterprise accounts.

Step 2: Submit Your First Request

The generation endpoint accepts text and a voice ID, and returns a job ID immediately. The audio is generated asynchronously — you don't wait for it in the same request.

curl

curl -X POST https://aitts.theproductivepixel.com/api/v1/tts \
  -H "Authorization: Bearer tts_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world! This is my first TTS request.",
    "voice_id": "kokoro:en-US-Kokoro-Bella"
  }'

Python

import requests

API_KEY = "tts_YOUR_KEY"
BASE = "https://aitts.theproductivepixel.com/api/v1"

response = requests.post(
    f"{BASE}/tts",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "text": "Hello world! This is my first TTS request.",
        "voice_id": "kokoro:en-US-Kokoro-Bella"
    }
)
data = response.json()["data"]
print(f"Job ID: {data['job_id']}")
print(f"Characters charged: {data['chars_charged']}")

Node.js

const API_KEY = "tts_YOUR_KEY";
const BASE = "https://aitts.theproductivepixel.com/api/v1";

const response = await fetch(`${BASE}/tts`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    text: "Hello world! This is my first TTS request.",
    voice_id: "kokoro:en-US-Kokoro-Bella"
  })
});
const { data } = await response.json();
console.log(`Job ID: ${data.job_id}`);

The response comes back immediately with HTTP 202:

{
  "success": true,
  "data": {
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "pending",
    "poll_url": "/api/v1/tts/550e8400-e29b-41d4-a716-446655440000",
    "audio_endpoint": "/api/v1/tts/550e8400-e29b-41d4-a716-446655440000/audio",
    "chars_charged": 43
  }
}

Step 3: Poll for the Result

Use the job ID to check status. Poll until the status is completed or failed.

curl

curl https://aitts.theproductivepixel.com/api/v1/tts/YOUR_JOB_ID \
  -H "Authorization: Bearer tts_YOUR_KEY"

Python

import time

job_id = data["job_id"]

while True:
    status_resp = requests.get(
        f"{BASE}/tts/{job_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    status = status_resp.json()["data"]

    if status["status"] == "completed":
        print(f"Audio endpoint: {status['audio_endpoint']}")
        break
    elif status["status"] == "failed":
        print(f"Failed: {status['error']}")
        break

    time.sleep(1)

Node.js

async function pollForResult(jobId) {
  while (true) {
    const res = await fetch(`${BASE}/tts/${jobId}`, {
      headers: { "Authorization": `Bearer ${API_KEY}` }
    });
    const { data: status } = await res.json();

    if (status.status === "completed") return status.audio_endpoint;
    if (status.status === "failed") throw new Error(status.error?.message);

    await new Promise(r => setTimeout(r, 1000));
  }
}

const audioEndpoint = await pollForResult(data.job_id);
console.log(`Audio endpoint: ${audioEndpoint}`);

When complete, the response includes a signed audio URL:

{
  "success": true,
  "data": {
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "completed",
    "audio_url": "https://storage.googleapis.com/...",
    "audio_url_expires_at": "2026-01-08T10:00:00.000Z",
    "audio_endpoint": "/api/v1/tts/550e8400-e29b-41d4-a716-446655440000/audio",
    "chars_charged": 43
  }
}

The audio_url is a temporary signed URL that expires after 24 hours — it's provided for backward compatibility. For production use, prefer audio_endpoint (e.g. /api/v1/tts/{job_id}/audio), which returns a fresh signed URL via redirect and never expires as long as the audio is retained.

Step 4: Download the Audio

# Using the temporary audio_url (expires in 24h):
curl -o output.wav "AUDIO_URL_FROM_RESPONSE"

# Using the durable audio_endpoint (recommended):
curl -L -o output.wav \
  https://aitts.theproductivepixel.com/api/v1/tts/JOB_ID/audio \
  -H "Authorization: Bearer tts_YOUR_KEY"

# Python
audio = requests.get(f"{BASE}/tts/{job_id}/audio",
    headers={"Authorization": f"Bearer {API_KEY}"},
    allow_redirects=True)
with open("output.wav", "wb") as f:
    f.write(audio.content)

Choosing a Voice

Voice IDs follow the format provider:language-Family-Name. You can browse available voices programmatically:

curl "https://aitts.theproductivepixel.com/api/v1/voices?language=en-US" \
  -H "Authorization: Bearer tts_YOUR_KEY"

The response includes voice metadata — provider, language, gender (clean enum: male, female, neutral, unknown), characteristics (styles, pitch, age, and more), tier, and capability flags like supports_ssml and supports_multispeaker. You can filter by provider, model_type (premium or ultra), and gender.

Or browse the voice gallery visually — no account required.

Output Formats

The default output is WAV. You can request MP3 or OGG Opus instead, with optional bitrate and sample rate control:

curl -X POST https://aitts.theproductivepixel.com/api/v1/tts \
  -H "Authorization: Bearer tts_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "MP3 output with custom bitrate.",
    "voice_id": "polly:en-GB-Generative-Brian",
    "output_format": "mp3",
    "sample_rate_hertz": 24000,
    "output_bitrate_kbps": 128
  }'

Supported formats and defaults:

wav — lossless, no bitrate config. Default.
mp3 — lossy, default 128 kbps. Configurable: 32–320 kbps.
ogg_opus — lossy, default 64 kbps. Configurable: 6–320 kbps.

Sample rates available: 8000, 12000, 16000, 22050, 24000, 44100, 48000 Hz (OGG Opus excludes 12000, 22050, and 44100).

Safe Retries With Idempotency Keys

Network failures happen. If your request times out and you're not sure whether it was processed, retrying could create a duplicate job and charge you twice. Idempotency keys prevent this.

Add an Idempotency-Key header with a unique value (UUID v4 recommended). If you retry with the same key and the same request body, you get the same response — no duplicate job, no double charge.

curl -X POST https://aitts.theproductivepixel.com/api/v1/tts \
  -H "Authorization: Bearer tts_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: 7c9e6679-7425-40de-944b-e07fc1f90ae7" \
  -d '{
    "text": "Safe to retry this request.",
    "voice_id": "kokoro:en-US-Kokoro-Bella"
  }'

If the key is still being processed, you'll get a 409 REQUEST_IN_PROGRESS. If the key was already used with a different request body, you'll get 409 IDEMPOTENCY_KEY_REUSE.

Error Handling

All errors follow a consistent shape:

{
  "success": false,
  "error": {
    "code": "INSUFFICIENT_CREDITS",
    "message": "Need 5.00 credits, have 2.50"
  }
}

Common errors you should handle:

401 UNAUTHORIZED — invalid or missing API key.
400 VALIDATION_ERROR — malformed request. Check the message for details.
400 INVALID_VOICE — voice ID not found or not available.
402 INSUFFICIENT_CREDITS — not enough balance. The message tells you how much you need.
429 RATE_LIMIT_EXCEEDED — too many requests. Check Retry-After header.
500 GENERATION_FAILED — audio generation failed on the provider side.
503 MAINTENANCE — API temporarily down. Retry after 300 seconds.

Request validation runs before voice/provider checks. A malformed payload gets VALIDATION_ERROR before INVALID_VOICE.

Rate Limits

Rate limits are per-account and depend on your plan:

Plan	Requests	Window
Pay-as-you-go	10	15 minutes
Pro	100	15 minutes
Enterprise	1,000	15 minutes

Every response includes rate limit headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and X-RateLimit-Bucket.

Webhooks (Enterprise)

Instead of polling, enterprise accounts can receive webhook notifications when jobs complete or fail. Configure your webhook endpoint once, and every job sends a callback:

# Configure your webhook URL
curl -X PUT https://aitts.theproductivepixel.com/api/user/enterprise/webhook \
  -H "Authorization: Bearer FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-server.com/tts-webhook"}'

When a job completes, your endpoint receives:

{
  "event": "job.completed",
  "created_at": "2026-01-07T12:00:00.000Z",
  "data": {
    "job_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "completed",
    "audio_url": "https://storage.googleapis.com/...",
    "audio_endpoint": "/api/v1/tts/550e8400-e29b-41d4-a716-446655440000/audio",
    "chars_charged": 150
  }
}

Webhooks are signed with HMAC-SHA256 via the X-TTS-Signature header. Always verify the signature before processing. Deliveries retry with exponential backoff up to 8 attempts over approximately 38 hours.

See the webhook documentation for full setup, signature verification examples, and retry policy details.

Durable Audio Retrieval

The audio_url in the status response is a temporary signed URL that expires. For production integrations, use the audio_endpoint instead — it's a stable API path that returns a fresh signed URL via HTTP redirect:

curl -L https://aitts.theproductivepixel.com/api/v1/tts/YOUR_JOB_ID/audio \
  -H "Authorization: Bearer tts_YOUR_KEY" \
  -o output.wav

This endpoint returns 307 with a Location header pointing to a fresh signed URL. Use -L in curl to follow the redirect. The endpoint works as long as the audio is retained — no expiry to worry about.

Streaming Delivery

The async flow above works well for batch processing and background generation. For use cases where you want audio to start playing as it's generated — interactive previews, chatbot responses, live demos — the API also supports streaming delivery.

Streaming uses a two-step flow: POST to create the job and get a one-time stream_url, then GET that URL for chunked audio. The same voices, billing, and quality apply — streaming is a delivery mode, not a different product.

# Step 1: Create streaming job
curl -X POST https://aitts.theproductivepixel.com/api/v1/tts/stream \
  -H "Authorization: Bearer tts_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Streaming lets you hear audio as it is generated.",
    "voice_id": "google:en-US-Chirp3HD-Charon"
  }'

# Step 2: Open the stream URL from the response
curl -N "STREAM_URL_FROM_RESPONSE" --output audio.ogg

After streaming completes, the durable audio is available at audio_endpoint just like an async job. For the full streaming guide — format selection, transport details, and when to use streaming vs async — see our streaming delivery guide.

Putting It All Together

Here's a complete Python example that handles the full flow — submit, poll, download, with error handling and idempotency:

import requests
import time
import uuid

API_KEY = "tts_YOUR_KEY"
BASE = "https://aitts.theproductivepixel.com/api/v1"

def generate_and_download(text, voice_id, output_path="output.wav"):
    # Submit with idempotency key
    idem_key = str(uuid.uuid4())
    resp = requests.post(
        f"{BASE}/tts",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Idempotency-Key": idem_key
        },
        json={"text": text, "voice_id": voice_id}
    )

    if not resp.ok:
        error = resp.json().get("error", {})
        raise Exception(f"{error.get('code')}: {error.get('message')}")

    job_id = resp.json()["data"]["job_id"]
    print(f"Job submitted: {job_id}")

    # Poll for completion
    for _ in range(60):
        status_resp = requests.get(
            f"{BASE}/tts/{job_id}",
            headers={"Authorization": f"Bearer {API_KEY}"}
        )
        status = status_resp.json()["data"]

        if status["status"] == "completed":
            # Download via durable audio_endpoint
            audio = requests.get(
                f"{BASE}/tts/{job_id}/audio",
                headers={"Authorization": f"Bearer {API_KEY}"},
                allow_redirects=True)
            with open(output_path, "wb") as f:
                f.write(audio.content)
            print(f"Saved to {output_path}")
            return

        if status["status"] == "failed":
            raise Exception(f"Generation failed: {status.get('error')}")

        time.sleep(1)

    raise TimeoutError("Job did not complete within 60 seconds")

# Usage
generate_and_download(
    text="This is a complete example with error handling.",
    voice_id="polly:en-GB-Generative-Brian",
    output_path="demo.wav"
)

Next Steps

Browse the voice gallery to find voices for your project.
Read the full API reference for all endpoints and parameters.
Check provider capabilities to understand what each provider and tier supports.
Read the streaming delivery guide for real-time audio playback via the API.
Download the OpenAPI spec for code generation in any language.

Get started: Create an API key from the Dashboard and make your first request in under two minutes.