Skip to main content
POST
/
v1
/
audio
/
voice-design
Voice Design (MiniMax)
curl --request POST \
  --url https://kymaapi.com/v1/audio/voice-design \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "description": "<string>",
  "model": "<string>",
  "name": "<string>",
  "preview_text": "<string>",
  "gender": "<string>",
  "age_group": "<string>"
}
'

Documentation Index

Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Synchronous endpoint. Describe a voice in plain English, get back a voice_id you can immediately use in /v1/audio/speech on any MiniMax voice model. Use this when you don’t have voice talent, you’re prototyping a fictional character, or you want a brand-safe persona voice from scratch.
curl -X POST https://kymaapi.com/v1/audio/voice-design \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Warm female narrator with a slight British accent, mid-30s, calm cadence",
    "gender": "female",
    "age_group": "young"
  }'

Request

application/json body.
description
string
required
Natural-language voice description. Max 1000 characters. Also accepts the alias text.
model
string
default:"minimax-voice-design"
Voice design SKU. Currently only minimax-voice-design is supported.
name
string
Optional human-readable label, max 64 chars.
preview_text
string
Optional sample text MiniMax will render in the new voice for an internal preview. Max 500 characters. Doesn’t appear in the response — the audio bytes are not returned (you’d call /v1/audio/speech afterward to render).
gender
string
Optional hint: male or female.
age_group
string
Optional hint: child, young, middle-aged, or elderly.

Response

200 OK JSON. Same shape as /v1/audio/voice-clone.
{
  "voice_id": "kyma_a91f4d2e7c8b5301",
  "name": null,
  "model": "minimax-voice-design",
  "cost_usd": 4.20,
  "balance_usd": 45.80
}

Pricing

Flat $4.20 per designed voice. One-time charge — once designed, the voice_id is reusable in unlimited TTS calls. Voice design costs ~2× voice clone because synthesizing timbre from text is strictly more compute-intensive than reproducing a captured voice.

Ownership

Same gating as voice clone — designed voice IDs are owned by the requesting user. Sharing the voice_id with another account is rejected with 403 voice_not_owned.

Errors

Statuserror.codeWhen
400not_a_voice_design_modelmodel is not a design SKU
400description_too_longdescription > 1000 chars
400invalid_requestmissing description
402insufficient_creditsbalance below $4.20
500ownership_write_faileddesign succeeded but ownership row insert failed
502provider_errorupstream MiniMax failure

See also