Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
minimax-voice-clone is a voice-cloning service, not a direct TTS model. Upload reference audio of someone speaking (10 seconds to 5 minutes) and get back a voice_id. That ID then becomes a usable voice on /v1/audio/speech with any MiniMax voice model.
Right for brand voice cloning when you have voice talent, character voices for games and animation, or custom narrator profiles for podcasts.
Specs
| Field | Value |
|---|---|
| Model ID | minimax-voice-clone |
| Creator | MiniMax |
| Best for | Brand voice cloning, character voices, custom narrators |
| Reference audio | 10 sec – 5 min, MP3/WAV/M4A, max 10 MB |
| Output | voice_id (kyma-namespaced) |
| Pricing mode | Per call (flat) |
Pricing
| Cost | |
|---|---|
| Per cloned voice | $2.10 flat (one-time) |
voice_id is then free to use in /v1/audio/speech — only the speech calls themselves bill at MiniMax’s per-character rate.
Use this when
- You have voice talent and want their voice usable across all your TTS workflows.
- You’re building a character cast for a game or animation.
- You want a single brand voice spanning multiple languages and use cases.
Pick something else when
- You don’t have a reference recording →
minimax-voice-design($4.20/voice from text description). - You’re fine with stock MiniMax voices → skip cloning, browse system voices via
GET /v1/audio/voicesand use directly.
Example
Ownership
Cloned voice IDs are gated per Kyma user. If user A passes user B’svoice_id to /v1/audio/speech, the request returns 403 voice_not_owned.
See also
POST /v1/audio/voice-clone— endpoint referencePOST /v1/audio/voice-design— generate from text instead