An overview of all the endpoint parameters.
Text
This is the text to be synthesized to audio.
VoiceId
Dan
: Young MaleWill
: Mature MaleScarlett
: Young FemaleLiv
: Young FemaleAmy
: Mature Female
Bitrate
Defaults to 192k
.
- Use lower values for low bandwidth or to reduce the transferred file size.
- Use higher values for higher fidelity.
Speed
Defaults to 0
. Examples:
0.5
: makes the audio 50% faster (i.e. 60-second audio becomes 42 seconds)-0.5
: makes the audio 50% slower (i.e. 60-second audio becomes 90 seconds)
Pitch
Defaults to 1
. However, on the landing page, we default male voices to 0.92
as people tend to prefer lower/deeper male voices.
Codec
Defaults to libmp3lame
(MP3).
- Use
pcm_mulaw
for phone calls.pcm_s16le
returns 22050 Hz raw audio.
Temperature
Defaults to 0.25
.
- The lower values make audio deterministic and more stable.
- The higher values make audio more expressive and less-deterministic.
- With a high Temperature value, audio will be different every time. However, it also increases the probability of mispronunciation.
TimestampType
By default, the endpoint returns per-sentence timestamps. Use word
to get per-word timestamps.
The timestamp feature is currently not supported via the /stream
endpoint.
CallbackUrl
If provided, the server will POST a JSON body to the CallbackUrl
. A sample body looks like below:
{
"TaskId": "8282b92d",
"TaskStatus": "completed", // or "failed"
}