text-to-speech
Fish Audio S2 Pro Text to Speech
Fish Audio S2 Pro 文本转语音模型,将文本转换为自然语音,支持参考音色、采样控制、分段、音频格式和韵律控制。
MiniMax Sound Design
Generate a personalized voice based on a text description. Returns a `voice_id` that can be used with the T2A text-to-speech API, along with a hex-encoded preview audio sample.
Gemini 2.5 Flash Text-to-Speech
The Google Gemini series emphasizes multimodal understanding and instruction following, balancing speed and cost to make it suitable for production-level use. Gemini 2.5 Flash prioritizes low latency and cost-effectiveness, making it ideal for real-time scenarios. Text-to-speech supports multiple languages and emotional tone control, and can be used for voiceovers, announcements, customer service, and character dialogue. The Instant Inference API offers stable performance, no waiting time, and affordable pricing.
MiniMax Speech 2.8 Turbo Async Text-to-Speech
The Minimax series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. The speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
MiniMax Speech 2.8 HD Asynchronous Text-to-Speech
The Minimax series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. The speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
MiniMax Speech 2.8 Turbo Sync Text-to-Speech
The Minimax series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. The speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
MiniMax Speech 2.8 HD Sync Text-to-Speech
The Minimax series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. The speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
GLM Text-to-Speech
The Glm series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and predictable output. The speech synthesis feature supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs Flash v2.5 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs Flash v2 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs Multilingual v2 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs Turbo v2.5 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs Turbo v2 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.
Elevenlabs v3 Text-to-Speech
The Elevenlabs series offers reliable generation capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and controllable output. Its text-to-speech synthesis supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.