ChatTTS

A generative speech model for daily dialogue.

39.0k
Stars
+53
Stars/month
0
Releases (6m)

Star Growth

+8 (0.0%)
38.2k39.0k39.8kMar 27Apr 1

Overview

ChatTTS是一个专为对话场景设计的生成式语音模型,特别优化用于LLM助手等对话应用。该模型使用超过10万小时的中英文音频数据训练,能够生成自然流畅的对话语音。ChatTTS的核心优势在于其针对对话任务的专门优化,支持多个说话者角色,能够预测和控制细粒度的韵律特征,包括笑声、停顿、插语等自然对话元素。模型在韵律表现方面超越大多数开源TTS模型,为对话系统提供更加真实自然的语音体验。目前开源版本包含4万小时预训练模型,支持流式音频生成和零样本推理,适用于学术研究和开发实验。

Deep Analysis

Key Differentiator

Purpose-built for dialogue TTS with fine-grained control over prosody (laughter, pauses, interjections) that most TTS models lack — trained on 100K+ hours, with multi-speaker and streaming support, but deliberately limited for safety

Capabilities

  • Text-to-speech optimized for dialogue scenarios
  • Multi-speaker support for interactive conversations
  • Fine-grained prosodic control (laughter, pauses, interjections)
  • Streaming audio generation
  • Speaker embedding sampling for voice variety
  • Word-level and sentence-level manual control
  • Chinese and English language support

🔗 Integrations

PyTorchHugging FacetorchaudioGoogle Colab

Best For

  • Research on conversational TTS with prosodic control
  • Building dialogue-oriented voice interfaces (non-commercial)
  • Chinese language TTS applications

Not Ideal For

  • Commercial TTS applications (license restriction)
  • High-fidelity audio production requiring studio quality

Languages

Python

Deployment

Local GPU (4GB+ VRAM)Google ColabSelf-hosted

Known Limitations

  • Model weights are non-commercial (CC BY-NC 4.0) — cannot be used in commercial products
  • English support is still experimental
  • Requires GPU with at least 4GB VRAM
  • Intentionally degraded audio quality (MP3 compression + high-frequency noise) as safety measure

Pros

  • + 专为对话场景优化,支持多说话者和自然对话流
  • + 细粒度韵律控制,可生成笑声、停顿等对话元素
  • + 超越大多数开源TTS模型的韵律质量表现

Cons

  • - 开源版本仅限学术用途,商业应用受限
  • - 目前只支持中英文两种语言

Use Cases

  • LLM助手和聊天机器人的语音交互功能
  • 多角色对话系统和虚拟助手应用
  • 语音合成研究和对话系统开发实验

Getting Started

通过pip安装ChatTTS包,从HuggingFace下载预训练模型文件,使用Python API加载模型并输入文本生成语音输出

Compare ChatTTS