MeloTTS-English-MLX

MLX port of myshell-ai/MeloTTS-English for Apple Silicon.

Architecture

VITS2-based end-to-end TTS with:

Transformer text encoder with relative position attention
Stochastic + deterministic duration predictors
TransformerCouplingLayer normalizing flow (4 layers)
HiFi-GAN vocoder
BERT prosodic features (bert-base-uncased)

Sample rate: 44,100 Hz
Speakers: EN-US, EN-BR, EN_INDIA, EN-AU, EN-Default

Usage

from mlx_audio.tts.models.melotts import Model, ModelConfig
from mlx_audio.tts.models.melotts.text import process_text
import mlx.core as mx
import json, soundfile as sf, numpy as np
from safetensors.numpy import load_file
from huggingface_hub import snapshot_download
from pathlib import Path

# Download model
model_path = Path(snapshot_download("mlx-community/MeloTTS-English-MLX"))

# Load
with open(model_path / "config.json") as f:
    config_data = json.load(f)
model = Model(ModelConfig.from_dict(config_data))
weights = load_file(str(model_path / "model.safetensors"))
model.load_weights(list({k: mx.array(v) for k, v in weights.items()}.items()))
Model.post_load_hook(model, model_path)

# Generate
for result in model.generate("Hello, this is MeloTTS running on MLX!"):
    audio = np.array(result.audio)
    sf.write("output.wav", audio, result.sample_rate)

Converted from

myshell-ai/MeloTTS-English using mlx_audio.tts.models.melotts.convert.

Downloads last month: 57

Safetensors

Model size

51.8M params

Tensor type

F32

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/MeloTTS-English-MLX

Base model

myshell-ai/MeloTTS-English

Finetuned

(4)

this model