nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • 0.6B • Updated • 120k • 918
This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identification.
Get this paper in your agent:
hf papers read 1510.08484 curl -LsSf https://hf.co/cli/install.sh | bash No Collection including this paper