Investigating the Prospects of Voice Cloning for Tamil

Home> Academic Documents> Investigating the Prospects of Voice Cloning for Tamil

Download Save

Unformatted text preview:

Investigating the prospects of voice cloning for Tamil Group members Aadharsh Aadhithya A CB EN U4AIE20001 Jayanth M CB EN U4AIE20024 Madhav Kishor CB EN U4AIE200033 Vishnu Radhakrishnan CB EN U4AIE200074 Visweswaran M CB EN U4AIE200075 WHAT IS VOICE CLONING Apple wants to clone your voice in IOS 17 Amazon used Amitabh Bachchan s Voice for its Alexa Can apple s SIRI now be in your own voice https www livemint com technology tech news apple ww dc 2023 new ios 17 feature that could allow you to clone y our voice in just 15 minutes 11685955925070 html Alexa responds to you in the voice of your favorite celebrity https www amazon in Amitabh Bachchan celebrity voice Alexa dp B092L9LQ38 Low Resource Language Very Few attempts to clone voices in Indic languages Most of the Indian languages except a few like Hindi Bengali lack enough linguistic corpora and speech data required for developing robust language technologies Voice cloning requires high quality annotated speech corpus making it a challenge for low resource language Fine Tuning Trial for the English Corpus TARGET WAVE AUDIO Cloning voice in real time TARGET WAVE AUDIO GENERATED AUDIO GENERATED AUDIO Fine Tuning Trial for the Hindi Corpus Chinese Dialect fine tuning DATASET Tamil Mozilla Common Voice Corpus The corpus has 231 120 samples from 19 817 contributors resulting in 399 hours We have used the Tamil Common Voice Corpus 13 0 published on 15 03 2023 with 850 different voice samples recorded for over 391 hours and validated for 229 hours Organizing Data The data had to further organized into directories for training via the Python Pytorch tool The audio files associated with a single speaker were identified using their ids in the audio file and were placed under a same folder Preproces sing Data Methodology METHODOLOGY Speaker Encoder Embeddings Neural network trained using speaker verification loss calculated by trying to predict whether two utterances are from the same user or not Encoder Maps Sequence of Log Mel spectrogram from speech of arbitary length to a fixed dimensional embedding vector known as d vector Network trained to optimize a generalized end to end speaker verification loss so that embeddings of utterances from the same speaker have high cosine similarity while those of utterances from different speakers are far apart in the embedding space Synthesizer Embeddings Generates a mel spectrogram of the corresponding text input conditioned on speaker embeddings Training of this model is done by minimizing L2 loss of the generated spectrogram is trained with the text and the target speech samples from Mozilla Tacotron used as synthesizer Vocoder To generate audio from Spectogram we use a Vocoder Network A sample by sample autoregressive WaveNet model is used to perform voice generation This model takes Mel Spectrogram as input to generate time domain waveforms Wavenet uses causal convolutions for sequential predictions giving an upper hand over recurrent neural networks in terms of tim RESULTS Generated Actual Generated Actual Generated Actual Sample 527500 Sample 51400 Results Discussion Retrained the synthesizer to capture the unique acoustic characteristics of the new language Few samples were seperated from training and used as validation samples Mel Spectrograms were generated from the trained synthesizer conditioned on speaker embeddings We examine the quality of voice generation across training steps with random samples from validation set Results show that the perception of the synthetic voices sounded natural and indistinguishable from the target speaker voice even with limited data to train on Mean Similarity Opinion Scores FUTURE WORKS Our future works will include building an corpus of common dialect of Tamil retraining the speaker encoder module and exploring other methods of voice cloning such as meta learning for speaker adaptation and analysis of performance of various synthesizers and vocoders REFERENCES

View Full Document


School:
Email:
New Password:
Confirm Password:

Sign up for free to view:

Please select your school