Username Grapheme to Phoneme Transformer
I’ve been running text-to-speech on my Discord bot for almost 8 years now. I mainly use it to announce when users join/leave a channel, similar to how Ventrillo works. I started out using gTTS, but later moved to using free ElevenLabs credits whenever available (still defaulting back to good old gTTS when the credits run out). While the TTS is pretty good, it still has trouble pronouncing some usernames. I thought this might be a great time to try training a Seq2Seq transformer model, taking usernames as input and outputting some kind of pronunciation data.