Translatotron: An End-to-End Speech-to-Speech Translation Model
Cool stuff. I wonder how long it will take until speech synthesis is at the point where it can be used to create dialogue for video games. Obviously it would (at least initially) replace just the less important stuff spoken by background characters or less important NPCs, but even that could be a big improvement.
Just imagine how much more immersive a game like Skyrim would be if the writers could just write the lines and then run it through a speech synthesizer to get finished dialogue. No need to hire multiple actors and get them to a studio to record their lines. It would be so much easier and faster to create a massive amount of unique dialogue and you wouldn't have to listen to the same "arrow to the knee" line spoken by the same couple of actors over and over again everywhere you go.
It would improve user created mods as well since just about anyone could then just create new characters with completely custom voices and dialogue.
Is it just me or are the final results (Translatotron translation) not playable, while the initial audio samples work fine?
A perfect illustration for how Chromium's monopoly is killing the open web. The sample audio doesn't play in Firefox, because who cares about anything other than Chrome, right?
I'm just going to say that this is flat out amazing. Really super impressive.
Am I reading this correctly, that there's no explicit semantic representation going on at any stage, it's purely audio input frequencies -> ML -> audio output frequencies? If so, that's ... so much for Jerry Fodor and his Language of Thought Hypothesis, eh? (Yeah, mostly joking but still..)
Is the model available yet?
There seem to be lots of questionable translations in their source data - missing emphasis, wrong words, and sometimes stuttering or mistaken vocalisations.
If they could fix that, I think results could be much better.