A Brisbane startup pitching itself as “Photoshop for voice” has taken the wraps off its first product, a creative studio powered by an AI able to perfectly replicate someone’s voice.
Fittingly named Replica, the startup is headed up by founding trio Shreyas Nivas, Riccardo Grinover and Keni Mardira, and has been running in a stealth mode of sorts for the past two years, while the three were building the product.
With each founder having varying areas of expertise, Nivas — who holds a degree in aerospace engineering — tells StartupSmart the team was brought together through a love for games, and inspired by research into voice synthesis coming out of Google.
“Around late 2016, Google released a paper on WaveNet, which was a neural network for voice synthesis which could create human-like speech. It was a natural transition from text-to-speech, and it was almost indistinguishable from actual conversation,” he says.
“We noticed the potential AI could have to power creatives in the same way digital has empowered creatives over the past decade. This was going to be the next generation of tools.”
Hence, Replica’s ‘Photoshop for voice’ was born. The product allows users to pick from a library of voices, or synthesise their own, and then create dialogue through an interface, even with appropriate emotion.
Nivas says the initial studio product is aimed at creatives in the music, advertising and gaming space, but notes the company has a more grandiose vision for their technology.
“While the product is something that empowers creatives, we’re also developing a marketplace for voices, sort of like a Freelancer but for voice,” he says.
“That means everyone would have the ability to store and replicate their voice on the platform, and creatives could license out your voice and you’d be paid for it.”
This breaks new ground in the world of audiobooks and podcasting, Nivas says. He envisions a world where authors could create audio versions of their novels in just hours, rather than weeks. He says the technology is good enough to sound almost completely natural, and can even detect emotion through text.
“We train the audio with text simultaneously, so it can see that this punctuation and those words means a certain emotion, which means it’s able to naturally predict anger, sadness — all sorts of emotion,” he says.
The software also includes a manual option to inflict emotion on the dialogue, which Nivas likens to an “Instagram filter for your voice”.
Ethics and privacy in question
The startup went through TechStars Music accelerator in San Francisco earlier this year, and is backed by small-scale investor Mawson, which is founded by prominent Australian entrepreneurs Stephen Phillips and Graeme Wood, the founder of Wotif.
Right now, the team has around 200 active users, and more than 1,000 on the waitlist, with Nivas saying they’re keen to iron out all the potential bugs before giving everyone access.
“We’re monitoring the activity for bad actors, we want to plug all the holes before we open it to the public,” he says.
These ‘bad actors’ are indicative of the inherent issues with voice replication, a hot topic recently after a completely fake video of Mark Zuckerberg did the rounds on Facebook, stoking concerns about privacy, ethics and copyright.
Nivas says this is Replica’s “first mandate”, and stresses that the company is not taking other people’s voices and making them available to the world, with their celebrity demonstration video being just a proof of concept.
Instead, he says, companies and individuals will be able to synthesise their own voices or the voices of characters they own, and register and license them through the Replica platform. The existence of such a platform is essential for wiping out nefarious actors, he says.
“We’re trying to do it in the most ethical way we can, because if we don’t, the company won’t succeed. While it’s taken us years to get to the point we are, the tech is advancing so fast that in just two years, a couple of kids with coding skills could replicate anyone’s voice,” he says.
“So how do you prevent that?”
“That’s the conversation we need to be having, and we need to fund the companies who are making solutions so we can stay ahead of malicious hackers.”
Nivas likens it to how “piracy died when streaming came along”, saying the goal of Replica’s marketplace is to license voices so that any voice which is not licensed can be easily identified as a fake.
He also calls on governments to work with businesses like Replica to best determine ways to crack down on synthetic media, urging a constructive approach rather than a heavy-handed one.
“We live in a world where, with just a couple of images and a recording, you can synthesise a person from any angle, saying anything. Our identity is under threat, and governments shouldn’t crack down on the companies being open about it,” he says.