OpenAI has introduced Voice Engine, a model of voice generation based on the — pattern, which turns out to have already been heard by mass users

OpenAI introduced the results of the Voice Engine, a tool for realistic voice synthesis based on a 15-second sample and text, which has been in development for about two years. But there is no public access to it — due to the company’s obvious security concerns.

“We hope to start a dialog about the responsible use of synthetic voices and how society can adapt to these new opportunities. Based on these conversations and the results of these small tests, we will make a more informed decision about whether and how to deploy this technology at scale,” the OpenAI blog says.

The generative artificial intelligence model that works with Voice Engine has been hiding in plain sight for some time. It’s at the heart of ChatGPT’s voice and read-aloud capabilities, as well as the pre-configured voices available in the OpenAI text-to-speech API. Spotify has also been using it since the beginning of September to dub podcasts in different languages.

The company sees several applications for the technology: assisting those who cannot read for some reason, translation, providing voice services to remote communities, supporting people with voice disorders, and helping with voice recovery. Examples of applications with samples in several languages are also presented in the blog.

Website TechCrunch asked the company’s representative Jeff Harris what materials Voice Engine was trained on. He replied that the Voice Engine model was trained on a mixture of licensed and publicly available data. The details of training artificial intelligence models can be both a competitive advantage and a source of legal problems, so the lack of details is not surprising. Voice Engine uses user data with extreme caution:

«We take a small sample of audio and text and create realistic speech that matches the original speaker, — says Harris. — The audio that is used is deleted after the query is complete».

According to the website, the price of the future service will be «bite». OpenAI has removed the price of using Voice Engine from its marketing materials, but documents reviewed by TechCrunch indicate a cost of $15 per million characters, or ~162,500 words in English. That’s a little more than a novel «Oliver Twist» Dickens. This translates to about 18 hours of audio, so the price is slightly below $1 per hour.

The cost is less than one of the most popular competitors, ElevenLabs, — $11 for 100,000 characters per month. Interestingly, the HD quality option costs twice as much, but an OpenAI representative told TechCrunch that there is no difference between HD and non-HD voices — this can be understood as you wish. Voice Engine also doesn’t offer any controls over pitch, tone, or other voice characteristics.

The cost of a voice actor’s work on ZipRecruiter ranges from $12 to $79 per hour, which is much more expensive than Voice Engine. Actors with agents will receive a much higher salary. There’s also the problem of dipshares. So the company is moving very cautiously so far, as with the above use cases.

OpenAI has introduced Voice Engine, a model of voice generation based on the — pattern, which turns out to have already been heard by mass users

Your comment (optional):