Problem: The gaming industry, within the B2B market, has an estimated worth of around 200 billion USD. Each year, 2 billion USD (TAM) is allocated solely to hiring voice actors for their roles in narration, character creation and audio post-editing. This expenditure is divided, with 1.6 billion USD spent by large gaming companies and 0.6 billion USD by indie game developers. On top of this, finding the right voices can take months at a time and some gaming companies may spend up to half of their total budget on voice acting for narrative games. Indie game developers are known for their quick adoption of new, cost-reducing technologies, provided these do not significantly impact user immersion or experience. This is why text-to-speech solutions, often criticized for their emotionless and monotonous speech generation, have not been adopted in this segment. A solution that combines quality, ease of use and economic viability does not exist so far.
Solution: We have developed a series of A.I. techniques that can segment, modify and reconstruct different aspects human speech in real-time. Our technology has the ability to modify accents and emotions, as well as generate and clone voices. For this reason, our long-term goal is to develop a single full voice transformation API endpoint that other companies can use in order to solve their problems. Much like OpenAI is doing with its GPT models. Our starting focus will be the gaming industry, since this is the segment that experiences the most pain and will allow us to expand. We will focus on our voice-swapping solution that transforms speech-to-speech in a natural-sounding manner. Our approach retains the emotion, prosody, and emphasis of the original voice, resulting in synthesized A.I. voices that sound authentic and natural. This makes them an ideal fit for integration into entertainment productions, maintaining realism while significantly reducing costs.
USP: Our solution employs an innovative approach to voice segmentation, breaking it down into distinct components such as message, emotion, intonation, accent, and identity. This enables our models to concentrate on and transform the most pertinent aspects of speech for the task at hand. Not only does this result in a significantly reduced computational cost (one order of magnitude) and extremely enhanced robustness and naturalness, but it also lays the groundwork for our ultimate goal: a comprehensive voice transformation API. This API is designed to fully modify human voice based on customer requirements, in real-time. All our features work together in synergy, creating an easy-to-integrate package for businesses. Initially, our superior quality and naturalness, combined with our unmatched cost-efficiency, provide us with a unique edge as we enter the voice generation sector of the gaming industry. We aim to integrate our solution with existing game engines, gain momentum, and broaden our market reach.