I think Ninezero highlights some limitations of the current AI engine, and I’m curious to see how Dreamtonics moves foward.
For comparison, consider Megpoid V4 Complete. This is a package of Vocaloid4 voicebanks with Native, Adult, Power, Sweet, and Whisper variants, which (back before the release of Vocaloid5) could be combined with the cross-synthesis feature, but were otherwise completely separate voicebanks sold as a bundle.
While I think we can all agree that SynthV has drastically more impressive output quality than Vocaloid, there’s something to be said about the sheer flexibility of GUMI V4. She can not just sing in a wide variety of styles, but also fill the various niches without compromise because each voicebank is an entirely separate product dedicated to that style.
Vocal modes are clearly intended to mimic the Vocaloid4 concept of combining multiple voicebanks to cover a wide variety of styles, however since these are all packaged within a single product, the result is often a “jack of all trades, master of none” situation.
No matter how much you tell the engine to favor Kevin’s belt mode, the AI engine still can’t “forget” about the rest of the machine learning that isn’t belting (which includes a lot of softer singing!). This is, in my opinion, the biggest drawback of the vocal mode approach.
This was also one of the most common reactions when people first compared the Standard and AI voice databases released by AHS – the AI ones simply did not have the same ability to perform in the upper range without becoming thin and inconsistent, even when comparing the Standard version by the same voice provider (such as Saki vs Saki AI).
Vocal modes have certainly improved the situation, but Ninezero has, in my opinion, highlighted the limitation of Dreamtonics’ current approach; in order to make a voice database cater to a niche, it seems that it must also exclude anything that falls outside that niche.
Of course I could be wrong, and maybe Dreamtonics will surprise us. Some people will find Ninezero fits exactly what they want, but the fact that he can only belt is going to be very limiting for songs with any sort of stylistic progression.