The possibility to create custom voicebanks would be greatly appreciated. Before reading the opinions here I thought there should be no discussion, just do it. I see now that there are a lot of caveats to publishing an SDK, so I’m going to talk about a few options I think could be viable. Ultimately, it’ll be up to Kanru to decide based on how likely it is to succeed, of course.
About engineering budget, probably it’s best is to not have one. Creating custom banks should be free, and community-created banks should not be marketable:
Maybe release SDK just the way it is used internally, only providing some documentation and let the interested parties figure out the rest. I, myself, likely wouldn’t need a GUI to work with.
A similar option would be to open source the SDK and let a community grow around it. I’d be willing to bet people would come up with a GUI soon enough.
Maybe using a custom bank could require a premium version of synthv. Just enough to offset the cost of writing the aforementioned documentation, since there wouldn’t be ongoing support for the SDK.
Someone raised the issue of the amount of low quality banks that could flood the space. That’s another very valid concern, but I think the solution to it is already implemented. Similarly to Vocaloid: Pro/Licensed banks should be few, feature-rich, and adequately priced, so that when browsing through synthv songs I know to only click songs that tag Eleanor Forte or the other official voices (hence why they should be few, I will want to remember all official singers). Basically: when you see the title of a song what guarantees the recording is good is the name of the digital singer instead of what software was used.
I don’t know enough to have an opinion on the workings of phonemes, but I think going through with releasing an SDK that is not language-agnostic is a huge waste of potential regardless of it’s eventual funding model. Having community-created voicebanks in many languages opens up the market for Synthesizer V to the whole world, just like that. I arrived at this thread exactly because I want to make a Br Portuguese bank, not because I’m unsatisfied with the voices that are available. Let’s say I did make my own voicebank (regardless of quality), I have quite a few friends who would want to play around with synthv just for that.
Anyways. I’m really impressed to see how Kanru deals with this community. I know that even if synthv goes with the exact opposite of what I suggested, it’ll be because I failed to understand something about this environment, and not because of greed or lack of imagination by the dev.
My stance on it remains unchanged, there should not be a public SDK at all, period.
The main problem with low quality banks flooding the space is the flooding of the space itself, it has nothing to do with making people guessing which ones are good and which ones aren’t. The reason why stuff with UTAU doesn’t get that much attention vs Vocaloid mainly has to do with the fact that the only actual official UTAU is Defoko. The rest are all fan-made, so sure, you have some stars like Teto but then the rest of the “loids” are like, there’s a large number of them and you wouldn’t have heard of half of them and the rabbit hole of diving into more various UTAUloids ends up going down further than the other side of the planet. That, along with the fact that voicebanks recorded by individuals are never going to be that good unless you either DIY your own anechoic chamber and spend $5,000,000,000 on like, various pro-level microphones because every voice calls for a completely different mic altogether (I have a diploma on this subject) and most will settle for something like the Blue Yeti which is like, no, you need something a lot more substantial for recording (and even if you’re only trying to record yourself you’re still going to end up buying a bunch of different mics because you still have to figure out which one works for you, so good luck). Not only that, one website isn’t going to have the budget to store links to nine septillion different voicebanks, even if each of them had its own site/portal. It just costs too much for all that space. You’ll need a supercomputer more than 9,000 times bigger than the ones they have at NASA. Nobody has the budget for that, not even the Queen or the President of the United States.
(TL;DR - Even if you don’t wind up spending a ton of money getting studio time and getting the banks recorded professionally, which is how you should be doing this if you were to, the space on the web that these banks have to be made available on is going to cost way too much, and nobody has the money for that.)
As for different languages, there are only four voicebanks that aren’t Chinese. Three are Japanese and one is English. The rest are all Chinese banks, and while I’m not against any language in particular, that gives me the impression that the developers are going to focus strictly on the Chinese language alone and nothing else, and there’s no hope of any globalisation, so… yea, you might as well give up on that too.
Because Chinese mainland 「Quadimension」 is too powerful… They have a whole set of plans for characters, and more singers will come out in the future. And 「AHS (AH-Software)」 in Japan also has a lot of singer characters that need to be iterative.
(Translate with a translator)
I have an idea for how a custom voicebank system could be implemented. Of course, Kanri will not get into this topic, but it would be nice to know if this is possible.
My idea is that a separate synthesizer v could be developed that would be the same as studio basic, but that would only support custom voicebanks
I think in this way it will be possible to make the line between paid voicebanks and custom.
To place custom voice banks, it would be possible to make a separate general site where people can upload their voice banks and, using a rating system, show which voicebanks are of higher quality
The idea of a single site was invented in order not to send voice banks to developers for verification, because the ability to create your own voice banks can attract a lot of people and physically cannot check each Dreamtonics
But this is difficult to implement, since the question also remains with the interface for creating custom voice banks so that the user understands how to configure, the costs of developing a separate version of Synthesizer V Studio for custom voice banks and the cost of a site for hosting will go away.
Information about voicebanks is considered proprietary information. For example, Eclipsed Sounds had to sign NDAs about the process. I’m sure the format and technologies are also considered proprietary, and won’t be released.
You’re not even going to get a reclist. Even the price of creating a voicebank is considered proprietary.
So I doubt that Dreamtonics will be supporting free voicebanks in the near future.
It’s important to note that Dreamtonics is very protective of their proprietary tech.
Third party developers like Eclipsed Sounds, Audiologie, etc. do a lot of manual work when creating a voice database, but all of the machine learning is done by Dreamtonics themselves. They don’t even give their partners access to that process.
Additionally, there’s not really anything to gain by opening the platform up. They’d only be creating competition for themselves without any real benefit.
As much as it would be great for the vocal synthesis field and userbase as a whole, it’s somewhat unrealistic.
I think it’s unrealistic to expect custom voicebanks. Having said that, I think what people want is a virtual singer that sounds that how they want it to sound. I think the perfect solution is available now. The solution is to create a vocal in Synth V and then convert it using RVC voice models. This is super super easy in something like Kits.ai or in one of the Google Colab RVC options (which I hate btw). If you aren’t familiar with this tech, here’s an example…Billie Eilish singing Radiohead:
Now, what would be really slick is if Synthsizer V would allow you to load and apply your own RVC voice models to a synth v track. My understanding is that there are realtime RVC voice changer solutions.
I have been doing exactly that for the past few months, and it sometimes works great and sometimes not at all. It always depends on the quality of the data the model has been trained with and also on the vocals range and style of the original voice. Some notes cannot be converted correctly.