Custom voicebanks

lang:en

#1

I saw a lot of people on Twitter saying they’d only be interested in Synth V if it allowed custom voicebanks. And a lot of other people are arguing that it’s unfair to expect Synth V to be free. These two ideas are not mutually exclusive. Synth V could remain commercial while allowing open library development. On the off chance that Kanru Hua hasn’t settled on a development model, I wanted to see what other people think about my dream model. :smiley:

Kontakt by Native Instruments, an extremely popular engine for sample libraries (i.e., soundbanks), is commercial and allows users to create their own banks in the full version. It also has a demo version that times out after 15 minutes, which you cannot make custom banks with. There are banks that cost hundreds or thousands of dollars, and there are free banks, and barring a partnership between Native Instruments and a particular’s bank developer, you need the full version of Kontakt ($500) to use a bank.

I think an adapted Kontakt model could potentially work for Synth V:

  • A) you need full Synth V to develop a voicebank or B) you need to buy a development kit to develop a voicebank
  • “official” voicebanks (those made in collaboration with Dreamtonics, like Eleanor and Aiko) continue to run in both full and trial Synth V.
  • “fanmade” voicebanks cannot run in trial Synth V, or only run with limitations (e.g., they stop working after 15 minutes)

Pros of an open model:

  • access to hundreds of free and paid voicebanks make buying the full engine a more appealing choice
  • could use a Steam/app-store type model, where paid fanmade voicebanks can get more attention in exchange for a revenue split with the store
  • makes Synth V a competitor with just UTAU, not Vocaloid, Cevio, Alter/Ego, Sharpkey, Realivox Blue, and Emvoice One

Cons of an open model:

  • may devalue paid banks that don’t stand out
  • ??? (I can’t really think of anything else)

Does anyone have holes to poke in this, or other ideas? Also, please note that I’m not saying I wouldn’t buy Synth V without custom banks, but right now I’m not in love with any of the banks available and like the engine itself more than the banks that are out.


#2

Thank you for this informative post. We haven’t decided what to do regarding custom DBs yet but I’ll mark this post and watch as you guys discuss.
I’d like to explain some major concerns on making dev kit available.

  • Quality control. Science is not black magic. Synthesizers are like rocket engines. They run on fuel (data). You can’t fire a rocket fueled with charcoal. So back to synthesizers the quality of recording kind of sets an upperbound on how good the results can get; you can’t expect it to magically make the vocals sound better than they originally were.
    • Custom recordings come in all levels of quality. I think it is not the best timing to drop the support before we build up a solid brand image of high-quality voices with the commercial DBs.
  • Different market segments. Pro producers and DIY hobbyists are expecting different types of features. If your goal is to make your song into the chart, you’d want a few solid, high-quality and well-supported DBs covering a wide range of voice registers, and various workflow-related features such as VSTi/AU/AAX… support, audio format compatibility, customization, … You probably won’t need any of these if you just want to make your own DB and let your friends know you’ve made a really cool thing. So there’s a risk of making the product kinda vaguely targeted if we were to release it with a dev kit.
    • The case of Kontakt is different because sampling a music instrument is a lot easier than voice so Pro users may also want to invest some time in attempting this.
  • Customer support. 200 emails flood your customer support inbox everyday and 24 x 7 on all the nit-picking details regarding why I can’t get my own DB to run but turns out the users skipped one chapter in the manual. We’re a small company. Need to figure out how to make this level of support doable, which is why we’re running this forum (as an experiment in some sense).
  • Revealing technical secrets is not a serious concern. This sounds taboo to a research-focused business but well, everyone in the industry knows how to design a statistically optimized reclist. You know how to do it if you have a bachelor degree in computational linguistics. Just some advanced party tricks.

#3

I too would like to make my own voicebank(s) but then realize that it would be a voice bank of me and that would be only marginally better (maybe) than stepping up to the mic and singing, then tuning my “performance” as I do now.


#4

Regarding quality and brand image, I agree that making custom DBs available right now may not be a good idea. Custom DBs should be an “extra” type of feature (extra voices to pick from). Doing it right now may cause people trying to find more about SynthV think it’s the main feature due to the possible quantity of custom DBs.

As for support, many businesses resort to a community-only support for their free/cheaper tiers and offer direct support only for their higher-tier customers. Honestly, I wouldn’t expect anything beyond “here are the tools and the manual” even if the dev kit for free-only DBs was something I had to pay for.

As for pricing I imagine it could go like this:

  • Dev kits are offered at a fixed price (=to the current editor price maybe?), but only allow the development of non-commercial DBs and offer no promised support beyond the community forum. This has the advantage of creating an entry barrier which should in theory both limit the quantity of the DBs and increase their average quality.

    • Of course there is an option to make the dev-kit bundled with the editor, but from what I’ve seen you post on Twitter, it is very unlikely it could be marketed as a feature for producers, due to time and computational costs associated.
  • There is an option to sign a business deal with you, where you do offer support in creating the DB (knowledge, configuration, computing power, etc.) but there is revenue split and/or other costs associated, depending on situaton.


#5

Another issue, though orthogonal to the ones already discussed: it might also be helpful to allow voicebanks to define their own list of phonemes – either to get better pronunciations for some words or to support a language not supported by SynthV itself. (Of course, conversion to phonemes is also an issue, which can be addressed by allowing voicebanks to include their own dictionaries.) Are there any possible problems with this proposal?


#6

Yes, I believe that would lead to the same sort of fragmentation we see in UTAU, where if you want to use an English UST, you’ll likely have to manually convert it because it’s either in Arpasing, CVVC, VCCV, or Delta X-SAMPA. There’s a reason all English Vocaloids use British X-SAMPA regardless of whether their accent actually uses British phonemes or not, and that’s so that English VSQs generally work for all English Vocaloids.

Not allowing user-defined phonemes would mean not allowing development for new languages though. Maybe only X-SAMPA phonemes should be allowed, somehow? That includes pretty much every phoneme for every language and prevents people from creating new names for the same phonemes.

@Scarfmonster I think it’s fine to include a feature that isn’t really widely marketable. Everyone who will want to use it will find out about it.

I agree that support shouldn’t be offered for custom voicebanks, or really anything but official, paid voicebanks. In fact, you could even make it so the contact form for support is only available to people with a valid serial code for one of the official voicebanks. (Although, this presents a new issue: people who bought the engine by itself wouldn’t get support.)


#8

This might be a stupid opinion of mine, but since Kanru mentioned that we should discuss about this topic (before his conclusion) I thought maybe I can give a few opinions of mine to maintain both Commericial DBs with user-made DBs. These opinions might not be great but I’ll try to keep a good impression.

Just like what @Scarfmonster said, A user made DB should be an extra thing. But instead of getting a serial code for an official voicebank to get support(since Eleanor and Renri are free), why not after purchasing the license for the actual editor itself, then they can get an offer to download/purchase the DB dev kit. But by creating a DB using the said dev kit, the DB is only limited to only whats been available on the Technical Preview version (since Kanru did mention being that both TP and the full release being different). That means the user-made DB is restricted from using most of the extra things that Commercial DBs have (VSTi/AU/AAX support, glottal effects, etc). If the user wants their DBs to be available for those extra things, they can have a partnership with Dreamtronics and use it for an actual commercial release. Some text-to-speech companies actually do a similar method where they offer users to create there own voice for free but they must pay the company itself if they want to have an API for other uses.

Like I’ve said, this might not be the best option on this thread but otherwise this is what I can come up with. There might be some flaws since Dreamtronics is a small company (compare from the TTS companies I mentioned), and some other complications… but hopefully my opinion can shed some light for resolving this issue.


#9

I definetly agree that the phomeme set needs to be limited for custom banks, it just makes things easier.

Kanru said he doesn’t want to include “pro-features” IIRC. So limiting custom voicebanks to the licensed version won’t work.

honestly, I believe UTAU itself is a great example of professional VBs and hobby VBs co-existing. There are thousands of voices available, in a vast range of quality. yet, people will ALWAYS come back to the “vipperloids” becuase of their proffesional quality, design, and managment. If Kasane Teto, Ruko Yokune, and Namine Ritsu cost money, I’m 100% sure they’d sell like hot cakes. In fact, Windows100% has succesfully sold commercial VBs for years.


#10

To be fair, most UTAU made by VIP@2ch don’t actually have that notable quality outside of iconic designs and next to no management outside of a few at the top who recieve occasional updates, but companies like VOICEMITH have been producing professionally recorded UTAU with commercialised designs like Xia Yu Yao for a while now.

The way Win100% handles distribution of their libraries is a little different as they’re usually packaged with a guide on UTAU/etc, so in theory you’re buying for the guide or the publication and the libraries come as a bonus. The most obvious example of commercial UTAU in its purest sense I can think of would be the Macne series who were usable with UTAU and distributed with oto.ini.

Apologies for going on a bit of a tangent, but my main point is that UTAU is vastly dominated by hobbyist libraries, with professionally produced libraries being in the minority.

I’ve refrained from posting in this thread because I don’t think I can present a very coherent argument particularly for or against. In my view I don’t think that it is necessarily a good idea to even consider the possibility of custom libraries, be it on a small or large scale, until Synthesizer V has been allowed more time to grow and more third party have come forward with the intention of develping libraries for Synthesizer V as Kanru currently intends. I understand at current that there is a reasonable threshold for this when a company buys in which ensures quality of result on all sides and demonstrates appropriate funding. I’m sure that as Dreamtonics grows they will be able to either figure out a way of improving efficiency in customer support or hiring someone specifically to manage it and from there this discussion will be more fruitful in its results.


#11

I think this is good idea :slight_smile: but only for personal use.


#12

I’d like to share some ideas and thoughts that I have on this subject.
First, I’d like to state very clearly that I’m not saying “X or Y should be done like this”, my intention is just to contribute in some way to this topic.

I’m going to start talking about the users. I believe that a considerable portion of Synthesizer V userbase are already UTAU users (myself included), therefore, they are used to the possibility of being able to build their own voicebanks.
Also, the fact that Synthesizer V is being developed by the same person who created Moresampler helps to create expectations from the users regarding custom voicebanks.

Now, on the topic itself. I believe that allowing third-party voicebanks would be beneficial to Synthesizer V. It would help to increase the variety of voices available, and also reduce the load of work on Synthesizer V’s developers, who otherwise would need to work on every single voicebank that will be released in the future.

Regarding voicebank quality, I have an idea for separate Comercial and Community Kits. Each with their own license agreements.
The Commercial Kit could be purchased, and would offer the most freedom for developing a voicebank. One purchase would be enough for a person, group or company to develop any amount of voicebanks.
However, in order to release the developed product, the Synthesizer V team would then receive the almost-finished product and check if it meets the quality standards. A fee could be paid in order to follow this process.
If the voicebank is rejected, it could be adjusted to meet the criteria and be sent to approval again, or it could be kept for private use. There could also be rules regarding usage of the Commercial kit for private banks.

The community Kit would be free, but have some restrictions. Some ideas are: limitation on number of pitches and/or sub-libraries and forbid custom phonemes. Since it would be distributed for free, restrictions on commercial usage could be applied. The voicebanks developed with this kit could not be commercialized, and instead would be distributed freely in the community.

Again, those are just some ideas that I have, and may need better structuring and polishing in order to work. But I wanted to contribute in some way.