Custom voicebanks

corasundae · 2018 年 12 月 29 日午前 9:10

I saw a lot of people on Twitter saying they’d only be interested in Synth V if it allowed custom voicebanks. And a lot of other people are arguing that it’s unfair to expect Synth V to be free. These two ideas are not mutually exclusive. Synth V could remain commercial while allowing open library development. On the off chance that Kanru Hua hasn’t settled on a development model, I wanted to see what other people think about my dream model.

Kontakt by Native Instruments, an extremely popular engine for sample libraries (i.e., soundbanks), is commercial and allows users to create their own banks in the full version. It also has a demo version that times out after 15 minutes, which you cannot make custom banks with. There are banks that cost hundreds or thousands of dollars, and there are free banks, and barring a partnership between Native Instruments and a particular’s bank developer, you need the full version of Kontakt ($500) to use a bank.

I think an adapted Kontakt model could potentially work for Synth V:

A) you need full Synth V to develop a voicebank or B) you need to buy a development kit to develop a voicebank
“official” voicebanks (those made in collaboration with Dreamtonics, like Eleanor and Aiko) continue to run in both full and trial Synth V.
“fanmade” voicebanks cannot run in trial Synth V, or only run with limitations (e.g., they stop working after 15 minutes)

Pros of an open model:

access to hundreds of free and paid voicebanks make buying the full engine a more appealing choice
could use a Steam/app-store type model, where paid fanmade voicebanks can get more attention in exchange for a revenue split with the store
makes Synth V a competitor with just UTAU, not Vocaloid, Cevio, Alter/Ego, Sharpkey, Realivox Blue, and Emvoice One

Cons of an open model:

may devalue paid banks that don’t stand out
??? (I can’t really think of anything else)

Does anyone have holes to poke in this, or other ideas? Also, please note that I’m not saying I wouldn’t buy Synth V without custom banks, but right now I’m not in love with any of the banks available and like the engine itself more than the banks that are out.

khuasw · 2018 年 12 月 29 日午前 10:11

Thank you for this informative post. We haven’t decided what to do regarding custom DBs yet but I’ll mark this post and watch as you guys discuss.
I’d like to explain some major concerns on making dev kit available.

Engineering budget. The dev kit is not meant for end users. To make it somewhat usable to non-experts, we need to work on graphical interfaces. There’s also going to be promotional and logistic expenses (for building an eco system). Is this a good investment? How large is the market? How many users will pay for the DIY voice service? A back-of-envelope estimation tells me that we need about 1000 paying users to sustain development and maintenance. The singing synth market is already quite a niche one, will there be enough users in the DIY voice sub-market?
Quality control. Science is not black magic. Synthesizers are like rocket engines. They run on fuel (data). You can’t fire a rocket fueled with charcoal. So back to synthesizers the quality of recording kind of sets an upperbound on how good the results can get; you can’t expect it to magically make the vocals sound better than they originally were.
- Custom recordings come in all levels of quality. I think it is not the best timing to add the support before we build up a solid brand image of high-quality voices with the commercial DBs.
Different market segments. Pro producers and DIY hobbyists are expecting different types of features. If your goal is to make your song into the chart, you’d want a few solid, high-quality and well-supported DBs covering a wide range of voice registers, and various workflow-related features such as VSTi/AU/AAX… support, audio format compatibility, customization, … You probably won’t need any of these if you just want to make your own DB and let your friends know you’ve made a really cool thing. So there’s a risk of making the product kinda vaguely targeted if we were to release it with a dev kit.
- The case of Kontakt is different because sampling a music instrument is a lot easier than voice so Pro users may also want to invest some time in attempting this.

bitman · 2018 年 12 月 29 日午後 3:15

I too would like to make my own voicebank(s) but then realize that it would be a voice bank of me and that would be only marginally better (maybe) than stepping up to the mic and singing, then tuning my “performance” as I do now.

Scarfmonster · 2018 年 12 月 30 日午前 12:36

Regarding quality and brand image, I agree that making custom DBs available right now may not be a good idea. Custom DBs should be an “extra” type of feature (extra voices to pick from). Doing it right now may cause people trying to find more about SynthV think it’s the main feature due to the possible quantity of custom DBs.

As for support, many businesses resort to a community-only support for their free/cheaper tiers and offer direct support only for their higher-tier customers. Honestly, I wouldn’t expect anything beyond “here are the tools and the manual” even if the dev kit for free-only DBs was something I had to pay for.

As for pricing I imagine it could go like this:

Dev kits are offered at a fixed price (=to the current editor price maybe?), but only allow the development of non-commercial DBs and offer no promised support beyond the community forum. This has the advantage of creating an entry barrier which should in theory both limit the quantity of the DBs and increase their average quality.
- Of course there is an option to make the dev-kit bundled with the editor, but from what I’ve seen you post on Twitter, it is very unlikely it could be marketed as a feature for producers, due to time and computational costs associated.
There is an option to sign a business deal with you, where you do offer support in creating the DB (knowledge, configuration, computing power, etc.) but there is revenue split and/or other costs associated, depending on situaton.

kozet · 2018 年 12 月 30 日午前 5:54

Another issue, though orthogonal to the ones already discussed: it might also be helpful to allow voicebanks to define their own list of phonemes – either to get better pronunciations for some words or to support a language not supported by SynthV itself. (Of course, conversion to phonemes is also an issue, which can be addressed by allowing voicebanks to include their own dictionaries.) Are there any possible problems with this proposal?

corasundae · 2018 年 12 月 31 日午前 11:22

Yes, I believe that would lead to the same sort of fragmentation we see in UTAU, where if you want to use an English UST, you’ll likely have to manually convert it because it’s either in Arpasing, CVVC, VCCV, or Delta X-SAMPA. There’s a reason all English Vocaloids use British X-SAMPA regardless of whether their accent actually uses British phonemes or not, and that’s so that English VSQs generally work for all English Vocaloids.

Not allowing user-defined phonemes would mean not allowing development for new languages though. Maybe only X-SAMPA phonemes should be allowed, somehow? That includes pretty much every phoneme for every language and prevents people from creating new names for the same phonemes.

@Scarfmonster I think it’s fine to include a feature that isn’t really widely marketable. Everyone who will want to use it will find out about it.

I agree that support shouldn’t be offered for custom voicebanks, or really anything but official, paid voicebanks. In fact, you could even make it so the contact form for support is only available to people with a valid serial code for one of the official voicebanks. (Although, this presents a new issue: people who bought the engine by itself wouldn’t get support.)

HoodyP · 2019 年 1 月 6 日午前 6:14

This might be a stupid opinion of mine, but since Kanru mentioned that we should discuss about this topic (before his conclusion) I thought maybe I can give a few opinions of mine to maintain both Commericial DBs with user-made DBs. These opinions might not be great but I’ll try to keep a good impression.

Just like what @Scarfmonster said, A user made DB should be an extra thing. But instead of getting a serial code for an official voicebank to get support(since Eleanor and Renri are free), why not after purchasing the license for the actual editor itself, then they can get an offer to download/purchase the DB dev kit. But by creating a DB using the said dev kit, the DB is only limited to only whats been available on the Technical Preview version (since Kanru did mention being that both TP and the full release being different). That means the user-made DB is restricted from using most of the extra things that Commercial DBs have (VSTi/AU/AAX support, glottal effects, etc). If the user wants their DBs to be available for those extra things, they can have a partnership with Dreamtronics and use it for an actual commercial release. Some text-to-speech companies actually do a similar method where they offer users to create there own voice for free but they must pay the company itself if they want to have an API for other uses.

Like I’ve said, this might not be the best option on this thread but otherwise this is what I can come up with. There might be some flaws since Dreamtronics is a small company (compare from the TTS companies I mentioned), and some other complications… but hopefully my opinion can shed some light for resolving this issue.

pantran · 2019 年 1 月 7 日午後 11:05

I definetly agree that the phomeme set needs to be limited for custom banks, it just makes things easier.

Kanru said he doesn’t want to include “pro-features” IIRC. So limiting custom voicebanks to the licensed version won’t work.

honestly, I believe UTAU itself is a great example of professional VBs and hobby VBs co-existing. There are thousands of voices available, in a vast range of quality. yet, people will ALWAYS come back to the “vipperloids” becuase of their proffesional quality, design, and managment. If Kasane Teto, Ruko Yokune, and Namine Ritsu cost money, I’m 100% sure they’d sell like hot cakes. In fact, Windows100% has succesfully sold commercial VBs for years.

xuu_u · 2019 年 1 月 8 日午前 2:05

To be fair, most UTAU made by VIP@2ch don’t actually have that notable quality outside of iconic designs and next to no management outside of a few at the top who recieve occasional updates, but companies like VOICEMITH have been producing professionally recorded UTAU with commercialised designs like Xia Yu Yao for a while now.

The way Win100% handles distribution of their libraries is a little different as they’re usually packaged with a guide on UTAU/etc, so in theory you’re buying for the guide or the publication and the libraries come as a bonus. The most obvious example of commercial UTAU in its purest sense I can think of would be the Macne series who were usable with UTAU and distributed with oto.ini.

Apologies for going on a bit of a tangent, but my main point is that UTAU is vastly dominated by hobbyist libraries, with professionally produced libraries being in the minority.

I’ve refrained from posting in this thread because I don’t think I can present a very coherent argument particularly for or against. In my view I don’t think that it is necessarily a good idea to even consider the possibility of custom libraries, be it on a small or large scale, until Synthesizer V has been allowed more time to grow and more third party have come forward with the intention of develping libraries for Synthesizer V as Kanru currently intends. I understand at current that there is a reasonable threshold for this when a company buys in which ensures quality of result on all sides and demonstrates appropriate funding. I’m sure that as Dreamtonics grows they will be able to either figure out a way of improving efficiency in customer support or hiring someone specifically to manage it and from there this discussion will be more fruitful in its results.

Okip12 · 2019 年 1 月 10 日午後 4:07

I think this is good idea but only for personal use.

tady159 · 2019 年 1 月 18 日午後 8:41

I’d like to share some ideas and thoughts that I have on this subject.
First, I’d like to state very clearly that I’m not saying “X or Y should be done like this”, my intention is just to contribute in some way to this topic.

I’m going to start talking about the users. I believe that a considerable portion of Synthesizer V userbase are already UTAU users (myself included), therefore, they are used to the possibility of being able to build their own voicebanks.
Also, the fact that Synthesizer V is being developed by the same person who created Moresampler helps to create expectations from the users regarding custom voicebanks.

Now, on the topic itself. I believe that allowing third-party voicebanks would be beneficial to Synthesizer V. It would help to increase the variety of voices available, and also reduce the load of work on Synthesizer V’s developers, who otherwise would need to work on every single voicebank that will be released in the future.

Regarding voicebank quality, I have an idea for separate Comercial and Community Kits. Each with their own license agreements.
The Commercial Kit could be purchased, and would offer the most freedom for developing a voicebank. One purchase would be enough for a person, group or company to develop any amount of voicebanks.
However, in order to release the developed product, the Synthesizer V team would then receive the almost-finished product and check if it meets the quality standards. A fee could be paid in order to follow this process.
If the voicebank is rejected, it could be adjusted to meet the criteria and be sent to approval again, or it could be kept for private use. There could also be rules regarding usage of the Commercial kit for private banks.

The community Kit would be free, but have some restrictions. Some ideas are: limitation on number of pitches and/or sub-libraries and forbid custom phonemes. Since it would be distributed for free, restrictions on commercial usage could be applied. The voicebanks developed with this kit could not be commercialized, and instead would be distributed freely in the community.

Again, those are just some ideas that I have, and may need better structuring and polishing in order to work. But I wanted to contribute in some way.

emgy805 · 2019 年 2 月 3 日午前 12:20

Yeah! I think it’d be pretty neat for people to be able to make some custom voicebanks, especially with UTAUs and whatnot. It’d be such a neat feature!

Cambionn · 2019 年 2 月 6 日午前 12:49

When I noticed Synthesizer V, it caught my attention because of the quality of output (then, it held me with it’s native Linux support and generous licencing & cost). I’m not even a producer, I’m a software engineer who was looking around for a voice synthesizer for a little project of mine, and I was originally planning to use Vocaloid for it (which is, for now, switched out for Synthesizer V in said project). The English actually sounded really good, understandable, smooth, and pleasant to hear.

Since the amount of people being pro-custom voicebanks I also wanted to reply. Because to be very honest, I hope against it beside for actual professionals (compagnies etc) and maybe the better hobbyists/freelancers, and if it’s going to be added, I hope for a really good implementation. I rather see this software as a competitor for Vocaloid and other professional software than as competitor for UTAU. A small company with passion and it’s own vision will be an interesting add on to the market. Please note that this is a personal view, and I don’t mean to tell others their opinion is wrong! It’s simply how I personally look at it based on my own preferences, uses, experiences, and opinions.

Biggest reasons I like it “closed” is quality, branding, and what that’ll mean for me.

I generally skip songs made with UTAU because of the high amount of low quality voices, and the usage of it. Even if there might be some higher quality ones, the amount of them is so small, it feels like searching for a needle in a haystack. And that’s generally not worth it considering there are other options around. Even after finding them, the few voicebanks made in good quality also have to be used well or the final product (generally, a song) will still sound bad. To be very honest, when trying to look up Vocaloid songs, I’m already kinda sad/annoyed at the amount of low-quality produced songs, regularly but not exclusively covers of songs where VSQ(X) files have been provided for by others, and when I look Eleanor Forte on YouTube, similar cases already exist. If people are allowed to make voicebanks without professional quality control, I fear it’ll end up with thousands of low quality releases for each good one, making it even harder to find the good ones. If it’ll get too bad, I might end up avoiding it like I do now with UTAU.

With Vocaloid, although there is a lot of not-so-professional stuff made, there is also still a lot of good stuff to be found. The proportion between lower-quality and high-quality stuff is different. I hope that with time, this will also be the case for Synthesizer V as it’ll make a name for itself (right now, the program is too young and there is too little to really judge the proportion between professional produced songs and clearly hobbyist projects). But to make name, it’ll need a good reputation. Which brings it to the second point.

I fear a drop in the average quality people see/hear will make the software come over less professional and make people think it performs worse than it does. This is a problem partly because I like to see it used professionally as I mentioned before, but also partly because I feel the software has much potential so I like to see what it can become in terms of higher-end vocal synthesising software for which it needs to be supported by more than hobbyist. And in becoming recognised as high-end software, having a “hobbyist reputation” can really hurt.
Considering the “most users are already UTAU users” thing. Current userbase doesn’t need to be the same as the market it’s marketed to or that’ll be the userbase in long term, especially with software this new. I stumbled across this randomly and it caught my attention right of the bat. This can happen with more software engineers, producers, etc. I think the way it’s marketed is especially important for this, as I wouldn’t have heard about it if not trough a mention on a Vocaloid related page and when looking what it was about, it looking good right away. First immersions count, and the places it’s advertised/talked about.

It might be less nice community-wise to keep it closed, but personally I care for this kind of software because I like how it sounds in (professionally produced) music, and thanks to my general interest in my own work and hobby field (ICT combined with humanly stuff, from very simply HR-interaction to “humanly” acts (like singing) done by machines to full on android robotics and their humanly AI). Most of the nice songs are made by decent producers. Some of them are hobbyist, some of them are professionals. But all of them are people who know how to produce and work with their software, and know quality over quantity. As for my own work/hobby field, quality is important as well. Communities matter a lot (especially in things made by smaller compagnies like this, or open-source software), but in the sense of their contribution, personally I have no use for a great community where most of the created stuff is low quality. I would say, let hobbyist who really have something to offer be able contact Dreamtronics to apply for making an official voicebank if they can convince them they can pass your quality check (if needed to ensure only serious people do it, with applicant fee). I do love it when “small” people like hobbyist and small-compagnies get a fair change and aren’t automatically beaten by big compagnies. This is also part of the reasons your generous terms of use and low cost make me happy, next to how it affects me.

That all being said, I would be very interested in a dev kit (for example C libraries) to use this as an engine in the background for synthesised vocals for other software. Right now I’m working making my software boot it up and send signals acting like a user, using playback for output, which I’m sure you can imagine isn’t ideal. I understand if this is not going to be a thing, but it would be great from my perspective.

Again, I like to pressure that this is a personal opinion based on my perspective only, and I can understand many will not agree with it or perhaps will even dislike me saying this. But for me, opening up the possibility to make voicebanks without proper quality control is really not a positive thing. I like to see this as fully professional aimed software. Also, note that when I say “professional” I mean it comes across as such, not that the person involved has to be a professional.

corasundae · 2019 年 2 月 6 日午前 1:20

I feel that a price barrier is enough quality control. How often do you hear genuinely bad Kontakt libraries?

Also, considering that Vocaloid continues to have a good reputation despite the existence of Arsloid, Sonika, ZOLA Project, etc., I don’t think bad voicebanks coexisting out there with good voicebanks harms the brand.

Cambionn · 2019 年 2 月 6 日午前 2:00

A price barrier can help but it’s not an all-fixing solution. It’s also not quality control, it’s a filter to have more people who are serious about it because they have to make a commitment dat affects them (and that filter isn’t perfect either since there are always people who spend money on professional software without being able to take full advantages of it’s capabilities, the group of serious people just get in higher proportion within the userbase). When it has a pricetag, piracy will certainly become a thing sooner or later, especially if the program becomes more well known and popular (which I hope for). When it is possible to buy a voicebank dev kit without any restraint it also becomes easy to get a copy to crack, after which it becomes hard to check if all voicebanks around have been made legally (probably not, but then, which are and which aren’t), especially for a small compagnie like Dreamtonics. Just look how big Pocaloid and all similar stuff is, and how little they need to release a cracked version of even the most limited data, although it counts for any software.

The amount of available synthesizer software for instruments that are decent is much higher than vocal, in any price range. It’s also easier to make it sound good than vocals. Odd as it may sound, it’s not really a comparable software when it comes to this.

The issue is not a few bad voicebanks that are rarely used like Vocaloid. It’s many bad voicebanks compared to good ones, as well as much non-professional quality work easy to be found compared to professional ones like UTAU. It’ll put professional and commercial users off, no matter if it’s justified they do so or not. Next to that, Vocaloid 5 seems to have a surprisingly hard time in commercial & professional market compared to previous releases. It’s not entirely a mess, not saying that, but I hear an awful lot of complains and way less high-quality produced songs. Which honestly makes me sad, because it has some great features and from some voicebanks I’m really looking forward to hear more from.

Synthesizer V is already really cheap asking only 79USD for a permanent licence. A one year license for just one IDE generally already cost more, and I’ve seen commercial software go up to thousands. The low price and the non-commercial usable trial are great for keeping the software usable for community-use. It’s hard to support quality and community development, and they are obviously already in favour of the community the way they are. I would say keeping voicebank creation closed beside for people who personally reach out to Dreamtonics and are willing to go trough official process similar to how bigger companies do, would be the best way to secure quality at this moment and give them a possibility to grow on other markets than community based ones. I also think the fact it sounds so much better than UTAU will already help them a lot, and that they don’t need hundreds of voicebanks as long as the ones they have are good. Perhaps in the future things might change and voicebank dev kits would be an option, but not now or anytime soon.

Anyways, I was just giving my two cents. I’m not exactly in the hobbyist part of the market (well, I program for hobby as well, but I still look for professional software whenever possible when I want to incorporate it), and obviously the hobby & community driver group exists too and has a very different needs and opinion related to this than I do. I don’t mean it negative or say the other people their opinion is wrong, I’m really just telling my point of view.

Run_djp_Run · 2019 年 2 月 6 日午前 10:49

Users should buy Synthesizer V instead of abusing the unlimited trial version, this could help Kanru to develop Dreamtonics.
Can users set up a donation system where they could donate money (5, 10, 15, 20 euros, dollars etcetc)?

Cambionn · 2019 年 2 月 6 日午後 12:36

Bit of topic but agree they shouldn’t abuse it. Previously, I was just saying that the current rules are quite nice and favourable to hobbyist and other small groups who (beside some) don’t pay big bucks. What should and shouldn’t aside, give people an option to use it for free and it’ll be used that way by a decent sized group. Just look at Winrar for example. But personally, I’m a big fan of paying developers for their work and using trials only to try stuff out.

Run_djp_Run · 2019 年 2 月 6 日午後 12:43

oops, my english is very bad, sorry

It’s not the subject sorry but let’s think about it

xuu_u · 2019 年 2 月 6 日午後 1:36

Personally, a price barrier and the developer (Kanru Hua) checking over DBs and ensuring a minimum quality threshold would be enough, though I have a feeling this may already be in place for interested third parties. It’s just a matter of interested parties applying now, no?

Villano · 2019 年 5 月 5 日午後 2:40

For me, custom content is a big thing nowadays.
It would be cool if we have voicebanks and plugins made by the community, it helps to see what’s lacking in the SynthV and what’s the favorite type of voices and new settings that pleases the majority part of the customers.
Of course that need to be some kind of barrier or perks in between paid and free content to make paid more appealing because we sure have good quality free VBs that is free for everyone.

First things first
The VB creation software not necessarily need to be paid. Free VB could have a free software that blocks the number of how many material they can put in your bank, like up to 1 to 5 .s5d files (almost like Renri’s voicebank) and for paid voicebanks a paid version that the creator can go wild.

For plugins, don’t have mystery. I think an open API and guide could do the trick.

For quality check we can use a website that everybody can register your plugins and VBs, either paid and free. Paid content have a special location for everybody to see.
The website will just register the name of the content, an area to put the link to buy or download the content and the description that we can put text, video, images and soundcloud links.
People can put star ratings and comments about what like or dislike about the product and they need to be registered to rate and comment.