I wanted to created a British accent for Solaria. I spent a long time with different phonetic spellings and parameter changes, but there are a couple of essential [vowel] sounds missing. Would it be possible to extend the range of vowel sounds for voices so that more customisation for accents / languages is possible? I think this would also be a good way to help users to make their voices sound unique.
Out of curiosity, which phonemes are missing?
In hindsight, ‘missing’ probably isn’t the word I wanted. Often, a version of the phoneme exists, but doesn’t work well for a British accent. A couple of common examples just using Solaria:
The ‘oo’ sound as in ‘true’ ‘you’ etc. The uw phoneme isn’t flat enough and has an accented tail to it. It can almost work if cutting the duration to minimum, or splitting words, but doing so can introduce problems in the context of phrases.
‘O’ as in oar / or / on / divorce etc. This can be approximated in a few cases with ‘ow ah’, but again severely cutting the duration and strength can make the phrase sound unnatural.
The heavily accented ‘er’ – sometimes a short / quiet ‘ah’ can work instead, but not reliably.
This being said, I love Synth V and it’s not a deal breaker.
Ninezero is the first example of an English voice database that isn’t based on an American-English voice provider (in this case Australian).
He was only just released, but it’ll be interesting to see if the phonemes actually reflect that accent. Arpabet’s phoneme symbols were designed for American English, but it’s possible that the existing symbols will sound somewhat different with Ninezero than with the existing English voices despite using the same notation.
It might be possible to have some extension of cross-lingual synthesis that allows selection of alternate accents or dialects, but considering cross-lingual synthesis already results in a noticable accent in many situations I also wonder how feasible that is.
Yeah, it seems to me that’s the strength and weakness of SynthV. By not trying to use generic phonemes, it does a better job capturing the sound of the singer, but that includes the dialect.
Which is great… until you’re trying to get a specific sound that the singer wouldn’t normally make. As you’ve said, you can blend together vowels, but that’s really not the same.
As claire pointed out, this is especially problematic in cross-lingual synthesis, where a voice might not have a sound at all, such as /ey/. In those cases, sometimes the best you can do is find an approximate vowel, make it short, and hope the listener mentally fills in the missing portion.
I posted this feature request mainly to flag my interest…in case such a thing becomes feasible.
In the meantime, one option I’ve yet to try is stretching / minutely splicing an exported audio file. This might be a way to cut or greatly minimise unwanted parts of a phoneme. It’s an interesting challenge.
The old Software Automatic Mouth (SAM) formant synthesizer also used the Arpabet.
Internally, each phoneme only had a single value. So complex phonemes like diphthongs, which have multiple parts, were behind the scenes implemented using “extra” phonemes. There were also some extra phonemes that served as softer versions of vowels:
- /yx/: diphthong ending used in /ey/, /ay/, /oy/
- /wx/: diphthong ending used in /ow/, /ow/, /uw/
- /ux/: first half of /uw/ /ux wx/
- /oh/: first half of /ow/ /oh wx/
- /ax/: softer version of /ae/
- /ix/: softer version of /ih/
These would mostly be useful for cross-lingual synthesis, as a number of voices seem to drop the second half of a diphthong and this would give a way to force it back on. The Phoneme Strength option already gives something akin to the “softer” versions of phonemes.
Anyway, the ability to mix-and-match is sometimes helpful in trying to build or remove an accent.
You might try just spelling the word in question wrong or how you want it to sound. That’s how I do it. For instance the words Whiffle Balls was sung by Kevin as it should, “Whiffle Balls”. I wanted a more black or southern “Wiffo Bowz” so that’s what the lyric is. Works a treat and is more easier to me then phoneme charts.
Yes the resulting lyrics are wrong. but that’s alright.
Hi Northwave, I know this doesn’t answer your question, but to give hope to your quest, there is a person on youtube that is getting Asterian and Quing Su to sing in Spanish, and not just Spanish, but ethnic Spanish with all the idiosyncracies of those dialects. Don’t ask me how he’s doing it, but he must be creating his own phonemes.
Creating new phonemes would require additional development of the voice database, and is not possible by users.
Voice databases are only capable of producing a finite set of sounds based on the phonemes available using cross-lingual synthesis. This means their “native” language and (for AI voices) the remaining two supported languages with some inaccuracies or accented sounds.
Anyone making songs outside of the supported languages is doing so by manually selecting similar phonemes from the available sets and adjusting the timings or parameters to get as close to the unsupported language as they can.
He must have a lot of time on his hands and excellent ears because he has done pretty amazing work considering the voices only support 3 languages. Thanks for the explanation.
@bitman yes, I use phonetic spelling a lot, both with Synth V and others. The difficultly I’ve had is that the sound I’m looking for doesn’t currently exist in a pure form. But the up-side of phonetic spelling is that you can still add a little individual character to the voice .
@ViolinMusic It’s amazing and inspiring what some users have managed to do. As mentioned further up the thread, when time allows I’m going to try splicing the output audio files. For my purposes, it might be possible to stretch a word and then do some micro-surgery editing to get closer to the sound I want.
Thanks for the replies, and happy weekend, everyone.