There is somthing about digital singers

There is somthing about digital singers where unless I know what the words are, they are difficult to understand. So many digi singer videos have the words, but without them, there is just something missing that is present in flesh and blood. If I knew what that was I would write my own synth.


1 Like

Ill expound on this a bit more.

If the lyrics are shown then yeah its obvious what forte, for example is singing. But if not, there is something missing where the words are not obvious at all. There was a cover tune posted yesterday where only the chorus hook / title was displayed. The song was you won’t see me anymore or something like that. I was listening and trying to tell what the verses were but could not.

I wonder if you all who have listened to my song Password to my heart, can tell what is being sung on the verses. I can because i know the song.

I think you should pay attention more to real singers. They’re rarely perfectly understandable. That’s why we have mondegreens.

Taylor Swift is a human singer, and yet in Blank Space, everyone thinks she’s saying “all the lonely Starbucks lovers” instead of “got a long list of ex-lovers.” This is probably because of the weird cadence placing unnatural emphasis on “of.” Western Vocaloid producers tend to ignore cadence when writing lyrics. If all the words ARE emPHAsized weirdLY, that makes it difficult to understand.

Another major contributor to whether lyrics are comprehensible is predictability. Early Fall Out Boy songs are known for their hard-to-make-out lyrics. If you listen to their song Lake Effect Kid (a new studio version of a song they wrote a long, long time ago) it’s almost impossible to understand because the lyrics are unusual and every line is about a completely new subject. Western Vocaloid producers don’t usually write about typical subjects (love, dancing, etc.) so it’s difficult to predict what will be said.

This is actually one of my issues with vocal synths. There’s too much of an emphasis on pronunciation being understandable over it being natural for a singer.


I’ve actually found that Eleanor works quite well when using a more natural pronunciation that you’d find a singer using. A typical example you find in many songs of something being pronounced differently to how it’s read is

up and I

putting those lyrics straight into synth V will result in something best described as passable, but stiff and unnatural sounding. What tends to work a lot better is something that sounds more like:

uh pan die”.

As Corasundae suggests, we’re a lot more used to hearing this kind of pronunciation with actual singers and are therefore more likely to recognize the same thing coming from a virtual singer.

One more notable example that comes to mind is “I’m”. Most of the time you’ll find that what you’re actually looking for is “um”.

Correct me if I’m wrong, but I think those two are phonetically the same in Arpabet, because it doesn’t distinguish between aspirated and unaspirated versions of phonemes. So if anything were different there, it’d be specifically with the phoneme timing, though that can make a difference sometimes.

I agree that it’s possible overly perfect pronunciation is throwing people off. I honestly feel like this should be considered a problem on the recording end, even if the user can mitigate the issues a lot of the times like how you suggest (I.E., ideally the voicer would’ve put more effort into “singing” the words as if they were lyrics rather than reading them as if they were a script).

Maybe it’s because I’m American, but for “up and I,” I’d probably put either “up n I” or “.ah p/.ax n/.ay”. Other than that, I generally agree on how to make pronunciation more natural.

Of course there is the variable of rhythm too. The rhythm and speed of the phrase is mostly going to dictate what sort of accent/slurs you’re going for. In my case for the song im working on the “up and I” sounds quite different when pronounced as “uh pan die” and fits in much more naturally.


Usually when I’m tuning in SynthV I adjust the phonemes and move ending consonants to the start of the next note if it begins with a vowel, for some reason it makes it sound a lot more natural. I guess because it’s how you’d sing it anyway? .aa .p ax n .dx ay would probably be my go to personally. Messing up pronunciation a bit is natural and if you have everything pronounced exactly it would sound unnatural when the notes are also exact.



OMG! I just tried this on a problem pair of words where Forte sounded too strident. You know, like you are playing her words with one finger on the piano. Anyway that fixed it. It turned “The - Summer” into “The summer” nice and smooth like!

Now, if that were not so tedious. :grin:

1 Like

Glad it worked! It’s a shame that it’s so tedious, though I feel it’s still better to have constant clear enunciation with the room for it to be edited like this than for it to be potentially unclear in the first place.

I feel like it’s easier to edit pronunciation to be more clear than it is to edit it to be more natural. Improving clarity usually comes down to adding extra phonemes where needed (e.g., if you don’t think she says the r in [.er] well enough, you can just try [.er r]). But when it comes to sounding natural, I can’t use anything but the phonemes that are already there.

1 Like

Been mixing my second song with Ms. Forte.
I hi passed her pretty steeply and then using a hi shelf, raised the top end 3k and up quite a bit. This brought out the ess sounds better and makes her more intelligible.

In other news:

I wish vocal synths would come up with a way to make the melody envelopes more legato or “sing-songy” and less stair stepped. Someone had asked if there was a way to have them sing it first then SynthV follow that lead. - Someday, someday.

1 Like