Can we SynthV people be at the forefront of this? Now’s the time!
In music, “tuning” almost always refers to the intonation of an instrument. Making it “in tune”. ie Tuning the guitar or violin strings.
In digital music, “tuning” refers to the same, but with more emphasis on post-production. ie using Autotune or Melodyne to “tune” the vocal track after it’s been recorded.
So to see people talk about “tuning” their SynthV/UTAU…it’s just confusing and honestly, inaccurate. It took me a few minutes wondering…“Why the heck would the new SynthV have an autotune feature?” before realizing what was going on.
I’d suggest “editing” or “tweaking”.
But, editing and tweaking is a bit vague, no?
Maybe, but I don’t think “vague” is necessarily confusing. Vocaloid “tuning” is a vague concept, right? It includes so many parameters. When talking to other musicians, “tuning” usually means only one thing: pitch accuracy. So using “tuning” to mean, say, “consonant placement” or “rhythmic timing”…that’s confusing
I think in the computer audio world, “editing” generally covers a range of similar things.
You have a valid point, and I had the same thought when I encountered the term. “Tuned? Is there a problem with Vocaloids™ being out of tune?”
But it’s likely a shortened version of “fine tuned”, rather than referring solely to pitch.
And even “tuning” vocals with Autotune or Melodyne often involves more than simple pitch correction - there’s crafting of the performance, more akin to “tuning” synthetic vocals with Synthesizer V.
So it’s akin the term “programming” applied to drums, or “hack” applied to… well, everything that’s not a hack. There may be better terms, but the community has adopted this one, and imbued it with its own meaning.
Yep, I assumed it was short for “tine tuning”!
You’re right - Melodyne can involve more steps. But if you used Melodyne to tame vibrato and fix the timing of a real voice, but left it out of tune, you would definitely not say “I tuned these vocals”. “Tuning” still means correcting pitch. The rest is “editing”.
I think “programming” drums mostly refers to the note input, right? Further work might be “humanizing”? Which, isn’t a great solution for Synth V, since often the goal is NOT to sound human.
Your last point is actually my main concern! Usually with this kind of topic I’d say “yeah that word just means something different to that community, no problem.” But in this case, the two “worlds” of Synth users and all other musicians are bound to intersect. At least I hope they do…I WANT to play bass or guitar in a band with a SynthV singer!
I do like “editing”! I would perhaps even suggest “training”, like the Japanese communities refer to it as.
“Tuning” can also imply that the vocal synths are instruments rather than actual people-which they are-but it can also detract from the life that we as a community give these personified instruments.
I believe that “training” can imply that the people who are working with these vocal synths are sitting in the room with them, practicing and teaching them how to better-sing the songs that they’re working on.
This is, in fact, the correct term and what happens during the creation of the voicebank. Neural networks are a system of equations that are used to transform input data into output data (in this case, typed language into the sound of that language sung). During training, a set of inputs with known outputs (recordings of the singer used as the basis for the voicebank) are submitted to the system thousands of times, and each time the parameters of the equations are tweaked until the system is able to reproduce the expected output as closely as possible.
Neural networks are a system of equations that are used to transform input data into output data (in this case, typed language into the sound of that language sung). During training, a set of inputs with known outputs (recordings of the singer used as the basis for the voicebank) are submitted to the system thousands of times, and each time the parameters of the equations are tweaked until the system is able to reproduce the expected output as closely as possible.
The problem with this is that none of the major singing programs - Vocaloid, UTAU, or SynthV actually use neural networks for synthesis. They all use some form of concatenative synthesis.
SynthV uses parameterized speech, which means that instead of storing audio samples, it stores parameters which are passed to the resynthesis engine. The process of reconstructing singing requires pitch and time stretching the original audio. Because SynthV is able to perform resynthesis at a lower level of detail, the result has less artifacts than other programs.
SynthV apparently does use neural networks to label speech as well as analyze and apply styles in “Auto Tuning”.
But it’s got nothing to do with the synthesis of the voice itself.
“Tuned? Is there a problem with Vocaloids™ being out of tune?”
Since vocal synths are trying to emulate humans they often do generate sounds that are either out of tune or otherwise not adherent to the midi. It’s more noticeable with some voicebanks than others, and is less of an issue in SynthV than in Vocaloid. Many vocal synth producers apply some subtle autotune or midi tuning (ie GSnap or Pitcher) to account for this.
As for whether “tuning” is a representative term for the process of customizing a voicebank for a specific song… perhaps “characterizing” is a good alternative. Whatever the producer’s goal is, modern tuning tends to describe the process of adding character to a synthesized vocal, whether that be to humanize it or lean into a more robotic/synth sound.
Since I like playing with accepted terms, I prefer to think of myself as the VB’s vocal coach. ;D