AI Retakes: Timbre

Luzula · 2023 年 4 月 25 日午後 3:31

Something I haven’t seen discussed alot is the Timbre Retakes function. I’ve been using it for some time by now, but I more often than not fail to hear an actual difference. What is it supposed to do? The name would suggest it has to do with timbre, but then I’ve seen mentioned a few times that it affects dynamics.

claire · 2023 年 4 月 25 日午後 3:46

Timbre is a relatively vague and broad term, but in this context it basically means the qualities of the sound that are not the pitch. Sometimes this will affect dynamics, enunciation, tone color, and various other aspects.

In the official demonstration (skip to 2:20) it was used to remove vocal fry (glottal sounds), smooth a transition (dynamics), and adjust an ending s sound (enunciation).

I think that dynamics got most of the focus when the feature was introduced because at the time the HDVM feature was unstable, and reducing the Timbre Retakes expressiveness slider for the entire track was the main way to address inconsistent dynamics. That’s no longer relevant as of 1.7.1 though.

That section in the manual seems like it could do with a revision, so I’ll add that to the “to-do” list.

Luzula · 2023 年 4 月 25 日午後 7:06

Brilliant – thank you, Claire.

Am I right in presuming that the effects of Timbre Retakes are somewhat hard to predict, and that the outcome will vary depending on voice database? For example, I’ve never managed to eliminate vocal fry from Solaria by way of Timbre Retakes.

claire · 2023 年 4 月 25 日午後 7:31

Yes, there is some randomness involved in the synthesis engine, and all retakes do is generate a new random result.

If we use a simplified analogy, imagine rolling a 6-sided die where option 1 is a result with no vocal fry, and options 2-6 result in vocal fry. By default, SynthV will generate a result based on the note’s context (the pitch of this note, whether it’s higher or lower than its neighbors, etc.).

That list of potential results will be different for each situation and with each voice database, depending on which biases are present in the original dataset and analysis. SOLARIA’s voice provider tends to use vocal fry, and therefore SOLARIA does as well when the engine “recognizes” similar situations to those original recordings.

Using AI Retakes is just like rerolling the dice, but the possible outcomes don’t change. If the chance of a desirable result is 1 in 6 like in the example above, it will be just as unpredictable as rolling a standard die and hoping to get a specific number.

By contrast, Vocal Modes actually change the possible results of that randomness by changing the bias. Maybe by changing a vocal mode you can increase the probability that a “dice roll” will result in the desired behavior (perhaps now it’s both 1 and 2 instead of only a roll of 1), because you’re giving certain behaviors more or less “weight”.

Of course in reality we don’t only have 6 results, we have millions of subtly different ones. It’s still the same idea though, as a matter of probability there are a number of factors that will make your desired result more or less likely – effectively changing the list of possibilities – but generating more AI Retakes is just rerolling the dice on that same list of possibilities.

So those situations where you can’t get SOLARIA to not use vocal fry are probably very similar to situations where the original voice provider would use vocal fry, and therefore your AI Retakes might only have a small percent chance of landing on a desirable result – effectively requiring more dice rolls.

And then of course since the AI-generated behaviors are context-specific, if you change the note or its neighbours in some way you risk changing that underlying set of possibilities, so even though your “dice roll” is locked in as a retake, it might no longer give the same result (for example, if the “no vocal fry” option was on a dice roll of 5 instead of 1, maybe that “1” result is locked in but it no longer means the same thing, because the available options have changed).

Luzula · 2023 年 4 月 26 日午後 2:30

Wow. I’m speechless. I have no speech! Thank you very much for this answer. Your dies analogy is brilliant, it all makes perfect sense.

Why not include the above in the Unofficial Manual’s AI Retakes section? I would believe I’m not the only one who’s been a bit confused by AI Retakes, so I think it would be helpful to elaborate—the way you’ve done above—on what it does.