I tried reducing stress and tension, also increasing soft and clear vocal modes, but I still get sibilance with Kevin, with S and Z even with P, K, T, D (I don’t know if it’s called sibilance for these or something else).
The best way is to reduce the duration of the sibilant consonant but most of the time I don’t like how it’s pronounced.
Strangely render quality affects sibilance negatively. “Prefer Quality” is more sibilant than “Prefer Speed”.
I haven’t experimented with rendering and processing the aspiration separately. Does it work? What is your workflow?
The simplest answer is to export the aspirant sounds separately, then recombine things after you’ve adjusted the levels of each component in isolation.
Don’t forget to use a de-esser just like you would with any other vocal recording. Of course, targeting the correct frequencies is much simpler when the aspirant sounds are isolated.
The other sounds you’re referring to are mostly plosives. They tend to be less problematic than sibilance, but you can tackle them with small dips in the Loudness parameter, as well as any of the usual techniques you’d apply to human vocals.
You could always try higher sample rates when saving and/or having a small gap between the words where the sibilance appears, but sometimes I have to use Soundforge with the smooth tool on the ‘sssss’ bits and RX10 de-click helps as well.
A further few thoughts on this. I find that Harrison AVA Vocal flow has one of the best De-essers if you can get it cheap enough. it might also be worth running the vocal track (MAKE BACKUPS) through a de-clicker.
“Also I’m not sure how it will help to reduce sibilance.”
because 16-bit is digitized sound, and will tend to be at more risk of having extra digital crackle in it if the sssss is causing clipping.
I use Reaper DAW and from various production lessons have got in the habit of using 32-bit when exporting final mixdowns, and also will do that for important stuff like vocals. The reason being - and you might want to confirm this with an audio engineer in case I am misunderstanding it - but my understanding is 32-bit float will allow for clipping to be better controlled because its such a higher digital quality than 16-bit. This also will effect the file sizes obviously.
I am finding Jun is very sibilant when set to gentle and breathy and am hoping to find a solution, but aspirations might be it, after that I use Sibinlance VST plugin from Waves which is a paid plugin but is good all rounder, and if that is not quite getting it I use Toneboosters free Sibilance VST plugin which is amazing and can dampen exactly thr right frequences with a bit of tweaking. (these are called de-essers or sibilance controllers when looking for them, but they are pretty much essential for most vocals and I guess synth vox are no different).
In both the sample rate and bit depth cases, they’re not particularly relevant to sibilance. If you have clipping issues, then the clipping is the problem to fix first, not the sibilance.
If you then still have sibilance issues after fixing the clipping, then bit depth is no longer helpful, because your final track will be 16-bit on most streaming services anyway. Using a higher quality to address a problem doesn’t work if the audience isn’t listening to that version of the file.
The simplest solution to clipping is to reduce the volume/loudness; you’ll be managing gain levels in the mixing stage anyway. In general, SynthV Studio won’t clip unless you’ve needlessly increased the loudness somewhere.
Harsh sibilance is the result of specific sounds being proportionally louder than the rest of the track, and reducing the volume to address clipping doesn’t change the dynamics of the track as a whole.
A de-esser is the most common way to tackle sibilance, among other options like targeted EQ or compression (effectively using other tools to do what a de-esser is made for), or manually editing the samples as is described by dcuny here:
On the other hand, when you clip a noisy sound like /s/, by definition the amplitude and frequency of the waveform is random, it’ll sound like clipped noise, because… well, that’s exactly what it is. And clipped noise sounds a lot more harsh than a clipped harmonic signal:
I think where a lot of people get confused with bit depth is the concept of “headroom”.
The bit depth, due to the “digital noise” introduced by digital audio encoding, determines the noise floor of a sample.
Using a higher bit depth doesn’t give you any extra space above 0dbfs; exceeding that will clip no matter what. The added “headroom” which avoids clipping refers to how a higher bit depth enables you to save and work with quieter recordings thanks to the lower noise floor, without sacrificing quality.
That’s also why it isn’t particularly relevant to SynthV Studio, where the synthesized output is rendered at a reasonable volume from the start.
Thanks for the detailed explanation and clarifying my incorrect assumption, that’s great. But I then wondered why I had the impression 32-bit floating point was the better choice, and a quick research reminded me where I got this from. There is some further detail on it here if anyone is interested.
incidentally my issue with the Jun voice creating bad sibilance when ‘gentle’ and ‘breathe’ modes were increased, was solved by reducing all the “s” where it occured individual parts to about 30% or sometimes less and it counterbalanced the increase of the other settings. So far so good.
tldr; 32-bit audio can’t clip but rendering you’re 32-bit project to 16- or 24-bit file can introduce clipping.
This is kind of true but also kind of misleading. Let’s consider this in the context of what is meant in the typical discussion of DAWs and audio production. The most common bit depths you will hear discussed are 16-, 24-, and 32-bit. In this context the 16-, and 24-bit are FIXED point and the 32-bit is FLOATING point. This is important. Yes, you can have fixed point at any bit depth and you can have floating point at any bit depth but when discussing audio the 16 and 24 are almost always going to mean FIXED point and the 32 is almost always going to mean FLOATING point. All FIXED point bit depths will “clip” when the audio goes above the maximum value or below the minimum value. This is because FIXED point literally cannot represent values outside those boundaries.
In FLOATING point audio formats, clipping is literally IMPOSSIBLE to occur (until you are gajillions of dB above zero which is theoretically possible but I doubt any DAW could ever actually achieve it so impossible at any real-world situation). This is because what we think of 0dB (maximum possible level) is at 1.0 in a FLOATING point format and floating point can easily represent values above 1.0 without causing any clipping or signal distortion.
IMPORTANT concept: FLOATING point audio can NOT clip as long as it remains in the floating point realm.
The problem only comes when you (probably) inevitably render down to a fixed-point audio format. At that time, if the rendered audio has any floating point values beyond 1.0 then those will be clipped only during the conversion.
The VST2 and VST3 do all audio processing I/O as floating point. They can NOT ever clip anything. However, clumsy plugin implementations could improperly clip their I/O as well as a poorly implemented DAW could do the same thing. DAWs like REAPER have all internal audio paths as floating-point even when your source audio is fixed point. Any DAW that supports VST would have to at least have the audio be floating-point to be processed in any VST. Hopefully, the DAW keeps the audio as floating-point instead of constantly converting back and forth. Therefor, clipping is NOT possible if you’re in 32-bit (float).
I don’t know of any DAW that does 32-bit fixed-point audio but I don’t know every DAW so maybe there are some.
So the above shared information about “what is clipping” is accurate in the general sense but is misinformation in the typical context of audio because 32-bit would commonly refer to a floating-point format.
In summary, 32-bit audio can’t clip but rendering you’re 32-bit project to a 16- or 24-bit file can introduce clipping.
Yes, 32-bit float is the reason DAWs can route signals over 0db between plugins or mixer inserts, but in the case of rendering a wav file from SynthV Studio (unless things have changed since I last checked), output over 0db will always be clipped, even if the bit depth is set to 32-bit float.
How do you understand if it clips or not? Synth V tracks can add up to 24 db. I did a test. While 16 and 24 bits add distortion on the bounced tracks that has +24 db volume, I don’t hear any distortion with 32 bit float.