[v1.10.0 now available!] Reinforcement Learning with Human Feedback (RLHF) and UX improvements

Perry · 2023 年 7 月 23 日午後 2:09

A few days ago, the fb page of Dreamtonics has uploaded a video to introduce a new technology they are developing that they call “Reinforcement Learning with Human Feedback” (RLHF).
According to it, it is a way to overcome the limitations of the AI learning not being able to distinguish by itself which possible way to sing a line will be more natural sounding or preferable for human listeners. For this, they are feeding the AI with human input about the different interpretations that the AI does of input information.
They say it will be implemented in the next update (I guess for Synth V 2.0).

claire · 2023 年 7 月 23 日午後 2:48

There’s also a video demonstration on their YouTube channel (Iinked below).

To summarize, it looks like Dreamtonics has surveyed randomly selected listeners to choose which outputs they find more natural-sounding, and then fed that data back into the models to improve reliability and consistency.

For users, this means little aside from less time spent generating retakes, and fewer notes that require manual correction (that is to say, the user doesn’t have any new tools to interact with directly, it’s something we’ll all benefit from passively without adjustments to workflow or technique).

Since the software has been given feedback about listener preferences, it knows which sort of patterns to avoid, or which to be biased toward, meaning it’s more likely to produce a desirable result with each retake generated, and it’ll be less likely you’ll feel the need to generate more takes in the first place.

This is a very good thing for voice databases like Kevin that currently take a lot of retakes to get desirable results, or a lot of manual correction. I’m curious how much of a difference it’ll make for those like Solaria where the generated patterns are already on the better end.

My main concern is about how the surveys were performed. What genres of music were used for the sequences that listeners were asked to pick between? Will this cause the voice databases to all trend toward a specific style? This is presented as a core engine improvement, so it’s not voice-specific, but of course user preferences about Kevin might not apply to Ninezero.

I’m sure Dreamtonics is aware of these pitfalls, but I do l wonder if they have plans to address the fact that the generated pitch curves have no way of taking into account genre or key signature. Vocal modes definitely help, but those don’t change the pitch patterns, and as many users have pointed out, certain pitch patterns like vibrato are much more at home in some genres than others. I personally don’t mind this because I’m accustomed to manual editing, but if Dreamtonics’ goal is to generate a desirable result without the need for user correction they’ll need to tackle that eventually.

This also likely means that projects brought into the new version will not sound the same once a note has its pitch re-generated, since the engine’s rules for pitch generation will have changed. A minor backwards-compatibility problem, but worth mentioning.

I’m curious to see what else they’ll accompany this with if it’s bound for 1.10.0 (software versions don’t necessarily have to go to 2.0 after 1.9) or if it’ll be delivered on its own as 1.9.1.

robint · 2023 年 7 月 23 日午後 5:00

I wonder what the dividing line will be between music (that we all love here) and TTS applications (commercial market with maybe 10^n times the size and potential monetization). Perhaps it will become a fork as the speech becomes so near perfection ? Hope some Giant player doesnt swoop in and swallow our baby from its independent nest.
So we music creators dont get marginalised - Sigh

claire · 2023 年 8 月 2 日午前 10:38

Looks like we have our answer!

Dreamtonics has released version 1.10.0b1 today. As usual, you can get the beta from the news post on the website, and only the stable release will be available through the application.

This includes RLHF, an additional set of options in the AI Retakes panel to submit your own feedback (opt-in), as well as a few long-requested UX improvements:

A color picker for tracks in the arrangement
Drag-and-drop MIDI importing when used as a plugin
An option to quickly shift the lyrics forward or backward one note, for those times where things are offset by just a bit

Full info here is in the article linked below. Remember to also get the beta version of each voice database to take full advantage of the new features.

I would especially encourage people to try out the drag-and-drop MIDI importing and submit your feedback to [email protected] to hopefully catch as many DAW compatibility issues as possible during the beta phase.

claire · 2023 年 10 月 19 日午前 11:40

The full version of version 1.10.0 has released today. There are no major changes from 1.10.0b2 aside from some stability improvements.

This means we can likely expect to wait a bit longer for other upcoming features (Japanese rap support and ARA DAW integration).

You can update to 1.10.0 from within the application. Most voice databases have updated versions available.

Tokyo6 has stated that they are skipping this version, citing concerns about user feedback affecting their products in unpredictable or undesirable ways. SynthesizerV AI 小春六花、夏色花梨、花隈千冬の今回のマイナーアップデートのスキップにつきまして｜TOKYO6 ENTERTAINMENT

Cr3ActivityFirst · 2023 年 10 月 23 日午前 3:38

WOW, I couldn’t agree more with Tokyo6 decision
I wish they would go back even before version 1.9
Cause Dreamtonics already laid the path to 1.10 in 1.9
The change of voice character already started in 1.9 - (b4 RLHF – instant > sing > RLHF)
But, it’s Dreamtonics software and there path
Sadly, not the path I would like to have them see taken
BUT, I’m already grateful for the 1.8 version they created
I can live with that version for the rest of my life, it’s magic
And who knows which path Dreamtonics will take in the future
ARA is in the pipeline, whoop whoop - already Cubase support:)
But for now, — I couldn’t AGREE more with Tokyo6 decision
----- I wished more 3th party would have followed their path:\

How-do-you-get-a-pulse on all the news claire !!!
You truly are a singing synth one stop shop:)
Thanks for your Info! -