A guide on using stability and similarity sliders for tailored voice performances in Voice Air.
Learn how to strike a balance between emotive and consistent audio outputs.
Our users have found different workflows that work for them. The one you’ll see most often is setting stability around 50 and similarity near 80, with minimal changes after that. Of course, this all depends on the original voice and the style of performance you’re aiming for.
It’s important to note that the AI is non-deterministic; setting the sliders to specific values
won’t guarantee the same results every time. Instead, the slider functions more as a range,
determining how comprehensive the randomisation can be between each generation.
Setting stability low means a broader range of randomization, often resulting in a more emotive
performance, but this is also highly dependent on the voice itself.
Hovering over the icon next to the sliders will provide additional information.
For a more lively and dramatic performance, it is recommended to set the stability slider
lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone on very high values, it is recommended to set the stability slider higher. And since it’s more consistent and stable, you usually don’t need to do as many generations to get what you are looking for. Experiment to find what works best for you!