Voice Settings

A guide on using stability, similarity sliders for tailored voice performances in Voice Air.


Learn how to strike a balance between emotive and consistent audio outputs.


Our users have found different workflows that work for them. The one you’ll see most often
is setting stability around 50 and similarity near 80, with minimal changes thereafter. Of
course, this all depends on the original voice and the style of performance you’re aiming for.
It’s important to note that the AI is non-deterministic; setting the sliders to specific values
won’t guarantee the same results every time. Instead, the slider’s function more as a range,
determining how wide the randomisation can be between each generation.

 

Setting stability low means a wider range of randomization, often resulting in a more emotive
performance, but this is also highly dependent on the voice itself.


Hovering over the icon next to the sliders will provide additional information.


For a more lively and dramatic performance, it is recommended to set the stability slider
lower and generate a few times until you find a performance you like.


On the other hand, if you want a more serious performance, even bordering on monotone on
very high values, it is recommended to set the stability slider higher. And since it’s more
consistent and stable, you usually don’t need to do as many generations to get what you are
looking for. Experiment to find what works best for you!


Was this article helpful?