VIP VOICES | Voice Air Knowledge Base

Voice Creation

When cloning a voice, it’s important to consider what the AI has been trained on: which languages and what type of dataset. In this case, the following are available Mult ...

Style Exaggeration

With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additiona ...

Why is my voice monotonous / too chaotic / doesn't sound similar, etc.?

Try changing your voice settings; you'll find them in the "Voice Settings" tab. Each attempt to generate a voice will bring a different result (especially visible at low stability) ...

What audio formats do you support?

We only deliver audio in the MP3 format 44.1kHz/16bit MP3 in 96kbps

Similarity

The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity s ...

How to create voice ?

Like we are proud to say always using our platform is easy for everyone ! Here is a quick tutorial about how to create voice -

How can I add pauses?

Using the pause feature creates an exact and natural pause in the speech. The AI can handle breaks of up to 3 seconds in length. You can embed the pause where you want the break t ...

Why does the voice start whispering or changing / audio degradation?

We know that the voices tend to degrade or start whispering during longer audio generations, and our team is working hard to develop the technology to improve this. This issue is m ...

Examples

Audio outputs and their corresponding text prompts. In this part, we’re highlighting what the text to speech AI can do, particularly in expressing variety of emotions. Keep ...

Models

Multilingual v2 This model has good stability, great language diversity, and fantastic accuracy in cloning voices and accents. Its speed is rather remarkable considering its si ...

Voice Settings

A guide on using stability, similarity sliders for tailored voice performances in Voice Air. Learn how to strike a balance between emotive and consistent audio outputs. Our users ...

Why are some numbers and words not properly pronounced in the correct language?

Numbers, acronyms, and foreign words sometimes default to English when prompted in a different language. For instance, the number "11" or the word "radio", typed in a Spanish promp ...

Pacing

Based on varying user feedback and test results, it’s been theorised that using a singular long sample for voice cloning has brought more success for some, compared to using ...

How can I force a certain pronunciation of a word or name?

We do not have any integrated solution to force a certain pronunciation. However, we are developing a proper solution and the tools to force and fine-tune pronunciations. But, at t ...

Can I use the same cloned/designed voice across languages?

All created voices are expected to maintain most of their original speech characteristics across all languages, including their original accent.

Stability

The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As men ...

Pause

There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax < break time ...

Overview

A guide on how to generate voiceovers using your voice in Voice Air. Now that you have your voice, it’s time to generate some voiceovers! To convert text to speech, hover ...

What characters are accepted when generating audio?

No textual-like characters and punctuation such as {,},<,>,[,] will usually result in low-quality speech generated by the model.

Can I slow down the pace of the voice?

We are working on features that will allow for speed optimization.

Do you have a list of symbols that have an effect on the output audio?

Unfortunately, we don’t have any such list of symbols. While the model responds to changes in pronunciation, there isn’t a predefined list of symbols that could be ...

How do you make the voice laugh?

We plan on introducing features that allow emotions such as laughter later in the year.

Prompting

Effective techniques to guide Voice Air AI in adding pauses, conveying emotions, and pacing the speech.

Alternatives

These options are inconsistent and might not always work. We recommend using the syntax above for consistency. One trick that seems to provide the most consistence output - sans t ...

How does the AI model work?

The AI has been trained on a vast amount of audio. The type of audio varies, but the mostprominent is audiobooks. This is the context it understands the best, and it provides t ...

Volume drops mid-utterance (stability)

When the voice drops in volume, whispers, or distorts, this is most likely a stability issue. How prevalent this is also dependent on the voice used and how wide the dynamic range ...

Speaker Boost

This is another setting that was introduced in the new models. The setting itself is quite self- explanatory – it boosts the similarity to the original speaker. However, usin ...

How many characters can I use per export ?

You have up to a maximum of 15,000 characters per production, and you can add multiple voices and voice segments with a total of 1500 characters per segment. The reason why there ...