When cloning a voice, it’s important to consider what the AI has been trained on: which languages and what type of dataset. In this case, the following are available Mult ...
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additiona ...
Try changing your voice settings; you'll find them in the "Voice Settings" tab. Each attempt to generate a voice will bring a different result (especially visible at low stability) ...
We only deliver audio in the MP3 format 44.1kHz/16bit MP3 in 96kbps
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity s ...
Like we are proud to say always using our platform is easy for everyone ! Here is a quick tutorial about how to create voice -
Using the pause feature creates an exact and natural pause in the speech. The AI can handle breaks of up to 3 seconds in length. You can embed the pause where you want the break t ...
We know that the voices tend to degrade or start whispering during longer audio generations, and our team is working hard to develop the technology to improve this. This issue is m ...
Audio outputs and their corresponding text prompts. In this part, we’re highlighting what the text to speech AI can do, particularly in expressing variety of emotions. Keep ...
Multilingual v2 This model has good stability, great language diversity, and fantastic accuracy in cloning voices and accents. Its speed is rather remarkable considering its si ...
A guide on using stability, similarity sliders for tailored voice performances in Voice Air. Learn how to strike a balance between emotive and consistent audio outputs. Our users ...
Numbers, acronyms, and foreign words sometimes default to English when prompted in a different language. For instance, the number "11" or the word "radio", typed in a Spanish promp ...
Based on varying user feedback and test results, it’s been theorised that using a singular long sample for voice cloning has brought more success for some, compared to using ...
We do not have any integrated solution to force a certain pronunciation. However, we are developing a proper solution and the tools to force and fine-tune pronunciations. But, at t ...
All created voices are expected to maintain most of their original speech characteristics across all languages, including their original accent.
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As men ...
There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax < break time ...
A guide on how to generate voiceovers using your voice in Voice Air. Now that you have your voice, it’s time to generate some voiceovers! To convert text to speech, hover ...
No textual-like characters and punctuation such as {,},<,>,[,] will usually result in low-quality speech generated by the model.
We are working on features that will allow for speed optimization.
Unfortunately, we don’t have any such list of symbols. While the model responds to changes in pronunciation, there isn’t a predefined list of symbols that could be ...
We plan on introducing features that allow emotions such as laughter later in the year.
Effective techniques to guide Voice Air AI in adding pauses, conveying emotions, and pacing the speech.
These options are inconsistent and might not always work. We recommend using the syntax above for consistency. One trick that seems to provide the most consistence output - sans t ...
The AI has been trained on a vast amount of audio. The type of audio varies, but the mostprominent is audiobooks. This is the context it understands the best, and it provides t ...
When the voice drops in volume, whispers, or distorts, this is most likely a stability issue. How prevalent this is also dependent on the voice used and how wide the dynamic range ...
This is another setting that was introduced in the new models. The setting itself is quite self- explanatory – it boosts the similarity to the original speaker. However, usin ...
You have up to a maximum of 15,000 characters per production, and you can add multiple voices and voice segments with a total of 1500 characters per segment. The reason why there ...