Models | Voice Air Knowledge Base

Multilingual v2

This model has good stability, great language diversity, and fantastic accuracy in cloning voices and accents. Its speed is rather remarkable considering its size as it supports 28 languages, but it is slower than English v1. There are a few important things worth noting. Since the model is highly accurate, it will strive to clone everything present in the original samples with even greater precision than older models. This really underscores the importance of using proper, high-quality samples
with the performance, accent, and tone of voice you want the AI to clone.

We’ve heard certain issues appearing when users use samples of poor quality, where there is excessive noise, very low rumble, or even very sharp esses. In such cases, the AI might begin to deteriorate, as it attempts to mimic these problems, which might confuse it.

We would recommend using less samples of higher quality with the performance and voice you want, rather than more samples with a lot of variances across quality and performance. It is worth noting that the AI will try to preserve the accent of the original voice. So, if you use a pre-made, voice designed voice, or voice cloned speaking English, you might hear a slight
English accent or the wrong pronunciation in other languages. Cloning voices speaking the language you intend to use the AI for is the best choice and will give the best results.

There have been reports of “language switching”, particularly between languages that share similarities in text but may have distinct pronunciations or accents. This is when the AI gets confused and don’t have enough context and switches language in the middle of generation. We are actively working on this issue, and it appears to be less present when using a well-cloned voice that was originally cloned on someone speaking the correct language with the correct accent.