Sound Effects
Overview
Get the most out of our Sound Effects Generator tool and learn how to create everything from blockbuster sound design for films to everyday sounds for your video game.
It is said that audio is more important than visuals. Most people can accept bad visuals but can’t stand bad audio. Audio also evokes emotions and sets moods for your audience; it can be subtle or bombastic. Depending on the type of sound and music that you use in your production, it can completely change the emotional context and meaning behind what you are trying to tell.
However, sometimes it’s pretty challenging to find that perfect sound. But it has now gotten a lot easier with Voice Air, as our sound effects generator allows you to generate any sound imaginable by inputting a prompt, streamlining the process tremendously. Of course, this is not only an excellent tool for independent filmmakers or indie games. It is also a fantastic resource for big productions, sound designers, and producers because you can generate a vast array of sounds.
We will go through some of them here in this documentation. Just to let you know, this is just scratching the surface. While the feature might seem simple at first glance, the understanding that the AI has of natural language and the type of sound effects it can generate gives way to infinite possibilities.
The general layout for sound effects is relatively straightforward. You have a window where you will input a prompt, you have some settings, and you have a generate button.
Each time you press generate, the AI will generate complete variations of the prompt that you’ve given. The cost of using the sound effects generator is based on the length of the generated audio. If you let the AI decide the audio length, the cost is 200 characters per generation. If you set the duration yourself, the cost is 40 characters per second.
Prompting
A prompt is a text or instruction that communicates to the AI model the expected response or output. The prompt serves as a starting point or context for the AI to understand the user’s intent and generate relevant and coherent output accordingly.
In this section, we will go through how to construct a good prompt and what a prompt is.
We will then split these prompts into simple prompts and complex prompts. In general, simple prompts instruct the AI to generate one sound, while complex prompts instruct the AI to generate a series of sounds.
The AI understands both natural language, which will be discussed more in complex prompts, and a lot of music terminology.
Sound Effects currently works best when prompts are written in English.
Simple Prompts
Simple prompts are just that: they are simple, one-sided prompts where we try to get the AI to generate a single sound effect. This could be, for example, “person walking on grass” or “glass breaking.” These prompts will generate a single sound effect with a few variations in the same or subsequent generations. All in all, they are pretty simple.
There are a few ways to improve these prompts, one of which is by adding more detail. Even if they are simple prompts, improving the prompt can give better output. For example, something that sometimes works is adding details like “high-quality, professionally recorded footsteps on grass, sound effects foley.” Finding a good balance between being descriptive and keeping it brief enough to have AI understand the prompt can require some experimentation
Complex Prompts
When talking about complex prompts, we don’t mean the length or the adjectives or adverbs used in the prompts. Although those can increase the complexity of the prompt, when we say complex prompts, we mean prompts where you have multiple sound effects or a sequence of sound effects happening in a specific order, and AI can replicate this.
For example, A man walks through a hallway and falls down some stairs.
Let’s take the prompt above as an example. The AI needs to understand what a man walking through the hallway sounds like and what a man falling down some stairs sounds like. It needs to understand the sequence in which these two things are supposed to happen based on how you wrote it, and then combine these sounds to make both coherent and correct. This is what we mean when we say a complex prompt because it involves both an understanding of sound and an understanding of natural language explaining what you want.
The AI can do this; for example, the result for the example prompt above would be accurate.
However, this is generally much more complicated for the AI because it is much more complex. For the best results, we recommend generating individual sound effects and then combining them in an audio editor of your choice or using our timeline to stitch the segments together, like you would with an actual production where you have individual sound effects that are then combined.
Settings
Once you’ve set your prompt and know what you want to generate, you can jump into the settings. Set how long you want the generated audio to be and how influential the prompt should be to the output.
There are just two settings:
Duration: Determine how long your generation should be depending on what you set this as, you can get quite different results. For example, if I write “kick drum” and set the length to 11 seconds, I might get an entire drum loop with a kick drum, but that might not be what I want. On the other hand, if I set the length to 1 second, I might get a one-shot with a single instance of a kick drum.
Prompt Influence: Slide the scale to make your generation perfectly adhere to your prompt or allow for some creativity. This setting ranges from giving the AI more creativity in how it interprets the prompt to telling the AI to be more strict in following the exact prompt that you’ve given.
Sound Effects
Now that we are dealing with prompts, it is important to learn some terminology related to audio to get the most out of the feature. You will have to prompt the AI with words and sentences in a way that it understands, and in this case, it understands both natural language and audio terminology.
There are many words that people working with audio know very well, and these are used in their daily vocabulary. However, for ordinary people, those words are completely foreign and might not mean anything. I will give a short and very non-comprehensive list of some of the words you might want to test that might be helpful to know.
Foley: Recreating and recording everyday sound effects like footsteps, movement, and object sounds in sync with the visuals of a film, TV show, or video game to enhance the audio quality and realism.
Whoosh: An effect that underscores movement, like a fist flying or a camera move. It’s versatile and can range from fast, ghostly, slow-spinning, rhythmic, noisy, and tense.
Impact: The sound of an object making contact with another object or structure, like a book falling, a car crashing, or a mug shattering.
Braam: A big, brassy cinematic hit that conveys something epic and grand about to happen, commonly used in movie trailers.
Glitch: The sound of a malfunction, jittering, scratching, skipping, or moving erratically, used for transitions, logo reveals, or sci-fi soundscapes.
Drone: A continuous, textured sound that adds atmosphere and suspense, often used to underscore exploration or horror scenes.
Onomatopoeias like “oink”, “meow”, “roar”, and “chirp” are also critical sound effects that imitate natural sounds.
Beyond Sound Effects
Even if the name of the feature is called “Sound Effects,” don’t let that fool you. This is the perfect tool for sound designers, Foley artists, game developers, as well as producers and composers.
If you’re a hip-hop producer looking for samples, new or more old school, and are tired of digging in crates or reusing the same overused samples that everyone else uses, this is the perfect tool for you. If you are an EDM producer looking for one-shots or other samples, it’s perfect for you as well.
You can generate everything from Individual one shots to drum Loops to instrumental Loops to Unique new samples from Big Band sections and Brass stabs pretty much anything you can imagine.
I will go through a little bit of how to prompt this, but it is a lot of trial and error to get what you want.
Stem: An individual track from a multitrack recording, such as isolated vocals, drums, or guitar.
BPM: Beats per minute, indicating the tempo of a piece of music.
Key: The scale in which a piece of music is set, such as C major or A minor.
Loop: A repeating section of sound material, commonly used in electronic music.
Sample: A portion of sound, typically a recording, used in musical compositions.
One-shot: A single, non-repeating sound or sample, often used in percussion.
These terms are, of course, just scratching the surface, as there are things such as synth pads, baselines, chord progressions, arpeggio, and many, many other musical terms that can be good to learn. However, the above can be good to get started with generating musical material.
.