AudioGAN: A Compact and Fast Text-to-Audio Generation Model with Superior Performance

Authors: Hae Chun Chung, Jae Hoon Jung

Affiliation: KT Corporation, Republic of Korea

Email: hc.chung@kt.com

Abstract

Sound is crucial in audio-visual media. While text-to-audio generation has potential, existing models are often large and slow. We introduce AudioGAN, a compact and fast text-to-audio generation model. AudioGAN achieves state-of-the-art performance with reducing the number of parameters in the training model by 90% and speeding up generation by 20 times over existing methods. These advancements make AudioGAN a practical and powerful solution for text-to-audio generation.

Results

Result 1 Result 2
Text Prompt Ground Truth AudioGAN AudioLDM2 Tango2
Whistling with wind blowing.
Several church bells ringing.
Pigeons coo and flap their wings.
A horse neighs followed by horse trotting and snorting.
A short horn followed by a car approaching with a longer horn.