Authors: Hae Chun Chung, Jae Hoon Jung
Affiliation: KT Corporation, Republic of Korea
Email: hc.chung@kt.com
Sound is crucial in audio-visual media. While text-to-audio generation has potential, existing models are often large and slow. We introduce AudioGAN, a compact and fast text-to-audio generation model. AudioGAN achieves state-of-the-art performance with reducing the number of parameters in the training model by 90% and speeding up generation by 20 times over existing methods. These advancements make AudioGAN a practical and powerful solution for text-to-audio generation.
| Text Prompt | Ground Truth | AudioGAN | AudioLDM2 | Tango2 |
|---|---|---|---|---|
| Whistling with wind blowing. | ||||
| Several church bells ringing. | ||||
| Pigeons coo and flap their wings. | ||||
| A horse neighs followed by horse trotting and snorting. | ||||
| A short horn followed by a car approaching with a longer horn. |