Nvidia unveils 'Fugatto' AI model for music and audio generation

Share this post:

Nvidia, the world’s largest supplier of AI chips and software, has unveiled Fugatto a new artificial intelligence model capable of generating and modifying music, sound effects, and audio.

According to reports by Reuters, this innovative tool is designed for professionals in the music, film, and video game industries.

However, Nvidia has stated there are no immediate plans to make Fugatto publicly available.

The Fugatto model, an acronym for Foundational Generative Audio Transformer Opus 1, is a leap forward in audio AI technology. Unlike many AI models, it can create both unique sounds from text descriptions and modify existing audio recordings.

Bryan Catanzaro, Nvidia’s Vice President of Applied Deep Learning Research, highlighted how computers and synthesizers have transformed music over the past 50 years and emphasized that generative AI will unlock new creative possibilities for music, video games, and everyday creators.

“If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers. I think that generative AI is going to bring new capabilities to music, video games, and to ordinary folks that want to create things,” Catanzaro stated.

Features of Fugatto

Fugatto offers several transformative features that distinguish it from existing generative AI technologies.

The model can:

Create sound effects and music from text descriptions, including novel sounds like a trumpet mimicking a barking dog.
Modify existing audio by converting piano notes into vocal melodies or changing accents and emotional tones in recorded speech.

Comparison with other AI models

Nvidia’s Fugatto joins a growing list of AI technologies developed by companies such as Meta Platforms and startups like Runway, which also generate audio or video content from text prompts.

However, Nvidia’s approach sets it apart due to its focus on refining existing audio in addition to generating entirely new content.
Despite Fugatto’s capabilities, Nvidia acknowledged the ethical risks associated with generative AI. The model was trained on open-source data, and the company is carefully considering how and whether to release it publicly.

“Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don’t. We need to be careful about that, which is why we don’t have immediate plans to release this,” Catanzaro said.

Developers of generative AI, including Nvidia, face significant challenges in regulating misuse. Risks include generating misinformation, infringing on copyrights, and imitating protected content.
These concerns highlight the broader need for effective safeguards and industry-wide standards.

What you should know

In October, Nvidia briefly surpassed Apple to become the world’s most valuable company, with its market valuation peaking at $3.53 trillion compared to Apple’s $3.52 trillion.
This achievement underscores Nvidia’s dominance in the artificial intelligence sector, driven by growing demand for its advanced chips.
The company’s rise was further bolstered by a $6.6 billion funding round from OpenAI, the creators of ChatGPT, which relies on Nvidia’s GPUs to train its language models.