Home/News/The Atlantic created a searchable database of the music used to train AI
The Verge2 min read

The Atlantic created a searchable database of the music used to train AI

The Atlantic published a searchable database of music datasets used for AI training on May 22, 2024, according to reporter Alex Reisner. This initiative aims to provide transparency into the vast amounts of copyrighted material being utilized by artificial intelligence models. The database includes four distinct datasets, two of which are exceptionally large, containing 12 million and 9 million tracks respectively. The remaining two datasets, while smaller, still represent substantial training data, each exceeding 1 million tracks. Reisner's investigation highlights the growing concern over the use of copyrighted music in AI development and offers a tool for researchers and the public to examine these collections. The database allows users to search for specific songs, artists, and albums within these AI training sets, revealing the extent to which popular music is being incorporated into machine learning models. This effort by The Atlantic is a significant step towards understanding the data pipelines that power generative AI, particularly in the realm of audio and music generation.

Original source — read the full reporting at the publisher:

Read on The Verge