AI
12 Million Songs Trained AI on Australia’s Biggest Names
Australian musicians are inside AI datasets of 12 million tracks, but copyright law gives them weaker tools than the Anthropic settlement gave authors.
A pair of datasets quietly circulating inside AI labs now lists millions of songs scraped from YouTube, and a long roster of Australian artists appears inside them. The Atlantic’s AI Watchdog, a search tool released in mid-2026, lets musicians type in a name and see whether their work shows up. Australian acts from Kylie Minogue to Nick Cave and Tame Impala have already turned up, with Kylie Minogue alone credited with 182 songs in one of the two main datasets.
The discovery has triggered an open fight between the Australian music industry and the AI companies training on its work. Songwriters describe their catalogues being hoovered up without permission. Australian copyright law was supposed to require consent and payment before reuse. AI training is using that work in a way the law was never built to police.
The 12 Million Song Database, and What’s Inside It
Two datasets assembled by AI researchers sit at the centre of the dispute, both of them now searchable through The Atlantic’s searchable AI dataset database. LAION-DISCO-12M, built by the Germany-based group LAION, lists 12.3 million YouTube tracks. Sleeping-DISCO-9M, compiled by a research collective calling itself Sleeping AI, adds another 9.7 million plus lyrics scraped from Genius.com.
Among the Australian artists showing up across one or both datasets:
- Kylie Minogue (182 songs in one dataset)
- John Farnham
- INXS
- Midnight Oil
- AC/DC
- Nick Cave
- Tame Impala
- Gotye
- Tones and I
- Parkway Drive
- The Living End
- Vance Joy
- Ben Frost
- Bernard Fanning
- Powderfinger
- Jimmy Barnes
- Something For Kate (Paul Dempsey)
- Savage Garden (Darren Hayes)
- The Bushwackers (Dobe Newton)
- The Delta Riggs (Jesse Pattinson)
Most major Australian acts have already turned up in the Watchdog’s data after checking their names in late June 2026. The Atlantic warns that AI companies can strip songs from training runs, so presence in a dataset is not proof a model learned from any specific track. Even so, the datasets themselves were downloaded thousands of times inside the AI development community before the Watchdog made them searchable, and most Australian acts found themselves inside without ever being asked.
Paul Dempsey of Something For Kate found his band’s complete catalogue, plus his solo material, in the data and called the experience frustrating in comments to AAP. “Every negotiated agreement and contract I’ve ever gone into in my career with whatever entity or record label is all just rendered useless,” Dempsey said the day the tool went public. Screen composer and APRA board member Caitlin Yeo, told she had decades of film and television scoring inside the data, called the discovery a violation. She described the situation as decades of work “hoovered up in a second” to “feed companies offshore that pay no taxes.”
I absolutely feel violated that all of the hundreds and hundreds and hundreds of hours, blood, sweat and tears that I’ve put into my music, along with every other musician, has been stolen and served up like french fries to a piece of software that spits out shit.
Darren Hayes, the Sydney-born songwriter who fronted Savage Garden in the 1990s, posted that line on Instagram on the day the Watchdog released. APRA AMCOS, the rights body representing 128,000 members in Australia and New Zealand, titled its response “PROOF OF THEFT” and announced an audit of Australian and New Zealand works inside the data.

The Drain Coming Back the Other Way
The scale of what flows back out of those models is what hits hardest. Suno, the most popular AI music generator, told investors that users on its platform create the equivalent of Spotify’s entire 100-million-song catalogue every two weeks.
That output problem in numbers:
- Spotify’s full catalogue: roughly 100 million tracks
- Suno’s user-generated output: a Spotify-sized catalogue every 14 days
- Tracks removed by Spotify as “spammy” AI music since September 2025: 75 million
- Deezer’s reported share of incoming tracks now AI-generated: close to half
Suno’s appetite is set out in a 2024 court filing in which the company said it had trained its models on “essentially all music files of reasonable quality” that it could download from the internet. Suno and rival Udio now operate as listening platforms in their own right, prompting users to type a description and pushing out a finished song in seconds. The four largest AI music datasets The Atlantic found hold at least 21 million tracks between them, with the 12-million-track dataset alone running to 91 years of continuous listening.
Streaming services are already showing the strain. Deezer reports that close to half of all tracks delivered to its platform daily are AI-generated and has begun excluding those tracks from its algorithmic recommendations. Sony found 135,000 AI-generated tracks attributed to its artists on streaming services earlier this year. The drain now flows in both directions, with copyrighted work in and AI-made imitations out, and the mismatch between the two sides is where the rest of the fight sits.
Why Musicians Have a Weaker Legal Hand
Australian artists face a copyright fight whose structure works against them, and the gap is technical before it is ethical. The books-based lawsuits that produced the largest AI copyright settlement in US history turned on whether AI companies had reproduced copyrighted text in training their models. Music copyright draws a different line, between an expression (a specific melody or recording) and a style (a general feel), and AI developers have built their products to stay on the safe side of that line. To the creator, the output feels like theft; under current law, it is often classified as imitation.
Rather than copying note-for-note, the models extract underlying patterns, chord progressions, and vocal textures to generate tracks that resemble the originals without reproducing any one of them. The way The Atlantic’s datasets are packaged makes proving infringement harder still: the database is a list of links to YouTube rather than copies of the songs themselves, alongside metadata such as titles and lyric text. Previous lawsuits have established that pointing at copyrighted material, without taking it, is not infringement.
The law only catches the act when an AI company downloads the audio and trains on it, and the AI companies do not disclose which songs went into any given model. Australian law adds its own obstacle, with the Copyright Act requiring permission before a work is reproduced or adapted, but listing no text-and-data mining (TDM) exception that AI training could rely on.
In August 2025 the Productivity Commission floated such an exception. In October 2025 the federal government ruled the proposal out and held the line on permission and payment. What Australia has not yet built is the machinery that would let its artists enforce that line against offshore AI labs, and the choice now is whether to build that machinery locally or to adopt the EU’s.
The Lawsuits Already Drawing Lines
While Australian artists lobby in Canberra, courts in Munich, Boston, and San Francisco are testing what AI training actually costs. The biggest signal so far came from authors, not musicians. In August 2025 Anthropic agreed to pay US$1.5 billion to settle Bartz v. Anthropic, a class action over its use of books scraped from pirate libraries LibGen and PiLiMi to train its Claude model.
| Case | Status | Scale | What it tested |
|---|---|---|---|
| Bartz v. Anthropic | Settled August 2025 | US$1.5 billion; ~500,000 covered works from a 7M-copy dataset | Whether training a language model on pirated books is fair use |
| GEMA v. OpenAI | Ruling against OpenAI, Munich Regional Court, 11 November 2025 | 9 German hits, including Herbert Grönemeyer’s Männer | Whether reproducing song lyrics inside ChatGPT outputs is infringement |
| Sony Music and UMG v. Suno | Amended complaint filed May 2026, awaiting approval | 61,000+ songs alleged to have trained Suno | Whether a generative music model’s use of commercial recordings is fair use |
The Munich ruling against OpenAI is the closest parallel to a fight Australian musicians could mount. Germany’s collecting society GEMA sued on behalf of songwriters whose lyrics had been reproduced inside ChatGPT outputs. The presiding judge in the Munich Regional Court rejected OpenAI’s defence that its outputs were written by its users, ruling that the model itself had been trained on protected lyrics and was reproducing them. The Munich ruling against OpenAI over song lyrics is now the strongest favourable precedent any Australian action could cite.
The Suno litigation tests the central question for AI music at scale, with Sony Music and Universal Music Group filing their original suits in June 2024 and then an amended complaint in May 2026 alleging Suno trained on more than 61,000 songs, identified through Audible Magic audio fingerprinting. Suno’s founder and CEO Mikey Shulman has held to the line his company used from the start, telling the court that “learning is not infringing.” The Sony and UMG amended complaint against Suno has not yet produced a ruling.
What Australia Has Done So Far
The political fight at home is moving fast, with more than 4,000 Australian songwriters, authors, and composers signing an open letter organised by APRA AMCOS in June 2026 to press the federal government to enforce existing copyright law against AI training. A delegation including novelist Thomas Keneally and rapper Briggs met MPs in Canberra the same week. The confrontation sits inside a wider week of AI accountability moves across the Five Eyes, a Munich court, and Illinois, as how AI accountability hit three fronts in one week of June 2026 sets out.
APRA AMCOS chief executive Dean Ormston framed the industry’s position bluntly in announcing the audit: “Major tech platforms have not come to the table. Not once. Instead, they have lobbied governments, circulated policy papers, and proposed solutions designed to extinguish any obligation to pay.” Yeo separately called for a domestic disclosure regime that lets artists see which models used their work, with payments following. The Productivity Commission’s August 2025 text-and-data mining exception died with the federal government’s October 2025 ruling-out, leaving the fight in Canberra now squarely over whether Australia sets up its own disclosure rules, copies the EU’s, or holds out and waits for AI companies to comply voluntarily.
The EU Path That Could Reach Australia
The clearest policy alternative sits in Brussels. The EU AI Act, passed in 2024, sets a 2 August 2026 deadline for new transparency rules under Article 50, requiring AI providers to publish summaries of the data used to train general-purpose models. From that date any AI system accessed from inside the EU must declare the source of its training data and show it complies with local copyright law, no matter where the model was built.
The rule Australian policymakers are closest to copying is the GPAI training-data disclosure template, already in force for new models since 2 August 2025. Under that template, providers must publish a structured summary of the data sources behind their model. Suno, Udio, and other music-focused AI tools would have to publish their books, and musicians and rights holders could then check whether their work was among the sources and pursue licensing or damages through normal copyright law. Until now, the veil of secrecy around training data has been the AI companies’ strongest defence, which makes forced disclosure the single biggest shift in leverage since the lawsuits began.
Australian copyright law already requires permission and payment. The August 2026 EU AI Act transparency deadline is the lever that turns the requirement into action, and a domestic equivalent, or formal recognition of EU-compliant disclosures for AI services available in Australia, would give APRA AMCOS and the artists it represents a way to know which models to challenge. Caitlin Yeo’s framing is the one the policy machinery has to satisfy: artists “should see a slice of the pie too.”
Frequently Asked Questions
What are LAION-DISCO-12M and Sleeping-DISCO-9M?
Two music datasets assembled for AI training, now searchable through The Atlantic’s AI Watchdog tool. LAION-DISCO-12M lists 12.3 million YouTube tracks and was compiled by the Germany-based group LAION. Sleeping-DISCO-9M lists 9.7 million YouTube tracks with lyrics scraped from Genius.com and was compiled by a research collective calling itself Sleeping AI. Together the two datasets hold at least 21 million tracks, downloaded thousands of times by AI developers.
How can Australian musicians check if their work was used?
Any artist can search The Atlantic’s AI Watchdog database by name and see whether their work appears in either of the two main datasets. APRA AMCOS has launched its own audit of Australian and New Zealand works inside the data. Songs appearing in the database confirms only that they were listed, not that any specific model was trained on them, since AI companies do not disclose which songs go into any given training run.
What was the $1.5 billion Anthropic settlement?
In August 2025 Anthropic agreed to pay US$1.5 billion to settle Bartz v. Anthropic, a class action brought by authors whose books Anthropic had downloaded from pirate libraries LibGen and PiLiMi to train its Claude model. The settlement covers roughly 500,000 works drawn from a 7-million-copy dataset and was finalised in court in September 2025, making it the largest AI copyright settlement in US history to date.
Does the EU AI Act actually help musicians?
Article 50 of the EU AI Act requires providers of general-purpose AI models to publish training-data summaries from 2 August 2026, with a similar disclosure template already in force for new models since August 2025. Disclosure does not pay artists by itself, but it gives rights holders a list of what to license or sue over, and that is the lever that closes the imitation loophole the AI music industry has used so far.
-
NEWS4 weeks agoGoogle Search Profiles Build a Follow Graph Inside Discover
-
GAMING3 weeks agoMicrosoft Xbox Layoffs Start in July as Sharma Slams 3% Margin
-
AI1 week agoGoogle DeepMind and A24 Sign $75 Million AI Partnership Deal
-
NEWS2 months agoApple Strikes Preliminary Deal For Intel To Make iPhone And Mac Chips
-
APPS3 weeks agoDGO App Brings Rs 549 Mobile Pass for FIFA World Cup 2026 in Nepal
-
AI1 week agoAnthropic Tells Senators Alibaba Ran the Largest Claude Distillation Attack
-
CRYPTO2 months agoAndreessen Horowitz Bets $2.2B on Crypto’s Quiet Cycle
-
AI4 weeks agoVinRobotics’ VR-H3 Debuts at Vienna, VinFast Is Next
