The Training-Data Reckoning: Anthropic's $1.5 Billion Settlement and What Fair Use Now Means

Michael Benavides • June 19, 2026

This is a subtitle for your new post

QIM Score: 88/100 — published under the house rule: no post goes live unscored.

Routes: Blue Data Law · Digital Rights

The Data Hook

Every AI model that writes, draws, or talks was trained on something. Usually that something was the work of human authors, artists, and journalists who never agreed to it and never got paid. For three years the courts have been deciding whether that is theft or fair use. In 2025 and 2026, the answers finally started arriving — and they are more nuanced than either side wanted.

Two Rulings That Drew the Line

Start with the case that gave the AI companies their best news. In Bartz v. Anthropic, a federal court held that training an AI model on copyrighted books can be fair use — the learning itself was transformative. That sounds like a clean win for AI. It was not. The same court held that storing pirated copies of those books was not protected, and Anthropic settled the piracy exposure for roughly $1.5 billion, with an estimated payout near $3,000 per work. Read those two halves together and the actual rule emerges: how the model learns may be fair use; how the company got the training data can still be infringement. Then look at Thomson Reuters v. Ross Intelligence, where a court found that using copyrighted legal headnotes to train a competing legal-research AI was not fair use. The tell was competition: the AI was built to replace the very product whose content trained it. That case is on appeal at the Third Circuit, but the signal is loud.

The Pattern: Lawful Acquisition and Competition

Put the cases side by side and a logic appears. Courts in Northern California have been more receptive to fair use where the content was lawfully acquired and the model is broadly transformative. Courts in Delaware have been far more skeptical when AI is trained on proprietary, curated content to build a direct competitor. The two variables that keep deciding cases are: did you get the data legally, and are you competing with the people you trained on? That framework matters far beyond AI companies — it tells any business deploying AI what diligence looks like.

The Case Everyone Is Watching

The biggest test is still pending. New York Times v. OpenAI and Microsoft alleges the copying of millions of articles to train ChatGPT, with the Times seeking statutory damages in the billions. A win for the Times would reshape the licensing economics of the entire industry; a win for OpenAI would entrench the transformative-use defense. Either way, the ground will move again. Anyone who tells you the law here is settled is selling something — there are more than 160 AI copyright cases moving through the courts, with conflicting district-level rulings, and the Supreme Court has not weighed in.

What This Means If You Create

If you are a writer, artist, photographer, or publisher, the practical takeaways are concrete. Register your copyrights — statutory damages and attorney fees, the leverage that produced the Anthropic settlement, generally require timely registration. Watch for the licensing market that these cases are forcing into existence; settlements and licensing deals are now a real revenue path, not a fantasy. And keep records of where and how your work has appeared, because proving inclusion in a dataset is half the battle.

What This Means If You Deploy AI

If you run a business using AI tools, the diligence question is no longer optional. Where did your vendor's training data come from? Is the tool competing with the sources it learned from? "We did not know" is not aging well as a defense. The clean-acquisition principle from Anthropic and the anti-competition principle from Thomson Reuters are becoming the baseline expectations.

What to Do

The "move fast and scrape everything" era of AI has a price tag now, measured in billions. The courts are not banning AI training — they are disciplining it, rewarding companies that acquired their data honestly and punishing the ones that pirated. For creators, that shift is turning an uncompensated taking into a licensable asset. For deployers, it is turning "where did this come from" into the most important question you can ask your vendor. A free Blue Data Law consult helps creators position their work for the emerging licensing market and helps businesses vet AI-vendor training-data risk before it becomes liability.

Blue Data Law — free consult | Michael Benavides, Esq., CA Bar No. 270714 | 707-362-4166 | attorneymichaelbenavides.com

ATTORNEY ADVERTISING. Blue Data Law is a trade name of the law practice of Michael Benavides, Esq., California State Bar No. 270714. General information only — not legal advice; no attorney-client relationship is formed by reading this. Authority cited is as of mid-2026 (17 U.S.C. § 107 fair use; Bartz v. Anthropic; Thomson Reuters v. Ross Intelligence, on appeal; New York Times v. OpenAI & Microsoft, pending) — this area is rapidly evolving; verify current rulings before relying on them. Prior results do not guarantee a similar outcome.

By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
By Michael Benavides June 19, 2026
This is a subtitle for your new post
Show More