An AI Maker Was Just Found Liable for Copyright Infringement. What Does This Portend for Content Creators and AI Makers?

The makers of generative AI (GenAI), such as ChatGPT, just lost the first legal battle in the war over whether they commit copyright infringement by using the material of others as training data without permission. If other courts follow this ruling, the cost of building GenAI will dramatically increase.

First, some tech background. To learn how to generate output, a GenAI is trained on a large volume of data. Much of the data used to train GenAIs is material for which the GenAI maker did not purchase licenses, such as scraping material from the Internet. Most expressive material is someone’s copyright property, even stuff free to view on the Internet.

In GenAI training, a copy is made of each piece of training data. But a GenAI’s neural network does not store a literal copy. All it retains is information it learns about the relationships between small units of training data. Thus, in theory, the public can’t use a GenAI to retrieve copies of the training data.

The big legal issue is whether using someone else’s copyright property without permission to train a GenAI is copyright infringement. Powerful content-creator plaintiffs are suing, including the New York Times, Getty Images, and author John Grisham.

A federal trial court in Delaware just issued the first court decision on the issue. It held that this use without permission constitutes copyright infringement and is not protected as fair use.

Yet, this case has unusual facts. The plaintiff, Thomson Reuters, makes the popular Westlaw legal research tool. As part of it, Thomson Reuters creates “headnotes,” which summarize key legal points in case opinions.

The defendant, Ross Intelligence, wanted to produce a case-finding tool to compete with Westlaw’s headnotes. Ross approached Thomson Reuters about licensing use of its headnotes to create Ross’ new product, but Thomson Reuters declined because doing so would create a competitor.

Spurned, Ross made a deal with a licensee of Westlaw to effectively get access to the headnotes. Ross then used the headnotes to train its AI system legal research tool, which competed with Westlaw. Importantly, the Westlaw headnotes are not in Ross’ product’s output. Also, Ross’ technology is not GenAI because it outputs something like headnotes rather than text that is custom created in response to a user prompt, such as happens with GenAI.

Thomson Reuters sued Ross for copyright infringement. Ross shut down due to the litigation, but the case continued.

Ross argued that using headnotes in training its AI is fair use. Generally speaking, copying someone else’s copyright property without permission is copyright infringement unless it constitutes a fair use. Fair use mainly asks whether the unauthorized copying and use of copyright property harmed the copyright owner.

Here, the court found that Ross makes a commercial product that competes with Westlaw’s headnotes and consequently hurts the headnotes’ market value. For that reason, it rejected Ross’ fair use defense and found Ross liable for copyright infringement. The court made this ruling while noting that Ross was using copies of the headnotes only in training its AI and that the headnotes don’t appear in its final product.

The court was careful to say that it was not considering a GenAI situation because Ross’ product doesn’t generate a custom output. Still, the logic of this court’s opinion addresses the crucial legal question facing GenAI makers – whether duplicating someone else’s copyright property without permission to train an AI is fair use.

Other courts could reach different conclusions. Still, it is blood in the water for content owners and will encourage more lawsuits.

Perhaps this case embodies the legal axiom that “bad facts make bad law.” Ross’ conduct smells bad.

GenAI makers still have a strong fair use argument. Training a GenAI is like how a human being learns – everything you read or otherwise sense sends information that educates your brain, enabling you to generate outputs, such as writing. Unless you memorize and repeat a specific item of what you have read, you are synthesizing various ideas and information that you took in.

This human brain training might enable you to compete with whoever wrote the material. Perhaps if you read all of John Grisham’s novels, you could write like him and compete with him for book sales. But doing that would not make you a copyright infringer unless you steal blocks of his prose or details of his stories. Copyright protects only expressions, not facts and ideas.

Much is at stake here. Many content creators feel their livelihoods are threatened by GenAI. Many technologists argue that GenAI creates huge societal opportunities but fear it might not be economically viable if all training data must be licensed. This issue will likely make it to the Supreme Court in a few years unless changes in AI technology make the issue moot.

NOTE: A longer, more detailed version of this column is or will be available on John Farmer’s Substack, which is here.

Written on February 19, 2025

by John B. Farmer

© 2025 Leading-Edge Law Group, PLC. All rights reserved.