The Copyright Office Issues Its Long-Awaited Report on AI Training Material and Fair Use. Will It Stymie the U.S. AI Industry?

On May 9, the U.S. Copyright Office (USCO) finally released its report analyzing whether training an AI on the copyright property of others without their permission is fair use. The analysis makes some key calls in favor of content creators and against generative AI (GenAI) companies.

What does this portend for U.S. AI businesses, content creators, and the U.S. economy?

Training a general-purpose, world-class AI (a “foundation model”) requires a lot of training data. While the makers of foundation models are tight-lipped about this, it’s widely understood that a lot of the training data was obtained by scraping material from the Internet without obtaining licenses from content owners.

Almost all material on the Internet is someone’s copyright property. In some cases, the training data is believed to be obtained from pirate sources, such as shadow libraries of material taken from behind pay walls.

Content owners are suing AI makers, claiming that using their material without permission is copyright infringement. AI makers claim fair use.

Fair use is a defense to copyright infringement. Copying someone else’s copyright property without permission generally is copyright infringement unless the user proves fair use. In practice, the biggest consideration in judging fair use is whether the defendant’s conduct substantially harms the market value of the plaintiff’s copyright property.

Except for one relatively minor case that doesn’t concern generative AI, no court has yet decided whether it’s fair use to train an AI using someone’s copyright-protected content without permission.

On two crucial issues, the USCO report controversially sided with content creators over AI companies.

First, it opined that when GenAI outputs material in the style of a content creator without reproducing any specific copyrighted-protected elements, that may not be fair use, particularly if the training material was obtained from a pirate source, and if the type of content at issue could be licensed for training.

Second, it opined that the fair use analysis must consider AI training and its outputs holistically rather than judging training and outputs separately.

I realize that’s confusing. Let’s unpack it.

Start with outputs. As the report admitted, creating content “in the style of” an author or artist is widely considered not to be copyright infringement provided you don’t use any of that person’s specific creative expression. For example, it’s not copyright infringement to produce a book written in the style of Tom Wolfe, provided you don’t take his character names, plot lines, or material passages of text.

But, surprisingly, the report opined that such style mimicry may not be legal if done by AI because of its awesome ability to create such content quickly and in large volume. The USCO admitted it is in “uncharted territory” in reaching this conclusion.

This relates to the USCO’s decision to combine AI training and output into a single fair use analysis rather than treating the two steps separately. That’s controversial because, if you consider each stage separately, AI providers would probably escape liability except for what are likely rare situations.

Viewed separately, as the USCO admits, training would almost certainly be fair use (it just analyzes data), and only a small fraction of outputs would infringe (e.g., reproducing verbatim text). By merging the steps, the USCO concludes that training may not be fair use when the resulting outputs compete with or erode the market for the originals, even if a human could have produced those outputs lawfully.

It’s possible the Trump Administration will withdraw this report. The Trump Administration is friendlier to the U.S. AI industry than the Biden Administration. Shortly after taking office, it rescinded a Biden Administration executive order on the development and use of AI, which was restrictive and burdensome.

The day before the report was released, the Trump Administration fired the head of the Library of Congress, which oversees the USCO. The day after the report was issued, it fired the head of the USCO. The administration didn’t comment on whether these firings were related to the report.

The USCO may have rushed out the report to prevent the Trump Administration from meddling with it. The version released was labeled a “pre-publication version.” It’s unusual to release a non-final version.

This report is not the law. Courts will decide this fair use issue. They’ll certainly consider this report, but they aren’t bound to follow it.

This fight ultimately comes down to money. Content creators want to force AI companies to pay for licenses to use their material to train AI. AI companies argue that it’s impractical to license all the training material they need to build foundation models, that collections of some types of training data don’t exist, and licensing would be financially prohibitive.

The result of this fight will have a profound impact on the fortunes of the U.S. tech and creative industries, and perhaps on the economic future of the U.S. itself.

NOTE: A longer version of this column (more information) is available on John Farmer’s Substack. You can view and subscribe to that Substack here: https://johnfarmer.substack.com/

Written on May 22, 2025

by John B. Farmer

© 2025 Leading-Edge Law Group, PLC. All rights reserved.