How Creators Can Protect Their Work from AI Training
As artificial intelligence continues to reshape the creative and digital economy, creators and rights holders are facing a fundamental challenge: how to maintain control over their work in the face of large-scale data mining by AI developers.
The UK Government’s recent consultation on AI and copyright provides a timely backdrop for this discussion. While the government seeks to strike a balance between protecting creators and supporting AI innovation, many creators feel left behind—especially as the current proposals place the burden of protection on them.
In this article, we explain what’s changing in UK copyright law and what creators can do today to protect their work from unauthorised use in AI training datasets.
What’s Happening in UK Copyright Law?
In response to the rapid rise of generative AI, the UK Government proposed several options for how copyright law could evolve:
- Status quo (Option 0) – No change.
- Strengthen copyright (Option 1) – Require explicit licensing for all AI training.
- Broad data mining exception (Option 2) – Allow AI firms to train on any lawfully accessed content.
- Opt-out model (Option 3) – Allow training unless creators actively reserve their rights.
The government favours Option 3: a “text and data mining” exception, similar to EU law, where creators must explicitly opt out to prevent AI firms from using their work. This means AI firms don’t need to ask first—they can use publicly accessible content unless told otherwise.
What This Means for Creators
Unless you take action, your publicly available content—blog posts, images, videos, music, or datasets—can legally be scraped and used to train AI models. Here's what you can do:
5 Practical Ways to Protect Your Work
1. Use robots.txt
to Block AI Crawlers
Place a robots.txt
file on your website to tell AI crawlers not to scrape your content. Example:
txt
CopyEdit
User-agent: GPTBot
Disallow: /books/fiction/contemporary/
⚠️ Note: robots.txt is a request, not a legally binding block. It relies on crawler compliance and may be ignored by bad actors.
In the case above, anything within the URL /books/fiction/contemporary/ will not be allowed to be crawled by web scrapers.
To learn more:
See this tutorial from Google.
What’s needed going forwards:
- Government-backed enforcement or penalties for ignoring
robots.txt
. - Support for more granular controls (e.g. per-asset opt-out, image-level tags).
2. Register with an Opt-Out Registry
Centralised registries allow creators to declare that their content should not be used for AI training. Examples include:
- Spawning.ai
- Government-endorsed collective licensing bodies (full list here)
These registries can be complex or fragmented, and there’s currently no single official UK-wide opt-out database.
What’s needed going forwards:
- A simple, government-supported registry.
- Automatic integration with content platforms like YouTube, Medium, or Squarespace.
3. License Your Work Directly
If you want to allow AI firms to train on your content—for a fee—set up direct licensing agreements. This is especially relevant for:
- Large datasets (e.g. medical, scientific)
- Creative portfolios
- Music and audio samples
Tools:
- Use licensing platforms or legal templates (like Genie AI’s document library) to formalise these terms.
4. Monitor How Your Work Is Being Used
Transparency requirements are in discussion, but creators can already:
- Search large models for similarities to their work.
- Use tools that track plagiarism or derivative content.
- Demand audit trails from AI firms where feasible.
What’s needed going forwards:
- Legally mandated AI model audit trails.
- Reasonable transparency laws that balance creator rights with commercial confidentiality.
5. Use Blockchain to Prove Ownership
Register your content on blockchain platforms like OpenSea or Arweave. This provides:
- Immutable proof of creation and ownership.
- A timestamped trail for future disputes or licensing.
Blockchain benefits:
- Clear attribution.
- Decentralised and tamper-proof records.
- Potential for smart contracts and royalty splits.
What Should the UK Government Do Next?
To support a fair system that works for both creators and AI developers, Genie AI recommends the following:
- Publish stronger enforcement guidelines for
robots.txt
- Turn opt-out requests into enforceable protections. - Create a national opt-out registry - Easy-to-use, well-publicised, and connected to major platforms.
- Support realistic transparency laws - Require AI developers to disclose their direct sources—not an impossible chain of data provenance.
- Encourage industry standards for fair use and licensing - Collaborative licensing schemes could offer creators income while ensuring lawful AI development.
Final Thoughts
For creators, the shift from “opt-in” to “opt-out” is more than a legal technicality—it places the burden of protection squarely on individuals and small businesses. The current UK proposals risk normalising data scraping unless clear, accessible protections are established.
At Genie AI, we believe that innovation and copyright can coexist—if creators are empowered with tools, education, and legal frameworks to make informed decisions about how their work is used.
✅ Want help protecting your content?
Sign up to explore our free legal templates or contact us about AI-specific licensing agreements at app.genieai.co.