15 April 2026

Can AI training result in copyright infringement? The state of play in the UK, EU and China

Written by:Edward Chatterton Elizabeth Wong Elena Varese Stefan Engels Lennart Elsaß Ekaterina Filikhina Satnam Sahota Alejandro González Vega Noa Naaman Frank Valentin Liam Blackford John Wilks

It’s no secret that generative AI models can generate high-quality text, image, audio and video content at rapid speed; it’s for this reason that more businesses and consumers have procured AI models for enterprise and personal use than ever before. The generative ability of these models depends largely on the quality of materials on which they have been ‘trained’. However, the data used to train many of the most powerful AI models on the market include works in which copyright subsists. Naturally, the owners of such works might expect control over and compensation for their use.

When copyright-protected works are used to train an AI model without the permission of the copyright owner, does this training result in copyright infringement in relation to those works?

This issue remains unresolved and hotly contested in most worldwide jurisdictions, not just the UK, EU and China.

Of the three territories, the EU is the closest to an answer. In one recent German decision which has already been appealed, the court held that, where a copyrighted work, or a part of such a work, is technically reproducible by the model in reply to a simple prompt, these constitute acts of reproduction under German and EU copyright law. The court considered that, in such a case, the work is embedded in the model parameters and can be made indirectly perceptible by technical means, as it had been “memorised”. The court therefore deemed the mere possibility of displaying song lyrics upon the entry of a prompt to be sufficient evidence that not only information derived from them, but the works themselves were stored in the model. In this particular case, it ruled that therefore the relevant text and data mining exceptions, which were considered by the court generally applicable to certain preparatory measures prior to the actual AI training, would not apply (more on these exceptions below). An appeal has already been filed, so we will be monitoring developments. Other cases will also be influential,¹ in particular the reference to the CJEU from a Hungarian court, in which (among other things) the court will address the matter of reproduction at the AI training phase.² In the meantime, providers of general-purpose AI models in the EU should comply with copyright-relevant obligations under the EU AI Act, including the obligation to make publicly available detailed summaries of the works used for training (though doing so, particularly in relation to the largest models, may entail administrative and technical challenges).

In the UK, an opportunity for the courts to address this issue was missed: in Getty Images v Stability AI,³ the claimant claimed that the defendant downloaded and stored images from the claimant’s image database on servers or computers in the UK during the development and training of the defendant’s AI model, constituting infringement by copying under UK copyright law. However, the claimant encountered difficulty in providing evidence, and decided to drop the claim during the trial. Several other suits on this issue have been threatened in the UK, however it will take some time for these to play out, and whether they will provide clear legal guidance remains to be seen.

In China we are also playing a waiting game, with numerous cases currently on foot. In one, the Beijing Internet Court will decide on a suit brought by a group of artists against social media platform Xiaohongshu (RedNote), in which the artists claim that the platform used their copyrighted works without authorisation to train its AI-painting model. In another, the Shanghai City Xuhui District People’s Court will decide on a suit brought by streaming platform iQIYI against AI company MiniMax, in which iQIYI claims that its movies, TV shows, and other content was copied without authorization to train MiniMax’s AI models. The outcomes of these suits are awaited, however we expect courts will take into account whether the defendants’ training was aimed at using the original expression of the plaintiffs’ works, as well as whether the plaintiffs’ normal use of their copyright works has been prejudiced.

If there is copyright infringement, who bears liability?

There are various potential answers to this question: it could be the entity that develops the model, the entity that trains the model, the entity that purchases the developed and trained model for use in their organization, and/or someone else entirely.

Judicial guidance in the UK, EU and China is yet to emerge. Liability (both direct and secondary) is most likely to fall with the entity responsible for training the model, most often (but not always) the AI model developer. In the case of an AI provider that merely provides but did not train or develop the model, direct and/or secondary liability may apply, but this will depend on the circumstances and jurisdiction. Users who merely prompt an AI model are relatively unlikely to bear liability for infringement related to training (as opposed to outputs), unless they are involved in the training process (which can be the case for models specifically made available for users to train or finetune with data of their choice). As always, financial exposure for any of these parties is likely to depend on the contractual arrangements between them, including terms and conditions with end users.

In establishing copyright infringement, does it matter where the training occurred?

In short, yes. However, there are lots of complexities here.

In the UK, a fundamental point in the Getty Images case was whether the defendant’s model had been trained in the UK, a point denied by the defendant. There was evidence that some Stability employees and contractors were based in the UK and/or employed by a Stability entity in the UK. Ultimately, the claimant was not able to establish this equated to the training having taken place in the UK, which made unviable their primary copyright infringement claim. Witnesses gave evidence that all work was undertaken on non-UK cloud-based servers and that no training datasets had been downloaded onto UK computers or servers. As for their secondary copyright infringement claim, even if training had taken place outside the UK, the claimant argued that importing the model into the UK amounted to infringement. However, the court ultimately dismissed this claim, holding that the pre-trained model did not constitute an infringing copy in the required sense because it did not store or reproduce any copyright works. This point is currently under appeal.

In the EU, many member states have rules in place to attribute jurisdiction to their courts based on different criteria, for example if one of the parties is a national of that member state. Further, provisions of the EU AI Act may apply, including the obligation on providers of general-purpose AI models to put in place a policy to comply with EU copyright law, an obligation which applies regardless of where training occurs.

In China, the location of training may be less relevant, putting plaintiffs in a relatively strong position. Even if training took place outside China, a plaintiff may still be able to claim for infringement in China so long as the training materials are protected by China copyright law. Further, Chinese law allows cyberspace administration authorities to take action against generative AI providers who make available in China an AI model that does not comply with Chinese law, including copyright law.

What exemptions or exceptions may apply (such as those relating to text and data mining (TDM))?

In the EU, attention has focused on the two TDM exceptions in the Digital Single Market Directive⁴: the first for TDM carried out for scientific research purposes, regarding works to which the defendant has lawful access (such as through licence or subscription), conducted for non-commercial purposes⁵; the second for TDM for any purpose, including commercial purposes, regarding works to which the defendant has lawful access, but for which the rightsholders have not opted-out.⁶ Cases are beginning to emerge on how these exceptions will apply in in the AI context.

In December 2025, a German appellate court upheld a first instance judgement concerning a claim from a photographer whose images were included in a dataset compiled and made available free of charge by the defendant (a non-profit organization). In creating the dataset, the defendant extracted the image and automatically analysed it to prepare the dataset for subsequent AI training.⁷ The court found that, in downloading and analysing the plaintiff’s images for the purpose of creating the dataset, the defendant had reproduced the plaintiff’s images. However, this reproduction was covered by the first TDM exception, given that the defendant conducted the analysis for scientific research and not for commercial purposes. Further, the court held that the reproduction was also covered by the second TDM exception, as the analysis of the image for subsequent AI training qualified as use for the purpose of obtaining information.The website on which the plaintiff’s images were hosted included a usage restriction in natural language (an ‘opt-out’), however the court held that this opt-out was not in the form required by the second TDM exception, as for works published online the opt-out must be expressed in a machine-readable format, which the plaintiff could not prove was the case at the time the defendant obtained the image. Importantly, the court expressly limited the applicability of the second TDM exception to the defendant’s preparatory measures prior to the actual AI training, namely the analysis of the image in question. As such, the case leaves open the important question of whether the second TDM exception will apply for the actual training of AI, not merely measures preceding such training.

In the UK, there is an existing TDM exception which only applies for non-commercial purposes, where the user has lawful access, where there is sufficient acknowledgment, and subject to other conditions. This does not permit TDM for commercial purposes. In the government’s consultation on various proposals for change to the UK’s existing copyright regime, one proposal was an EU-style TDM exception that would allow TDM for commercial purposes, subject to rightsholders’ right to opt-out and other transparency measures. However, as of March 2026, it is not clear that UK law will evolve in that direction anytime soon.

Chinese copyright law contains no express exceptions for TDM activity. While various “fair dealing” exceptions exist, none expressly cover the use of copyright works to train an AI model. In one decided court case, the court commented that such use could be considered “fair dealing” where there is no evidence showing that the use is for the purpose of using the copyright works’ original expression, nor that the use prejudices the normal use of the work or unreasonably damages the legitimate interests of the copyright owner.⁸ However, as China is not a case law jurisdiction, it remains to be seen how influential the court’s comments here will be.

Licensing and authorisation

The debate on AI and copyright issues has turned to how licensing and other formal authorisation might help enable AI training while managing risk and liability issues. Making content available for AI training via a formal licence has numerous benefits for rightsholders: besides allowing the content to be monetized, it enables negotiation on how such content can and cannot be used, and better control over brand image. For the entities doing AI training, it provides legal certainty, as well as certainty as to data quality, compliance and provenance.

We are beginning to see licensing agreements struck between major players to allow AI training on a licensed basis. The rightsholders/licensors in question include news and media publishers, music publishers and record labels, academic and scholarly publishers, social media platforms, as well as owners of image, video and media databases.

The licence terms naturally vary, but some we have observed include the following. Naturally, the legality and appropriateness of these will vary depending on the circumstances and jurisdiction:

explicit licence language allowing use of content for model training, fine-tuning and evaluation, as well as for text and data mining where appropriate;

remuneration in the form of a one-time payment, recurring payment, revenue-sharing, or royalties on an agreed basis;

limits on how outputs can be used, published and distributed;

agreement as to ownership of the trained models, outputs, and derivative datasets;

limiting training only to self-hosted or closed environments, especially where the licensor seeks to prevent public release of trained models;

controls on involvement of third-party AI models, or controls and conditions on such involvement;

prohibitions on creating products which would compete with the licensor; and

licensor warranties regarding ownership of provided content, and indemnities for infringement of third-party IP.

In addition to licensing negotiated between private entities, discussion has commenced on whether use of content for AI training should be subject to some kind of mandatory royalty or levy under law, as well as whether there should be collective management organisations (CMOs) on a sectoral basis, similar to what exists in the music and publishing industries. On this, France has emerged as a thought leader: the CSPLA has published various reports on the legal and economic issues which may prove to be influential in the EU and beyond. In India, authorities have proposed a “mandatory blanket license” framework in which AI companies have access to copyright works for training in exchange for royalties managed by a dedicated new collecting body. No global market standard has yet emerged.

“It may take years for the legal position on AI and copyright issues to solidify, and there is no guarantee that jurisdictions will take a consistent approach.”

Practical measures to mitigate copyright infringement risks in AI training

It may take years for the legal position on AI and copyright issues to solidify, and there is no guarantee that jurisdictions will take a consistent approach. In the meantime, some practical measures can be taken:

When using existing content to train an AI model, consider who owns the content and whether copyright or other rights may apply. Investigate what legal basis you have for accessing the content (eg via licence, subscription, or legal exemption), whether any existing terms and conditions apply, and what they say about text and data mining and AI model training. Consider whether a licence is necessary or advisable, as well as what terms a licence should cover (see some of the comments above). In an enterprise context where employees and consultants are engaged in AI model training, consider implementing policies and training to help alert personnel to the copyright risks and mitigate potential liability.

In agreements in which one party will provide services and/or produce content using AI, ensure the agreement contains appropriate representations, warranties and indemnities regarding infringement of third-party IP.

If you are a content owner, or the owner of a database of content, consider whether this content or database is susceptible to text and data mining or other kinds of scraping. Consider whether you need to apply copyright or other notices, and even technical measures, to limit such activities. Where opt-out language is used, ensure this can be read and understood by crawling bots. Where you licence your content, consider imposing relevant guardrails to protect your content’s value.

Keep abreast of regulatory obligations, which may evolve prior to decisions in major cases. In particular, the EU’s obligations on general-purpose AI model providers are beginning to crystallise, including significant obligations regarding copyright. One of the EU AI Act’s most relevant obligations is for providers of general-purpose AI models to make available a sufficiently detailed summary (generally comprehensive in scope instead of technically detailed, for example listing the main data collection or sets) of the content used for the training of the model, the purpose of which is to facilitate parties such as copyright holders to exercise and enforce their rights under Union law.⁹ In January 2026, Legal Affairs Committee MEPs adopted a series of proposals, including demands for fair remuneration for the use of copyright content by AI. In March 2026, the European Parliament adopted a resolution arguing that existing copyright law is insufficient, and called for further laws to clarify licensing, infringement and subsistence issues. The ongoing debate regarding these issues may eventually result in legislative change in the EU and beyond.

This article focused on copyright infringement risks arising out of the use of copyrighted content to train AI models. The next article in our series will explore the use of AI to generate new content, and the risk that these outputs will infringe third-party copyright and other rights.

If you have questions about anything raised in this article, please get in touch with the authors or your regular DLA Piper contact.