Is it safe to say that artificial intelligence violates copyright?

24 February 20243 months ago babelfish

Generative AI, like ChatGpt, proposes results based on the training it receives with existing text and images. However, proving that what you produce is a copy and therefore violates copyright is a very slippery and complex terrain. In-depth analysis by Laura Turini, lawyer, for the Appunti di Stefano Feltri newsletter

Last February 12, the Court of San Francisco issued a first, partial, decision in favor of OpenAI.

Even if it is presented as a ruling in favor of OpenAI, and in part it is, it is in reality an interlocutory decision, issued following the request for dismissal made by OpenAI in the course of two twin lawsuits brought against it by a series of authors, also in the form of class actions.

On 28 June 2023 the authors and writers Paul Tremblay, Mona Awad, the latter then withdrawing from the trial, and on 7 July 2023 the authors Sarah Silverman, Christopher Golden, and Richard Kadrey, represented by the same defenders and on the basis of the same arguments, have acted on behalf of all the authors against OpenAI and its artificial intelligence system ChatGPT, complaining that the latter has used their books for training and that the texts it generates are derivative works of the training data.

The authors claim that, during training, the system copied every single text to extract useful information for its operation and generate text based on the requests that the user makes with his prompts.

They also dispute that all AI outputs are works derived from the training data and, therefore, infringe the copyright of authors who have not been asked for consent and have not been paid royalties.

OpenAI would have copied millions of books, including those of the authors who are suing, but the defenders have not provided direct evidence to prove this, given that OpenAI has never revealed what the training material is, so they were based on clues and logical deductions.

This is precisely the crucial point of Judge Araceli Martínez-Olguín's recent decision.

While starting from the assumption that when examining a request for dismissal the Court is required to consider the facts described by the parties as true, the Judge recalls that the Court is not, however, required to consider as true the merely assertive statements, the deductions not justified or unreasonable inferences.

On the fact that the work generated by artificial intelligence can be a derivative work, the Court does not accept the authors' thesis that all the works generated by AI are derived from the training ones and states that it must be proven, in concrete terms. that there is an original work that they were inspired by and that they modified.

The Court therefore requests that a substantial similarity be demonstrated between the work generated by artificial intelligence and a training work that is allegedly counterfeit, proof that was not provided during the trial.

The authors defended themselves by stating that proof of similarity would not be necessary because we would be faced with an "as is" copy of the work which does not need to be demonstrated and they cite a completely different precedent relating to a song sung, without consent, in a bar.

The Judge observes that in that case proof of substantial similarity was not required as there was a full and obvious overlap between the lyrics of the song sung and the lyrics of the original one, while here there is no proof that there was this copy and it is up to the authors to prove it.

The Court therefore rejected the request presented by the authors on this point, but granted them the right to amend on this point and provide new evidence which, based on the state of the art, it will be very difficult for them to be able to produce.

The decision is very important because it demonstrates that works generated by artificial intelligence cannot be considered illicit "in themselves", but a case-by-case assessment must be made.

However, it is not a final decision and many other open questions remain, including one of the most important.

OpenAI, in fact, requested the dismissal of five of the six requests made by the actors, but not the most complex one, relating to the accusation of direct copyright infringement for having copied the books of the authors in the training phase.

I believe that it was a choice of procedural strategy, as in this summary phase, OpenAI would hardly have been able to get right, as already happened in Midjourney in the Sarah Andersen, Kelly McKernan, and Karla Ortiz case, v. Stability AI, Midjourney, Devian Art, of January 2023 which had its request for dismissal rejected.

In that case the decision was made based on the premise that an artificial intelligence system, during the training phase, would make “compressed copies” of the original works, even though Stability AI claimed this was not true.

In his act he wrote:

«To be clear, training a model does not mean copying or storing images for later distribution.

In fact, Stable Diffusion does not "memorize" any images. Rather, training involves developing and refining millions of parameters that collectively define how things look. (…).

The purpose of this is not to enable models to reproduce copies of the training images. If someone wanted to copy images entirely from the Internet, there are much simpler ways to do so."

This is the issue that still needs to be resolved and in order to address it it is necessary to proceed with a real judgment of merit and with an in-depth investigation in which, in this case, OpenAI can explain and demonstrate how the training of the its system which, in many respects, still remains a mystery.

It will be that process and that decision that will change the fate of artificial intelligence.

(Excerpt from the Notes newsletter by Stefano Feltri)

This is a machine translation from Italian language of a post published on Start Magazine at the URL https://www.startmag.it/innovazione/e-lecito-dire-che-lintelligenza-artificiale-viola-il-copyright/ on Sat, 24 Feb 2024 06:29:44 +0000.