Canadian creators and publishers are calling on the government to do something about the unauthorized, and usually unreported, use of their content to train generative artificial intelligence systems.
But AI companies argue that using the material to train their systems does not infringe copyright, and that restricting its use would hinder the development of AI in Canada.
The two sides made their arguments in recently published submissions to the federal government’s Copyright and AI consultation, which is considering how Canadian copyright law should address the emergence of generative AI systems like OpenAI’s ChatGPT.
Generative AI can create text, images, videos and computer code based on simple prompts, but to do so, the system must first learn from vast amounts of existing content.
In its filing with the government, Access Copyright argued that most, and possibly all, large-scale language models “currently profit from the unauthorized use and copying of copyrighted works.”
Copyright ‘black box’ infuriates creators
According to Access Copyright, which represents writers, visual artists and publishers, this happens inside a “black box.”
“Rightsholders know it’s happening, but because of information asymmetries between rightsholders and AI platforms, they are unable to determine who is using whose work to carry out that activity, and they have no mechanism to stop it.”
“It has become abundantly clear that AI models and systems are already ingesting large amounts of proprietary data sets without permission from the sources or rights holders of that data,” Music Canada, which represents Canada’s major record labels, said last year in a statement about AI-generated fake songs that imitated the voices of artists like Drake and The Weeknd.
The Writers Guild of Canada called on the government to start by implementing basic disclosure and reporting requirements, saying developers have full knowledge of what works are being mined and how they are being used, but creators have no knowledge of it.
Some organizations have signed licensing agreements with AI companies, but the Canadian Authors Guild said rights holders face “significant obstacles” in licensing their content because they “don’t know which works are being used by which companies.”
The group called on Canada to clarify that text and data mining is subject to copyright law.
There are a number of ongoing lawsuits in the US over the use of copyrighted material by AI-generated systems, including one filed this week by the world’s largest record companies against two AI music generation systems.
Artists say information gap is a problem
The Canadian Association of Media Producers said court cases illustrate the problems that lack of transparency creates, citing one case in which an AI company argued that rights holders could not move forward with infringement claims unless they could identify the exact works used in training.
“Rights holders will undoubtedly face similar evidentiary challenges, as many of the datasets used to train generative AI systems are allegedly destroyed after the initial training is completed,” the report said.
The group said it was an issue that “demands urgent attention” and called on the government to implement transparency requirements.
But AI companies argue that the kind of transparency rights holders want is not realistic.
Microsoft told the government that training large-scale AI systems requires “vast amounts” of data and that companies do not have to keep records of it or disclose the content used in training.
“Recording such information is impractical and such a requirement would stifle AI development,” the company said.
The company argued that “analysing a work to learn concepts and facts is not copyright infringement.”
Google said that AI training is already exempt from existing copyright law, but that governments should adopt an exemption to make it clearer.
Front Burner27:52Is it possible to have a “real” relationship with AI?
Google said that requiring permission to use content for training purposes would expose competitively sensitive information and “effectively hinder the development and use of large-scale language models and other cutting-edge AI.”
He also said that AI developers do not have access to accurate information about copyright status.
Canadian AI company Cohere said using content to train AI systems has a similar effect to an individual reading a book to increase their knowledge.
The company argued that the process does not infringe copyright and that this needs to be made clear in law, or it could undermine “Canada’s ambitions to be home to the world’s leading AI companies and ecosystem.”
The Council of Canadian Innovators, which represents Canada’s tech industry, said disclosure requirements would hurt small businesses rather than big tech rivals, which it warned would “severely hinder the ability of Canadian companies to scale significantly.”