Microsoft’s GitHub Copilot sued over “software piracy on an unprecedented scale”
The lawsuit stated that this is the first class-action case in the US challenging the training and output of AI systems
Microsoft’s GitHub Copilot is being sued in a class action lawsuit that claims the artificial intelligence product is committing software piracy on an unprecedented scale.
The case was launched on 3 November by Matthew Butterick, a designer and programmer, along with the Joseph Saveri Law Firm to investigate GitHub Copilot. The team has filed a class action lawsuit in the San Francisco federal court on behalf of potentially millions of GitHub users.
The lawsuit seeks to challenge the legality of GitHub Copilot, as well as OpenAI Codex which powers the AI tool, and has been filed against GitHub, its owner Microsoft, and OpenAI.
GitHub and OpenAI launched Copilot in June 2021, an AI-based product that aims to help software coders by providing or filling in blocks of code using smart suggestions. It charges users $10 per month or $100 a year for its service.
“By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more), we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licences on GitHub,” said Butterick.
These licences include a set of 11 popular open source licences that all require attribution of the author’s name and copyright. This includes the MIT licence, the GNU General Public Licence, and the Apache licence.
The case claimed that Copilot violates and removes these licences offered by thousands, possibly millions, of software developers, and is therefore committing software piracy on an unprecedented scale.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
Big payoffs from big bets in AI-powered automation
Automation disruptors realise 1.5 x higher revenue growth
Copilot, which is entirely run on Microsoft Azure, often simply reproduces code that can be traced back to open-source repositories or licensees, according to the lawsuit. The code never contains attributions to the underlying authors, which is in violation of the licences.
“It is not fair, permitted, or justified. On the contrary, Copilot’s goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall. It violates the licences that open-source programmers chose and monetises their code despite GitHub’s pledge never to do so,” detailed the class-action complaint.
Moreover, the case stated that the defendants have also violated GitHub’s own terms of service and privacy policies, the DMCA code 1202 which forbids the removal of copyright-management information, and the California Consumer Privacy Act.
“As far as we know, this is the first class-action case in the US challenging the training and output of AI systems,” said Butterick. “It will not be the last. AI systems are not exempt from the law. Those who create and operate these systems must remain accountable. If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still.
“AI needs to be fair and ethical for everyone. If it’s not, then it can never achieve its vaunted aims of elevating humanity. It will just become another way for the privileged few to profit from the work of the many,” he added.
When asked for comment, GitHub highlighted that it had announced on 1 November that it’s set to bring in new features to the Copilot platform in 2023.
Whenever the tool suggests a code fragment, it’s hoping to provide developers with an inventory of similar code found in GitHub public repositories as well as the ability to organise the inventory by filters like the commit date, repository licence, and more.
IT Pro has contacted Microsoft and OpenAI for further comment.
In October 2022, developer Tim Davis, professor of computer science at Texas A&M University, wrote on Twitter that GitHub Copilot had emitted large chunks of his copyrighted code, with no attribution to him.
Davis added that he could probably reproduce his entire sparse matrix libraries from simple prompts, aiming to underline the similarity between his work and what the AI tool produced.
“The code in question is different from the example given. Similar, but different. If you can find a way to automatically identify one as being derivative of the other, patent it,” responded Alex Graverly on Twitter, creator of GitHub Copilot.
This comes at a time when Microsoft is looking at developing Copilot technology for use in similar programmes for other job categories, like office work, cyber security, or video game design, according to a Bloomberg report.
Microsoft's chief technology officer revealed that the tech giant will build some of the tools itself, while others will be provided by its customers, partners, and rivals.
Examples of what the technology could do include helping video game creators make dialogue for non-playable characters, while the tech giant’s cyber security teams are investigating how the tool can help combat hackers.
GitHub did admit that in some cases Copilot can produce copied code, with the current version of the tool aiming to prevent suggestions that match existing code in public repositories.
Zach Marzouk is a former ITPro, CloudPro, and ChannelPro staff writer, covering topics like security, privacy, worker rights, and startups, primarily in the Asia Pacific and the US regions. Zach joined ITPro in 2017 where he was introduced to the world of B2B technology as a junior staff writer, before he returned to Argentina in 2018, working in communications and as a copywriter. In 2021, he made his way back to ITPro as a staff writer during the pandemic, before joining the world of freelance in 2022.