The future of generative AI lies in open source
The tenets of open source could support AI in areas such as security and ethical development, but major roadblocks might impede progress


The open source ecosystem has long been the backbone of the global technology industry, and in the age of generative AI, the situation is no different. Some of the most impressive models out there are open source, such as Mistral AI and Meta’s Llama.
With the AI industry growing at an astounding pace, the open source community is well placed to contribute to - and guide - whatever this next generation of technology brings.
Speaking at a press briefing at KubeCon 2024, Jim Zemlin, executive director of the Linux Foundation, touted the wide range of areas in which open source may be able to assist in the development of AI.
“It might be easier to think about the goal of open source more broadly in generative AI by looking at it from a full stack,” Zemlin said.
Zemlin worked his way from the CPU level up to the data level of open source and generative AI’s relationship, pointing out the notable headway open source is making along the way.
On the baseline computing level, the Linux Foundation has created the ‘Unified Acceleration Foundation’ to look a little more closely at the role of open source and GPUs, while he also mentioned the forward momentum of open source at the foundation model level.
Perhaps most notably, Zemlin said he believes that open source might be the answer to some of AI’s most pertinent problems, such as hallucinations, security risks, and distinguishing between real and AI-generated content.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
MORE FROM KUBECON 2024
“Sometimes the answer to problems in tech is more tech, and a lot of people are skeptical of that,” Zemlin said. “But in this case, I think it's true.”
“If you look at some of the things around large language models, around AI safety and security, I think this is an area where we're seeing some good starts,” Zemlin said.
Speaking to the specific areas in which the open source community could help develop tools to track problems, Zemlin said projects are already underway to assist developers with ‘unlearning’ in a bid to fine tune AI models.
“We're already seeing some of these tools in our Linux Foundation AI big data project,” he added.
Zemlin also drew attention to open source’s commitment to the Coalition for Content Provenance Authority (C2PA), a project which builds on the efforts of the Content Authenticity Initiative (CAI) in establishing a framework for identifying AI-generated content.
The open source ecosystem can do more to support AI development
Zemlin warned, however, that the open source community can be more proactive with regard to AI development. The ecosystem should be more vocal about the role it can play in underpinning safe and responsible development, he suggested.
“[There’s] a real opportunity for open source to do more,” he said.
RELATED WEBINAR
Speaking to ITPro at the conference, Oleksandr Matvitskyy, senior director analyst at Gartner, echoed Zemlin’s comments regarding the role of open source in the future of generative AI development.
Closer collaboration with the ecosystem and ensuring open source development is prioritized should be a key focus for enterprises, regulators, and governments alike moving forward, Matvitskyy said.
“I think anything can be done with open source,” he told ITPro.
“I think [it] has to be every government's, every regulator's priority to make sure that AI remains open source,” Matvitskyy added.
Prevalence of proprietary data could hamper progress
Roadblocks are standing in the way of open source AI development approaches, however, specifically with regard to AI training. The last 18 months have been fraught with instances of hallucinations and security issues.
Matvitskyy pointed out that these issues are particularly visible in the operation of AI models.
“They still hallucinate in their outputs,” Matvitskyy said, “they have no data to learn on - everything is private, everything is protected.”
Companies often hoard their data, thus limiting the amount of open data available for the training of AI which, fundamentally, is the only way that AI models will develop past whatever their current level of complexity is.
Mattvitskyy said that around 60% of the data that companies are holding on to is probably “not really important” and could be released into the public domain for the training of AI models.
“They should be open and the companies should get money … for innovation, for what they actually do, not for what they created thirty years ago,” Matvitskyy said.

George Fitzmaurice is a former Staff Writer at ITPro and ChannelPro, with a particular interest in AI regulation, data legislation, and market development. After graduating from the University of Oxford with a degree in English Language and Literature, he undertook an internship at the New Statesman before starting at ITPro. Outside of the office, George is both an aspiring musician and an avid reader.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
EU’s Cyber Resilience Act would benefit from US’ open source approach
News The EU is said to be “shooting itself in the foot” if current proposals are passed into law
By Ross Kelly Published
-
AI acceleration represents a ‘tectonic shift’ for DevSecOps
Interview David DeSanto, chief product officer at GitLab, believes there’s still much more to come for AI use cases in DevSecOps
By Ross Kelly Published