An open source challenger to GitHub Copilot? StarCoder2, a code generation tool backed by Nvidia, Hugging Face, and ServiceNow, is free to use and offers support for over 600 programming languages
StarCoder2 offers code generation support for over 600 programming languages, and it’s free to use


Rory Bathgate
The StarCoder code generation tool has received a massive update that could position it as a leading open source alternative to services such as GitHub Copilot.
Initially launched in May 2023 as part of a collaboration between Hugging Face and ServiceNow, the latest iteration, StarCoder2, now also has major industry backing in the form of Nvidia.
The code generation tool supports developers by automating code completion, similar to GitHub Copilot or Amazon CodeWhisperer. It’s also capable of summarizing existing code and generating original snippets
StarCoder2 is available in three different model sizes, each trained by a different member of the partnership.
The smallest version is a three billion-parameter model trained by ServiceNow, with a seven billion-parameter model trained by Hugging Face.
Nvidia was responsible for the largest iteration of StarCoder2 with a 15 billion-parameter model built using its NeMo generative AI platform and trained on Nvidia’s accelerated AI infrastructure.
Each fork of the StarCoder2 models offers a significantly expanded array of programming languages they can work in.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
The original StarCoder tool was trained on over 80 different programming languages, whereas StarCoder2 boasts the ability to generate code in 619 languages.
StarCoder2 is underpinned by the Stack v2 dataset, the largest open code dataset suitable for LLM pretraining, according to Hugging Face. The AI company said this latest dataset is seven times larger than the original Stack v1.
Paired with new training techniques, the trio believe this will help the models understand low-resource programming languages, mathematics, and program source code discussions.
The performance of each of the new LLMs is vastly enhanced too, with the three billion-parameter StarCoder2 matching the performance of Hugging Face’s original 15 billion-parameter StarCoder model.
StarCoder2 could be a game changer for devs

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
StarCoder2 is a huge step forward for open source AI code generation. In opening the door to competition within the open source community for the title of ‘best AI pair programmer’ and putting the heat on Meta’s Code Llama, it has ensured that developers have a future of solid, open options to look forward to.
RELATED WHITEPAPER
Within the paper accompanying the launch, the team behind StarCoder2 presented evidence that the model can go toe-to-toe with Code Llama even in its largest, 34-billion parameter size.
In MBPP, a benchmark that pits a coding model against approximately 1,000 entry-level Python programming problems, StarCoder2’s 15-billion parameter model scored 66.2 against Code Llama 34B’s 65.4.
The fact that the training data for StarCoder is openly available through the Stack will also be a relief to many organizations.
Future legal battles will be fought over who owns the data used to train AI and any company that discovers its source code was generated using scraped proprietary data could be in for a very difficult and costly replacement process down the line.
In contrast, the openness of StarCoder2 is a crowning achievement. In the interest of crediting the developers whose code formed the basis of StarCoder2, users can enter outputs into a dataset search on Hugging Face to identify if the code the tool has produced is ‘original’ or a verbatim copy from its immense training data.
Alternatively, teams can freely search the dataset themselves.
It’s in the interest of all developers to have strong options like this on the market, as innovation and competition in the sector will only drive models to become more accurate. But the precedent StarCoder2 sets in terms of responsible AI model creation through open source may be its lasting legacy.

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.
- Rory BathgateFeatures and Multimedia Editor
-
Should AI PCs be part of your next hardware refresh?
AI PCs are fast becoming a business staple and a surefire way to future-proof your business
By Bobby Hellard Published
-
Westcon-Comstor and Vectra AI launch brace of new channel initiatives
News Westcon-Comstor and Vectra AI have announced the launch of two new channel growth initiatives focused on the managed security service provider (MSSP) space and AWS Marketplace.
By Daniel Todd Published
-
Meta executive denies hyping up Llama 4 benchmark scores – but what can users expect from the new models?
News A senior figure at Meta has denied claims that the tech giant boosted performance metrics for its new Llama 4 AI model range following rumors online.
By Nicole Kobie Published
-
‘This is the first event in history where a company CEO invites all of the guests to explain why he was wrong’: Jensen Huang changes his tune on quantum computing after January stock shock
News Nvidia CEO Jensen Huang has stepped back from his prediction that practical quantum computing applications are decades away following comments that sent stocks spiraling in January.
By Nicole Kobie Published
-
We’re optimistic that within five years we’ll see real-world applications’: Google thinks it’s on the cusp of delivering on its quantum computing dream – even if Jensen Huang isn't so sure
News Nvidia CEO Jensen Huang sent shares in quantum computing firms tumbling last month after making comments on the near-term viability of the technology.
By Ross Kelly Last updated
-
DeepSeek flips the script
ITPro Podcast The Chinese startup's efficiency gains could undermine compute demands from the biggest names in tech
By Rory Bathgate Published
-
The DeepSeek bombshell has been a wakeup call for US tech giants
Opinion Ross Kelly argues that the recent DeepSeek AI model launches will prompt a rethink on AI development among US tech giants.
By Ross Kelly Published
-
Jensen Huang doesn't think AI will come for his job — but other CEOs might disagree
News A survey last year found almost half of CEOs believe they could be replaced with AI, but Nvidia’s superstar CEO thinks otherwise
By Solomon Klappholz Published
-
HPE’s ‘one-click AI solution’ for private cloud cuts project times from months to a ‘single moment’
News The new tools allow generative AI virtual assistants to be launched in seconds, using private data
By Emma Woollacott Last updated
-
Meta won't release multimodal AI models in Europe due to "unpredictable" privacy regulations
News The "unpredictable nature" of EU regulations have prompted Meta to scrap plans to release its models across the region
By Nicole Kobie Published