Three open source large language models you can use today
Enterprises are flocking to open source large language models, many of which have become highly popular - here’s three you might want to try out
Ever since OpenAI fired the starting gun in the generative AI race in November 2022 with the release of ChatGPT, large language models (LLMs) have soared in popularity.
These systems can offer capabilities like text generation, code completion, translation, article summaries, and more, and businesses across the board are finding new use cases for generative AI.
But what if you want to leverage the power of LLMs without the hefty licensing fees or the restrictions of using a closed model?
Well in that case, it might be worth considering an open source large language model - and there’s certainly no shortage of options now.
What is an open source large language model?
An open source LLM differs from a proprietary - or closed source - model in that it's publicly available for organizations to use - although there can be some limitations at times so be wary.
There are a number of benefits to open source LLMs aside from the fact they're publicly available to use right now. Cost savings, in particular, have frequently been touted as a key factor in the decision to use these models by some enterprises.
Similarly, flexibility is another advantage here. Rather than relying on a single provider, open source models allow enterprise users to take advantage of multiple providers.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
Maintenance, updates, and development can also be handled by in-house teams, providing you have the internal capabilities to do so.
This article will delve into three of the most notable choices for open source LLMs on the market right now, spanning models launched by major providers such as Meta and Mistral, and a relative newcomer to the scene.
Llama 2 7B (Meta)
Meta’s first foray into a public AI model came with LLaMA (Large Language Model Meta AI) in February 2023, its initial open source 65B parameter LLM.
In July 2023, in partnership with Microsoft, Meta announced the second iteration of its flagship open source model, Llama 2, with three model sizes boasting 7, 13, and 70-billion parameters.
This included both the underlying foundation model as well as fine-tuned versions of each model tailored for conversational agent interfaces billed as Llama 2 Chat.
Each version of Llama 2 was trained using an offline dataset between January and July 2023.
Llama 2 was billed as the first free ChatGPT competitor and is available via application programming interface (API) allowing organizations to quickly integrate the model into their IT infrastructure.
In addition, Llama 2 is also available through AWS or Hugging Face platforms to meet a wider range of use cases in the open source community.
Some experts queried the open source credentials of Meta’s model, however, as applicants with products or services that have over 700 million monthly active users will need to approach Meta directly for permission to use the model.
Open source data science and machine learning platform Hugging Face stated the Llama 2 7B Chat model outperformed open source chat models on most of the benchmarks they tested.
Mixtral 8x7B (Mistral AI)
French artificial intelligence startup Mistral AI has been a hot property in the AI space since it was launched in April 2023 by former researcher at Google Deepmind, Arthur Mensch.
Mistral AI closed a major Series A funding round in December 2023, securing $415 million and pushing the company’s value to nearly $2 billion just eight months after its inception.
The company released Mistral 7B, a 7-billion parameter large language model in September 2023 via Hugging Face.
The model was released under the Apache 2.0 license, a permissive software license that means the software only has very minimal restrictions on how it can be used by third parties.
The release of Mistral 7B was accompanied by some rather bold claims by Mistral AI, stating it could outperform the 13-billion parameter version of Meta’s Llama 2 on all benchmarks.
Mistral 7B uses grouped-query attention (GQA) to achieve faster inferencing functionality, as well as sliding window attention (SWA) to handle longer sequences at lower costs.
In a head-to-head with Meta’s 13-billion parameter Llama 2 model, platform-as-a-service (PaaS) company E2E Cloud said the comparable performance of Mistral 7B relative to larger models from Meta indicates better memory efficiency and improved throughput from the French Model.
Soon after the release of Mistral 7B, the French AI specialists announced their second model, Mixtral 8x7B.
Mixtral 8x7B uses a sparse mixture-of-expert (MoE) architecture composed of eight expert layers, or neural networks.
The advantage of this approach is that MoEs allow models to be pre-trained much faster than their denser feed-forward network (FFN) layer counterparts.
Mixtral 8x7B’s performance is a step up from its predecessor in terms of inference too, with faster inference capabilities compared to a traditional model with the same number of parameters.
This architecture does suffer from a few drawbacks, however, with the eight expert neural networks all needing to be loaded in memory. This means Mixtral 8x7B will require high levels of VRAM to operate.
Fine-tuning models with an MoE approach can also be difficult, as the architecture tends toward overfitting. This means it can struggle to diverge from its training data when presented with new data.
Despite this, there is cause for optimism as a new fine-tuning method known as instruction-tuning could address some of these concerns.
Smaug-72B (Abacus AI)
US startup Abacus.AI released its 72-billion parameter in February 2024, with its impressive benchmark performance prompting much excitement across the machine learning community.
A fine-tuned version of the Qwen-72B model, Smaug-72B was the first and only open source model to post an average score of more than 80 across the major LLM evaluations.
Smaug-72B outperformed a host of the most powerful proprietary models in Hugging Face’s tests for massive multitask language understanding (MMLU), mathematical reasoning, and common sense reasoning.
This included OpenAI’s GPT-3.5 and Mistral’s closed-source Medium model.
As it stands, Smag-72B rules the roost at the top spot of Hugging Face’s open LLM leaderboard.
Interestingly, Smaug-72B outperforms the model it was based on, Qwen-72B. According to Abacus, this is due to a new fine-tuning technique, DPO-Positive, that addresses a number of the weaknesses in previous LLMs.
In a research paper published in February 2024, engineers at Abacus.AI outlined how they designed a new loss function and training procedure that avoids a common failure mode associated with the Direct Preference Optimization (DPO) training method.
Solomon Klappholz is a Staff Writer at ITPro. He has experience writing about the technologies that facilitate industrial manufacturing which led to him developing a particular interest in IT regulation, industrial infrastructure applications, and machine learning.