The best large language models (LLMs) in 2024

March 21, 2024

No Comments

The best large language models (LLMs) in 2024

These are the most significant, interesting, and popular LLMs you can use right now.

By Gibbs · January 30, 2024

Hero image with an icon representing an AI agent

Large language models (LLMs) are the main kind of text-handling AIs, and they’re popping up everywhere. ChatGPT is by far the most famous tool that uses an LLM—it’s powered by a specially tuned version of OpenAI’s GPT models. But there are lots of other chatbots and text generators—including everything from Google Bard and Anthropic’s Claude to Writesonic and Jasper—that are built on top of LLMs.

Pairing AI with automation will change how you work

Learn more

LLMs have been simmering in research labs since the late 2010s, but after the release of ChatGPT (which showcased the power of GPT), they’ve burst out of the lab and into the real world.

Some LLMs have been in development for years. Others have been quickly spun up to catch the latest hype cycle. And yet more are open source research tools. Here, I’ll break down the most important LLMs on the scene right now.

The best LLMs in 2024

There are dozens of major LLMs, and hundreds that are arguably significant for some reason or other. Listing them all would be nearly impossible, and in any case, it would be out of date within days because of how quickly LLMs are being developed.

Take the word “best” with a grain of salt there: I’ve tried to narrow things down by offering a list of the most significant, interesting, and popular LLMs (and LMMs), not necessarily the ones that outperform on benchmarks (though most of these do). I’ve also mostly focused on LLMs that you can use—rather than ones that are the subjects of super interesting research papers—since we like to keep things practical around here.

One last thing before diving in: a lot of AI-powered apps don’t list what LLMs they rely on. Some we can guess at or it’s clear from their marketing materials, but for lots of them, we just don’t know. That’s why you’ll see “Undisclosed” in the table below a bit—it just means we don’t know of any major apps that use the LLM, though it’s possible some do.

Click on any app in the list below to learn more about it.

LLM	Developer	Popular apps that use it	# of parameters	Access
GPT	OpenAI	Microsoft, Duolingo, Stripe, Zapier, Dropbox, ChatGPT	175 billion+	API
Gemini	Google	Some queries on Bard	Nano: 1.8 & 3.25 billion; others unknown	API
PaLM 2	Google	Google Bard, Docs, Gmail, and other Google apps	340 billion	API
Llama 2	Meta	Undisclosed	7, 13, and 70 billion	Open source
Vicuna	LMSYS Org	Chatbot Arena	7, 13, and 33 billion	Open source
Claude 2	Anthropic	Slack, Notion, Zoom	Unknown	API
Stable Beluga	Stability AI	Undisclosed	7, 13, and 70 billion	Open source
StableLM	Stability AI	Undisclosed	7, 13, and 70 billion	Open source
Coral	Cohere	HyperWrite, Jasper, Notion, LongShot	Unknown	API
Falcon	Technology Innovation Institute	Undisclosed	1.3, 7.5, 40, and 180 billion	Open source
MPT	Mosaic	Undisclosed	7 and 30 billion	Open source
Mixtral 8x7B	Mistral AI	Undisclosed	46.7 billion	Open source
XGen-7B	Salesforce	Undisclosed	7 billion	Open source
Grok	xAI	Grok Chatbot	Unknown	Chatbot

What is an LLM?

An LLM, or large language model, is a general-purpose AI text generator. It’s what’s behind the scenes of all AI chatbots and AI writing generators.

LLMs are supercharged auto-complete. Stripped of fancy interfaces and other workarounds, what they do is take a prompt and generate an answer using a string of plausible follow-on text. The chatbots built on top of LLMs aren’t looking for keywords so they can answer with a canned response—instead, they’re doing their best to understand what’s being asked and reply appropriately.

This is why LLMs have really taken off: the same models (with or without a bit of extra training) can be used to respond to customer queries, write marketing materials, summarize meeting notes, and do a whole lot more.

How do LLMs work?

Early LLMs, like GPT-1, would fall apart and start to generate nonsense after a few sentences, but today’s LLMs, like GPT-4, can generate thousands of words that all make sense.

To get to this point, LLMs were trained on huge corpuses of data. The specifics vary a little bit between the different LLMs—depending on how careful the developers are to fully acquire the rights to the materials they’re using—but as a general rule, you can assume that they’ve been trained on something equivalent to the entire public internet and every major book that’s ever been published. This is why LLMs can generate text that sounds so authoritative on such a wide variety of subjects.

From this training data, LLMs are able to model the relationship between different words (or really, fractions of words called tokens) using high-dimensional vectors. This is all where things get very complicated and mathy, but the basics are that every individual token ends up with a unique ID and that similar concepts are grouped together. This is then used to generate a neural network, a kind of multi-layered algorithm based on how the human brain works—and that’s at the core of every LLM.

The neural network has an input layer, an output layer, and multiple hidden layers, each with multiple nodes. It’s these nodes that compute what words should follow on from the input, and different nodes have different weights. For example, if the input string contains the word “Apple,” the neural network will have to decide to follow up with something like “Mac” or “iPad,” something like “pie” or “crumble,” or something else entirely. When we talk about how many parameters an LLM has, we’re basically comparing how many layers and nodes there are in the underlying neural network. In general, the more nodes, the more complex the text a model is able to understand and generate.

Of course, an AI model trained on the open internet with little to no direction sounds like the stuff of nightmares. And it probably wouldn’t be very useful either, so at this point, LLMs undergo further training and fine-tuning to guide them toward generating safe and useful responses. One of the major ways this works is by adjusting the weights of the different nodes, though there are other aspects of it too.

Infographic showing how natural language processing works

All this is to say that while LLMs are black boxes, what’s going on inside them isn’t magic. Once you understand a little about how they work, it’s easy to see why they’re so good at answering certain kinds of questions. It’s also easy to understand why they tend to make up (or hallucinate) random things.

For example, take questions like these:

What bones does the femur connect to?
What currency does the USA use?
What is the tallest mountain in the world?

These are easy for LLMs, because the text they were trained on is highly likely to have generated a neural network that’s predisposed to respond correctly.

Then look at questions like these:

What year did Margot Robbie win an Oscar for Barbie?
What weighs more, a ton of feathers or a ton of feathers?
Why did China join the European Union?

You’re far more likely to get something weird with these. The neural network will still generate follow-on text, but because the questions are tricky or incorrect, it’s less likely to be correct.

What can LLMs be used for?

LLMs are powerful mostly because they’re able to be generalized to so many different situations and uses. The same core LLM (sometimes with a bit of fine-tuning) can be used to do dozens of different tasks. While everything they do is based around generating text, the specific ways they’re prompted to do it changes what features they appear to have.

Here are some of the tasks LLMs are commonly used for:

General-purpose chatbots (like ChatGPT and Google Bard)
Customer service chatbots that are trained on your business’s docs and data
Translating text from one language to another
Converting text into computer code, or one language into another
Generating social media posts, blog posts, and other marketing copy
Sentiment analysis
Moderating content
Correcting and editing writing
Data analysis

And hundreds of other things. We’re only in the early days of the current AI revolution.

But there are also plenty of things that LLMs can’t do, but that other kinds of AI models can. A few examples:

Interpret images
Generate images
Convert files between different formats
Search the web
Perform math and other logical operations

Of course, some LLMs and chatbots appear to do some of these things. But in most cases, there’s another AI service stepping in to assist. When one model is handling a few different kinds of inputs, it actually stops being considered a large language model and becomes something called a large multimodal model (though, to a certain degree, it’s just semantics).

With all that context, let’s move on to the LLMs themselves.

The best LLMs in 2024

GPT

OpenAI Playground with a modified system prompt.

Developer: OpenAI
Parameters: More than 175 billion
Access: API

OpenAI’s Generative Pre-trained Transformer (GPT) models kickstarted the latest AI hype cycle. There are two main models currently available: GPT-3.5-turbo and GPT-4. GPT is a general-purpose LLM with an API, and it’s used by a diverse range of companies—including Microsoft, Duolingo, Stripe, Descript, Dropbox, and Zapier—to power countless different tools. Still, ChatGPT is probably the most popular demo of its powers.

You can also connect Zapier to GPT or ChatGPT, so you can use GPT straight from the other apps in your tech stack. Here’s more about how to automate ChatGPT, or you can get started with one of these pre-made workflows.

Create blog outlines with ChatGPT from submitted Airtable forms

Try it

Airtable, ChatGPT, Google Docs

Airtable + ChatGPT + Google DocsMore details

Create a response to incoming forms with ChatGPT

Try it

Typeform, ChatGPT, Email by Zapier

Typeform + ChatGPT + Email by ZapierMore details

Update Jira issues with product requirements from ChatGPT

Try it

Jira Software Cloud, ChatGPT

Jira Software Cloud + ChatGPTMore details

Gemini

Developer: Google
Parameters: Nano available in 1.8 billion and 3.25 billion versions; others unknown
Access: API

Google Gemini is a family of AI models from Google. The three models—Gemini Nano, Gemini Pro, and Gemini Ultra—are designed to operate on different devices, from smartphones to dedicated servers. While capable of generating text like an LLM, the Gemini models are also natively able to handle images, audio, video, code, and other kinds of information.

Gemini Pro now powers some queries on Google’s chatbot, Bard, and is available to developers through Google AI Studio or Vertex AI. Gemini Nano and Ultra are due out in 2024.

PaLM 2

Bard, the best ChatGPT alternative for connecting to Google apps

Developer: Google
Parameters: 340 billion
Access: API

PaLM 2 is an LLM from Google. It’s designed for natural language tasks and powers most queries on Google Bard, as well as many of Google’s other AI features throughout its apps, like Docs and Gmail. It’s also available as an API for developers.

Llama 2

Developer: Meta
Parameters: 7 billion, 13 billion, and 70 billion
Access: Open source

Llama 2 is a family of open source LLMs from Meta, the parent company of Facebook and Instagram. It’s one of the most popular and powerful open source LLMs, and you can download the source code yourself from Github. Because it’s free for research and commercial uses, a lot of other LLMs use Llama 2 as a base.

Vicuna

Developer: LMSYS Org
Parameters: 7 billion, 13 billion, and 33 billion
Access: Open source

Vicuna is an open source chatbot built off Meta’s Llama LLM. It’s widely used in AI research and as part of Chatbot Arena, a chatbot benchmark operated by LMSYS.

Claude 2

Claude, the best AI chatbot with a long conversation history

Developer: Anthropic
Parameters: Unknown
Access: API

Claude 2 is arguably one of the most important competitors to GPT. It’s designed to be helpful, honest, harmless, and—crucially—safe for enterprise customers to use. As a result, companies like Slack, Notion, and Zoom have all partnered with Anthropic.

Like all the other proprietary LLMs, Claude 2 is only available as an API, though it can be further trained on your data and fine-tuned to respond how you need. You can also connect Claude to Zapier so you can automate Claude from all your other apps. Here are some pre-made workflows to get you started.

Create blog posts based on keywords with Claude and save in Google Sheets

Try it

Google Sheets, Anthropic (Claude)

Google Sheets + Anthropic (Claude)More details

Write AI-generated email responses with Claude and store in Gmail

Try it

Gmail, Anthropic (Claude)

Gmail + Anthropic (Claude)More details

Create AI-generated social media posts with Claude

Try it

Google Sheets, Anthropic (Claude), Facebook Pages

Google Sheets + Anthropic (Claude) + Facebook PagesMore details

Stable Beluga and StableLM

Developer: Stability AI
Parameters: 7 billion, 13 billion, and 70 billion
Access: Open source

Stability AI is the group behind Stable Diffusion, one of the best AI image generators. They’ve also released a handful of open source LLMs based on Llama, including Stable Beluga and StableLM, although they’re nowhere near as popular as the image generator.

Coral

Developer: Cohere
Parameters: Unknown
Access: API

Like Claude 2, Cohere’s Coral LLM is designed for enterprise users. It similarly offers an API and allows organizations to train versions of its model on their own data, so it can accurately respond to customer queries.

Falcon

Developer: Technology Innovation Institute
Parameters: 1.3 billion, 7.5 billion, 40 billion, and 180 billion
Access: Open source

Falcon is a family of open source LLMs that have consistently performed well in the various AI benchmarks. It has models with up to 180 billion parameters and can outperform PaLM 2, Llama 2, and GPT-3.5 in some tasks. It’s released under a permissive Apache 2.0 license, so it’s suitable for commercial and research use.

MPT

Developer: Mosaic
Parameters: 7 billion, 30 billion
Access: Open source

Mosaic’s MPT-7B and MPT-30B LLMs are two more powerful, popular, commercially available LLMs. Interestingly, they’re not built on top of Meta’s Llama model, unlike a lot of other open source models. MPT-30B outperforms the original GPT-3 and is released under an Apache 2.0 license, like Falcon. There are a few different versions available, fine-tuning for things like chat, and most interestingly, a 7B version fine-tuned for generating long works of fiction.

Mixtral 8x7B

Developer: Mistral
Parameters: 46.7 billion
Access: Open source

Mistral’s Mixtral 8x7B uses a series of sub-systems to efficiently outperform larger models. Despite having significantly fewer parameters (and thus being capable of running faster or on less powerful hardware), it’s able to outperform Llama-70B and match or beat GPT-3.5. It’s also released under an Apache 2.0 license.

XGen-7B

Developer: Salesforce
Parameters: 7 billion
Access: Open source

Salesforce’s XGen-7B isn’t an especially powerful or popular open source model—it performs about as well as other open source models with seven billion parameters. But I still think it’s worth including because it highlights how many large tech companies have AI and machine learning departments that can just develop and launch their own LLMs.

Grok

Developer: xAI
Parameters: Unknown
Access: Chatbot

Grok, a chatbot trained on data from X (formerly Twitter) doesn’t really warrant a place on this list on its own merits as it’s not widely available nor particularly good. Still, I’m listing it here because it was developed by xAI, the AI company founded by Elon Musk. While it might not be making waves in the AI scene, it’s still getting plenty of media coverage, so it’s worth knowing it exists.

Why are there so many LLMs?

Until a year or two back, LLMs were limited to research labs and tech demos at AI conferences. Now, they’re powering countless apps and chatbots, and there are hundreds of different models available that you can run yourself (if you have the computer skills). How did we get here?

Well, there are a few factors in play. Some of the big ones are:

With GPT-3 and ChatGPT, OpenAI demonstrated that AI research had reached the point where it could be used to build practical tools—so lots of other companies started doing the same.
LLMs take a lot of computing power to train, but it can be done in a matter of weeks or months.
There are lots of open source models that can be retrained or adapted into new models without the need to develop a whole new model.
There’s a lot of money being thrown at AI companies, so there are big incentives for anyone with the skills and knowledge to develop any kind of LLM to do so.

What to expect from LLMs in the future

I think we’re going to see a lot more LLMs in the near future, especially from major tech companies. Amazon, IBM, Intel, and NVIDIA all have LLMs under development, in testing, or available for customers to use. They’re not as buzzy as the models I listed above, nor are regular people ever likely to use them directly, but I think it’s reasonable to expect large enterprises to start deploying them widely.

I also think we’re going to see a lot more efficient LLMs tailored to run on smartphones and other lightweight devices. Google has already hinted at this with Gemini Nano, which runs some features on the Google Pixel Pro 8. Developments like Mistral’s Mixtral 8x7B demonstrate techniques that enable smaller LLMs to compete with larger ones efficiently.

The other big thing that’s coming is large multimodal models or LMMs. These combine text generation with other modalities, like images and audio, so you can ask a chatbot what’s going on in an image or have it respond with audio. GPT-4 Vision (GPT-4V) and Google’s Gemini models are two of the first LMMs that are likely to be widely deployed, but we’re definitely going to see more.

Other than that, who can tell? Three years ago, I definitely didn’t think we’d have powerful AIs like ChatGPT available for free. Maybe in a few years, we’ll have artificial general intelligence (AGI).

Related reading: