Blog > Best 39 Large Language Models (LLMs) in 2025

Best 39 Large Language Models (LLMs) in 2025

by Anthony Cardillo
Last Updated: March 13, 2025

Contents:

Large language models are pre-trained on large datasets and use natural language processing to perform linguistic tasks such as text generation, code completion, paraphrasing, and more.

The initial release of ChatGPT sparked the rapid adoption of generative AI, which has led to large language model innovations and industry growth.

Contents:

Top LLMs in February 2025

This table lists the leading large language models in early 2025.

LLM Name	Developer	Release Date	Access	Parameters
GPT-4.5	OpenAI	Feb 27, 2025	API	Unknown
Claude 3.7 Sonnet	Anthropic	Feb 24, 2025	API	Unknown (est. 200B+)
Grok-3	xAI	Feb 17, 2025	API	Unknown
Gemini 2.0 Flash-Lite	Google DeepMind	Feb 5, 2025	API	Unknown
Gemini 2.0 Pro	Google DeepMind	Feb 5, 2025	API	Unknown
GPT-o3-mini	OpenAI	Jan 31, 2025	API	Unknown
Qwen 2.5-Max	Alibaba	Jan 29, 2025	API	Unknown
DeepSeek R1	DeepSeek	Jan 20, 2025	API, Open Source	671B (37B active)
DeepSeek-V3	DeepSeek	Dec 26, 2024	API, Open Source	671B (37B active)
Gemini 2.0 Flash	Google DeepMind	Dec 11, 2024	API	Unknown
Sora	OpenAI	Dec 9, 2024	API	Unknown
Nova	Amazon	Dec 3, 2024	API	Unknown
Claude 3.5 Sonnet (New)	Anthropic	Oct 22, 2024	API	Unknown
GPT-o1	OpenAI	Sept 12, 2024	API	Unknown (o1-mini est. ~100B)
DeepSeek-V2.5	DeepSeek	Sept 5, 2024	API, Open Source	Unknown
Grok-2	xAI	Aug 13, 2024	API	Unknown
Mistral Large 2	Mistral AI	July 24, 2024	API	123B
Llama 3.1	Meta AI	July 23, 2024	Open Source	405B
GPT-4o mini	OpenAI	July 18, 2024	API	~8B (est.)
Nemotron-4	Nvidia	July 14, 2024	Open Source	340B
Claude 3.5 Sonnet	Anthropic	June 20, 2024	API	~175-200B (est.)
GPT-4o	OpenAI	May 13, 2024	API	~1.8T (est.)
DeepSeek-V2	DeepSeek	May 6, 2024	API, Open Source	Unknown
Phi-3	Microsoft	April 23, 2024	API, Open Source	Mini 3B, Small 7B, Medium 14B
Mixtral 8x22B	Mistral AI	April 10, 2024	Open Source	141B (39B active)
Jamba	AI21 Labs	Mar 29, 2024	Open Source	52B (12B active)
DBRX	Databricks' Mosaic ML	Mar 27, 2024	Open Source	132B
Command R	Cohere	Mar 11, 2024	API, Open Source	35B
Inflection-2.5	Inflection AI	Mar 7, 2024	Proprietary	Unknown (predecessor ~400B)
Gemma	Google DeepMind	Feb 21, 2024	API, Open Source	2B, 7B
Gemini 1.5	Google DeepMind	Feb 15, 2024	API	~1.5T Pro, ~8B Flash (est.)
Stable LM 2	Stability AI	Jan 19, 2024	Open Source	1.6B, 12B
Grok-1	xAI	Nov 4, 2023	API, Open Source	314 billion
Mistral 7B	Mistral AI	Sept 27, 2023	Open Source	7.3 billion
Falcon 180B	Technology Innovation Institute	Sept 6, 2023	Open Source	180 billion
XGen-7B	Salesforce	July 3, 2023	Open Source	7 billion
PaLM 2	Google	May 10, 2023	API	340 billion
Alpaca 7B	Stanford CRFM	Mar 13, 2023	Open Source	7 billion
Pythia	EleutherAI	Mar 13, 2023	Open Source	70 million to 12 billion

Context Windows and Knowledge Boundaries

LLMs with a larger context window size can handle longer inputs and outputs. The context window therefore determines how much information an LLM process before its performance starts to degrade. The knowledge cutoff date determines the end date of the data used in training.

LLM Name	Context Window (Tokens)	Knowledge Cutoff Date
Gemini 2.0 Pro	2,000,000	August 2024
Gemini 2.0 Flash	1,000,000	August 2024
Gemini 1.5 Pro	2,000,000	November 2023
Claude 3.7 Sonnet	200,000	October 2024
Claude 3.5 Sonnet (New)	200,000	April 2024
Claude 3.5 Sonnet	200,000	April 2024
DeepSeek R1	131,072	July 2024
Grok-3	128,000	N/A - Real time
Gemini 2.0 Flash-Lite	128,000	August 2024
Grok-2	128,000	July 2024
DeepSeek V3	128,000	July 2024
Llama 3.1	128,000	December 2023
GPT-4.5	128,000	October 2023
GPT-o3-mini	200,000	October 2023
GPT-o1	200,000	October 2023
GPT-4o mini	128,000	October 2023
GPT-4o	128,000	October 2023
Phi-3	128,000	October 2023
DeepSeek-V2.5	32,768	July 2024
Mistral Large 2	32,768	Unknown (pre-Jul 24)
Mixtral 8x22B	65,536	Unknown (pre-Apr 24)
DBRX	32,768	December 2023
DeepSeek-V2	32,768	November 2023
Mistral 7B	32,768	Unknown (pre-Sept 23)
Nemotron-4	32,768	June 2023
Inflection-2.5	32,768	Mid-2023
Nova	32,000	Unknown (pre-Dec 24)
Qwen 2.5-Max	32,000	October 2023
Gemini 1.0 Pro	32,000	February 2023
Command R	8,192	November 2023
Jamba	8,192	Mid-2023
Gemma	8,192	Mid-2023
Grok-1	8,192	Mid-2023
PaLM 2	8,192	February 2023
Falcon 180B	8,192	Late 2022
Stable LM 2	4,096	December 2023
XGen-7B	4,096	Mid-2023
Alpaca 7B	2,048	June 2022
Pythia	2,048	Mid-2022

As adoption continues to grow, so does the LLM industry.

The global large language model market is projected to grow from $6.5 billion in 2024 to $140.8 billion by 2033
92% of Fortune 500 firms have started using generative AI in their workflows
Generative AI is disrupting the SEO industry and changing the way we find information online

Instantly Analyze
Any Market

Here's a deeper dive on some of the most important models over the last 3 years.

1. GPT-4.5

undefined

Developer: OpenAI

Release date: February 2025

Number of Parameters: Unknown

Context Window (Tokens): 128,000

Knowledge Cutoff Date: October 2023

What is it? GPT-4.5 "Orion" is the largest OpenAI model to date. In benchmarking, it outperforms GPT-4o in most tests. However, GPT-4.5 is not a reasoning model like the OpenAI "o" lineup, so it does not "think" before responding. Instead, it was trained using unsupervised learning, which is designed to give it broad knowledge from a massive set of training data. In its announcement, OpenAI said that GPT-4.5 is not a replacement for 4o, which is less expensive to run.

Pro users were first to get access to GPT-4.5 in late February, 2025. Plus, Team, Enterprise, and Education users will be able to use it from early March.

GPT-4.5 is thought to be the last release before the highly anticipated GPT-5.

2. DeepSeek R1

DeepSeek counting Rs in strawberry

Developer: DeepSeek

Release date: January 2025

Number of Parameters: 671B total, 37B active

Context Window (Tokens): 131,072

Knowledge Cutoff Date: July 2024

What is it? DeepSeek R1 is a reasoning model that excels in math and coding. It beats or matches OpenAI o1 in several benchmarks, including MATH-500 and AIME 2024.

On its release, DeepSeek immediately hit headlines due to the low cost of training compared to most major LLMs. Traffic to the DeepSeek website exploded in early 2025.

DeepSeek R1 is free to use and open-source. It's accessible via the API, the DeepSeek website, and mobile apps.

3. GPT-o3-mini

ChatGPT model switcher

Developer: OpenAI

Release date: January 31, 2025

Number of Parameters: Unknown (estimated ~100B for o1-mini, likely higher for o3-mini)

Context Window (Tokens): 128,000

Knowledge Cutoff Date: October 2023

What is it? GPT-o3-mini is a small, fast model with reasoning capabilities. It follows GPT-o1, the first model in the "o" series from OpenAI and a smaller version of the forthcoming o3 model.

Compared to o1, o3-mini is cost-efficient but does not support vision. Developers can choose from three levels of "effort", and similarly, users on the web can switch to o3-mini-high. OpenAI says o3-mini-high is more intelligent, but slower to reason.

After the full version of o3 is released, OpenAI is expected to release GPT-4.5 "Orion" and GPT-5 in mid-2025. The ChatGPT website continues to be one of the world's most popular sites, receiving more than 75 million visitors from organic search in February 2025.

4. Claude 3.7 Sonnet

undefined

Developer: Anthropic

Release date: February 24, 2025

Number of Parameters: Unknown (estimated to be 200B+)

Context Window (Tokens): 200,000

Knowledge Cutoff Date: October 2024

What is it? Claude 3.7 Sonnet is the successor to Claude 3.5 Sonnet "New". It's anticipated to be the most intelligent model from Anthropic so far.

Claude remains a popular model for creative tasks and coding. On the web, it receives 3M+ visits from organic search each month.

The company describes 3.7 Sonnet as a hybrid reasoning model, meaning that it functions as a regular LLM, but is also capable of "thinking". In this extended thinking mode, 3.7 Sonnet publishes its "thought process" before providing an answer. The amount of "thinking" time is also user-customizable.

As part of the release, Claude's knowledge cutoff date was extended to October 2024. The context window remains unchanged at 200,000 tokens.

5. Grok-3

undefined

Developer: xAI

Release date: February 17, 2025

Number of Parameters: Unknown (Grok-1: 314 billion)

Context Window (Tokens): 128,000

Knowledge Cutoff Date: None (uses real-time information)

What is it? Grok-3 (announced as Grok-3 Beta) is the latest xAI model. It's directly integrated into X (formerly Twitter) on the Premium+ plan and excels in processing real-time information about the real world. In pre-training, Grok-3 required ten times the resources of the previous model, Grok-2.

Grok-3 has a much larger context window than its predecessors at 1 million tokens. Users can click the "Think" button to run Grok in reasoning mode. It also has "Big Brain" and "Deep Search" modes and was released alongside a mini version.

6. Mistral Large 2

undefined

Developer: Mistral AI

Release date: July 24, 2024

Number of Parameters: 123 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Unknown

What is it? Mistral Large 2 is a massive upgrade on its predecessor, Mistral 7B. It's designed to bring Mistral up to speed with the latest competing LLMs from Meta and OpenAI. It was also fine-tuned to reduce the number of hallucinations. Large 2 is multilingual and has a 128k context window.

Benchmarking suggests that Mistral Large 2 outperforms Llama 3.1 405B in Python, C++, Java, PHP, and C# coding tasks.

7. PaLM 2

undefined

Developer: Google

Release date: May 10, 2023

Number of Parameters: 340 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: February 2023

What is it? PaLM 2 is an advanced large language model developed by Google. As the successor to the original Pathways Language Model (PaLM), it’s trained on 3.6 trillion tokens (compared to 780 billion) and 340 billion parameters (compared to 540 billion). PaLM 2 was originally used to power Google's first generative AI chatbot, Bard (rebranded to Gemini in February 2024).

Want to Uncover
Your True Audience? 🔎

8. Falcon 180B

undefined

Developer: Technology Innovation Institute (TII)

Release date: September 6, 2023

Number of Parameters: 180 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Late 2022

What is it? Developed and funded by the Technology Innovation Institute, Falcon 180B is an upgraded version of the earlier Falcon 40B LLM. It has 180 billion parameters, which is 4.5 times larger than the 40 billion parameters of Falcon 40B.

In addition to Falcon 40B, it also outperforms other large language models like GPT-3.5 and LLaMA 2 on tasks such as reasoning, question answering, and coding. In February 2024, the UAE-based Technology Innovation Institute (TII) committed $300 million in funding to the Falcon Foundation.

9. Stable LM 2

undefined

Developer: Stability AI

Release date: January 19, 2024

Number of Parameters: 1.6 billion and 12 billion

Context Window (Tokens): 4,096

Knowledge Cutoff Date: December 2023

What is it? Stability AI, the creators of the Stable Diffusion text-to-image model, are the developers behind Stable LM 2. This series of large language models includes Stable LM 2 12B (12 billion parameters) and Stable LM 2 1.6B (1.6 billion parameters). Released in April 2024, the larger 12B model outperforms models like LLaMA 2 70B on key benchmarks despite being much smaller.

10. Gemini 1.5 Pro

undefined

Developer: Google DeepMind

Release date: February 15, 2024

Number of Parameters: Unknown

Context Window (Tokens): 2,000,000

Knowledge Cutoff Date: November 2023

What is it? Gemini 1.5 offered a significant upgrade over its predecessor, Gemini 1.0. Gemini 1.5 Pro provided a one million-token context window on release (1 hour of video, 700,000 words, or 30,000 lines of code), the largest available at the time. This upgrade is 35 times larger than Gemini 1.0 Pro and surpasses the previous largest record of 200,000 tokens set by Anthropic’s Claude 2.1.

In June 2024, Gemini 1.5 Pro's context window was increased to 2,000,000 tokens.

11. Llama 3.1

undefined

Developer: Meta AI

Release date: June 23, 2024

Number of Parameters: 405 billion

Context Window (Tokens): 128,000

Knowledge Cutoff Date: December 2023

What is it? Llama 3, the predecessor to Llama 3.1, was available in both 70B and 8B versions that outperformed other open-source models like Mistral 7B and Google's Gemma 7B on MMLU, reasoning, coding, and math benchmarks. Now, users will notice major upgrades to the latest version, including 405 billion parameters and an expended context length of 128,000.

Users will also notice more accuracy because of the impressive knowledge base, which has been trained on over 15 trillion tokens. Plus, Meta added eight additional languages for this model. The increased size of this model makes it the largest open-source model released to date.

Customers can still access its predecessor, Llama 2, which is available in three versions: 7 billion, 13 billion, and 70 billion parameters.

12. Mixtral 8x22B

undefined

Developer: Mistral AI

Release date: April 10, 2024

Number of Parameters: 141 billion

Context Window (Tokens): 65,536

Knowledge Cutoff Date: Unknown

What is it? Mixtral 8x22B is a sparse Mixture-of-Experts (SMoE) model has 141 billion total parameters but only uses 39B active parameters to focus on improving the model’s performance-to-cost ratio.

13. Inflection-2.5

undefined

Developer: Inflection AI

Release date: March 10, 2024

Number of Parameters: Unknown

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Mid 2023

What is it? Inflection-2.5 was developed by Inflection AI to power its conversational AI assistant, Pi. Significant upgrades have been made, as the model currently achieves over 94% of GPT-4’s average performance while only having 40% of the training FLOPs. In March 2024, the Microsoft-backed startup reached 1+ million daily active users on Pi.

In October 2024, Inflection released the Inflection 3.0 family of models designed for enterprise use.

14. Jamba

undefined

Developer: AI21 Labs

Release date: March 29, 2024

Number of Parameters: 52 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Mid 2023

What is it? AI21 Labs created Jamba, the world's first production-grade Mamba-style large language model. It integrates SSM technology with elements of a traditional transformer model to create a hybrid architecture. The model is efficient and highly scalable, with a context window of 256K and deployment support of 140K context on a single GPU.

15. Command R

undefined

Developer: Cohere

Release date: March 11, 2024

Number of Parameters: 35 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: November 2023

What is it? Command R is a series of scalable LLMs from Cohere that support ten languages and 128,000-token context length (around 100 pages of text). This model primarily excels at retrieval-augmented generation, code-related tasks like explanations or rewrites, and reasoning. In April 2024, Command R+ was released to support larger workloads and provide real-world enterprise support.

16. Gemma

undefined

Developer: Google DeepMind

Release date: February 21, 2024

Number of Parameters: 2 billion and 7 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Mid 2023

What is it? Gemma is a series of lightweight open-source language models developed and released by Google DeepMind. The Gemma models are built with similar tech to the Gemini models, but Gemma is limited to text inputs and outputs only. The models have a context window of 8,000 tokens and are available in 2 billion and 7 billion parameter sizes.

17. Phi-3

undefined

Developer: Microsoft

Release date: April 23, 2024

Number of Parameters: 3.8 billion

Context Window (Tokens): 8,000-128,000

Knowledge Cutoff Date: Mid 2023

What is it? Classified as a small language model (SLM), Phi-3 is Microsoft's latest release with 3.8 billion parameters. Despite the smaller size, it's been trained on 3.3 trillion tokens of data to compete with Mistral 8x7B and GPT-3.5 performance on MT-bench and MMLU benchmarks.

To date, Phi-3-mini is the only model available. However, Microsoft plans to release the Phi-3-small and Phi-3-medium models later this year.

18. XGen-7B

undefined

Developer: Salesforce

Release date: July 3, 2023

Number of Parameters: 7 billion

Context Window (Tokens): 4,096

Knowledge Cutoff Date: Mid 2023

What is it? XGen-7B is a large language model from Salesforce with 7 billion parameters and an 8k context window. The model was trained on 1.37 trillion tokens from various sources, such as RedPajama, Wikipedia, and Salesforce's own Starcoder dataset.

Salesforce has released two open-source versions, a 4,000 and 8,000 token context window base, hosted under an Apache 2.0 license.

19. DBRX

undefined

Developer: Databricks' Mosaic ML

Release date: March 27, 2024

Number of Parameters: 132 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: December 2023

What is it? DBRX is an open-source LLM built by Databricks and the Mosaic ML research team. The mixture-of-experts architecture has 36 billion (of 132 billion total) active parameters on an input. DBRX has 16 experts and chooses 4 of them during inference, providing 65 times more expert combinations compared to similar models like Mixtral and Grok-1.

20. Pythia

undefined

Developer: EleutherAI

Release date: February 13, 2023

Number of Parameters: 70 million to 12 billion

Context Window (Tokens): 2,048

Knowledge Cutoff Date: Mid 2022

What is it? Pythia is a series of 16 large language models developed and released by EleutherAI, a non-profit AI research lab. There are eight different model sizes: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. Because of Pythia's open-source license, these LLMs serve as a base model for fine-tuned, instruction-following LLMs like Dolly 2.0 by Databricks.

21. Alpaca 7B

undefined

Developer: Stanford CRFM

Release date: March 27, 2024

Number of Parameters: 7 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Unknown

What is it? Alpaca is a 7 billion-parameter language model developed by a Stanford research team and fine-tuned from Meta's LLaMA 7B model. Users will notice that, although being much smaller, Alpaca performs similarly to text-DaVinci-003 (ChatGPT 3.5). However, Alpaca 7B is available for research purposes, and no commercial licenses are available.

22. Nemotron-4 340B

undefined

Developer: NVIDIA

Release date: June 14, 2024

Number of Parameters: 340 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: June 2023

What is it? Nemotron-4 340B is a family of large language models for synthetic data generation and AI model training. These models help businesses create new LLMs without larger and more expensive datasets. Instead, Nemotron-4 can create high-quality synthetic data to train other AI models, which reduces the need for extensive human-annotated data.

The model family includes Nemotron-4-340B-Base (foundation model), Nemotron-4-340B-Instruct (fine-tuned chatbot), and Nemotron-4-340B-Reward (quality assessment and preference ranking). Due to the 9 trillion tokens used in training, which includes English, multilingual, and coding language data, Nemotron-4 matches GPT-4's high-quality synthetic data generation capabilities.

Conclusion

The landscape of large language models is rapidly evolving, with new breakthroughs and innovations emerging at an unprecedented pace.

From compact models like Phi-2 and Alpaca 7B to cutting-edge architectures like Jamba and DBRX, the field of LLMs is pushing the boundaries of what's possible in natural language processing (NLP).

We will keep this list regularly updated with new models. If you liked learning about these LLMs, check out our lists of generative AI startups and AI startups.

Stop Guessing, Start Growing 🚀

Use real-time topic data to create content that resonates and brings results.

Find Thousands of Trending Topics With Our Platform

Try Exploding Topics Pro

Reveal More Competitor Secrets for Free

Which keywords they target

Their most important pages

Where they get backlinks from

How they monetize their site

Get more free data

Stop Guessing, Start Growing 🚀

Use real-time topic data to create content that resonates and brings results.

Newsletter Signup

By clicking “Subscribe” you agree to Semrush Privacy Policy and consent to Semrush using your contact data for newsletter purposes

Written By

Anthony Cardillo

Content Writer

Anthony is a Content Writer at Exploding Topics. Before joining the team, Anthony spent over four years managing content strat... Read more

Best 39 Large Language Models (LLMs) in 2025

Struggling to Write? Let AI Help! 🤖

Top LLMs in February 2025

Context Windows and Knowledge Boundaries

1. GPT-4.5

2. DeepSeek R1

3. GPT-o3-mini

4. Claude 3.7 Sonnet

5. Grok-3

6. Mistral Large 2

7. PaLM 2

8. Falcon 180B

9. Stable LM 2

10. Gemini 1.5 Pro

11. Llama 3.1

12. Mixtral 8x22B

13. Inflection-2.5

14. Jamba

15. Command R

16. Gemma

17. Phi-3

18. XGen-7B

19. DBRX

20. Pythia

21. Alpaca 7B

22. Nemotron-4 340B

Conclusion

Stop Guessing, Start Growing 🚀

Reveal More Competitor Secrets for Free

Stop Guessing, Start Growing 🚀