grid-line

Best 39 Large Language Models (LLMs) in 2025

by Anthony Cardillo
Last Updated: March 13, 2025

Large language models are pre-trained on large datasets and use natural language processing to perform linguistic tasks such as text generation, code completion, paraphrasing, and more.

The initial release of ChatGPT sparked the rapid adoption of generative AI, which has led to large language model innovations and industry growth.

Struggling to Write? Let AI Help! 🤖

Create engaging, optimized content effortlessly with AI-driven tools that rank.

Powered by data from Semrush Logo

Top LLMs in February 2025

This table lists the leading large language models in early 2025.

LLM NameDeveloperRelease DateAccessParameters
GPT-4.5OpenAIFeb 27, 2025APIUnknown
Claude 3.7 SonnetAnthropicFeb 24, 2025APIUnknown (est. 200B+)
Grok-3xAIFeb 17, 2025APIUnknown
Gemini 2.0 Flash-LiteGoogle DeepMindFeb 5, 2025APIUnknown
Gemini 2.0 ProGoogle DeepMindFeb 5, 2025APIUnknown
GPT-o3-miniOpenAIJan 31, 2025APIUnknown
Qwen 2.5-MaxAlibabaJan 29, 2025APIUnknown
DeepSeek R1DeepSeekJan 20, 2025API, Open Source671B (37B active)
DeepSeek-V3DeepSeekDec 26, 2024API, Open Source671B (37B active)
Gemini 2.0 FlashGoogle DeepMindDec 11, 2024APIUnknown
SoraOpenAIDec 9, 2024APIUnknown
NovaAmazonDec 3, 2024APIUnknown
Claude 3.5 Sonnet (New)AnthropicOct 22, 2024APIUnknown
GPT-o1OpenAISept 12, 2024APIUnknown (o1-mini est. ~100B)
DeepSeek-V2.5DeepSeekSept 5, 2024API, Open SourceUnknown
Grok-2xAIAug 13, 2024APIUnknown
Mistral Large 2Mistral AIJuly 24, 2024API123B
Llama 3.1Meta AIJuly 23, 2024Open Source405B
GPT-4o miniOpenAIJuly 18, 2024API~8B (est.)
Nemotron-4NvidiaJuly 14, 2024Open Source340B
Claude 3.5 SonnetAnthropicJune 20, 2024API~175-200B (est.)
GPT-4oOpenAIMay 13, 2024API~1.8T (est.)
DeepSeek-V2DeepSeekMay 6, 2024API, Open SourceUnknown
Phi-3MicrosoftApril 23, 2024API, Open SourceMini 3B, Small 7B, Medium 14B
Mixtral 8x22BMistral AIApril 10, 2024Open Source141B (39B active)
JambaAI21 LabsMar 29, 2024Open Source52B (12B active)
DBRXDatabricks' Mosaic MLMar 27, 2024Open Source132B
Command RCohereMar 11, 2024API, Open Source35B
Inflection-2.5Inflection AIMar 7, 2024ProprietaryUnknown (predecessor ~400B)
GemmaGoogle DeepMindFeb 21, 2024API, Open Source2B, 7B
Gemini 1.5Google DeepMindFeb 15, 2024API~1.5T Pro, ~8B Flash (est.)
Stable LM 2Stability AIJan 19, 2024Open Source1.6B, 12B
Grok-1xAINov 4, 2023API, Open Source314 billion
Mistral 7BMistral AISept 27, 2023Open Source7.3 billion
Falcon 180BTechnology Innovation InstituteSept 6, 2023Open Source180 billion
XGen-7BSalesforceJuly 3, 2023Open Source7 billion
PaLM 2GoogleMay 10, 2023API340 billion
Alpaca 7BStanford CRFMMar 13, 2023Open Source7 billion
PythiaEleutherAIMar 13, 2023Open Source70 million to 12 billion

Context Windows and Knowledge Boundaries

LLMs with a larger context window size can handle longer inputs and outputs. The context window therefore determines how much information an LLM process before its performance starts to degrade. The knowledge cutoff date determines the end date of the data used in training.

LLM NameContext Window (Tokens)Knowledge Cutoff Date
Gemini 2.0 Pro2,000,000August 2024
Gemini 2.0 Flash1,000,000August 2024
Gemini 1.5 Pro2,000,000November 2023
Claude 3.7 Sonnet200,000October 2024
Claude 3.5 Sonnet (New)200,000April 2024
Claude 3.5 Sonnet200,000April 2024
DeepSeek R1131,072July 2024
Grok-3128,000N/A - Real time
Gemini 2.0 Flash-Lite128,000August 2024
Grok-2128,000July 2024
DeepSeek V3128,000July 2024
Llama 3.1128,000December 2023
GPT-4.5128,000October 2023
GPT-o3-mini200,000October 2023
GPT-o1200,000October 2023
GPT-4o mini128,000October 2023
GPT-4o128,000October 2023
Phi-3128,000October 2023
DeepSeek-V2.532,768July 2024
Mistral Large 232,768Unknown (pre-Jul 24)
Mixtral 8x22B65,536Unknown (pre-Apr 24)
DBRX32,768December 2023
DeepSeek-V232,768November 2023
Mistral 7B32,768Unknown (pre-Sept 23)
Nemotron-432,768June 2023
Inflection-2.532,768Mid-2023
Nova32,000Unknown (pre-Dec 24)
Qwen 2.5-Max32,000October 2023
Gemini 1.0 Pro32,000February 2023
Command R8,192November 2023
Jamba8,192Mid-2023
Gemma8,192Mid-2023
Grok-18,192Mid-2023
PaLM 28,192February 2023
Falcon 180B8,192Late 2022
Stable LM 24,096December 2023
XGen-7B4,096Mid-2023
Alpaca 7B2,048June 2022
Pythia2,048Mid-2022

As adoption continues to grow, so does the LLM industry.

Instantly Analyze
Any Market

Here's a deeper dive on some of the most important models over the last 3 years.

1. GPT-4.5

undefined

Developer: OpenAI

Release date: February 2025

Number of Parameters: Unknown

Context Window (Tokens): 128,000

Knowledge Cutoff Date: October 2023

What is it? GPT-4.5 "Orion" is the largest OpenAI model to date. In benchmarking, it outperforms GPT-4o in most tests. However, GPT-4.5 is not a reasoning model like the OpenAI "o" lineup, so it does not "think" before responding. Instead, it was trained using unsupervised learning, which is designed to give it broad knowledge from a massive set of training data. In its announcement, OpenAI said that GPT-4.5 is not a replacement for 4o, which is less expensive to run.

Pro users were first to get access to GPT-4.5 in late February, 2025. Plus, Team, Enterprise, and Education users will be able to use it from early March.

GPT-4.5 is thought to be the last release before the highly anticipated GPT-5.

2. DeepSeek R1

DeepSeek counting Rs in strawberry

Developer: DeepSeek

Release date: January 2025

Number of Parameters: 671B total, 37B active

Context Window (Tokens): 131,072

Knowledge Cutoff Date: July 2024

What is it? DeepSeek R1 is a reasoning model that excels in math and coding. It beats or matches OpenAI o1 in several benchmarks, including MATH-500 and AIME 2024.

On its release, DeepSeek immediately hit headlines due to the low cost of training compared to most major LLMs. Traffic to the DeepSeek website exploded in early 2025.

DeepSeek R1 is free to use and open-source. It's accessible via the API, the DeepSeek website, and mobile apps.

3. GPT-o3-mini

ChatGPT model switcher

Developer: OpenAI

Release date: January 31, 2025

Number of Parameters: Unknown (estimated ~100B for o1-mini, likely higher for o3-mini)

Context Window (Tokens): 128,000

Knowledge Cutoff Date: October 2023

What is it? GPT-o3-mini is a small, fast model with reasoning capabilities. It follows GPT-o1, the first model in the "o" series from OpenAI and a smaller version of the forthcoming o3 model.

Compared to o1, o3-mini is cost-efficient but does not support vision. Developers can choose from three levels of "effort", and similarly, users on the web can switch to o3-mini-high. OpenAI says o3-mini-high is more intelligent, but slower to reason.

After the full version of o3 is released, OpenAI is expected to release GPT-4.5 "Orion" and GPT-5 in mid-2025. The ChatGPT website continues to be one of the world's most popular sites, receiving more than 75 million visitors from organic search in February 2025.

4. Claude 3.7 Sonnet

undefined

Developer: Anthropic

Release date: February 24, 2025

Number of Parameters: Unknown (estimated to be 200B+)

Context Window (Tokens): 200,000

Knowledge Cutoff Date: October 2024

What is it? Claude 3.7 Sonnet is the successor to Claude 3.5 Sonnet "New". It's anticipated to be the most intelligent model from Anthropic so far.

Claude remains a popular model for creative tasks and coding. On the web, it receives 3M+ visits from organic search each month.

The company describes 3.7 Sonnet as a hybrid reasoning model, meaning that it functions as a regular LLM, but is also capable of "thinking". In this extended thinking mode, 3.7 Sonnet publishes its "thought process" before providing an answer. The amount of "thinking" time is also user-customizable.

As part of the release, Claude's knowledge cutoff date was extended to October 2024. The context window remains unchanged at 200,000 tokens.

5. Grok-3

undefined

Developer: xAI

Release date: February 17, 2025

Number of Parameters: Unknown (Grok-1: 314 billion)

Context Window (Tokens): 128,000

Knowledge Cutoff Date: None (uses real-time information)

What is it? Grok-3 (announced as Grok-3 Beta) is the latest xAI model. It's directly integrated into X (formerly Twitter) on the Premium+ plan and excels in processing real-time information about the real world. In pre-training, Grok-3 required ten times the resources of the previous model, Grok-2. 

Grok-3 has a much larger context window than its predecessors at 1 million tokens. Users can click the "Think" button to run Grok in reasoning mode. It also has "Big Brain" and "Deep Search" modes and was released alongside a mini version.

6. Mistral Large 2

undefined

Developer: Mistral AI

Release date: July 24, 2024

Number of Parameters: 123 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Unknown

What is it? Mistral Large 2 is a massive upgrade on its predecessor, Mistral 7B. It's designed to bring Mistral up to speed with the latest competing LLMs from Meta and OpenAI. It was also fine-tuned to reduce the number of hallucinations. Large 2 is multilingual and has a 128k context window.

Benchmarking suggests that Mistral Large 2 outperforms Llama 3.1 405B in Python, C++, Java, PHP, and C# coding tasks.

7. PaLM 2

undefined

Developer: Google

Release date: May 10, 2023

Number of Parameters: 340 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: February 2023

What is it? PaLM 2 is an advanced large language model developed by Google. As the successor to the original Pathways Language Model (PaLM), it’s trained on 3.6 trillion tokens (compared to 780 billion) and 340 billion parameters (compared to 540 billion). PaLM 2 was originally used to power Google's first generative AI chatbot, Bard (rebranded to Gemini in February 2024).

Want to Uncover
Your True Audience? 🔎

8. Falcon 180B

undefined

Developer: Technology Innovation Institute (TII)

Release date: September 6, 2023

Number of Parameters: 180 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Late 2022

What is it? Developed and funded by the Technology Innovation Institute, Falcon 180B is an upgraded version of the earlier Falcon 40B LLM. It has 180 billion parameters, which is 4.5 times larger than the 40 billion parameters of Falcon 40B.

In addition to Falcon 40B, it also outperforms other large language models like GPT-3.5 and LLaMA 2 on tasks such as reasoning, question answering, and coding. In February 2024, the UAE-based Technology Innovation Institute (TII) committed $300 million in funding to the Falcon Foundation.

9. Stable LM 2

undefined

Developer: Stability AI

Release date: January 19, 2024

Number of Parameters: 1.6 billion and 12 billion

Context Window (Tokens): 4,096

Knowledge Cutoff Date: December 2023

What is it? Stability AI, the creators of the Stable Diffusion text-to-image model, are the developers behind Stable LM 2. This series of large language models includes Stable LM 2 12B (12 billion parameters) and Stable LM 2 1.6B (1.6 billion parameters). Released in April 2024, the larger 12B model outperforms models like LLaMA 2 70B on key benchmarks despite being much smaller.

10. Gemini 1.5 Pro

undefined

Developer: Google DeepMind

Release date: February 15, 2024

Number of Parameters: Unknown

Context Window (Tokens): 2,000,000

Knowledge Cutoff Date: November 2023

What is it? Gemini 1.5 offered a significant upgrade over its predecessor, Gemini 1.0. Gemini 1.5 Pro provided a one million-token context window on release (1 hour of video, 700,000 words, or 30,000 lines of code), the largest available at the time. This upgrade is 35 times larger than Gemini 1.0 Pro and surpasses the previous largest record of 200,000 tokens set by Anthropic’s Claude 2.1.

In June 2024, Gemini 1.5 Pro's context window was increased to 2,000,000 tokens.

11. Llama 3.1

undefined

Developer: Meta AI

Release date: June 23, 2024

Number of Parameters: 405 billion

Context Window (Tokens): 128,000

Knowledge Cutoff Date: December 2023

What is it? Llama 3, the predecessor to Llama 3.1, was available in both 70B and 8B versions that outperformed other open-source models like Mistral 7B and Google's Gemma 7B on MMLU, reasoning, coding, and math benchmarks. Now, users will notice major upgrades to the latest version, including 405 billion parameters and an expended context length of 128,000.

Users will also notice more accuracy because of the impressive knowledge base, which has been trained on over 15 trillion tokens. Plus, Meta added eight additional languages for this model. The increased size of this model makes it the largest open-source model released to date

Customers can still access its predecessor, Llama 2, which is available in three versions: 7 billion, 13 billion, and 70 billion parameters. 

12. Mixtral 8x22B

undefined

Developer: Mistral AI

Release date: April 10, 2024

Number of Parameters: 141 billion

Context Window (Tokens): 65,536

Knowledge Cutoff Date: Unknown

What is it? Mixtral 8x22B is a sparse Mixture-of-Experts (SMoE) model has 141 billion total parameters but only uses 39B active parameters to focus on improving the model’s performance-to-cost ratio.

13. Inflection-2.5

undefined

Developer: Inflection AI

Release date: March 10, 2024

Number of Parameters: Unknown

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Mid 2023

What is it? Inflection-2.5 was developed by Inflection AI to power its conversational AI assistant, Pi. Significant upgrades have been made, as the model currently achieves over 94% of GPT-4’s average performance while only having 40% of the training FLOPs. In March 2024, the Microsoft-backed startup reached 1+ million daily active users on Pi.

In October 2024, Inflection released the Inflection 3.0 family of models designed for enterprise use.

14. Jamba

undefined

Developer: AI21 Labs

Release date: March 29, 2024

Number of Parameters: 52 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Mid 2023

What is it? AI21 Labs created Jamba, the world's first production-grade Mamba-style large language model. It integrates SSM technology with elements of a traditional transformer model to create a hybrid architecture. The model is efficient and highly scalable, with a context window of 256K and deployment support of 140K context on a single GPU.

15. Command R

undefined

Developer: Cohere

Release date: March 11, 2024

Number of Parameters: 35 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: November 2023

What is it? Command R is a series of scalable LLMs from Cohere that support ten languages and 128,000-token context length (around 100 pages of text). This model primarily excels at retrieval-augmented generation, code-related tasks like explanations or rewrites, and reasoning. In April 2024, Command R+ was released to support larger workloads and provide real-world enterprise support.

16. Gemma

undefined

Developer: Google DeepMind

Release date: February 21, 2024

Number of Parameters: 2 billion and 7 billion

Context Window (Tokens): 8,192

Knowledge Cutoff Date: Mid 2023

What is it? Gemma is a series of lightweight open-source language models developed and released by Google DeepMind. The Gemma models are built with similar tech to the Gemini models, but Gemma is limited to text inputs and outputs only. The models have a context window of 8,000 tokens and are available in 2 billion and 7 billion parameter sizes.

17. Phi-3

undefined

Developer: Microsoft

Release date: April 23, 2024

Number of Parameters: 3.8 billion

Context Window (Tokens): 8,000-128,000

Knowledge Cutoff Date: Mid 2023

What is it? Classified as a small language model (SLM), Phi-3 is Microsoft's latest release with 3.8 billion parameters. Despite the smaller size, it's been trained on 3.3 trillion tokens of data to compete with Mistral 8x7B and GPT-3.5 performance on MT-bench and MMLU benchmarks.

To date, Phi-3-mini is the only model available. However, Microsoft plans to release the Phi-3-small and Phi-3-medium models later this year.

18. XGen-7B

undefined

Developer: Salesforce

Release date: July 3, 2023

Number of Parameters: 7 billion

Context Window (Tokens): 4,096

Knowledge Cutoff Date: Mid 2023

What is it? XGen-7B is a large language model from Salesforce with 7 billion parameters and an 8k context window. The model was trained on 1.37 trillion tokens from various sources, such as RedPajama, Wikipedia, and Salesforce's own Starcoder dataset.

Salesforce has released two open-source versions, a 4,000 and 8,000 token context window base, hosted under an Apache 2.0 license.

19. DBRX

undefined

Developer: Databricks' Mosaic ML

Release date: March 27, 2024

Number of Parameters: 132 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: December 2023

What is it? DBRX is an open-source LLM built by Databricks and the Mosaic ML research team. The mixture-of-experts architecture has 36 billion (of 132 billion total) active parameters on an input. DBRX has 16 experts and chooses 4 of them during inference, providing 65 times more expert combinations compared to similar models like Mixtral and Grok-1.

20. Pythia

undefined

Developer: EleutherAI

Release date: February 13, 2023

Number of Parameters: 70 million to 12 billion

Context Window (Tokens): 2,048

Knowledge Cutoff Date: Mid 2022

What is it? Pythia is a series of 16 large language models developed and released by EleutherAI, a non-profit AI research lab. There are eight different model sizes: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. Because of Pythia's open-source license, these LLMs serve as a base model for fine-tuned, instruction-following LLMs like Dolly 2.0 by Databricks.

21. Alpaca 7B

undefined

Developer: Stanford CRFM

Release date: March 27, 2024

Number of Parameters: 7 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: Unknown

What is it? Alpaca is a 7 billion-parameter language model developed by a Stanford research team and fine-tuned from Meta's LLaMA 7B model. Users will notice that, although being much smaller, Alpaca performs similarly to text-DaVinci-003 (ChatGPT 3.5). However, Alpaca 7B is available for research purposes, and no commercial licenses are available.

22. Nemotron-4 340B

undefined

Developer: NVIDIA

Release date: June 14, 2024

Number of Parameters: 340 billion

Context Window (Tokens): 32,768

Knowledge Cutoff Date: June 2023

What is it? Nemotron-4 340B is a family of large language models for synthetic data generation and AI model training. These models help businesses create new LLMs without larger and more expensive datasets. Instead, Nemotron-4 can create high-quality synthetic data to train other AI models, which reduces the need for extensive human-annotated data.

The model family includes Nemotron-4-340B-Base (foundation model), Nemotron-4-340B-Instruct (fine-tuned chatbot), and Nemotron-4-340B-Reward (quality assessment and preference ranking). Due to the 9 trillion tokens used in training, which includes English, multilingual, and coding language data, Nemotron-4 matches GPT-4's high-quality synthetic data generation capabilities.

Conclusion

The landscape of large language models is rapidly evolving, with new breakthroughs and innovations emerging at an unprecedented pace.

From compact models like Phi-2 and Alpaca 7B to cutting-edge architectures like Jamba and DBRX, the field of LLMs is pushing the boundaries of what's possible in natural language processing (NLP).

We will keep this list regularly updated with new models. If you liked learning about these LLMs, check out our lists of generative AI startups and AI startups.

Stop Guessing, Start Growing 🚀

Use real-time topic data to create content that resonates and brings results.

Reveal More Competitor Secrets for Free

Which keywords they target
Their most important pages
Where they get backlinks from
How they monetize their site
Get more free data

Stop Guessing, Start Growing 🚀

Use real-time topic data to create content that resonates and brings results.