grid-line

Languages Supported by ChatGPT

by Josh Howarth
June 11, 2024

ChatGPT has over 180 million active users and has been adopted by companies around the world. Does that mean it supports other languages?

The short answer is yes.

The long answer is yes — but…

ChatGPT does officially support dozens of languages. And it has been found to have varying levels of ability in many more, including several programming and coding languages.

However, ChatGPT’s abilities vary wildly when using non-English languages.

So, which languages does ChatGPT know? And what's the best language to use for prompting?

In this article, we’ll investigate ChatGPT’s language abilities. We’ll also break down what ChatGPT does well in other languages and where it struggles, and the reasons why.

Full list of languages supported and spoken by ChatGPT

Language Officially supported Limited support or some ability demonstrated Countries
Albanian ✔️ Albania, Kosovo, North Macedonia
Amharic ✔️ Ethiopia
Arabic ✔️ Algeria, Bahrain, Chad, Comoros, Djibouti, Egypt, Eritrea, Iraq, Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tunisia, United Arab Emirates, Yemen
Armenian ✔️ Armenia
Awadhi ✔️ India
Azerbaijani ✔️ Azerbaijan
Bashkir ✔️ Russia
Basque ✔️ Spain
Belarusian ✔️ Belarus
Bengali ✔️ Bangladesh, India
Bhojpuri ✔️ India, Nepal
Bosnian ✔️ Bosnia and Herzegovina
Brazilian Portuguese ✔️ Brazil
Bulgarian ✔️ Bulgaria
Burmese ✔️ Myanmar
Cantonese (Yue) ✔️ China
Catalan ✔️ Spain, Andorra, Italy
Chhattisgarhi ✔️ India
Croatian ✔️ Croatia, Bosnia and Herzegovina
Czech ✔️ Czech Republic
Danish ✔️ Denmark, Greenland, Faroe Islands
Dogri ✔️ India
Dutch ✔️ Netherlands, Belgium, Suriname, Aruba, Curaçao, Sint Maarten
English ✔️ Australia, Barbados, Belize, Botswana, Canada, Eswatini, Fiji, Ghana, India, Ireland, Jamaica, Kenya, Lesotho, Liberia, Malawi, Malta, Marshall Islands, Mauritius, Micronesia, Namibia, New Zealand, Nigeria, Pakistan, Palau, Papua New Guinea, Philippines, Rwanda, Samoa, Seychelles, Sierra Leone, Singapore, Solomon Islands, South Africa, South Sudan, Sri Lanka, Tanzania, Uganda, Vanuatu, Zambia, Zimbabwe
Estonian ✔️ Estonia
Faroese ✔️ Faroe Islands
Finnish ✔️ Finland
French ✔️ Belgium, Benin, Burkina Faso, Burundi, Cameroon, Canada, Central African Republic, Chad, Comoros, Congo (Republic of the), Democratic Republic of the Congo, Djibouti, Equatorial Guinea, France, Gabon, Guinea, Haiti, Ivory Coast, Luxembourg, Madagascar, Mali, Monaco, Niger, Rwanda, Senegal, Seychelles, Switzerland, Togo, Vanuatu
Galician ✔️ Spain
Georgian ✔️ Georgia
German ✔️ Austria, Belgium, Germany, Liechtenstein, Luxembourg, Switzerland
Greek ✔️ Greece, Cyprus
Gujarati ✔️ India
Haryanvi ✔️ India
Hindi ✔️ India
Hungarian ✔️ Hungary
Icelandic ✔️ Iceland
Indonesian ✔️ Indonesia
Irish ✔️ Ireland
Italian ✔️ Italy, San Marino, Switzerland, Vatican City
Japanese ✔️ Japan
Javanese ✔️ Indonesia
Kannada ✔️ India
Kashmiri ✔️ India
Kazakh ✔️ Kazakhstan
Konkani ✔️ India
Korean ✔️ South Korea, North Korea
Kyrgyz ✔️ Kyrgyzstan
Latvian ✔️ Latvia
Lithuanian ✔️ Lithuania
Macedonian ✔️ North Macedonia
Maithili ✔️ India
Malay ✔️ Malaysia, Brunei, Singapore
Malayalam ✔️ India
Maltese ✔️ Malta
Mandarin Chinese ✔️ China, Taiwan, Singapore
Marathi ✔️ India
Marwari ✔️ India
Min Nan ✔️ Taiwan, China
Moldovan ✔️ Moldova
Mongolian ✔️ Mongolia
Montenegrin ✔️ Montenegro
Nepali ✔️ Nepal, India
Norwegian ✔️ Norway
Oriya ✔️ India
Pashto ✔️ Afghanistan, Pakistan
Persian (Farsi) ✔️ Iran, Afghanistan, Tajikistan
Polish ✔️ Poland
Portuguese ✔️ Portugal, Brazil, Mozambique, Angola, Cape Verde, Guinea-Bissau, East Timor, Equatorial Guinea, São Tomé and Príncipe
Punjabi ✔️ India, Pakistan
Rajasthani ✔️ India
Romanian ✔️ Romania
Russian ✔️ Russia, Belarus, Kazakhstan, Kyrgyzstan
Sanskrit ✔️ India
Santali ✔️ India
Serbian ✔️ Serbia, Bosnia and Herzegovina, Montenegro
Sindhi ✔️ Pakistan, India
Sinhala ✔️ Sri Lanka
Slovak ✔️ Slovakia
Slovenian ✔️ Slovenia
Somali ✔️ Somalia, Djibouti, Ethiopia
Spanish ✔️ Argentina, Bolivia, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Equatorial Guinea, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Spain, Uruguay, Venezuela, Andorra
Swahili ✔️ Tanzania, Kenya, Uganda
Swedish ✔️ Sweden, Finland
Tagalog ✔️ Philippines
Tajik ✔️ Tajikistan
Tamil ✔️ India, Sri Lanka, Singapore
Tatar ✔️ Russia
Telugu ✔️ India
Thai ✔️ Thailand
Turkish ✔️ Turkey, Northern Cyprus
Turkmen ✔️ Turkmenistan
Ukrainian ✔️ Ukraine
Urdu ✔️ Pakistan, India
Uzbek ✔️ Uzbekistan
Vietnamese ✔️ Vietnam
Welsh ✔️ Wales, United Kingdom
Wu ✔️ China

How many languages does ChatGPT support?

Officially, ChatGPT supports 58 languages. That’s according to OpenAI, the company that develops ChatGPT.

On the one hand, that’s less than 1% of all the languages spoken in the world.

On the other hand, these 50-plus languages account for the vast majority of the world’s population. The list of supported languages includes English, Spanish, French, Portuguese, Russian, Japanese, Chinese, Arabic, and practically all other major languages.

undefined
Most countries have an official language supported by ChatGPT

Roughly 4.5 billion people natively speak at least one of the languages officially supported by ChatGPT. Many more speak at least one of them as a second language.

Can you change ChatGPT’s language?

Yes. However, you might not have to. ChatGPT automatically detects the language used by your browser or mobile device. It then switches to that language, as long as it is one of the languages officially supported.

You can manually change the language used by ChatGPT in the settings.

What languages does ChatGPT know?

This might seem like the same question as “Which languages are supported,” but it’s very different.

ChatGPT might officially support over fifty languages, but it’s demonstrated some level of ability in almost twice as many natural languages.

ChatGPT can even process and respond to queries submitted in different scripts, from cyrillic, arabic, and chinese, to hieroglyphs.

How to generate a different language in ChatGPT

You can get ChatGPT to speak in a different language in two ways.

The first is to simply ask. Submit your query in one language, and at the end, specify the language you would like ChatGPT to answer with.

undefined
ChatGPT responds to a question submitted in English in Spanish, as requested

The second method is to submit your request in the language you wish to receive as a response.

undefined
When asked a question in Spanish, ChatGPT correctly responds in Spanish

You might notice that the two responses are different, despite the question being the same. We’ll get back to that later.

The exact number of languages that ChatGPT knows is somewhat unclear. Sources provide different estimates, some a little under 100, some slightly over.

The reason for this uncertainty is the same reason ChatGPT knows so many languages in the first place. It’s based on how large language models learn new information.

How does ChatGPT learn new languages?

ChatGPT is a natural language processor. In other words, it’s great at processing and producing information that a human might say. That’s why it can generate human-like responses, which allow you to have a conversation with it.

But how does ChatGPT know what to say in response to a query?

Very simply, ChatGPT is an extremely powerful algorithm that has been trained on vast amounts of data to identify certain patterns and respond appropriately. Each language is composed of different patterns, and ChatGPT is capable of learning, processing, and responding in any of them.

When you submit a query, ChatGPT processes the information it has received. It recognizes certain patterns — the words you used, their order, and so on — and generates an appropriate response. Or at least, that’s the aim. Because ChatGPT doesn’t truly understand things, its responses are not always appropriate.

Hold on. ChatGPT doesn’t understand things?

Is ChatGPT multilingual?

It’s a mistake to conceive of ChatGPT “understanding” a language. That’s because it doesn’t truly understand things the way a human does.

For example, ChatGPT is sometimes compared to a baby. A human baby doesn’t learn language by studying a textbook, memorizing vocab, learning grammar rules, and so on.

Instead, babies learn language organically. Over time, with enough input, they begin to identify what words mean, how those words are put together to form sentences, and so on. Babies can even learn multiple languages simultaneously.

On the surface, ChatGPT does the same, albeit on a much larger scale. By studying its training data, ChatGPT learns the patterns and structures that define a language. It can process queries in a language and respond to them.

However, unlike a human, ChatGPT doesn’t truly understand what it is processing or producing. It identifies certain patterns in the information it receives, recognizes those patterns based on its training, and generates appropriate patterns in response.

In short, then, ChatGPT isn’t truly multilingual. ChatGPT can speak Spanish, for example, but it doesn’t really understand Spanish the same way a native speaker or even a translator does.

This might seem like a minor difference. However, it explains why lists of the number of languages that ChatGPT speaks are generally incomplete. ChatGPT’s abilities are not fixed.

It also means you probably shouldn’t try to use ChatGPT to learn a language. ChatGPT could be an excellent tool to help you learn a language, but it can’t really teach you a whole new language by itself.

Finally, and most importantly, this seemingly small difference between recognizing patterns and truly understanding helps explain why ChatGPT struggles with languages that aren’t English.

Why ChatGPT struggles in other languages

The corpus of data used to train ChatGPT contained mostly English. In part, that’s because OpenAI is an American company. It’s also because the vast majority of digitized content is in English.

Put simply, the more information ChatGPT has in a certain language, the more capable it is in that language. That’s because it’s had more opportunities to learn the patterns and structures that define that language.

The amount of data available to ChatGPT — or available at all, to any AI tool — is a resource. A high-resource language, like English, is one with plenty of material available for ChatGPT to learn from. A low-resource language is one with few materials available.

English is by far the language with the most resources. Chinese is another high-resource language, thanks to the sheer number of people who speak it. Spanish is another fairly high-resource language for the same reason.

undefined

A selection of languages ranked by their prevalence in the CommonCrawl corpus (“CC Size”) and grouped into categories based on whether they are high, medium, low, or extremely low resource

Studies have repeatedly shown that ChatGPT performs worse in languages with fewer resources available.

Of course, new resources are constantly being generated. However, they aren’t being generated at an equal rate, nor are they necessarily all being included in ChatGPT’s training.

What is the knowledge cutoff of ChatGPT?

ChatGPT-4o, which as of June 2024 is the most advanced version of ChatGPT available, has a knowledge cutoff of October 2023.

That means that any new materials produced in the last several months are not included in ChatGPT’s training.

For one thing, that means ChatGPT won’t be aware of new parts of a language — a new slang term or idiom invented after October 2023, for example.

Can ChatGPT speak Spanish?

Speaking of idioms, ChatGPT often struggles with these, even in relatively high-resource languages (that aren’t English).

For example, in one study, researchers asked GPT-4 to list some colloquial terms for sneakers. It successfully noted trainers (used in England) and joggers (used in Australia). However, when the same question was posed in Spanish, GPT-4 struggled. It failed to identify equivalent slang terms in Spanish, such as zapatillas deportivas (Spain) or championes (Uruguay).

These difficulties can even creep into responses given in the same language. Remember how GPT-4 responded slightly differently to my question about the most popular Spanish dish? Studies have shown that ChatGPT generates better answers when asked in English, even if the response is always in another language.

In general, the same principles of approaching ChatGPT apply in any language. Refining your query can help produce better answers. Sticking to subjects that are widely discussed online also helps.

Beyond all that, though, asking ChatGPT something in English can still yield better results, even if it is technically proficient in the desired language.

Ultimately, ChatGPT is still focused on English. In December 2022, an OpenAI staff member explicitly said this in a forum discussion. They said that “any good Spanish results are a bonus.”

Can ChatGPT translate things?

Once again, the short answer is yes, and the longer answer is yes — but…

What ChatGPT can translate

ChatGPT can struggle with idioms and similarly complex features of language. Nevertheless, it’s still often better at translating these sorts of phrases than standard translation tools.

That’s because translation tools convert words and phrases in isolation, while ChatGPT views things in the context of all the data it has been trained on. It has more knowledge of a language’s culture, regional differences, slang, and other patterns that a traditional translation tool might not pick up when given a phrase in isolation.

Take one common Spanish idiom, “como la copa de un pino.” Let’s ask ChatGPT to translate that.

undefined
ChatGPT successfully recognizes and translates a Spanish idiom

ChatGPT successfully recognized the idiom. It provided a few different translations in English, and even gave an example phrase that makes the real meaning of the idiom clear.

Now, let’s ask a translation tool the same question.

undefined
Google Translate translates the Spanish idiom literally, which isn’t particularly helpful

Here’s what Google Translate generates. The result is technically correct, but doesn’t really make sense. It certainly doesn’t convey the actual meaning of the Spanish phrase. “Like the top of a pine” could mean something is tall, or leafy, or, well, just about anything.

undefined
DeepL is a little better, but still produces a translation that would likely confuse someone unfamiliar with the Spanish idiom

The translation tool DeepL is a little better, but still quite literal. “As good as the top of a pine tree” probably won’t make sense to an English speaker with no knowledge of the Spanish idiom. What makes the top of a pine tree good? Again, perhaps its height?

This is just one case where ChatGPT outperforms translation tools.

Another is when you’re dealing with language filled with mistakes or slang. Sam Quillen, for example, found that ChatGPT figured out the translation of a Spanish sentence filled with errors, while Google Translate spat out nonsense.

In general, though, ChatGPT is not ready to replace translation tools — not yet, anyway.

What ChatGPT can’t translate

ChatGPT might perform better than translation tools when it comes to certain idioms, but its performance varies wildly based on the language, prompt, and other factors that might be outside of the user’s control.

Microsoft, which has invested some $10 billion into OpenAI, conducted an extensive study of the translation abilities of large language models like ChatGPT. Researchers found that LLMs performed well with common or relatively high-resource languages, like Spanish. However, the less common the language, the worse LLMs performed.

Even in high-resource languages, ChatGPT performs worse than English. It hallucinates more, getting facts wrong that it managed to get when asked in English.

undefined
One 2023 study found that large language models are more often incorrect, inconsistent, and incorrect when operating in non-English languages.

Again, this might change in the future. Researchers around the world are working on data corpora to help improve ChatGPT’s abilities in lower-resource languages. Last March, OpenAI CEO Sam Altman told a congressional hearing that he was aiming to partner with governments and organizations to acquire more data to help improve ChatGPT’s non-English language skills.

As of now, however, ChatGPT is still very much an English-focused tool. Remember what the OpenAI employee said. Other languages are just a bonus.

ChatGPT gets translations from English wrong

There’s an additional problem with ChatGPT’s translation efforts. More than a dozen studies in 2023 alone concluded that it performs far worse when translating from English into another language.

It also seriously struggles when multiple languages are mixed together. That’s something that millions of people around the world do quite naturally every day.

This seriously undermines the ability of ChatGPT to function as a translator. For example, ChatGPT is often used as a multilingual support bot, or to facilitate trade between two people who don’t speak the same language. However, consider the problems that would naturally arise if one-half of any conversation — the half translated from English — is riddled with errors.

Which coding language is used in ChatGPT?

We’ve discussed what languages ChatGPT knows and speaks, but what language is it built with?

The main structure of ChatGPT was created with Python. About half of all software developers use Python, making it one of the most popular languages out there. It’s used to develop websites, analyze data, and build artificial intelligence tools like ChatGPT.

ChatGPT was also built by connecting this base Python code to two powerful frameworks called TensorFlow and PyTorch.

Various other frameworks were layered on top of this core structure to help improve ChatGPT’s performance. These include Gensim, NLTK, and spaCy.

You can also connect ChatGPT to other libraries or APIs to further enhance its abilities. Currently, OpenAI supports access from over 180 countries and territories.

Which coding languages does ChatGPT know?

ChatGPT might be built in Python, but it’s demonstrated impressive abilities in several other programming and coding languages. These include all of the most popular languages, as well as a few others:

  • Python
  • JavaScript
  • Java
  • Ruby
  • HTML
  • R
  • C++
  • C#

Is ChatGPT worth it for coding?

Remember, ChatGPT doesn’t truly understand coding and programming languages, in the exact same way it doesn’t understand natural languages.

Therefore, while it can be incredibly helpful, it shouldn’t be relied on entirely. Instead, use it as one of the tools in your toolkit when building or editing code.

Conclusion

In the eighteen months since its launch, ChatGPT has demonstrated ability in over 100 languages. It and other large language models have helped people around the world overcome language barriers that previously may have seemed insurmountable.

That being said, ChatGPT is still very much English-centric. It was built by an American company and its focus is on American English. Despite showing some amazing skills in other languages, it still performs best in English.

Nevertheless, as time goes on, ChatGPT’s skills in non-English languages will almost certainly improve.