Comparing the Web’s Languages to the World’s Languages

Wordcloud of the top 10 languages used on websites around the world,
showing the disproportionate predominance of English

recent article in Quartz cites research by Facebook, that English language websites accounted for 56% of the global total, and a basket of merely 10 languages account for 89% of the sites in the world. This leads to a phenomenon of “linguistic exclusion” that shuts out many of those who speak any of the more than 7,000 other living languages in the world.

These top 10 languages, collectively, do not reflect global population. Let’s compare the top 10 languages for websites with the top 10 most popularly spoken languages in the world. The data cited in the web article was from w3techs’ table of “Usage of content language for websites” and is a bit outdated. w2tech’s most recent statistics show a bit of broadening, with English only representing 53.6% of global websites now. Here’s the current top 10 language list, which account for 88.2% of global websites:

Source: w3techs

Wordcloud showing the top 10 languages spoken around the world,
proportionally showing the predominance of Chinese

Here’s the most popular first languages in the world, taken from Ethnologue. It combines 13 different variants of Chinese into one broad category, and the same with 19 different variations of Arabic, and likewise combines multiple Western Punjabi languages into Lahnda. According to their listing, here are the top spoken languages in the world:

Source: Ethnologue

Native speakers of these ten languages cumulatively account for 52.1% of the global population — little more than half of the world’s population.

English is thus extremely over-represented on the web compared to its actual global population by a factor of 10 (53% of websites compared to 5.2% of natively fluent population). Russian, German and Japanese are also over-represented on the web by more than twice their demographic percentage of global population.

French, Italian, and Polish occur on the top 10 list of web site languages, but are not among the top 10 spoken languages of the world. French is 14th place on the Ethnologue list. Italian at 21st place, and Polish is much further down, around 30th place, with around 40 million speakers.

The reason for over-representation is in part historical, and partly economical. Many of the over-represented languages reflect historically wealthy regions and larger economies of the world such as Germany, Japan, France and Italy.

Portuguese is represented on the web (2.6%) nearly in proportion to its global population (3.1%). Two other popular languages, Chinese and Spanish, are in both top 10s, but are severely underrepresented on the web compared to their populations.

Now consider Arabic, with over 267 million speakers, or Hindi, Bengali and Lahnda (Western Punjabi), which all have populations in excess of 100 million speakers, yet do not show up in the ranks of top languages used on the web. Arabic comes in at 14th Place on w3techs’ list. The other three languages from South Asia are used by less than 0.1% of the world’s web sites.

This can be attributed to the use of English as an official languages in both India and Pakistan, where it serves as a lingua franca for these multilingual nations. Yet it also shows the dichotomy of what is seen and heard on the web compared to what is spoken on the street and in the homes for hundreds of millions of people.

What thoughts do you have on the difference between the languages prevalent on the web, and those used broadly in your target markets and communities of interest? We’d love to hear! Send us your comments at projects@e2f.com.

Previous
Previous

RL10N: Data Science gets Localized

Next
Next

"Localization Matters for B2B and B2E Content Too!"