Wikipedia: Translation En Masse
Wikipedia is a truly global phenomenon of public, open-source information. As of this writing, it is available in over 280 different languages. And by written, we mean that simple machine-translation is prohibited. While the prevalence of many languages in the Wkipedia community may be expected, there are also some genuine surprises. Let’s take a look at a profile of Wikipedia in its myriad of different languages.
The opportunity Wikipedia presents is self-apparent. Speakers of over 280 language communities have taken to Wikipedia as a way to document their culture and heritage. The downside is that many of the pages have wide varieties of quality.
There are 12 languages that offer over a million Wikipedia pages.
There are 42 two more Wikipedia editions that offer 100,000+ pages.
Another 77 over more than 10,000 pages.
109 have more than 1,000 pages.
38 have 100 pages or more.
The English-language version, launched first in 2001, is unsurprisingly the predominant version in terms of traffic, with over 142 million visitors per day, and pages available (over 5 million). For the most part they have provided clear guidance as to when to use American English or British English. Wikipedia has long been very open and welcoming of non-native English speakers to Wikipedia. This has many positives and some drawbacks. Global contributors can offer perspectives on their own homelands and cultures, broadening Wikipedia’s perspective from a historically Western-culture lens. Though some of the pages on Wikipedia may have some awkward translations into English, native English speakers can cleanup the pages after initial contributions are made. Also, at times you may see Wikipedia edit wars due to culture conflicts, such as ethnic or international political controversies.
Wikipedia traffic for the “top ten” most-visited versions for the month of November 2015 are as follows. Yet for a kicker, let’s compare this to the number of total L1 and L2 speakers for each language to see how popular Wikipedia is within each community. This will lead to very general statistic of “views per month” (vpm) per person fluent in that language.
English (4.267 Billion views / 840 Million speakers = 5 views per month per each fluent person)
Spanish (649 Million views / 490 Million speakers = 1.32 vpm)
Russian (637 Million views / 260 Million speakers = 2.45 vpm )
German (630 Million / 88 Million = 7.15 vpm)
Japanese (536 Million / 130 Million = 6 vpm)
French (451 Million / 160 Million = 2.8 vpm)
Italian (229 Million / 64 Million = 3.5 vpm)
Chinese[Mandarin] (217 Million / 1,030 Million = 0.21 vpm )
Portuguese (184 Million / 200 Million = 0.92 vpm)
Polish (179 Million / 39 Million = 4.5 vpm)
So you have three top languages where Wikipedia predominates with 5+ views per month per person: German, Japanese, and English. Next, you have a cluster of languages where the average fluent person visits between 2-5 times per month: Polish, Italian, French and Russian. Spanish hovers at just over one visit per month per fluent speaker, and Portuguese just below that.
Chinese surprisingly pulls up far in the rear of the top languages, with, on average, only one Mandarin speaker out of five making only one Wikipedia page lookup per month. The reason here is the popularity of other Wiki sites, such as the for-profit Hudong Baike (baike.com) In fact, Wikipedia is only the fourth-most-popular Wiki in China.
As well, there is a huge difference between traffic (measured in visitors) and content (measured in articles). In terms of the most encyclopedia pages, here’s the list of the top ten languages on Wikipedia:
English (5 Million articles)
Swedish (2.3 Million articles)
German (1.8 Million articles)
Dutch (1.8 Million articles)
French (1.7 Million articles)
Cebuano (1.6 Million articles)
Russian (1.27 Million articles)
Waray-Waray (1.25 Million articles)
Italian (1.24 Million articles)
Spanish (1.22 Million articles)
English is the most predominant again. There are many other correlations from the top ten traffic sites: German, French, Russian, Italian and Spanish.
It is also not too surprising to see Swedish or Dutch make their way onto the list. At 89 million visits per month, Dutch falls just outside the top ten. Swedish, with 45.5 million visits per month, is the 15th most-visited Wikipedia, but has the second-most content available. Two more languages with more than a million pages that are right behind the “top ten” are Polish and Vietnamese.
The true outliers are the Philippine language Wikipedias of Cebuano and Waray-Waray. Cebuano is the second-most spoken language in the Philippines (behind Tagalog), and has a community of about 16 million people. Traffic to its Wikipedia ranks 68th, with 468,000 visitors per month. Waray-Waray, spoken by about 2.5 million, ranks 88th, with 405,000 visitors per month. It is a testimony to the dedication of volunteer translators to keep these wikis competing with other far more broadly-spoken languages.
One lesson to take away from Wikipedia is that sheer quantity of content will not necessarily convert directly into traffic. You should look at other factors in reaching a market, including overall population size, economy, and access to Internet-enabled computing or mobile smartphones. Though another lesson is this: there are passionate and vibrant communities around the world who have vested interests in their native languages, and none should be taken for granted.