Chinese, Arabic, and Romance Languages: The Politics of Language Family Ties

“A language is a dialect with an army and a navy.” – attributed to Max Weinreich

The boundaries between languages and dialects can be fuzzy. Two languages can be mutually intelligible or even interchangeable, and two dialects of the same language can be so different that two people speaking to each other wouldn’t understand a word of what the other was saying. There doesn’t seem to be a hard-and-fast rule for when a dialect graduates to language status, and it varies across different schools of thought, regions, and political conceptions of linguistic identity.

To drive home the point how confusing this can get, I want to give you three concrete examples using the languages or language groups I mentioned in the title: Chinese, Arabic, and Romance Languages.

Let’s start with Chinese language(s), since as has likely become painfully obvious by now, that’s my specialty. These days, Chinese is often used as a synonym for Mandarin Chinese, a Sinitic/Chinese language that originated from languages spoken by government officials (mandarins) in what is now Beijing. However, sojourn further south to the Hong Kong Special Administrative Region (SAR), and you will hear Chinese used as a synonym for Cantonese, the language spoken in the southeastern provinces of Guangdong (Canton) and parts of Guangxi, as well as by many members of the global Chinese diaspora communities (Mandarin is as well, for that matter).

The problem with this semantic arrangement is that Mandarin and Cantonese are mutually unintelligible in their spoken form. If Li Xiaopeng from Beijing and Lam Lai-ying from Hong Kong wanted to speak to each other in their respective definitions of Chinese, they would not understand a word that the other was saying. Although Mandarin Chinese and Cantonese are related to one another and share both a writing system and certain characteristics (namely, they are both tonal languages), their phonology and grammar are markedly different from each other. For instance, per an article by Hong Kong Baptist University professor Gisela Bruche-Schultz (1997), when making a comparison between two items, Mandarin Chinese generally follows this pattern: Thing A + Comparative Marker + Thing B + Adjective. Let’s translate the English phrase “Shaquille O’Neal is taller than Peter Dinklage” into Mandarin and Cantonese:

Shaquille O’Neal 比 Peter Dinklage 高.

Literally: Shaquille O’Neal COMPARISON MARKER Peter Dinklage tall.

But in Cantonese, the order’s a bit different. Let’s use the same example sentence, though this time, I’ll use just English, since regrettably, I don’t know how to type in Cantonese (they use a different Romanization system from Mandarin), or how to speak Cantonese at all:

Shaquille O’Neal tall COMPARISON MARKER Peter Dinklage.  (It’s interesting that Cantonese’s way of constructing comparisons is much closer to English)

This is just one difference that I gave as an example, but there are many, many more differences between Cantonese and Mandarin that make them mutually unintelligible. But if Cantonese and Mandarin are so different that speakers can’t understand each other, and even the shared written form can reveal markedly different grammar and syntax, why are they considered dialects of one Chinese language?

An article from The Economist theorizes that it may be a matter of nationalism. After all, Danish and Norwegian are so closely related that they are mutually intelligible, but don’t get caught saying that they’re the same language on your next trip to Oslo or Copenhagen. By the same token, after the handover of Hong Kong from Britain to China in 1997, national linguistic cohesion has become more crucial for Beijing. Creating a narrative in which the language of Hong Kong is just as Chinese as the language of Beijing (which it is, in fact — recall that it is spoken in parts of Mainland China and has had a long history as a part of China’s rich linguistic landscape) is part of the overall narrative of a Greater China of which Hong Kong is an inalienable part. Thus, Cantonese, despite being by many accounts (per The Economist) a separate language, is considered a “dialect” of a Chinese “macrolanguage” that includes many unintelligible varieties.

Arabic is similar to Chinese in this regard. While Modern Standard Arabic (MSA) and Classical Arabic (used in the Quran) are powerful unifiers for Arabs in MENA and Muslims around the world, the vernacular varieties of Arabic are, like the varieties included under the “Chinese” language umbrella, often mutually unintelligible. If Rami from Cairo and Ahmed from Beirut wished to converse, unless they chose to write to each other in MSA or use Classical Arabic, clear communication would likely be extremely difficult or impossible. And yet, they would likely technically be speaking the same language.

As before, political agendas may be at work for why two mutually unintelligible “varieties” or “dialects” are considered the same language. From the global influence of Islam to the Pan-Arabism movements of the 20th century, there are potent political and cultural reasons to see the many language varieties and dialects spoken across the Middle East, North Africa, and parts of Sub-Saharan Africa as one macrolanguage, Arabic. Yet again, this is despite the fact that according to the “proximity” criterion mentioned in the Economist article, many of these varieties of Arabic could arguably be considered separate languages.

We can contrast Arabic and Chinese to the Romance languages (French, Spanish, Italian, Portuguese, Romanian, Catalan, Latin, and Romansch). Like the many varieties of the Chinese and Arabic language umbrellas we previously discussed, these languages share many grammatical, lexical, and syntactical similarities, and share a writing system. The languages may be intelligible or unintelligible to varying degrees (for instance, I can get the gist of a text written in Italian or Portuguese because of my years of studying Spanish), but unlike the two macrolanguages and their sub-varieties we just discussed, the Romance languages are considered discrete, separate languages!

Why is that? Yet again, let’s look to politics. I would argue that these languages are confined to distinctive geographic and cultural regions (like, say, Cantonese and Levantine Arabic), but these geographic and cultural regions either resisted being subsumed by greater powers…or were doing plenty of subsuming on their own (Weinreich’s bon mot holds true here). European nationalism that arose in the early modern era only intensified the sense that, for instance, Portuguese isn’t a subset of Spanish, or why Danes may take umbrage at the suggestion that their mother tongue is indistinguishable from Norwegian (or take umbrage at the suggestion that Danish and Norwegian are Romance languages, not Nordic languages). Unlike “Greater China” or the Muslim Ummah/Pan-Arab sphere, there is no cultural sphere of influence that encompasses all of the Romance languages together. They tend to have separate spheres of influence — the Francophonie, the Lusophone community, the Hispanophone community, et cetera. A simple twist in the plot threads of history may have made Romansch a dialect of French, or left two “Spanish” speakers from Naples and Madrid scratching their heads at what their interlocutor was on about.

It might start to get a bit boring if I just kept listing more and more examples of the problems of language vs. dialect, but here’s one more quick one: Hindi and Urdu, spoken in India and Pakistan and by members of South Asian diaspora communities, are very, very close to each other in terms of grammar, syntax, and vocabulary. Some have said that they’re about as linguistically distant from each other as British and American English. A Hindi speaker and and Urdu speaker would be able to understand each other with little trouble. The main difference is that mostly Muslims speak Urdu (thus the vocabulary is more influenced by Arabic and Persian, and the script is Arabic-based), and mostly Hindus speak Hindi (thus the vocabulary is more influenced by Sanskrit and the script is Devanagari). Obviously this is not a hard-and-fast rule or stereotype, but that is essentially the division between Hindi and Urdu. They are two languages despite the fact that they’re linguistically closer to each other than Cantonese and Mandarin, or Hassaniya Arabic and Levantine Arabic.

I wish that there was a politically or semantically neutral word for language/dialect/variety to clear up a little bit of this confusion and inconsistency. Speechway? Meaningful sounds? Tongue? But for now, the best way to tell if something is a language or a dialect is its proximity to other, related languages, or how politically and culturally powerful its speakers are.



Bruche-Schultz, G. (1997). ‘Fuzzy’ Chinese: The status of Cantonese in Hong Kong. Journal of Pragmatics, 27, pp. 295-314.

Carreiro, H. (2010). Why Hindi-Urdu is One Language and Arabic is Several. The Matador Network.

R.L.G. (2014). The Economist Explains: How a dialect differs from a language. The Economist.



