Keywords

1 Introduction

The world we live in is constantly changing, creating, and responding to newly acquired features, such as technological advances, climate change, societal upheavals, pandemics, and shifting boundaries between cultures, religions, and languages. It took Columbus 3 months to get from Spain (Castille) to the Americas (the Bahamas); now, a flight from Madrid to Nassau is only about 13 h. Years ago, the professional occupation of an explorer existed; today there are no more unexplored areas left on the planet. Ever increasing global interconnectedness has led to an acceleration of advancements in science and technology and enabled a dramatic expansion of educational and economic opportunity in many developing countries. However, it has also intensified environmental destruction, consolidation of wealth and power in the hands of fewer and fewer corporations (often transnational, not accountable to any nation state), and a widening of the gap between economically and politically powerful individuals and societies and those that lack power.

Along with such changes brought about by globalization, there have also been changes in the expectations of what a modern member of this world should be capable of doing, including what skills he or she should master to enter the global labor market and succeed in it. Knowing how to read is, without a doubt, one of the most crucial of such skills, and any difficulties encountered during the acquisition of reading may jeopardize an individual’s potential for success in today’s global society. In recognition of the importance of literacy as a survival skill and empowerment tool, UNESCO designated 2003–2012 as the United Nations Literacy Decade and sponsored the Literacy Initiative For Empowerment (LIFE) to improve literacy in countries with high literacy needs (Richmond et al., 2008). Although much progress in improving global literacy rates has been made, substantial challenges remain, particularly in some regions (e.g., sub-Saharan Africa, South and West Asia) that are the most linguistically diverse on the planet.

There are important ways in which societal multilingualism affects language choices of individuals and their educational outcomes. For example, globally, there are 50–57 million marginalized children not enrolled in school (Ball, 2015). One of the factors that drives these numbers is linguistic: children whose mother tongue is not the language of instruction are at a higher risk for early school failure or drop out (Bühmann & Trudell, 2008). A better understanding of the linguistic landscapes in which children around the world learn to read is thus necessary for improving the lives of millions of children and further advancing the scientific study of reading.

While cognitive science has been quite successful in mapping out the neuro-cognitive architecture of the reading brain and its genetic underpinnings, most reading acquisition research does not address the issue of linguistic diversity. The majority of studies tacitly assume a scenario in which children learn to read in one of the world’s mega languages, their mother tongue, spoken in a linguistically fairly homogeneous community. In reality, for many children around the world, the situation is quite different. While only 12 languages account for almost half of the world’s population, there are 7117 languages spoken in the world today (Simons & Fennig, 2018).Footnote 1 Most of these are subnational, co-existing with multiple other languages within a country’s border and often competing with a more dominant – in terms of demography and political clout – language(s). Furthermore, only roughly half of the world languages have developed a writing system (Simons & Fennig, 2018). For example, 80% of African languages lack a writing system (https://livingtongues.org). Even if a writing system for a language has been developed, it may not be widely used.

Literacy instruction for speakers of such languages, by some estimates 40% of the world population (UNESCO, 2016a), but substantially higher in regions of the highest linguistic diversity, such as Sub-Saharan Africa, is provided in a language that is not their mother tongue. Evidently, literacy under this condition is a process quite different from what is encountered by children learning to read in their native English and a handful of its European relatives, the languages almost all reading research focuses on. Understanding how linguistic diversity and language ecology in general influences literacy acquisition is thus an important research goal.

In this chapter, we take a look at literacy acquisition through the multifaceted lens of linguistic diversity. We will discuss some of the key issues that arise in multilingual societies and provide various examples illustrating some of these issues, including forms of societal multilingualism, dialect continua, power dynamics in multilingual societies, and diglossia. We also take a look at the effects of linguistic diversity on literacy by examining correlations between countries’ level of linguistic diversity and literacy rate. We conclude by discussing implications of this relationship for educational policy.

2 Forms of Societal Multilingualism

Shifts in the global economic and political order have dramatic effects on the world’s linguistic landscape. One stark consequence of such shifts has been the rise of a few “big” languages that compete for global influence. Some of them acquired the status of international, e.g., English, French, Spanish, Russian, Arabic, and Chinese – used for international transactions and taught and/or used as a medium of instruction in schools outside their countries of origin. At the same time, a massive number of “small” languages are threatened with extinction, being crowded out and replaced by bigger languages (Harrison, 2008). In fact, up to 90% of the languages that exist today face disappearance (Graddol, 2004), leading to an erosion of the human knowledge base, dents in human civilization, and a permanent loss of unique actualizations of the human nature encoded in these languages.

Because a small number of mega languages account for the majority of all speakers, most of the several thousand languages that still exist today are spoken by small groups of people, often in remote rural areas, making them hard to study. Under some estimates, the median size for a language spoken today is only 7500 speakers (Pereltsvaig, 2017). Many of these languages have rich oral traditions but no written language.

Thus, on the one hand, we are witnessing a trend towards a dramatic reduction of linguistic diversity – many languages, especially small tribal languages spoken in rural areas, are being marginalized and gradually replaced by a small number of dominant mega languages. On the other hand, the growing influence of “big languages” has made bilingualism and multilingualism a norm across the world. For example, English, once a small tribal language limited to parts of an island and dominated by a more prestigious language of the conquering elite (Norman French), is now spoken in some form in 165 countries by over a billion people, including 753,000,000 people using English as a second or third language (Simons & Fennig, 2018).

Although linguistic diversity is common throughout the world, it is unevenly distributed. Furthermore, there seems to be a link between high levels of linguistic diversity and language endangerment. Just as there are biodiversity “hot spots,” areas with highly concentrated levels of the diversity of species, many of which face extinction, there are linguistic hotspots, areas with a high concentration of diverse language families and high level of language endangerment (https://livingtongues.org). What’s more, there is a high degree of overlap between biodiversity and linguistic diversity (Gorenflo et al., 2012), a relationship still poorly understood.

Europe, whose languages are the most thoroughly researched, is among the least linguistically diverse regions, as it is home to only 4% of the world languages, compared to Asia’s 32% and Africa’s 30% (Simons & Fennig, 2018). The most linguistically diverse country, Papua New Guinea, alone has 840 languages, more than twice the number of languages spoken in Europe. It is followed by Indonesia--710 languages, Nigeria--524 languages, and India--453 languages (Simons & Fennig, 2018).

Multilingual societies also vary with respect to the form of linguistic diversity. Linguistically diverse societies, in which all members are balanced bi- or multi-linguals with a native or native-like proficiency in the languages spoken in their community, are rather uncommon, as are countries that are strictly monolingual. For example, countries that officially recognize multiple languages within their boundaries contain sizable monolingual populations (e.g., Russia, Brazil, Canada), and countries considered monolingual may have sizable multi-lingual populations (e.g., Turkey, France). Thus, most societies fall somewhere along a spectrum between “territorial” and “individual” multilingualism (Grosjean, 1982), defined below.

On one end of the spectrum of societal multilingualism are situations of territorial multilingualism, i.e., when multiple language groups coexist within a national boundary, and the country, on the whole, is linguistically diverse but comprised of individuals, who, to a large extent, are not multilingual (i.e., the majority of individuals are native speakers of one of the languages). These situations may be characterized by ethnic and linguistic fractionalization, particularly severe in many parts of Sub-Saharan Africa, which is one of the most linguistically diverse part of the world, where the population is about 1 billion people speaking over 2100 languages (Pereltsvaig, 2017). Only about 75 of these languages have more than one million speakers. The rest are spoken by populations ranging from a few hundred to several hundred thousand speakers. Sixteen countries in the region have 50 or more languages each, including the most linguistically diverse Nigeria (520 languages), Cameroon (279), Tanzania (125), Kenya (67), Mali (66), Congo (62), and Benin (55 languages). In addition, 17 countries with smaller populations where 10–20 languages are spoken can also be classified as highly linguistically diverse, based on the criterion of having 200,000 or fewer speakers per language (Pereltsvaig, 2017).

Such an amount of linguistic diversity has been argued to lead to ethno-linguistic fractionalization, presenting an obstacle for economic and social development in that region (Easterly & Levine, 1997) and is detrimental to efforts in improving literacy. A difficult issue that such societies must grapple with is whether to try to bridge the linguistic divide and strengthen the national identity by imposing primary and secondary education in a national language or to provide (at least primary) education to all children in their mother tongue, as the most optimal form of literacy education (Ball, 2010).

This issue also exists in situations of linguistic diversity arising from international migration, when minority languages of immigrant groups co-exist with the socially and culturally dominant majority language in a society of predominantly monolingual speakers. As an example, one can cite New York City (USA) public schools, where nearly half of all students come from homes where a language other than English is spoken, representing more than 150 languages, in contrast to ≈75% of the New York City population, who identify as speakers of English. In this situation, similar to the situation of regional multilingualism, students may have only limited experience with the language of literacy instruction prior to starting schooling, and their plight with literacy acquisition is similar to students from linguistically diverse regions where mother tongue education is unavailable. These students also have the added issue of potentially limited community support for their home language, resulting in an erosion of home language competence while acquiring the majority language (McBride, 2016).

The issue may be ameliorated in linguistically diverse societies structured on the “personality principle,” i.e., multilingualism at the level of individuals (Grosjean, 1982). Personal bi- and multilingualism has been characterized as a positive force, with cognitive, social, educational and psychological benefits, particularly for minority groups (Bialystok, 2011; Mehisto & Marsh, 2011; Mohanty & Perregaux, 1997; Mohanty, 2019). Bilingual education has been found to provide substantial benefits for disadvantaged indigenous populations, as investment in human capital (Benson, 2002; García, 2011; Patrinos & Velez, 2009). Given the complex and interconnected state of our world, monolingual education was pronounced to be “utterly inappropriate” (p.16; García, 2011).

India is usually given as the quintessential example of this form of linguistic diversity (Bhatia & Ritchie, 2004; Sridhar, 1988), where at least three languages are used by many speakers: a mother tongue – a language (or languages) of the region, the official language of the country, Hindi, and English, each associated with different functions (Sridhar, 1996). Large urban centers, attracting residents and commercial activity from diverse regions, are particularly linguistically diverse, as “a Gujarati spice merchant in Bombay uses Kathiawadi (his dialect of Gujarati) with his family, Marathi (the local language) in the vegetable market, Kacchi and Konkani in trading circles, Hindi with the milkman and at the train station, and even English on formal occasions” (Pandit, 1972, p. 79). Tri-lingualism (with a majority local language plus Hindi and English taught as the first, second, and third languages) is the official policy adopted for India’s education system (Three-language Formula) (Mohanty, 2019; Vaish, 2008). However, this implies more uniformity than actually exists in a country with 22 constitutionally recognized (“scheduled”) languages and hundreds of “unscheduled” languages (representing 6 language families). Thus, only 53.6% of the population listed Hindi and 10% English as their first, second, or third language in the 2011 national survey of India (https://censusindia.gov).

Indian educational programs have been criticized as not sufficiently supporting multilingualism and ignoring children’s needs of mother tongue education, instead furthering soft assimilation (Mohanty, 2006). In the colorful linguistic mosaic that is Indian society, where so many languages coexist, unfortunately, there is entrenched linguistic inequality: some languages are privileged in terms of prestige and access to resources, while others are neglected, and their speakers are discriminated against. Many (potentially 80% of) Indian languages are endangered (Mohanty, 2010).

Linguistic power imbalance is an important issue to consider. In many (or even most) multilingual communities, languages command differential levels of power and prestige and are not equally valued, based on the domains associated with each language use and the status of its native speakers. Language attitudes in society generally reflect economic stratification: linguistic codes of the groups associated with more central and profitable economic sectors enjoy prestige, translating into the gradual expansion of their functional range overtime, while groups associated with less central and less profitable economic spheres lack prestige, often facing stigma and discrimination, and are likely to reduce its functional range and weaken overtime (Philips, 2004). Thus, languages associated with international commerce, the legal system and administration, religious institutions, higher education, science and technology, as well as pop culture – often former colonial languages – are typically perceived as more important and highly valued than languages associated with other, non-official domains because they are associated with upward mobility.

However, issues associated with tribal, ethnic, cultural, or religious identity complicate the linguistic power dynamic in multilingual societies, and many countries have seen a push to increase the recognition of indigenous languages, expand the domain of their use, while restricting the use of the culturally dominant language competing with them. The support for mother tongue-based bilingual (or multilingual) education has been expressed and reaffirmed by UNESCO since 1951, which acknowledged that it is most beneficial for children to receive initial education, continued for as long as possible, in their mother tongue (Ball, 2010; Bühmann & Trudell, 2008). Research confirmed that best practice for children in linguistically diverse societies is to start their education in their mother tongue, which provides a foundation to which a second (and third) language should be added (Benson, 2005; Cummins, 2001; Dutcher, 2003; Foley, 2001; Orekan, 2011; UNESCO, 2016b).

The argument for “mother tongue education” is that teaching literacy in a familiar language facilitates learning of the correspondences between the orthographic symbols and the corresponding linguistic units because the newly learned symbols are mapped onto elements that are already familiar, and children can use psycholinguistic strategies for “self-teaching,” proposed as the sine qua non of reading acquisition (Share, 1995). Learning to read in the mother tongue also allows students to discover meaning in what they are reading and communicate through writing much earlier than in submersion programs, which teach decoding in an unfamiliar language (Benson, 2005; Cummins & Swain, 2014). Unfortunately, there are factors (outside of the political and economic ones) that may work together against the implementation of mother tongue education (Benrabah, 2007; Gupta, 1997). One such issue is difficulty defining the mother tongue, particularly in situations of a dialect continuum. Another one is diglossia, when a language variety spoken in the community is in full complementarity with the written language. We will discuss these two phenomena in the following section.

3 Dialect Continua

Even for “big” national languages, i.e., those used at the national level for education, mass media, and government, there is a considerable amount of variation. The regional language varieties spoken within a national border that are uncodified, unwritten, and typically associated with the speech of lower classes are commonly referred to by the public as “dialects,” a label connoting a lower status and divergence from what is viewed as the standard. However, in reality, “language” versus “dialect” is largely a political construct.

A “language” is generally defined as a collection of mutually intelligible dialects (Chambers & Trudgill, 1998). However, the linguistic principle of intelligibility of delimiting dialects from languages is generally overruled by political geography and political economy; even closely related language varieties are considered distinct languages if they are associated with a separate political entity (an independent state or an autonomous region), but are viewed as dialects otherwise, even if they are considerably distinct from each other. Small subnational language varieties are more likely to lack a distinct writing system, a written literary tradition, dictionaries codifying their linguistic norms, and are less likely to be a medium for language education. This further reinforces their substandard status as merely “dialects” of the standard language variety, while the standard variety is thought of as “language” rather than one of the existing dialects, as viewed by linguists.

For example, there are at least 10 distinct language varieties spoken in France listed as “languages” by Ethnologue (e.g., Gascon, Provençal, Breton, Piedmontese) that are commonly perceived as “dialects” of French because of the French national identity of the speakers, and the perception of these languages is that of rural and “lower class” (Blanchet & Armstrong, 2006). Their speakers face stigma and discrimination by French institutions, especially in schools.

Language varieties that comprise the Chinese branch of the Sino-Tibetan language family (main varieties, e.g., Cantonese, Hakka, Gan, Min, Xiang, Wu, and hundreds of smaller ones) are as distinct from each other as Romance languages are (e.g., French, Italian, Spanish, and Romanian), but these are considered “dialects” of the Modern Standard Mandarin (Chappell, 2001). On the other hand, closely related language varieties once considered to be dialects become codified as distinct languages if the territory where they are spoken acquires independent statehood (as was the case with Serbo-Croatian splitting into Serbian, Croatian, Bosnian and Montenegrin after the dissolution of Yugoslavia, Czechoslovakian splitting into Czech and Slovak after the breakup of Czechoslovakia, or Hindustani splitting into Hindi and Urdu after the partition of India).

Another issue that makes it difficult to differentiate language from dialect is the phenomenon known as dialect continuum, an observation that linguistic differences accumulate gradually over geographic distance, which makes it often difficult to establish sharp boundaries between language varieties. One common example of this is Continental West Germanic continuum, encompassing the territory of Austria, the German-speaking part of Switzerland, Liechtenstein, Germany, Luxembourg, the Netherlands, the northern half of Belgium (Flanders), and South Tyrol (northern Italy). The language varieties in this region have substantial distinctions in phonological, morphological, and lexical features. However, it was observed that “one could start from the far south of the German-speaking area and move to the far west of the Dutch- speaking area without encountering any sharp boundary across which mutual intelligibility is broken; but the two end points of this chain are speech varieties so different from one another that there is no mutual intelligibility possible” (p. 3; Comrie, 2009). The German dialects spoken in the North are thus closer to the Dutch dialects than they are to some German dialects, but speakers of these dialects are nevertheless educated in Standard German, a dialect that would be considered a different language if the borders were drawn differently. Similar dialectal continua are found in many parts of the world, including countries where Southern and Eastern Slavic languages are spoken, many parts of Africa, the Arab countries, Western Australia, China, and other locals, further complicating the picture.

4 Diglossia

Diglossia has been defined as a relatively stable linguistic situation, in which the primary dialect(s) of a language coexists with a substantially divergent, highly codified (often grammatically more complex) superimposed variety (Ferguson, 1959). The superimposed (or High; H) variety usually comes from a pre-existing highly respected body of written texts, is learned via formal schooling, and is reserved for most written and formal spoken purposes. The vernacular (Low; L) form is used by all members of the community for every-day communication, including in child-directed speech, and is acquired by children naturally as their L1 with the H variety being, essentially, an L2.

Ferguson illustrated the concept of diglossia with four examples: Classical (H) and Colloquial Arabic (L) used in parallel throughout the Arab world, Standard (H) and Swiss German (L) in Switzerland, Katharevousa (H) and Dhimotiki (L) in Greece, and French (H) and Haitian Creole (L) in Haiti. The concept of diglossia has since been extended and applied to multilingual communities (Fishman, 1972) where there exists a dichotomy between H and L languages, analogous to the classic diglossia described by Ferguson. For example, in Zaire, French functions as an H form (used in prestige domains, such as higher education, law, and administration), while indigenous languages function as L forms, used in everyday communication.

Such functional specialization between H and L varieties, with only a slight overlap between them, is the key characteristic of diglossia. For example, the H variety of Arabic (Classical Arabic or its modern form, Modern Standard Arabic) is used in religious sermons, government communiques, academia, news media, and literature (especially poetry). On the other hand, the L variety is used for speaking to service providers (waiters, servants), in personal conversations with friends and family, “low brow” TV programming, such as soap operas, captions on political cartoons, and folklore.

Among the important features associated with diglossia, as noted by Ferguson in 1959, was that literacy would be restricted to a small elite group, a situation antithetical to goals of modern societies. Although literacy is no longer limited to the elites in most of the world, historically, associations between the literary language and social and cultural elites, and between the vernacular language and commoners, has led to a situation that is still wide-spread in diglossic communities, where H varieties are invariably regarded as superior to L varieties. This was illustrated by a study that used a survey of language attitudes (Benrabah, 2007; Zeggagh, 2017) among young adults in Algeria (n = 1051) in which they were asked to identify their language preferences among Algerian Arabic, Literary Arabic, French, and Tamazight (Berber). Preference was strongly expressed for Literary Arabic and French over the two vernacular languages. For example, the majority or plurality of the respondents chose Literary Arabic as the “richest language” (75%), “most beautiful language” (45%), “language of religious and moral values” (80%), and the “language that allows me to understand the past” (51%). The majority of respondents chose French as the “language I like the most” (44%), “language that I like to learn in” (55.3%), “language of science and technology” (85%), “most modern” (82%) and “most useful language” (58%). Negative attitudes were reserved for Tamazight (with over 70% of respondents choosing it as the “language I like the least,” “the language incapable of progress,” “most difficult,” and “the least pure language”).

Because of the entrenched cultural preference for the H variety in diglossic societies, speakers of L varieties often fail to acknowledge it as a legitimate language. Instead, it is often regarded as “slang” or as improper or incorrect speech. Such an attitude presents an obstacle to the principle of mother tongue education; the obstacle increases with greater linguistic distance between H and L varieties. In Arabic, the linguistic distance between H and L Arabic varieties is particularly acute in phonology and morpho-syntax, including phonemic inventories, syllabic structure, phonotactic constraints, stress patterns, and inflectional categories (Aoun et al., 2010; Aoun et al., 1994). Modern Standard Arabic has a richer system of agreement, compared to a less differentiated system in most spoken varieties, Verb Subject Object word order (rather than a more flexible word order, with the pragmatically neutral order of Subject Verb Object of spoken varieties), different distribution and frequency of verbal patterns, passivization, and nominal constructions. Lexically, Modern Standard Arabic and the spoken variety overlap only partially, with 80% of the lexicon of young children consisting of words with divergent forms in spoken and formal Arabic (Saiegh-Haddad, 2018).

Because of the linguistic distance between H and L varieties of Arabic, diglossia creates challenges to literacy acquisition that go beyond what children in linguistically homogeneous societies may experience. Studies of literacy acquisition in Arabic have confirmed that reading in Arabic presents special challenges, manifested as both lower reading speeds among skilled readers and slower reading acquisition among children (Eviatar & Ibrahim, 2014; Ibrahim et al., 2002; Saiegh-Haddad, 2017). Because of the linguistic distance between Modern Standard Arabic and spoken Arabic, learning Modern Standard Arabic in school may be akin to learning a second language. For example, one study (Ibrahim & Aharon-Peretz, 2005) demonstrated that Arabic-speaking students faced with a lexical decision task in spoken Arabic exhibited priming effects only when primed with a spoken Arabic word and not with Modern Standard Arabic words or Hebrew, a language the children did not speak. This suggests that for Arabic-speaking children, literacy acquisition is complicated by having to learn it in a language in which they are not yet proficient.

5 Global Literacy Rates and Territorial Linguistic Diversity

Countries with the highest levels of linguistic diversity have persistent and often seemingly intractable problems in meeting educational and literacy needs of their people, in comparison with linguistically more homogeneous societies. As linguistic diversity and economic disadvantage (personal and societal) so often overlap, one can ask to what extent linguistic diversity, independently from economic factors, contributes to educational disadvantage. If linguistic diversity is shown to be an independent source of educational disadvantage, it is important to consider what solutions are most appropriate to mitigate the challenges presented by societal multilingualism.

To test the hypothesis that societal linguistic diversity is an independent (from economic wealth) source of variance in literacy achievement across countries, we used global adult literacy data (age 15 and above), reported by the UNESCO Institute of Statistics, as the measure of societal literacy achievement, vis-à-vis Gross National Income (GNI) per Capita data reported by the World Bank, and countries’ territorial language diversity data measured by Greenburg’s Language Diversity Index (LDI) (Greenberg, 1956) reported by the Ethnologue (Simons & Fennig, 2018). LDI ranges between 0 and 1, where 0 would indicate perfect regional monolingualism, and 1 indicates a hypothetical situation in which no two randomly selected individuals in the country would speak the same language. The mean LDI varied from 0.00 to .988 (Mean = 0.48; SD = 0.31). On one end of the diversity spectrum is Haiti with an LDI of 0.000, and on the opposite end is Papua New Guinea with an LDI of 0.988. GNI per Capita ranged from 280 to 78,320 USD (M = 8754.53; SD = 11,955.07). Global literacy rate varied widely from 22.31% to 99.99% (Mean = 83.54; SD = 19.22). Only the countries for which all three indicators were available were included in the analyses (n = 151)Footnote 2.

First, we examined bivariate correlations between literacy rates, GNI per Capita, and LDI. In this analysis we also included population size (as of 2018) as one may suppose that it may be an additional factor related to either economic development, linguistic diversity, or literacy achievement. These results showed that the global literacy rates were significantly, but only moderately, positively correlated with GNI per capita (r = 0.421; p < .001) and negatively correlated with LDI (r = – 0.509, p < .001). No other significant correlations were found, suggesting that linguistic diversity, economic development, and the size of the population were not significantly related to each other (see Table 1 for these results).

Table 1 Correlations between global literacy rates, economic wealth, linguistic diversity and population size

Next, we attempted to tease apart the respective contributions of the countries’ economic development level and their linguistic diversity on literacy rate. To this end, we conducted partial correlations between literacy rate and GNI per Capita while controlling for LDI and vice versa. Here, we found that when controlling for GNI per Capita, the correlation between literacy rate and LDI remained significant and its strength was essentially unchanged (r = −.500; p < .001). Similarly, when controlling for linguistic diversity, the correlation between literacy rates and GNI per Capita also remained significant (r = −.409; p < .001). Thus, it appears that economic wealth of the country and its linguistic diversity level are independently related to its literacy rate.

To further examine the relationship between linguistic diversity and literacy rate, we separated the countries into linguistic diversity groups: high LDI – with LDI of at least 1 SD above the mean (0.79 or above); moderately high LDI – with LDI within 1 SD above the mean (LDI = 0.48 to 0.78); moderately low LDI – with an LDI within 1 SD below the mean (LDI = 0.47 to 0.18); and low LDI -more than 1 SD below the mean (LDI = 0.17 or below). We compared these groupings on GNI per Capita and literacy rates, using a one-way ANOVA (see Table 2 for the results). We found a significant effect of group on both GNI per capita [F(3146) = 3.24, p < .05] and literacy rate [F(3146) = 19.61, p < .001]. Post hoc comparisons showed that the groups with low and moderately low levels of linguistic diversity did not differ from each other, but both had significantly higher literacy rates than countries with moderately high and high LDI, which also significantly differed from each other. The countries in the highest level of LDI were significantly below all other groups on literacy levels, p’s < .05 (see Fig. 1). The comparisons on the economic indicator showed that the group with the highest level of linguistic diversity was significantly different from the other groups (p < .05; see Fig. 1b), with no other significant pair-wise comparisons.

Table 2 Mean literacy rates and GNI per Capita in groups of countries with various levels of linguistic diversity
Fig. 1
figure 1

(a, b) Comparison of countries with various levels of linguistic diversity on income and literacy levels

Note. X-axis indicates linguistic diversity groups: 0 – low LDI, 1 – moderately low, 2 – moderately, 3 – high LDI

Finally, we grouped the countries by their income level based on the World Bank designation of Low, Low-Middle, Upper-Middle, and High Income and compared them on literacy rates and linguistic diversity levels (see Table 3 for the results). Not surprisingly, we found a significant effect of group on both indicators [F(3146) = 11.391; p < .001 for linguistic diversity] and [F(3146) = 74.267; p < .001 for literacy]. Post hoc pair-wise comparisons showed that with respect to the LDI, the two lower income groups and the two higher income groups were not significantly different from one another. However, the two lower income groups had significantly higher LDIs relative to the two upper income groups (p < .05). With respect to literacy, the two upper income groups did not differ significantly from each other, but both lower income groups were significantly different from each other and from the two upper income groups (p < .05). Figure 2b shows differences between income groups on territorial linguistic diversity and literacy rates.

Table 3 Mean literacy rates and linguistic diversity in groups of countries with various income levels
Fig. 2
figure 2

(a, b) Comparison of countries with various levels of income levels (GNI per capita) on linguistic diversity and literacy rate

Note. X-axis indicates countries’ economic development level: 0 – low income, 1 – low middle income, 2 – upper middle income, 3 – high income

In order to gain a better idea as to why higher linguistic diversity seems to be associated with lower societal literacy levels, one can look more closely at the countries comprising each of the income groups. Not surprisingly, we found that High Income countries had uniformly high literacy rates (M = 97.7, SD = 2.30) and low levels of territorial linguistic diversity (Mean LDI = 0.033; SD = .24), i.e., over 1.5 SD, falling below the mean for the total sample (Mean LDI = 0.479; SD = 0.30). Upper Middle Income countries also had high literacy rates (M = 93.67, SD = 8.18) and relatively low average LDI (M = 0.361, SD = 0.25). However, there were several countries in this group that underperformed relative to their economic peers and had literacy rates below 85% (i.e., over 1 SD below this group’s mean): Algeria (80.2%), Guatemala (81.5%), Iraq (79.7%), Namibia (81.9%), and Gabon (83.2%). Three of these countries share the characteristic of having a high level of territorial linguistic diversity (approaching 0.77 or higher): Iraq (LDI = .761), Gabon (LDI = 0.846) and Namibia (LDI = over 0.779), making these among the most linguistically diverse countries in the world. For example, Gabon (population ≈2 million) has 42 spoken languages. Namibia (population ≈2.5 million) has 23 indigenous spoken languages (plus former colonial languages--English, German, and Afrikaans). Guatemala has a somewhat lower level of linguistic diversity (LDI = .518); however, it is still very substantial, with 40% of its population (the largest indigenous population in Central America) speaking 23 indigenous languages and many speakers not fully proficient in the official language, Spanish. The indigenous languages were not officially recognized in Guatemala until 2003, when, for the first time in the history of that country, the Language Law decreed that no restrictions should be placed on the use of 22 indigenous languages, including in the educational and academic spheres. However, as the poorest country in Central America, Guatemala has faced major economic challenges in implementing tangible measures toward elevating the status of the indigenous languages, such as providing mother tongue education for children who speak these languages, thus limiting literacy achievement in its population.

Algeria, although it has a relatively low LDI (0.360), represents a linguistically complex situation of moderate language diversity paired with diglossia (or rather multi-glossia). In the past, French, the language of the colonial era, dominated the educational system and is still widely regarded as a vehicle for upward mobility. Literary Arabic was strictly imposed in schools (and bilingual education eventually discontinued at all levels) during the post-colonial period of Arabization, between the 1960 and late 1990s, in a country where only a small proportion of the people (300,000 out of 1,300,000 literate people) could read Classical Arabic. Subsequently, a majority of the population were in favor of returning to the French-Arabic bilingual model of education as many consider the existing educational system a failure (Benrabah, 2007), but plans for such reforms had been scrapped for political reasons, and a reintroduction of French into the school system did not begin until 2008.

One reason for the inadequacy of the Algerian educational system after the strict Arabization is that diglossia, although present throughout the Arab world, has an even more pronounced effect on literacy in countries like Algeria, where Algerian Arabic is even more divergent from literary Arabic than other spoken Arabic varieties. It belongs to the Maghrebi group (which also includes Arabic varieties spoken in Morocco, Tunisia, Libya, Western Sahara, and Mauritania), languages heavily influenced by Berber, Turkish, and French. The issue is summed up as follows: “Today the linguistic situation in Algeria is dominated by multiple discourses and positions. The language spoken at home and in the street remains a mixture of Algerian dialect and French words. In this case, every language has become a source of frustration: classical Arabic is still not mastered even at higher educational levels; dialectical Arabic cannot express things in writing. Contact with the French culture has left the Algerians with a vitiated language and resulted in a profound linguistic alienation. This situation condemned many Algerian writers either to silence or to exile” (p.87; Maamri, 2009).

What the quote above does not mention is that a large proportion of the Algerian population are speakers Tamazight, a member of the Berber branch of Afro-Asiatic language family, distinct from the languages of the Semitic branch, which includes Arabic. The reported proportion of Berber speakers varies anywhere from 25% (Brett, 2019) to 40% – 60% (Saib, 2001). Such a large discrepancy is due to the suppression of census data regarding the Berber-speaking population from the time of the country’s independence in 1956, for political reasons. If the estimate of 60% Berber speakers turns out to be correct, this would mean that a majority language became “minorized” (Saib, 2001), and, as the discussion of language attitudes in Algeria in the previous section showed, stigmatized. Most Tamazight publications in Algiers use the Berber Latin or Arabic scripts, even though there is an ancient traditional Berber script – Tifinagh, suppressed until recently. The repression of Berber languages led to political unrest at different points in history (e.g., “Berber Spring” in 1980 and “Black Spring” in 2001). Political activism has led to the recognition of Tamazight as a national language in 2002 and teaching in this language was allowed in 2003. However, Modern Standard Arabic continues to be the most common language of instruction.

Next, we look in more detail at the two low income groups of countries. Low Middle Income countries had a lower average literacy rate (M = 79.79%, SD = 14.57) and higher linguistic diversity (M = 0.47; SD = 3.18). Countries with low literacy rates (literacy of 65% or below) included Mauritania (LDI = .228), Sudan (LDI = .307), Comoros (LDI = .551), Angola (LDI = .748), Pakistan (LDI = .752), Senegal (LDI = .778), Nigeria (LDI = .890), Ivory Coast (LDI = .900), and Papua New Guinea (LDI = .988). The country with the lowest LDI on this list is Mauritania, a country where two thirds of the population are Moors who speak Ḥassāniyyah Arabic, while the remaining third of the population is comprised of members of other ethnic groups who speak Niger-Congo languages (Fula, Wolof, and Soninke are recognized as official). However, (Literary) Arabic is the official language, and since 1980 it has been the language of instruction in schools. Thus, for the majority, it represents a situation of diglossia. The relatively low LDI of Sudan obscures the fact that it is a country of approximately 70 indigenous languages that come from diverse language families (multiple branches of Afro-Asiatic, Nilo-Saharan, and Niger-Congo language families). Sudanese Arabic, the most widely spoken language, has been heavily influenced by the indigenous languages of the area. Despite this diversity, Arabic and English are the only official languages. Thus, it appears that for all of the countries in this group, the economic disadvantage is paired with a complex linguistic situation, with each factor likely contributing to the low literacy rates.

Finally, Low Income countries have lower literacy rates than countries of higher income groups, ranging from 22.31 (Chad) to 77.89% (Tanzania; M = 54.68, SD = 18.45). These countries also have the highest levels of territorial linguistic diversity (Mean LDI = .72, SD = .28) among all income groups. Looking at exceptions here may also be instructive. One notable exception to low literacy in this income group is Tajikistan, the country with the GDP per Capita of merely 1010 USD but which has a reported literacy rate of 99.8%. This is similar to what is observed for High Income Countries (defined as GNI per Capita of above 12,375 USD), the highest literacy rate among Low Income Countries. The country that directly followed Tajikistan in literacy in this income group was Tanzania, (literacy rate of 77.89%). One may ask what can account for the disparity in literacy between the two countries since they have a similar level of national income (1010 and 1020 USD, respectively). Although there may be various explanations for this disparity, differences in territorial linguistic diversity and educational language policy differences between Tajikistan and Tanzania may offer a potential (at least partial) explanation.

According to Ethnologue, Tanzania (LDI = .871) is home to 126 languages, representing multiple language families, including the Bantu branch of Niger-Congo, Cushitic branch of Afro-Asiatic, Nilotic branch of Nilo-Sharan, and Khoisan (the latter are the click languages spoken in Botswana and Namibia, which may come from the same stock as two of Tanzanian languages, Hadza and Sandawe). None of these are spoken natively by the majority or even a plurality of the population. Swahili, a Bantu language used as a lingua franca, is the language in which primary education is administered and literacy is taught (English is also used for this purpose), but only 10% of the population speak Swahili as their native language, and fluency in Swahili as a second language varies among the adult population (Ammon et al., 2006). Thus, many children may not have sufficient exposure to this language prior to the start of schooling. In contrast, 84% of the population of Tajikistan (LDI = .276) are speakers of Tajik, an Iranian language with a writing system based on the Cyrillic alphabet (introduced in 1940), which has the status of a national language, and in which the majority of children are educated. Although Tajikistan has a substantial language minority population (≈15% of the population are an Uzbek-speaking minority, in addition to small numbers of Russian speakers and speakers of other languages of the former Soviet Union, who communicate in Russian), they have an option to attend schools where education is administered in their home language (Uzbek- or Russian language schools) where they can acquire literacy in their home language while learning Tajik and Tajik writing as a second language. Thus, not only is the level of linguistic diversity in Tajikistan significantly lower than in Tanzania, but for the vast majority of individuals, literacy instruction in their native language is available.

Another Low Income country that overperforms on literacy rate relative to its economic peers is Burundi, with a literacy rate of 68.38% (similar to Tanzania), despite having the lowest in the world GNI per capita of only 280 US dollars compared to Tanzania’s 1020. Burundi is reported to have a higher literacy rate than Nigeria, a country with a much larger economy (GNI per capita 1960; literacy rate 62.02). It is an outlier among countries of Sub-Saharan Africa in having a single indigenous language, Kirundi, shared by 98% of the population and recognized as an official language. While Nigeria (LDI = .890) is a global leader in linguistic diversity, with over 500 spoken languages, Burundi is the only African country with an LDI below .01 (LDI = .007). Once again, we can see how the complexity of the linguistic topography, quite independently from economic factors, is associated with depressed levels of literacy.

6 Concluding Thoughts

The issues and examples we have discussed throughout the chapter all illustrate the idea that linguistic landscapes in multilingual societies are complex and often fraught with challenges that inevitably arise out of an intricate web of interrelationships between languages, their speakers, and the institutions, cutting across economic, political, and social lines and reflecting long and often difficult histories. The above analyses are undoubtedly quite simplistic and are not meant to represent the full complexity of factors that influence literacy development across continents and countries with an array of diverse peoples, cultures, economies, and political systems. However, one striking observation emerges too consistently to be ignored: for countries with linguistically diverse populations attaining universal literacy and expanding educational and economic opportunity is a more complicated proposition than for linguistically homogeneous societies. In the face of these challenges, maintenance of local languages all too often is not seen as a priority by the institutions in charge. Multilingual countries often resort to policies of homogenization and subtractive bilingualism for the minority language speakers. Their languages are devalued, denigrated and excluded from education, making it more likely that speakers of these languages drop out of school or fail to learn. The reasons for this may be political (e.g., nationalism, institutional racism), economic (lack of resources for language development and teacher training for speakers of diverse languages), demographic (many indigenous languages are spoken by small rural populations), and linguistic (e.g., difficulty to establish which local linguistic varieties constitute “languages,” a lack of language documentation, standardization, and/or writing system.)

The value of bilingualism or multilingualism is now widely acknowledged, at least for middle class inhabitants of economically advanced countries and a small number of the world’s mega languages. Languages that lack the power and overt prestige (i.e., most of the world’s 7117 languages) deserve to be equally valued. Elevating the status of these languages and promoting additive bilingualism (or active bilingualism, in more recent conceptualization) and biliteracy (or multilingualism and multi-literacy) represents the best practice for language education of linguistically and culturally diverse students (Cummins, 2001, 2017). Children learn best when their home languages are maintained and strengthened as the foundation for learning the second language, and a national language is added as a second language instead of replacing the first (Alvear, 2019; Cenoz & Valencia, 1994; Cummins, 2017; Koch et al., 2009; Landry & Allard, 1992; Orekan, 2011). Unfortunately, implementing these forms of education requires radically changing coercive power relations entrenched in many societies across the globe, a task that goes far beyond the academic, but without which closing the proverbial achievement gap will continue to be illusive.