Thursday, January 11, 2018

Tokenistic Tifinagh #fail 2

The Algerian government recently decided to make the Amazigh New Year (really the Julian New Year) - coming up tomorrow - an official holiday. This holiday is actually traditional in a lot of Arabic-speaking areas too, in Algeria and across North Africa - and its origins are of course Roman - but over the past few decades it has been reinterpreted as an Amazigh holiday rather than a North African one, and the government made it official specifically as a gesture towards Amazigh identity. In non-Amazigh areas, this creates some quandaries, as illustrated by the announcement below by the government of the wilaya (province) of Blida...

No automatic alt text available.

The Algerian flag in the middle is flanked on all sides by easily recognizable signs of Amazigh identity - the letter aza, the abzim pins, etc. - none of which are particularly associated with Blida (even though there are still small Berber communities in the mountains above Blida, not to mention Kabyle migrants.)  The main text is in Arabic, but there is one line of Berber in Arabic script - تفاسكا ن يناير tfaska n Yennayer "holiday of Yennayer", using a word for "holiday" that in a Kabyle context amounts to a modern neologism - and two lines written in Tifinagh, whose geometric shapes add yet another easily recognizable symbol of Berber identity.  If you try to read those lines, though, they turn out in each case to be simple transcriptions (not translations) of the line of Arabic above them:

"Celebration of the Amazigh New Year"
احتفالية رأس السنة الأمازيغية iḥtifāliyyat ra's as-sanah al-'amāzīɣiyyahⴰⵃⵜⴼⴰⵍⵉⴰ ⵔⴰⵙ ⴰⵍⵙⵏⴰ ⴰⵍⴰⵎⴰⵣⵉⵖⵉⴰ aḥtfalia ras alsna alamaziɣia

"Algerian and proud of my Amazigh identity"
جزائري وبأمازيغيتي أفتخر jazā'irī wabi'amāzīɣiyyatī 'aftaxir
ⵊⵣⴰⵉⵔⵉ ⵡⴱⴰⵎⴰⵣⵉⵖⵉⵜⵉ ⴰⴼⵜⵅⵔ jzairi wbamaziɣiti aftxr

It's arguably not quite as bad as the Oran case we saw last time; at least this transcription doesn't randomly discard letters.  Nevertheless, the message it sends is once again clear: nobody involved in the making of this official, centralized celebration of Amazigh identity speaks Berber, or thought it would be worthwhile to get someone who does speak it to help them out.  If the Algerian government seriously wants to make Tamazight official throughout the country, it's got a long way to go...

Thursday, January 04, 2018

Taleb unintentionally proves Lebanese comes from Arabic

So Taleb has jumped back on his hobbyhorse with yet another post on Lebanese not being Arabic; see my previous posts Why "Levantine" is Arabic, not Aramaic: Part 1, Part 2, Part 3, Zombie hypotheses and the Zeitgeist, On finding the sources of shared items. The funniest thing about this one is that he's been helpful enough to provide a wordlist (for his dialect, I presume) that - despite a number of typos, almost all of which increase the apparent similarity between Levantine and non-Arabic Semitic languages - should be enough all by itself to prove to anyone in doubt that Lebanese is clearly descended primarily from Arabic, with very little Aramaic influence and even less from Canaanite/Phoenician. Unfortunately, he wasn't as helpful on the grammar, not bothering to include equivalents from other Semitic languages for the pronouns and verbal conjugations...
But I don't have all day to spend beating this dead horse, and doing etymology properly takes time. So let's just have a quick look at the first page of his wordlist (well, probably the second one - the real first one seems to be missing), and leave the other pages as an exercise for the reader.

Out of these 39 words, 18 seem to be unambiguously Arabic in origin - either they share specific sound changes with Arabic to the exclusion of the rest of Semitic, or they use a root not used in the appropriate meaning elsewhere in Semitic. Only two look like being Aramaic rather than Arabic in origin (and the evidence in both cases is fairly weak): "hand" and the patently non-basic vocabulary word "image". (Taleb would add a third, zalame "man", but this word has an at least equally plausible Arabic etymology, making it ambiguous at best.) The remaining 19 words are ambiguous, and could in principle derive from any of more than one Semitic languages - but even there, the situation is not symmetrical; all 19 could derive from Arabic, whereas no more than 11 of them could derive from Aramaic. The unambiguous cases give the following ratio: 18 Arabic : 2 Aramaic : 0 everything else. On that basis, we should therefore expect 90% of the ones ambiguous between Arabic and Aramaic (ie all but one) to derive from Arabic, not from Aramaic, and all of the ones ambiguous between Arabic and another Semitic language but not Aramaic to derive from Arabic. For details, see the following table:

1 goat Arabic does not share Canaanite+Aramaic+Ugaritic *nC > CC; does not share Akkadian *ʕa > e
2 god Arabic / Aramaic shows innovative gemination of the l, attested only in Arabic and some dialects of Syriac
3 good innovative the Arabic etymology is obvious, but the root is pan-Semitic so we may generously assume that it could in principle have derived from some other branch
4 grass Arabic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š
5 grind Arabic / Canaanite does not share Akkadian *aħa > ê ; does not share Aramaic CaCVC > CCVC
6 hair Arabic / Ugaritic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š ; does not share Akkadian loss of *ʕ
7 hand Aramaic although a change of *yad > *īd is natural enough that it could easily have happened independently in Arabic...
8 hare Arabic / Canaanite / Aramaic / Akkadian no distinctive innovations
9 he-goat Arabic / Canaanite / Aramaic no distinctive innovations
10 head Arabic / Ugaritic does not share Canaanite *aʔ > *ā > ō nor Aramaic *aʔ > ī nor Akkadian *aʔ > ē ; the form rās (with loss of the glottal stop) is well-attested in early Arabic dialects
11 hear Arabic does not share Aramaic and Phoenician *s > š (I'm going with Huehnergard's reconstruction of proto-Semitic sibilants here). Note that the correct Syriac form is šmaʕ, not sma3 ; likewise the Hebrew
12 heart Arabic The initial glottal stop (still pronounced q in, for example, Alawite dialects) can only be explained from the Arabic form, which is a lexical innovation replacing original *libb
13 honey Arabic 3asal is clearly Arabic, and – as I've pointed out before – dabs is attested in Classical Arabic as well as in Hebrew and Aramaic
14 horn Arabic / Canaanite / Aramaic / Akkadian / Ugaritic no distinctive innovations
15 horse Arabic Syriac ḥsan 'strong' has s, not ṣ, but even if it were cognate, the Classical Arabic and Levantine form still share a semantic shift unattested in Aramaic
16 house Arabic / Canaanite / Aramaic / Ugaritic Akkadian can be ruled out, since it shows a shift *ay > ī which never happened in Levantine.
17 hundred Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, ʔ > y, is not shared with any of the ancient language in question
18 hunger Arabic Even assuming jūʕ has cognates elsewhere in Semitic, the change g > j is specific to Arabic
19 hunt Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, use of the D-stem, is not shared with any of the ancient languages
20 image Aramaic Since when is 'image' basic vocabulary? But yes, assuming we can trust the transcription, it shares the aw with Aramaic
21 inside Arabic / Aramaic Mixed signal here: the meaning looks like Aramaic, but the sound shift g > j is Arabic not Aramaic. In reality, the word *jaww must originally have meant 'inside' in Arabic too; it lost this meaning in Classical Arabic, but kept it in many of the dialects
22 iron Arabic
23 kidney Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, *y > w, is not shared with any of the ancient languages (but _is_ shared with many other modern Arabic dialects...)
24 kill Arabic / Canaanite Does not share Aramaic CaCVC > CCVC
25 king Arabic / Canaanite / Aramaic / Ugaritic Since when is 'king' basic vocabulary?
26 knee Arabic Shares a unique innovation with Arabic – the metathesis brk > rkb
27 know Arabic
28 laugh Arabic Shares a unique innovation with Arabic – the sound shift *ɬ' > ḍ (which came relatively late in Arabic – later than Sibawayh, even – and never happened in any other Semitic language). I can't speak for Amioun, but in general Levantine has ḍaḥak; if Amioun does have ḍaḥaq, the fact that it didn't become *ḍaḥaʔ suggests that the *k > q happened there only after the regular shift *q > ʔ, and hence has nothing to do with the Canaanite or Ugaritic forms.
29 leg innovative The alleged Ugaritic form is nonsense – Ugaritic had no j sound, and the dictionary of Del Olmo Lete and Sanmartin reveals no appropriate Ugaritic form. It is true that the Levantine form seems to be shared with Ethiopic and some Yemeni dialects, but not with any ancient language of the Fertile Crescent.
30 lion Arabic A very problematic choice as 'basic vocabulary'.
31 live Arabic / Canaanite / Aramaic Except that the Levantine form is clearly 'alive', not 'live', making the whole comparison problematic....
32 love Arabic The Arabic is of course mistranscribed - in his terms, it should be 2a7abba, whereas the Hebrew and Aramaic forms really do have a h.
33 make Arabic
34 man innovative 'zalame' is etymologically problematic – both Arabic and Aramaic etymologies have been proposed. 'rejjel' is of course from Arabic. dakar is 'male', not 'man'.
35 many Arabic
36 meat Arabic This shares a specific semantic shift with Arabic to the exclusion of the rest of Semitic : « staple food » > « meat »
37 milk Arabic / Ugaritic The root is common to several Semitic languages, but the use of the passive pattern fa3īl in this word is unique to Arabic
38 month Arabic Pretty sure the normal Levantine form is shahr, not sha7r, not that it makes any difference to the etymology – and for sure Syriac 'moon' below is sahrā, not šahrā.
39 moon Arabic

Saturday, December 23, 2017

Tokenistic Tifinagh #fail

In Oran (Algeria) when I was there a few days ago, political party posters were everywhere, advertising the recent local elections. Oran is nowhere near any major Berber-speaking region (though it has attracted a significant Kabyle Berber minority), and such posters – along with a few telecom ads – were almost the only publicly visible mark of Berber on its linguistic landscape. Their bilingualism is a token gesture towards the government's pious aspiration to make Tamazight (Berber) a national language, emanating from the centre rather than from the regions where it's actually spoken.

Among these, the FLN posters in particular caught my attention. Right under the Arabic name of the party, they included a line in Tifinagh (the Berber “heritage” script) that I couldn’t make head or tail of: ⵔⴵⵏⵜⴷⴻ ⵉⴱⵔⴰⵜⵉⴵⵏ ⴰⵜⵉⴵⵏⴰⵍⴻ. Transcribed, this reads rğntde ibratiğn atiğnale – which makes no sense; it’s not even possible in Berber to have e (schwa) at the end of a word.

It wasn’t until I started looking at my pictures on the flight back that the penny dropped. Just substitute o for ğ, and you get rontde ibration ationale. Restoring the capital and accented letters (neither of which Tifinagh has), you get Front de Libération Nationale. When the order came from on high to add Tamazight to the poster, some supremely indifferent functionary in the local FLN office must have literally downloaded a Tifinagh keyboard, typed in the French name of the party, and stuck it on the poster.

Most likely, this functionary was an Arabic speaker. In fairness, though, plenty of Kabyle speakers would have little idea how to render “National Liberation Front” into Kabyle. The officially acceptable way of doing so relies on neologisms developed by activists and familiar mainly to other activists, despite the gradually expanding efforts of teachers and broadcasters – and such activists are especially unlikely to be members of the FLN, given its general reluctance to promote Tamazight. What everyone actually calls it in practice (in Kabyle and in Arabic alike) is “FLN”.

I didn’t notice any similarly clearcut fails on other parties’ posters – though some didn’t bother with Tamazight at all, and at least one, the PT, opted for Latin characters instead. I did see a similar case on a jewelry shop, though, which prominently advertises ⴰⵔⴳⴻⵏⵜ argent, next to a picture of a recognizably Kabyle earring:

It's striking that both cases are based on French, rather than Arabic - even though the normal Kabyle word for "silver" is actually an Arabic loan, lfeṭṭa from الفضة. For some, apparently, the only really important thing about Tamazight is that it's not Arabic...

Sunday, December 10, 2017

Jerusalem's suppletive gentilic

Jerusalem stands out among Arab cities today not only culturally and religiously, but morphologically as well. In Modern Standard Arabic, the city of Jerusalem is al-Quds القدس, and the gentilic suffix is (properly -iyy), but "Jerusalemite" is Maqdisī مقدسي rather than the expected *Qudsī (though the latter is attested as a personal name). As a general cross-linguistic rule of thumb, morphological irregularities are most likely with older, more basic words. Yet this type of irregularity is rather unusual, even among the region's oldest and most prominent cities: Dimashq (Damascus) yields Dimashqī (Damascene), Baghdād yields Baghdādī, Makkah (Mecca) yields Makkī... How did it arise?

It turns out that, in the early Muslim era, it was formed in a perfectly regular way. In his masterwork, the medieval geographer Al-Maqdisī (d. 991) calls his hometown Bayt al-Maqdis بيت المقدس ("house of holiness"), a title now largely supplanted by al-Quds ("the holy"). It survives to the present in certain religious contexts or as a poetic synonym, not only in Arabic but in Kabyle Berber as well: H. Genevois ("Croyances") notes a traditional popular belief that the souls of the dead gather in Bit Elmeqdes, corresponding exactly to Al-Maqdisī's boast that Jerusalem is "the site of the Day of Judgement, and from it is the Resurrection, and to it is the Gathering" (عرصة القيامة ومنها النشر وإليها الحشر).

A quick search of Alwaraq's heritage library suggests that the shorter name "al-Quds" became popular around the period of the Crusades, when Jerusalem was as much a subject of dispute as now. The earliest attestation I can spot on a cursory search (excluding a work falsely attributed to al-Wāqidī) is a mention by the Andalusi traveller Ibn Jubayr (1185), who notes that "between [Kerak] and al-Quds is a day's march or so, and it is the best location in Palestine" (بينه وبين القدس مسيرة يوم أو اشف قليلاً، وهو سرارة أرض فلسطين). Very likely a longer search would yield slightly older attestations. By the time of the next major Palestinian writer I notice in the collection - Al-Ṣafadī (d. 1363) - al-Quds had clearly become the unmarked term for the town; it recurs constantly in his work.

The name Bayt al-Maqdis was thus replaced in practice by the shorter and catchier name al-Quds a good 800 years ago, yet the corresponding gentilic continues to preserve the older name. Since 1967, the Israeli government has imposed a third name as its official term for the city in Arabic: Ūrshalīm, a transcription of the Syriac name used in Christian liturgical contexts which provoked "furious ridicule" from residents (Segev 2007:492). Since this usage remains entirely unknown to most Arabic speakers, it is unlikely to have much impact on Arabic usage. Yet the timing of the shift from Bayt al-Maqdis to al-Quds reminds us that political upheaval impacts placenames as well as people's lives.

Monday, December 04, 2017

Tifinagh and place of articulation

The order of the Latin alphabet we use is a matter of historical chance; if it ever made sense, the reasons behind it were lost millennia ago. Many other writing systems, however, have tried to order their letters in a less arbitrary fashion. The most prominent successes for this approach are found in and around India, where scripts are usually ordered by place of articulation - ie, by how far back in the mouth they are pronounced - as in Devanagari: a..., ka ga kha gha ŋa, ca cha ja jha ña, ṭa ṭha ḍa ḍha ṇa... (After a couple of sound changes, this order ultimately also yields that of the Japanese kana: a, ka, sa (< ca), ta na, ha (< pa) ma, ya ra wa n.) In Arabic, the normal order of letters reflects a partial reordering by shape rather than by sound (thus ب ت ث are all grouped together, whereas in the older order they were far apart from one another). However, for technical purposes such as traditional phonetics and Qur'an recitation, one occasionally also finds the place-of-articulation order: indeed, the earliest Arabic dictionary (Kitāb al-`Ayn) used it (ع ح هـ خ غ ق ك ج ش ض ص س ز ط ت د ظ ذ ث ر ل ن ف ب م و ي ا ء).

Tifinagh, the traditional script of the Tuareg people of the Sahara, seems not to have any established traditional ordering. However, if you organize its letters by place of articulation, an obvious pattern emerges:

This table represents Tifinagh as used at Imi-n-Taborăq in Mali, as recorded by Elghamis (2011:64-65). (Note that w is a labio-velar sound; for obvious reasons, I've chosen to place it in the velar column rather than the labial one. Also, the letter put in the laryngeal plosive slot actually just indicates the presence of a final vowel, although there are reasons to suspect that it once represented a glottal stop.) There is a lot of regional variation in Tifinagh, but one thing stands out: in every variety, everything on the right side of the thick line - ie, everything velar or further back - is consistently formed exclusively out of dots, except for g - and even that is often composed of a combination of dots and lines. Throughout much of Tuareg, original g tends to be palatalized to [ɟ], and some dialects - like this one - have lost the distinction altogether.

How this distribution emerged is unclear for the moment. It is noteworthy, however, that dot letters did not exist in Tifinagh's ancestor, Libyco-Berber as used in the pre-Roman and early Roman periods (with rare, doubtful exceptions). Two of the dot letters have clear Libyco-Berber origins; ⴾ (k, three dots in a triangle) was originally ⥤ (k, a rightwards open arrow), while : (w) was originally =. Based on these two alone, one might suppose a sort of regular form shift of = to :, in which case the development might simply be coincidental. ⵗ (ɣ) may derive from the rarely attested ÷, whose value (q?) is speculative, while ... (x) is simply a rotation of ɣ. :: (q) had no Libyco-Berber equivalent, and is perhaps historically a visual "ligature" of ɣ and + (t) - the word-final cluster *ɣt becomes qq in Tuareg. The final vowel sign · might derive from classical ☰, which had the same function; alternatively, one might derive it from or the dot occasionally used to separate words, and suppose that classical ☰ actually yielded ⵂ (h), in which case the extra dot needs to be explained.

It's not impossible that Tifinagh users at some stage made a conscious link between back consonants and dots. But even if the distribution is just a coincidence, it should still be useful for anyone seeking to memorise the script.

Sunday, October 29, 2017

Butterfly-collecting: the history of an insult

Chomsky's barb about butterfly-collecting has echoed in the ears of descriptive linguists for decades, and is sometimes blamed for the withering away of field linguistics over the late 20th century. The earliest published version I could track down via Google is:
"You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so." (Chomsky 1979:57)
So I was surprised to find a similar statement attributed to the eminent early 20th century physicist Ernest Rutherford, quoted by Dyson (2006:179) as saying "Physics is the only real science; the rest are butterfly-collecting." How did this metaphor make its way into linguistics?

For a start, it appears that Dyson's version is somewhat inexact. The Rutherford quote appears to belong to the oral tradition of physics, rather than deriving from any publication of his; the earliest version that I can find on Google Books is from Baker (1942:96):

"These ideas are crystallized in the statement, attributed to Rutherford, that science consists of physics and stamp- collecting. This is an epigram intended to mean that particular objects are uninteresting : it is the extreme view-point of a general analytical scientist."
The shift from stamps to butterflies came decades later, first attested only in 1974. In fact, the derisive comparison to butterfly collecting seems likely to have seeped into linguistics not from physics but from, of all subjects, anthropology. Edmund Leach (1961:2) makes it the central metaphor of his assault of Radcliffe-Brown:
"Radcliffe-Brown maintained that the objective of social anthropology was the 'comparison of social structures'. [...] Comparison is a matter of butterfly collecting — of classification, of the arrangement of things according to their types and subtypes. The followers of Radcliffe-Brown are anthropological butterfly collectors and their approach to their data has certain consequences."
Anthropologists would reuse the metaphor in debates over the distinction between different types of comparison in linguistics itself, whether endorsing it like Lehman (1964:387) or rebutting the criticism like Sarana (1965:29). From there it seems to have been taken up by Chomskyan linguists as an argument against Bloomfield's "disovery procedures", if I am correctly interpreting the incomplete fragment of Ferber and Lynd (1971) that I can find on Google Books:
"These procedures, which are largely a matter of classification, have been uncharitably called "butterfly-collecting" in the manner of pre-Darwinian biology: they account for a detailed "external" description of each language (what Chomsky [...]"
Geoffrey Leech (1969:4) deploys the same metaphor against rhetoric:
"Connected to this is a second weakness of traditional rhetoric - what I am tempted to call its 'train-spotting' or 'butterfly-collecting' attitude to style. This is the frame of mind in which the identification, classification and labelling of specimens of given stylistic devices becomes an end in itself [...]"
The redeployment of this argument to belittle descriptive work in general, rather than particular approaches, seems to be attributable to David DeCamp (1971:158), criticizing sociolinguistics from a Chomskyan perspective:
"The weakest theory is a 'functional' model, which only relates outputs from the black box to inputs, e. g. a grammar which would generate all and only the sentences of a language; the goal of much scientific research is to replace such a functional model with a 'structural' model, one that makes the stronger claim of describing what is actually in the black box. Mendel's 'genes' were only a functional model of genetics; the research on the DNA and RNA molecules has yielded a model that is much more nearly structural. Thus one branch of biology has at last become a true science; general linguistics is approaching that status; sociolinguistics is still in the pre-theoretical, butterfly-collecting stage, with no theory of its own and uncertain whether it has any place in general linguistic theory."
He then clarifies (ibid:170) that:
"'Butterfly collecting' is simply the collection of a whole lot of information toward the day when somebody can produce a formal theory. Now this is valuable, this is useful. We need a lot of empirical data collection also. I certainly would not want to imply by this that in this I'm saying that there is not an importance to the kinds of things that the Urban Language Survey is doing at CAL, or Bill Labov's work in New York. This is immensely important. What I am saying is that although it is necessary, it is not sufficient. We've got enough data now; it is about time to guide further research by means of some sort of a theory."
So, if we have to blame one person for reducing descriptive linguistics to butterfly collecting, it looks like it would be David DeCamp, at least until someone tracks down an earlier citation. But that misses a broader point: the disparaging comparison of data gathering to butterfly collecting seems to have become rather pervasive across a variety of disciplines in the late 20th century - including biology itself, which may well be part of where DeCamp got it from. All the way back in 1964, Theodosius Dobzhansky - who had been an ardent butterfly collector before becoming a prominent evolutionary biologist - comments sarcastically that:
"The notion has gained some currency that the only worthwhile biology is molecular biology. All else is "bird watching" or "butterfly collecting." Bird watching and butterfly collecting are occupations manifestly unworthy of serious scientists!" (Dobzhansky 1964:443)
Had he lived to see molecular biology turn to such quintessentially descriptive, list-making pursuits as the Human Genome Project, he would surely have enjoyed having the last laugh.

(If you have any earlier citations bearing on the history of this metaphor in linguistics, please tell me below!)

Tuesday, October 24, 2017

Siwi on Wikipedia

I am not a big fan of Wikipedia, despite its usefulness. To contribute good material to it - and there is a lot of wonderful material there - is to make an article look reassuringly reliable. That appearance of reliability then makes the article prime prey for anybody with an ideological or even commercial agenda to push: one little edit, and their propaganda is integrated into the same text, gaining credibility from its context, and getting copied over and over and over. Nevertheless, the insistent niggling itch of knowing that "someone is wrong on the internet" eventually got to me, and last month I ended up massively expanding the article Siwi language - including a fairly extensive section on Siwi oral literature. Suggestions or comments are welcome, although I make no promises.