Friday, March 17, 2017

An index & a count of Fulfulde words used in Kaïdara

Last year I dusted off an old sub-project idea to index words used in Amadou Hampâté Bâ's Kaidara, a Fulani initiation tale originally published in parallel Fulfulde and French text. I've brought that to a level of completion with a list of occurrences of all Fulfulde words in Kaïdara ("Kaydara" in Fulfulde), each one tagged with the "stanza" (actually just a set of 10 numbered lines) in which it appears. This is complemented by a word frequency count using an online utility designed for such work.

The original idea goes back to a project proposal in the early 1990s for a follow-on phase to an original US Department of Education materials grant to produce a lexicon for the Maasina variety of Fula. That phase would have included on the one hand field research and the other "mining" of various Fulfulde texts for vocabulary and word forms. The Kaïdara idea fit under the latter.

At the time there were ASCII texts (with markup for accents and extended characters) of this and a few other texts available from an FTP site. The plan was to use a series of macros in WordPerfect to substitute characters as needed in such text, then to tag each word with the number of the line in which it appeared - tag meaning simply to affix the number in a manner similar to what I have just done with the index I'm making available. The resulting index could then be used to identify terms missing from the lexicon, and to look up how they and other words were used along with their translations in context. (Kaïdara of course is in verse, so the usage is stylized but still of interest.)

Ultimately the follow-on project was not funded, so the Fulfulde lexicon completed for the original project was further edited and slightly expanded for publication in 1993. And the idea of indexing Fulfulde texts in the manner described was shelved. In the intervening quarter century, a considerable amount of work has been done on corpus development for many languages, but not to my knowledge including Kaïdara (or other bilingual works in the "Classiques africaines" series).

In January 2016 I decided to make an index, using the digital copy of Kaïdara from WebPulaaku.net. That resource is very helpful, but I did find a number of small errors, which to me looked like scanos (these were most easily identifiable at the stage when the words were sorted alphabetically). This was a manual process, with some search & replaces: a set of 10 lines is copied (lines ending in 0-9, so numbering indicates 10s), and spaces are searched and replaced with the appropriate number and a hard return. A difference between this and the original concept is that the words are not tagged with their exact line, but rather with the set of ten lines within which they occur (still more exact than a page number would be).

At the end of that process, punctuation was stripped out of the complete list, again by search & replace, and then the list was sorted. It was at that point that the whole list had to be scanned visually for anomalies - for example several words repeating but one with a regular d instead of hooked ɗ, or what looks like a plural ending in -be when -ɓe is intended. And for single words, occasionally something doesn't look right and needs to be checked against what was printed in the book.

It is entirely possible that I (1) missed errors, or (2) introduced errors. Ideally an automated process (that could be run more than once) could do such work. But for the moment, here is a way of searching Fulfulde words in Kaïdara, and a different way to look at its contents.

No comments: