The story began last week of a Canadian comic book publisher, Joseph John, astonished to discover there was no Google Translate option for Cree. Perhaps it was the request of a genuine cultural “outsider” (a real Indian from India!) that continues to draw major media attention to the petition to Google that began as a result. Simon Bird (#CreeSimonSays) did a great job speaking up for the language, and raised intriguing possibilities – which also raise questions that need answers.
[Also: Hats off to Sienna Deschambault in the photo above, already taking a leadership role in Cree language revitalization!]
Question One: Which Cree?
Numerous pieces here at Cree Literacy Network (including this one) hint at the depth and diversity of Cree language variation. “Cree” may seem like a single language to a cultural outsider, but even insiders are not necessarily fully aware that what Western Cree people call “Cree” and what Eastern Cree people call “Cree” are actually different languages (with some mutual intelligibility). And within each group, variation is significant.
(For another view of the Cross-Canada picture of Cree linguistic variation, you can look at our popular Google map (still a work in progress), “Cree Names of Cree-speaking Communities.” (Incredibly, it’s been viewed nearly 83,000 times since posting).
Question Two: What does Google need to make this happen?
A petition attempts to pressure those in power to make something happen. But even if Google is willing, their translation tools can’t simply pull Cree (of any variety) out of thin air. What Google needs, in this case, is an enormous collection of good, written examples of Cree language (perhaps a million examples – not a million different words), and a “concerted effort” by the language community.
For other languages around the world, Google’s tools use computer programs to assemble and then analyze a database of hundreds of thousands of language samples, scanned in from published books, and newspapers, or harvested from the internet. For most languages (take Hindi, for example) enormous, useable samples are generated every day through online news publications and simple communication between fully fluent speakers, reading and writing that language.
Google translate cannot teach people how to speak their own language, or invent new ways of spelling: it builds its tools on established language use. When we look at existing FaceBook Cree language groups, we still see regular declarations that spelling doesn’t matter. This may be true, for one or two words at a time exchanged between fully fluent speakers. But you can’t read or write a book that way (at least, not one that anyone else can read). Computer programs are built on rules and predictability.
For those who think the language simply shouldn’t be shared at all: sadly, linguistic research world wide shows that there is no more effective way of ensuring and accelerating language death.
Question Three: Who can be part of this “Concerted Effort”?
In the news story, Simon mentions his own recent involvement with the 21st Century Tools for Indigenous Languages project at the University of Alberta (funded by the Social Sciences and Humanities Research Council of Canada). In particular, he mentions the itwêwina dictionary project (you can also find a link on the main menu bar above, highlighted in red). This dictionary is built using the same kind of computer tools that Google would create, and it’s based on a database of all the well written samples of Cree that can be found. Over the last year, its functionality and user-friendliness have grown in leaps and bounds.
Here in our tenth year at the Cree Literacy Network, we’ve been on board with the UofA project since it began. Every one of our Cree language posts becomes part of the Cree language database that helps to fuel the project. Each one of our contributors uses the same foundation of Cree language literacy and standard spelling laid by our honorary founders Dr Jean Okimâsis, and the late Dr Freda Ahenakew, whose work follows on that of the late Ida McLeod, and the late Rev. Canon Edward Ahenakew, going back a century or more.
The initial focus may be on y-dialect, because it has the largest existing collection of written examples, but the deep similarities among the Western dialects opens the door to relatively rapid inclusion of th- and n-dialects (if they are built on the same principles of standard spelling).
There’s an old saying that the best time to plant a tree is thirty years ago. Maybe that’s the really good news in this story. The real work began ages ago, and continues to make significant progress through good faith contributions of speakers, teachers, linguists and advocates from across the Cree language continuum. And many of the CLN’s closest friends and colleagues and readers (that means you!) continue to contribute informally every day to the language databases needed to support the computer tools that will help to revitalize Cree as the fully functional 21st-century language that future generations deserve.
5 Responses
What an awesome post, Thats so good to hear. Keep up the good content.
Blaine
Flying Dust, ohci niya
Thanks, Blaine: ay-hay. CLN always hopes to be part of the solution!
Thank you for this. The part on what is required for building well-functioning machine-learnt language technological tools (that Google typically creates) presents the core challenge – having large amounts of similarly spelled texts, though recent research suggests that one might make do with lesser amounts, though providing lesser functionality. Combing data from multiple dialects and spelling conventions would increase the overall extent of training materials, but the resultant computational model would present confusing results (imagine Google translating from English into a random mixture of French, Spanish, Portuguese and Romanian) and be of little real practical use, if not quite the contrary. A partial alternative to needing large amounts of texts is using rule-based techniques such as we use for our computational model of Cree word-structure, but that is largely based on Dr. Wolvengrey’s lexical database and inflectional paradigms, and other information from e.g. the Maskwacîs Cree Dictionary and the Alberta Elders’ Cree Dictionary (which all again are based on extended study of language use). But though such rule-based computational models can represent word-structure quite well (recognize myriad inflected forms, generate inflectional paradigms, allow for spell-checking), they do not easily extend beyond the vocabulary they contain, nor do they by themselves deliver machine-translation. However, some limited objectives in translating might be achieved, e.g. simple English phrases matching some core set of complex Cree word forms.
Thanks, Antti for clarification of what we might be able to manage with less. (But I love the idea of gathering all possible data!)
I am looking for Creative and Interesting methods in Teaching Cree Vocabulary? Thank You