The web, a multilingual encyclopedia

Chapter 8

Prior to this date, Google used a Systran based translator like Babel Fish in Yahoo!, with several stages for the language options:

First stage: English to French, German, and Spanish, and vice versa.

Second stage: English to Portuguese and Dutch, and vice versa.

Third stage: English to Italian, and vice versa.

Fourth stage: English to simplified Chinese, j.a.panese and Korean, and vice versa.

Fifth stage (April 2006): English to Arabic, and vice versa.

Sixth stage (December 2006): English to Russian, and vice versa.

Seventh stage (February 2007): English to traditional Chinese, and simplified Chinese to traditional Chinese, and vice versa.

Here were the first language options for Google"s translation system:

First stage (October 2007): All language pairs previously available were available in any language combination.

Second stage: English to Hindi, and vice versa.

Third stage (May 2008): Bulgarian, Croatian, Czech, Danish, Finnish, Greek, Norwegian, Polish, Romanian, Swedish, with any combination.

Fourth stage (September 2008): Catalan, Filipino, Hebrew, Indonesian, Latvian, Lithuanian, Serbian, Slovak, Slovene, Ukrainian, Vietnamese.

Fifth stage (January 2009): Albanian, Estonian, Galician, Hungarian, Maltese, Thai, Turkish.

Sixth stage (June 2009): Persian.

Seventh stage (August 2009): Afrikaans, Belarussian, Icelandic, Irish, Macedonian, Malay, Swahili, Welsh, Yiddish.

Eighth stage (January 2010): Haitian Creole.

Ninth stage (May 2010): Armenian, Azeri, Basque, Georgian, Urdu.

Tenth stage (October 2010): Latin.

Etc.

A speech program was launched in 2009 to read the translated text, with new languages added over the months. In January 2011, people could choose different translations for a word in Google Translate.

Google Translator Toolkit is a web service allowing (human) translators to edit the translations automatically generated by Google Translate. Translators can also use shared translations, glossaries and translation memories. Starting in June 2009 with English as a source language and 47 target languages, Google Translator Toolkit supported 100,000 language pairs in May 2011, with 345 source languages into 345 target languages.

2009 > 6,909 LIVING LANGUAGES IN THE ETHNOLOGUE

[Summary]

6,909 living languages were cataloged in the 16th edition (2009) of "The Ethnologue: Languages of the World", an encyclopedic reference work freely available on the web since 1996, with a print book for sale. As stated by Barbara Grimes, its editor from 1971 to 2000, the Ethnologue is "a catalog of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other socio-linguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps." A core team of researchers in Dallas, Texas, has been helped by thousands of linguists gathering and checking information worldwide. A new edition of the Ethnologue is published approximately every four years.

6,909 living languages were cataloged in the 16th edition (2009) of "The Ethnologue: Languages of the World", an encyclopedic reference work freely available on the web since 1996, with a print book for sale.

As stated by Barbara Grimes, its editor from 1971 to 2000, the Ethnologue is "a catalog of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other socio-linguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps."

A core team of researchers in Dallas, Texas, has been helped by thousands of linguists gathering and checking information worldwide. A new edition of the Ethnologue is published approximately every four years.

The Ethnologue has been an active research project since 1950. It was founded by Richard Pittman as a catalog of minority languages, to share information on language development needs around the world with his colleagues at SIL International and other language researchers.

Richard Pittman was the editor of the 1st to 7th editions (1951- 1969).

Barbara Grimes was the editor of the 8th to 14th editions (1971- 2000). In 1971, information was expanded from primarily minority languages to encompa.s.s all known languages of the world. Between 1967 and 1973, Barbara completed an in-depth revision of the information on Africa, the Americas, the Pacific, and a few countries of Asia. During her years as editor, the number of identified languages grew from 4,493 to 6,809. The information recorded on each language expanded so that the published work more than tripled in size.

In 2000, Raymond Gordon Jr. became the third editor of the Ethnologue and produced the 15th edition (2005).

In 2005, Paul Lewis became the editor, responsible for general oversight and research policy, with Conrad Hurd as managing editor, responsible for operations and database management, and Raymond Gordon as senior research editor, leading a team of regional and language-family focused research editors.

In the Introduction of the 15th edition (2009), the Ethnologue defines a language as such: "How one chooses to define a language depends on the purposes one has in identifying that language as distinct from another. Some base their definition on purely linguistic grounds. Others recognize that social, cultural, or political factors must also be taken into account. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and ident.i.ty much more than to the linguistic features of the language(s) in question."

As explained in the introduction, one feature of the database since its inception in 1971 has been a system of three-letter language identifiers (for example "fra" for French), that were included in the publication itself from the 10th edition (1984) onwards.

At the invitation of the International Organization for Standardization (ISO) in 2002, SIL International prepared a new standard that reconciled the complete set of codes used in the Ethnologue with the codes already in use in the ISO 639-2 standard (1998), that identified only 400 languages, as well as codes developed by Linguist List to handle ancient and constructed languages. Published in 2007, the ISO 639-3 standard provided three-letter codes for identifying nearly 7,500 languages. SIL International was named the registration authority for the inventory of language identifiers, and administers the annual cycle for changes and updates.

2010 > A UNESCO ATLAS FOR ENDANGERED LANGUAGES

[Summary]

In 2010, UNESCO (United Nations Educational, Scientific and Cultural Organization) launched a free Interactive Atlas of the World"s Languages in Danger. The online edition is a complement of the print edition (3rd edition, 2010), edited by Christopher Moseley, and available in English, French and Spanish, with previous editions in 1996 and 2001. 2,473 languages were listed on 4 June 2011, with a search engine by country and area, language name, number of speakers from/to, vitality and ISO 639-3 code. The language names have been indicated in English, French and Spanish transcriptions. Alternate names (spelling variants, dialects or names in non-Roman scripts) are also provided.

In 2010, UNESCO (United Nations Educational, Scientific and Cultural Organization) launched a free Interactive Atlas of the World"s Languages in Danger.

The online edition is a complement of the print edition (3rd edition, 2010), edited by Christopher Moseley, and available in English, French and Spanish, with previous editions in 1996 and 2001.

2,473 languages were listed on 4 June 2011, with a search engine by country and area, language name, number of speakers from/to, vitality and ISO 639-3 code.

The language names have been indicated in English, French and Spanish transcriptions. Alternate names (spelling variants, dialects or names in non-Roman scripts) are also provided.

# About language vitality

UNESCO"s Language Vitality and Endangerment framework has established six degrees of vitality/endangerment: safe, vulnerable, definitely endangered, severely endangered, critically endangered, extinct.

"Safe" -- not included in the atlas -- means that the language is spoken by all generations and that intergenerational transmission is uninterrupted.

"Vulnerable" means that most children speak the language, but it may be restricted to certain domains, for example at home.

"Definitely endangered" means that children no longer learn the language as a mother tongue in the home.

"Severely endangered" means that the language is spoken by grand- parents and older generations. While the parent generation may understand it, they don"t speak it to children or among themselves.

"Critically endangered" means that the youngest speakers are grandparents and older, and they speak the language partially and infrequently.

"Extinct" means there are no speakers left. The atlas includes presumably extinct languages since the 1950s.

# How to define an endangered language

When exactly is a language considered as endangered? As explained by UNESCO on the interactive altas" website: "A language is endangered when its speakers cease to use it, use it in fewer and fewer domains, use fewer of its registers and speaking styles, and/or stop pa.s.sing it on to the next generation. No single factor determines whether a language is endangered."

UNESCO experts have identified nine factors that should be considered together: (1) intergenerational language transmission; (2) absolute number of speakers; (3) proportion of speakers within the total population; (4) shifts in domains of language use; (5) response to new domains and media; (6) availability of materials for language education and literacy; (7) governmental and inst.i.tutional language att.i.tudes and policies including official status and use; (8) community members" att.i.tudes towards their own language; (9) amount and quality of doc.u.mentation.