The web, a multilingual encyclopedia

Chapter 6

Indeed, yourDictionary.com has lots of new ideas. We plan to work with the Endangered Language Fund in the U.S. and Britain to raise money for the Foundation"s work and publish the results on our site. We will have language chat rooms and bulletin boards. There will be language games designed to entertain and teach fundamentals of linguistics. The Linguistic Fun page will become an online journal for short, interesting, yes, even entertaining, pieces on language that are based on sound linguistics by experts from all over the world."

As the portal for all languages without any exception, yourDictionary.com offered a section for endangered languages called the Endangered Language Repository.

As explained by Robert Beard: "Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world"s 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language ident.i.ty and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression."

How about the future of the web? "The web will be an encyclopedia of the world by the world for the world. There will be no information or knowledge that anyone needs that will not be available. The major hindrance to international and interpersonal understanding, personal and inst.i.tutional enhancement, will be removed. It would take a wilder imagination than mine to predict the effect of this development on the nature of humankind."

2000 > PROJECT GUTENBERG AND LANGUAGES

[Summary]

Project Gutenberg is a visionary project launched by Michael Hart in July 1971 to create free electronic versions of literary works and disseminate them worldwide. In 2010, Project Gutenberg offered more than 33,000 high-quality ebooks being downloaded by the tens of thousands every day, and websites in the United States, Australia, Europe and Canada, with 40 mirror sites worldwide.

Project Gutenberg mainly offers ebooks in English, but multilingualism has been one of its priorities since the late 1990s. French is the second language of the project. There were ebooks in 60 languages in December 2010, thanks to the patient work of Distributed Proofreaders, a website created in 2000 to share the proofreading of ebooks between hundreds of volunteers in many countries.

Project Gutenberg is a visionary project launched in July 1971 by Michael Hart to create free electronic versions of literary works and disseminate them worldwide. In the 15th century, Gutenberg allowed anyone to have print books for a small cost. In the 21th century, Project Gutenberg would allow anyone to have a digital library at no cost.

Michael worked from Illinois, typing in books from public domain, for example the Bible and the complete works of Shakespeare, first alone, then with the help of a few volunteers.

His project got a major boost with the invention of the web in 1990. 95% of internet users were native English speakers in the mid-1990s, so most books were in English.

Project Gutenberg was also inspiring other digital libraries in Europe. Projekt Runeberg was launched in Sweden in 1994 to digitize Nordic (Scandinavian) literature from public domain.

Projekt Gutenberg-DE was launched in Germany in 1994 to digitize German literature from public domain.

French was the second language of Project Gutenberg, and still is now. The first ebooks released in French were six works by Stendhal and two works by Jules Verne, all released in early 1997.

Three novels by Jules Verne were already available in English in 1994. Since then, Jules Verne has always stayed on the top list of the most downloaded authors.

In October 1997, Michael Hart wrote about producing more works in other languages than English in the Project Gutenberg newsletter.

In early 1998, on top of ten French ebooks, there were a few ebooks in German, Italian, Spanish and Latin. Released in May 1999, eBook #2000 was "Don Quijote" (1605), by Cervantes, in Spanish, its original language. In July 1999, Michael wrote in an email interview: "I am publishing in one new language per month right now, and will continue as long as possible."

The project got a new boost with the launching of Distributed Proofreaders, a website created in October 2000 by Charles Franks to share the proofreading of ebooks between hundreds of volunteers living in many countries.

Released in April 2002, eBook #5000 was "The Notebooks of Leonardo da Vinci" (written in the early 16th century), as an English translation from Italian, its original language. Since its release, it has regularly stayed in the top 100 downloaded ebooks.

There were works in 25 languages in early 2004, in 42 languages in July 2005, including Sanskrit and the Mayan languages, and in 59 languages in October 2010. The ten main languages were English (with 28,441 ebooks on 7 October 2010), French (1,659 ebooks), German (709 ebooks), Finnish (536 ebooks), Dutch (496 ebooks), Portuguese (473 ebooks), Chinese (405 ebooks), Spanish (295 ebooks), Italian (250 ebooks), and Greek (101 ebooks). The next languages were Latin, Esperanto, Swedish and Tagalog.

When machine translation will be judged 99% satisfactory, we may be able to read literary cla.s.sics in a choice of many languages.

The machine translated ebooks won"t compete with the work of literary translators and their labor of love during days and months if not years, but they will allow readers to get the gist of some literary works that have never been translated so far, or only translated in a few languages for commercial reasons.

The output of translation software could then be proofread by human translators, in a similar way the output of OCR software is proofread by the volunteers of Distributed Proofreaders. So, may be, we will see the creation of Distributed Translators one day, as a partner or sister project of Distributed Proofreaders and Project Gutenberg.

2001 > WIKIPEDIA, A COLLABORATIVE ENCYCLOPEDIA

[Summary]

Wikipedia was launched in January 2001 by Jimmy Wales and Larry Sanger (Larry resigned later on) as a global free collaborative online encyclopedia, financed by donations, with no advertising.

Its website is a wiki, which means that anyone can write, edit, correct and improve information throughout the encyclopedia, with people contributing under a pseudonym. The articles stay the property of their authors, and can be freely used according to Creative Commons or GFDL (GNU Free Doc.u.mentation License).

Wikipedia quickly became the largest reference website. It was in the top ten websites in December 2006, and in the top five websites in 2008. In May 2007, Wikipedia had 7 million articles in 192 languages, including 1.8 million articles in English, 589,000 articles in German, 500,000 articles in French, 260,000 articles in Portuguese, and 236,000 articles in Spanish. Wikipedia celebrated its tenth anniversary in January 2011 with 17 million articles in 270 languages et 400 million individual visits per month for all websites.

Wikipedia was launched in January 2001 by Jimmy Wales and Larry Sanger (Larry resigned later on) as a global free collaborative online encyclopedia.

Wikipedia was financed by donations, with no advertising. Its website is a wiki, which means that anyone can write, edit, correct and improve information throughout the encyclopedia, with people contributing under a pseudonym. The articles stay the property of their authors, and can be freely used according to Creative Commons or GFDL (GNU Free Doc.u.mentation License).

Wikipedia is hosted by the Wikimedia Foundation, founded in June 2003, which has run a number of other projects, beginning with Wiktionary (launched in December 2002) and Wikibooks (launched in June 2003), followed by Wikiquote, Wikisource (texts from public domain), Wikimedia Commons (multimedia), Wikispecies (animals and plants), Wikinews and Wikiversity (textbooks).

Wikipedia quickly became the largest reference website, with thousands of people contributing worldwide. In December 2004, Wikipedia had 1.3 million articles by 13,000 contributors in 100 languages. In December 2006, Wikipedia was among the top ten sites on the web, with 6 million articles. In May 2007, Wikipedia had 7 million articles in 192 languages, including 1.8 million articles in English, 589,000 articles in German, 500,000 articles in French, 260,000 articles in Portuguese, and 236,000 articles in Spanish.

In 2008, Wikipedia was in the top five websites. In September 2010, Wikipedia had 14 million articles in 272 languages, including 3.4 million articles in English, 1.1 million articles in German and 1 million articles in French. Wikipedia celebrated its tenth anniversary in January 2011 with 17 million articles in 270 languages et 400 million individual visits per month for all websites.

Wikipedia also inspired many other projects over the years, for example Citizendium, launched in 2007 as a pilot project to build a new encyclopedia.

Citizendium, an acronym for "The Citizen"s Compendium", was launched in March 2007 at the initiative of Larry Sanger, who co- founded Wikipedia with Jimmy Wales in January 2001, but resigned later on over policy and content quality issues, as well as the use of anonymous pseudonyms.

Citizendium is a wiki project open to public collaboration, but combining "public partic.i.p.ation with gentle expert guidance". The project is experts-led, not experts-only. Contributors use their own names, and they are guided by expert editors. As explained by Larry in his essay "Toward a New Compendium of Knowledge", posted in September 2006 and updated in March 2007: "Editors will be able to make content decisions in their areas of specialization, but otherwise working shoulder-to-shoulder with ordinary authors."

There are also constables who make sure the rules are respected.

There were 1,100 high-quality articles, 820 authors, and 180 editors in March 2007, 11,800 articles in August 2009, and 15,000 articles in September 2010. Citizendium wants to act as a prototype for upcoming large scale knowledge-building projects that would deliver reliable reference, scholarly and educational content.

2001 > UNL, A DIGITAL METALANGUAGE PROJECT

[Summary]

The UNDL Foundation (UNDL: Universal Networking Digital Language) was founded in January 2001 to develop and promote the UNL (Universal Networking Language) project. The UNL project was launched in 1996 as a major digital metalanguage project by the Inst.i.tute of Advanced Studies (IAS) of the United Nations University (UNU) in Tokyo, j.a.pan. As explained in 1998 on the bilingual English-j.a.panese website: "UNL is a language that -- with its companion "enconverter" and "deconverter" software -- enables communication among peoples of differing native languages.

It will reside, as a plug-in for popular web browsers, on the internet, and will be compatible with standard network servers."

At the time, 120 researchers worldwide were working on a multilingual project in 16 languages (Arabic, Brazilian, Chinese, English, French, German, Hindi, Indonesian, Italian, j.a.panese, Latvian, Mongolian, Russian, Spanish, Swahili, Thai).

The UNDL Foundation (UNDL: Universal Networking Digital Language) was founded in January 2001 to develop and promote the UNL (Universal Networking Language) project.

The UNL project was launched in 1996 as a major digital metalanguage project by the Inst.i.tute of Advanced Studies (IAS) of the United Nations University (UNU) in Tokyo, j.a.pan.

As explained in 1998 on the bilingual English-j.a.panese website: "UNL is a language that -- with its companion "enconverter" and "deconverter" software -- enables communication among peoples of differing native languages. It will reside, as a plug-in for popular web browsers, on the internet, and will be compatible with standard network servers. The technology will be shared among the member states of the United Nations. Any person with access to the internet will be able to "enconvert" text from any native language of a member state into UNL. Just as easily, any UNL text can be "deconverted" from UNL into native languages. United Nations University"s UNL Center will work with its partners to create and promote the UNL software, which will be compatible with popular network servers and computing platforms."

At the time, 120 researchers worldwide were working on a multilingual project in 16 languages (Arabic, Brazilian, Chinese, English, French, German, Hindi, Indonesian, Italian, j.a.panese, Latvian, Mongolian, Russian, Spanish, Swahiki, Thai). After things worked with 16 languages, other UN languages would be included in 2000.

UNL was meant to become the HTML of linguistic content. Possible applications would be multilingual email, multilingual information, active dictionaries for reading foreign languages online, and machine translation for navigating the web and monitoring websites.

The project was also important from a political and cultural point of view, as the first project building up tools for all languages on the internet, i.e. main languages as well as minority languages.