The web, a multilingual encyclopedia

Chapter 7

The UNDL Foundation (UNDL: Universal Networking Digital Language) was founded in January 2001 to develop and promote the UNL project, and became a partner of the United Nations.

The definition of UNL has evolved over the years. According to the UNDL Foundation"s website in 2010: "UNL is a computer language that enables computers to process information and knowledge. It is designed to replicate the functions of natural languages. Using UNL, people can describe all information and knowledge conveyed by natural languages for computers. As a result, computers can intercommunicate through UNL and process information and knowledge using UNL, thus providing people with a Linguistic Infrastructure (LI) in computers and the internet for distributing, receiving and understanding multilingual information. Such multilingual information can be accessed by natural languages through the UNL System. UNL, as a language for expressing information and knowledge described in natural languages, has all the components corresponding to that of a natural language."

2001 > A MARKET FOR LANGUAGE TRANSLATION SOFTWARE

[Summary]

The development of electronic commerce boosted language translation software, products and services targeting the general public, language professionals, and companies localizing their websites. The software, products and services were developed for example by Alis Technologies, Globalink, Lernout & Hauspie, Softissimo and IBM. In March 2001, IBM embarked on a growing translation market with a high-end professional product, the WebSphere Translation Server. The software could instantly translate webpages, emails and chats from/into several languages (Chinese, English, French, German, Italian, j.a.panese, Korean, Spanish). It could process 500 words per second and add terminology to the software. Computer-a.s.sisted translation (CAT) software were developed for professional translators, based on "translation memory" with terminology processing in real time, for example Wordfast, created in 1999 by Yves Champollion. Worldfast could be used on any platform (Windows, Mac, Linux), and was compatible with the software of other key players like IBM and SDL Trados.

The development of electronic commerce boosted language translation software, products and services targeting the general public, language professionals, and companies localizing their websites.

The software, products and services were developed for example by Alis Technologies, Globalink, Lernout & Hauspie, Softissimo and IBM.

In March 2001, IBM embarked on a growing translation market with a high-end professional product, the WebSphere Translation Server.

The software could instantly translate webpages, emails and chats from/into several languages (Chinese, English, French, German, Italian, j.a.panese, Korean, Spanish). It could process 500 words per second and add terminology to the software.

Machine translation can be defined as the automated process of translating a text from one language to another language. MT a.n.a.lyzes the text in the source language and automatically generates the corresponding text in the target language. With the lack of human intervention during the translation process, machine translation differs from computer-a.s.sisted translation (CAT), based on interaction between the translator and the computer.

Computer-a.s.sisted translation (CAT) software were developed for professional translators, based on "translation memory" with terminology processing in real time, for example Wordfast, created in 1999 by Yves Champollion. Worldfast was compatible with the software of other key players like IBM and SDL Trados. Available for any platform (Windows, Mac, Linux), Wordfast had 14,000 customers worldwide in 2010, including the United Nations, Coca- Cola and Sony.

According to Tim McKenna, a writer and philosopher interviewed in October 2000: "When software gets good enough for people to chat or talk on the web in real time in different languages, then we will see a whole new world appear before us. Scientists, political activists, businesses and many more groups will be able to communicate immediately without having to go through mediators or translators."

A further step could be "transcultural, transnational transparency", as stated in September 1998 by Randy Hobler, a consultant in internet marketing of translation software and services: "We are rapidly reaching the point where highly accurate machine translation of text and speech will be so common as to be embedded in computer platforms, and even in chips in various ways.

At that point, and as the growth of the web slows, the accuracy of language translation hits 98% plus, and the saturation of language pairs has covered the vast majority of the market, language transparency (any-language-to-any-language communication) will be too limiting a vision for those selling this technology. The next development will be "transcultural, transnational transparency", in which other aspects of human communication, commerce and transactions beyond language alone will come into play. For example, gesture has meaning, facial movement has meaning and this varies among societies. (...)

There are thousands of ways in which cultures and countries differ, and most of these are computerizable to change as one goes from one culture to the other. They include laws, customs, business practices, ethics, currency conversions, clothing size differences, metric versus English system differences, etc. Enterprising companies will be capturing and programming these differences and selling products and services to help the peoples of the world communicate better. Once this kind of thing is widespread, it will truly contribute to international understanding."

2004 > THE WEB 2.0, COMMUNITY AND SHARING

[Summary]

The term "web 2.0" was invented in 2004 by Tim O"Reilly, founder of O"Reilly Media, a publisher of computer books, as a t.i.tle for a series of conferences he was organizing. The web 2.0 was based on community and sharing, with a wealth of websites whose content was supplied by users, such as blogs, wikis, social networks and collaborative encyclopedias. Wikipedia, Facebook and Twitter, of course, but also tens of thousands of others. The web 2.0 may begin to fulfill the dream of Tim Berners-Lee, who invented the web in 1990, and wrote in an essay dated April 1998: "The dream behind the web is of a common information s.p.a.ce in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished. ("The World Wide Web: A very short personal history", available on his webpage on the W3C website)

The term "web 2.0" was invented in 2004 by Tim O"Reilly, founder of O"Reilly Media, a publisher of computer books, as a t.i.tle for a series of conferences he was organizing.

The web 2.0 was based on community and sharing, with a wealth of websites whose content was supplied by users, such as blogs, wikis, social networks and collaborative encyclopedias. Wikipedia, Facebook and Twitter, of course, but also tens of thousands of others.

The web 2.0 may begin to fulfill the dream of Tim Berners-Lee, who invented the web in 1990, and wrote in an essay dated April 1998: "The dream behind the web is of a common information s.p.a.ce in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished."

("The World Wide Web: A very short personal history", available on his webpage on the W3C website)

The first blog was launched in 1997. A blog is an online diary kept by a person or a group, usually in reverse chronological order, and can be updated every minute or once a month. There were 14 million blogs worldwide in July 2005, with 80,000 new blogs per day. According to Technorati, the first blog search engine, there were 65 million blogs in December 2006, with 175,000 new blogs per day. Some blogs are devoted to photos (photoblogs), music (audioblogs or podcasts), and videos (vlogs or videoblogs).

The wiki concept became quite popular in 2000. Deriving from the Hawaiian term "wiki" ("fast"), a wiki is a website allowing multiple users to collaborate online on the same project. Users can contribute to drafting content, editing it, improving it, and updating it. The software can be simple or more elaborate. A simple program handles text and hyperlinks. With a more elaborate program, one can embed images, charts, tables, etc. The most famous wiki is Wikipedia.

Facebook was founded in February 2004 by Mark Zuckerberg and his fellow students as a social network. Originally created for the students of Harvard University, it was then available to students from any university in the U.S. before being open to anyone worldwide in September 2006, to connect with relatives, friends and strangers. Facebook was the second most visited website after Google, with 500 million users in June 2010, while sparking debates on privacy issues.

Founded in 2006 by Jack Dorsey and Biz Stone, Twitter is a social networking and micro-blogging tool to send free short messages of 140 characters maximum, called tweets, via the internet, IM or SMS.

Sometimes described as the SMS of the internet, Twitter gained worldwide popularity, with 106 million users in April 2010, and 300,000 new users per day. As for tweets, there were 5,000 per day in 2007, 300,000 in 2008, 2.5 million in 2009, 50 million in January 2010, and 55 million in April 2010, with the archiving of public tweets by the Library of Congress as a reflection of the trends of our time.

We now try to fulfill the second part of Tim Berners-Lee"s dream, according to his essay dated April 1998: "There was a second part of the dream, too, dependent on the web being so generally used that it became a realistic mirror (or in fact the primary embodiment) of the ways in which we work and play and socialize.

That was that once the state of our interactions was online, we could then use computers to help us a.n.a.lyze it, make sense of what we are doing, where we individually fit in, and how we can better work together."

2007 > THE ISO 639-3 STANDARD TO IDENTIFY LANGUAGES

[Summary]

The first standard to identify languages was ISO 639-1, adopted by the International Organization for Standardization (ISO) in 1988 as a set of two-letter identifiers. The ISO 639-2 standard followed in 1998 as a set of three-letter codes identifying 400 languages. Published by SIL International, the Ethnologue, an encyclopedic catalog of living languages, had also developed its own three-letter codes in its database since 1971, with their inclusion in the publication itself since 1984 (10th edition). ISO 639-2 quickly became outdated. In 2002, at the invitation of the International Organization for Standardization, SIL International prepared a new standard that reconciled the complete set of identifiers used in the Ethnologue with the identifiers already in use in ISO 639-2, as well as identifiers developed by the Linguist List to handle ancient and constructed languages. Published in 2007, the ISO 639-3 standard provided three-letter codes for identifying 7,589 languages. SIL International was named the registration authority for the inventory of language identifiers.

Published in 2007, the ISO 639-3 standard provided three-letter codes for identifying 7,589 languages.

The first standard to identify languages was ISO 639-1, adopted by the International Organization for Standardization (ISO) in 1988 as a set of two-letter language identifiers.

The ISO 639-2 standard followed in 1998 as a set of three-letter codes identifying 400 languages. The standard was a convergence of ISO 639-1 and the ANSI Z39.53 standard (ANSI: American National Standards Inst.i.tute). The ANSI standard corresponded to the MARC (Machine Readable Cataloging) language codes, a set of three- letter identifiers developed by the library community and adopted as an American National Standard in 1987.

Published by SIL International, the Ethnologue, an encyclopedic catalog of living languages, had also developed its own three- letter codes in its database since 1971, with the inclusion in the encyclopedia itself from the 10th edition (1984) onwards.

ISO 639-2 quickly became insufficient because of the small number of languages it could handle. In 2002, at the invitation of the International Organization for Standardization, SIL International prepared a new standard that reconciled the complete set of codes used in the Ethnologue with the codes already in use in ISO 639-2, as well as codes developed by the Linguist List -- a main distribution list for linguists -- to handle ancient and constructed languages.

Approved in 2006 and published in 2007, the ISO 639-3 standard provided three-letter codes for identifying 7,589 languages, with a list of languages as complete as possible, living and extinct, ancient and reconstructed, major and minor, and written and unwritten. SIL International was named the registration authority for the inventory of language identifiers, and administers the annual cycle for changes and updates.

2007 > GOOGLE TRANSLATE

[Summary]

Launched by Google in October 2007, Google Translate is a free online language translation service that instantly translates a section of text, doc.u.ment or webpage into another language. Users paste texts in the web interface or supply an hyperlink. The automatic translations are produced by statistical a.n.a.lysis rather than traditional rule-based a.n.a.lysis. Prior to this date, Google used a Systran based translator like Babel Fish in Yahoo! As an automatic translation tool, Google Translate can help the reader understand the general content of a foreign language text, but doesn"t deliver accurate translations. In 2009, the text could be read by a speech program, with new languages added over the months.

Released in June 2009, Google Translator Toolkit is a web service allowing (human) translators to edit the translations automatically generated by Google Translate. In January 2011, people could choose different translations for a word in Google Translate.

Launched by Google in October 2007, Google Translate is a free online language translation service that instantly translates a section of text, doc.u.ment or webpage into another language.

Users paste texts in the web interface or supply an hyperlink. The automatic translations are produced by statistical a.n.a.lysis rather than traditional rule-based a.n.a.lysis.

As an automatic translation tool, Google Translate can help the reader understand the general content of a foreign language text, but doesn"t deliver accurate translations.