Technology and Books for All

Chapter 1

Technology and Books for All.

by Marie Lebert.


Michael Hart, who founded Project Gutenberg in 1971, wrote: "We consider eText to be a new medium, with no real relationship to paper, other than presenting the same material, but I don"t see how paper can possibly compete once people each find their own comfortable way to eTexts, especially in schools." (excerpt from a NEF interview, August 1998)

Tim Berners-Lee, who invented the web in 1989-90, wrote: "The dream behind the web is of a common information s.p.a.ce in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished. There was a second part of the dream, too, dependent on the web being so generally used that it became a realistic mirror (or in fact the primary embodiment) of the ways in which we work and play and socialize. That was that once the state of our interactions was on line, we could then use computers to help us a.n.a.lyse it, make sense of what we are doing, where we individually fit in, and how we can better work together." (excerpt from: The World Wide Web: A Very Short Personal History, May 1998)

John Mark Ockerbloom, who created The Online Books Page in 1993, wrote: "I"ve gotten very interested in the great potential the net had for making literature available to a wide audience. (...) I am very excited about the potential of the internet as a ma.s.s communication medium in the coming years. I"d also like to stay involved, one way or another, in making books available to a wide audience for free via the net, whether I make this explicitly part of my professional career, or whether I just do it as a spare-time volunteer." (excerpt from a NEF interview, September 1998)

Here is the journey we are going to follow:

1968: ASCII is a 7-bit coded character set.

1971: Project Gutenberg is the first digital library.

1974: The internet takes off.

1977: UNIMARC is set up as a common bibliographic format.

1984: Copyleft is a new license for computer software.

1990: The web takes off.

1991: Unicode is a universal double-byte character set.

1993: The Online Books Page is a list of free eBooks.

1993: The PDF format is launched by Adobe.

1994: The first library website goes online.

1994: Publishers put some of their books online for free.

1995: is the first main online bookstore.

1995: The mainstream press goes online.

1996: The Palm Pilot is the first PDA.

1996: The Internet Archive is founded to archive the web.

1996: Teachers explore new ways of teaching.

1997: Online publishing begins spreading.

1997: The Logos Dictionary goes online for free.

1997: Multimedia convergence is the topic of an international symposium.

1998: Library treasures like Beowulf go online.

1999: Librarians become webmasters.

1998: The web becomes multilingual.

1999: The Open eBook format is a standard for eBooks.

1999: Authors go digital.

2000: is a language portal.

2000: The Bible of Gutenberg goes online.

2000: Distributed Proofreaders digitizes books from public domain.

2000: The Public Library of Science (PLoS) works on free online journals.

2001: Wikipedia is the first main online cooperative encyclopedia.

2001: Creative Commons works on new ways to respect authors" rights on the web.

2003: MIT offers its course materials for free in its OpenCourseWare.

2004: Project Gutenberg Europe is launched as a multilingual project.

2004: Google launches Google Print to rename it Google Books.

2005: The Open Content Alliance (OCA) launches a world public digital library.

2006: Microsoft launches Live Search Books as its own digital library.

2006: The union catalog WorldCat goes online for free.

2007: Citizendium is a main online "reliable" cooperative encyclopedia.

2007: The Encyclopedia of Life will doc.u.ment all species of animals and plants.

[Unless specified otherwise, all quotations are excerpts from NEF interviews. These interviews are available online at .]

1968: ASCII


Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1968 by ANSI (American National Standards Inst.i.tute), with an update in 1977 and 1986. The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a-z, numbers, punctuation and basic symbols), i.e. the ones that are available on the English/American keyboard. Plain Vanilla ASCII can be read, written, copied and printed by any simple text editor or word processor. It is the only format compatible with 99% of all hardware and software. It can be used as it is or to create versions in many other formats. Extensions of ASCII (also called ISO-8859 or ISO-Latin) are sets of 256 characters that include accented characters as found in French, Spanish and German, for example ISO 8859-1 (Latin-1) for French.

[In Depth (published in 2005)]

Whether digitized years ago or now, all Project Gutenberg books are created in 7-bit plain ASCII, called Plain Vanilla ASCII. When 8-bit ASCII (also called ISO-8859 or ISO-Latin) is used for books with accented characters like French or German, Project Gutenberg also produces a 7-bit ASCII version with the accents stripped. (This doesn"t apply for languages that are not "convertible" in ASCII, like Chinese, encoded in Big-5.)

Project Gutenberg sees Plain Vanilla ASCII as the best format by far.

It is "the lowest common denominator." It can be read, written, copied and printed by any simple text editor or word processor on any electronic device. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete (or are already obsolete, like formats of a few short-lived reading devices launched since 1999). It is the a.s.surance collections will never be obsolete, and will survive future technological changes.

The goal is to preserve the texts not only over decades but over centuries. There is no other standard as widely used as ASCII right now, even Unicode, a universal double-byte character encoding launched in 1991 to support any language and any platform.



In July 1971, Michael Hart created Project Gutenberg with the goal of making available for free, and electronically, literary works belonging to public domain. A pioneer site in a number of ways, Project Gutenberg was the first information provider on the internet and is the oldest digital library. When the internet became popular in the mid-1990s, the project got a boost and gained an international dimension. The number of electronic books rose from 1,000 (in August 1997) to 5,000 (in April 2002), 10,000 (in October 2003), 15,000 (in January 2005), 20,000 (in December 2006) and 25,000 (in April 2008), with a current production rate of around 340 new books each month. With 55 languages and 40 mirror sites around the world, books are being downloaded by the tens of thousands every day. Project Gutenberg promotes digitization in "text format", meaning that a book can be copied, indexed, searched, a.n.a.lyzed and compared with other books. Contrary to other formats, the files are accessible for low-bandwidth use. The main source of new Project Gutenberg eBooks is Distributed Proofreaders, conceived in October 2000 by Charles Franks to help in the digitizing of books from public domain.

[In Depth (published in 2005, updated in 2008)]

The electronic book (eBook) is now 37 years old, which is still a short life comparing to the five and a half century print book. eBooks were born with Project Gutenberg, created by Michael Hart in July 1971 to make available for free electronic versions of literary books belonging to public domain. A pioneer site in a number of ways, Project Gutenberg was the first information provider on an embryonic internet and is the oldest digital library. Long considered by its critics as impossible on a large scale, Project Gutenberg had 25,000 books in April 2008, with tens of thousands downloads daily. To this day, n.o.body has done a better job of putting the world"s literature at everyone"s disposal, while creating a vast network of volunteers all over the world, without wasting people"s skills or energy.

During the first twenty years, Michael Hart himself keyed in the first hundred books, with the occasional help of others. When the internet became popular, in the mid-1990s, the project got a boost and gained an international dimension. Michael still typed and scanned in books, but now coordinated the work of dozens and then hundreds of volunteers across many countries. The number of electronic books rose from 1,000 (in August 1997) to 2,000 (in May 1999), 3,000 (in December 2000) and 4,000 (in October 2001).

37 years after its birth, Project Gutenberg is running at full capacity. It had 5,000 books online in April 2002, 10,000 books in October 2003, 15,000 books in January 2005, 20,000 books in December 2006 and 25,000 books in April 2008, with 340 new books available per month, with 40 mirror sites worldwide, and with books downloaded by the tens of thousands every day.

Whether they were digitized 30 years ago or digitized now, all the books are captured in Plain Vanilla ASCII (the original 7-bit ASCII), with the same formatting rules, so they can be read easily by any machine, operating system or software, including on a PDA, a cellphone or an eBook reader. Any individual or organization is free to convert them to different formats, without any restriction except respect for copyright laws in the country involved.

In January 2004, Project Gutenberg had spread across the Atlantic with the creation of Project Gutenberg Europe. On top of its original mission, it also became a bridge between languages and cultures, with a number of national and linguistic sections. While adhering to the same principle: books for all and for free, through electronic versions that can be used and reproduced indefinitely. And, as a second step, the digitization of images and sound, in the same spirit.