Progress in Informatics, No.2, pp. 1-23, (2005)

Research Paper

Access, Claims and Quality on the Internet: Future Challenges

                                                                                      Kim H. VELTMAN

European University of Culture, Paris


The vision of access to human knowledge has existed explicitly at least since the time of Aristotle  In 1934, Otlet outlined a vision of comprehensive access to knowledge. Progress towards this vision entailed initial visions of hypertext, markup languages, the semantic web, Wikipedia and more recently a series of developments with respect to Open Source. A brief survey of these developments is provided. 

The rhetoric of the Internet insists that everything should be accessible by everyone at anytime. This poses obvious technical challenges and serious philosophical problems of method. If everything is accessible then how do we separate the chaff from the grain and how do we identify quality? Following a survey of important developments, this essay suggests five dimensions that need to be included in a future web: 1) variants and multiple claims; 2) levels of certainty in making a claim; 3) levels of authority in defending a claim; 4) levels of significance in assessing a claim; 5) levels of thoroughness in dealing with a claim.     



Internet Access, Quality, Distributed Repositories, Networks 




2. Hypertext

2.1 Freely Available

2.2 Intranets

2.3 Commercial

2.3.i Reference 

2.3.ii Journals

2.3.iii. Books and Publications

       2.4 Commercial Infrastructures

       2.5 Government Initiatives

    3. Semantic Web

    4. Wikipedia

    5. Open Source

    6. Variants

6.1 Names

                           6.2 Associations

                 6.3 Attributions

                                      6.4 Claims

1.                    7. Certainty

7.1 Direct and Indirect Links

                                      7.2 Degrees of Identity

                                      7.3. Levels of Certainty

2.                    8. Authority

3.                    9. Significance

9.1 Peer Review

9.2 Citation Indexes

9.3 Automatic Citation Indexes

10. Thoroughness

11.     New Criteria for Scholarship

12.     Conclusions



The vision of access to the whole of human knowledge is probably as old as mankind. It inspired Aristotle, the Natural History of Pliny, the Summas of Thomas Aquinas, and the Encyclopédie of Diderot and dfAlembert. The 19th and early 20th centuries saw a quest to gain universal access to both primary and secondary literature. In 1934, this vision inspired Paul Otlet to outline the idea of comprehensive access to human knowledge:  

ga technology will be created acting at a distance and combining radio, X-rays, cinema and microscopic photography. Everything in the universe, and everything of man, would be registered at a distance as it was produced. In this way a moving image of the world will be established, a true mirror of his memory. From a distance, everyone will be able to read text, enlarged and limited to the desired subject, projected on an individual screen. In this way, everyone from his armchair will be able to contemplate creation, as a whole or in certain of its parts.h[1]

By 1943, Otlet had sketched how such a machine to imagine the world might look (machine à penser le monde). In 1945, Vannevar Bush published a similar idea. In 1948, Claude Shannon, published a theory of information, which became one of the cornerstones of the Internet. In practical terms, the Internet began in the United Kingdom in 1968 and served as a basis for the U.S. Internet, which began in 1969. In the course of two decades the Internet became a tool for c. 100,000 academic users. The innovations of Tim Berners-Lee and Robert Cailliau (CERN) transformed the Internet into a World Wide Web (WWW). By March 2005, Google provided access to 8,058,044,651 web pages and to 1,187,630,000 images.[2] By the end of 2005, the Internet is predicted to reach 1 billion fixed line users.

Problems remain qua finding meaningful hits, knowing whether they are reliable, creating new tools to make this possible and frameworks for searching and filtering that save us from a state of sheer chaos. The good news is that the past decades have brought many initiatives which point to new solutions. Hypertext, the semantic web, Wikipedia and Open Source have brought many positive steps forward. This paper surveys these developments and outlines some the challenges that lie ahead.

1.       Hypertext Developments

One impulse in this direction has come from the computer science community. The article by Vannevar Bush was a direct inspiration for Douglas Engelbartfs ideas about collaborative and augmented knowledge. This inspired Ted Nelson to coin the term hypertext, which then made its way into the scholarly community in the 1980s.[3] Slowly the computer science community extended its interests beyond linking to the idea of annotating[4] and a vision of self-annotation.[5]

Much less publicized has been another impulse, which came from the heart of the scholarly community. In the years 1942-1946, Father Roberto Busa, while preparing his doctorate at the Pontificia Università Gregoriana (Rome), conceived the idea of linguistic analysis of Thomas Aquinas using computers. In 1949, the young Jesuit, approached the president of IBM about an electronic concordance to the collected works of Thomas Aquinas. In the next two decades he produced an Index Thomisticus of 10 million words, which contained every word and every expression of Aquinas, which amounted to 70,000 pages in 52 volumes in published form, and was made available on a single 12h disc.[6]  In 1992, Father Busa went on to found the School of Lexicography and Hermeneutics (Scuola di Lessicografia ed Ermeneutica) at the Pontificia Università Gregoriana in Rome.[7]

In the 1960s and 1970s, computers began to have a serious impact in the study of the classics.[8] In 1972, Marianne McDonald set out to transform the Thesaurus Linguae Graecae (TLG) into an electronic data bank of ancient Greek Literature. The project was publicly released in 1982, was then made available in CD-ROM format (1985) and in April 2001, it gbecame available online to subscribing institutions and individuals. The web version currently provides access to 3,700 authors and 12,000 works, approximately 91 million wordsh (i.e. virtually every surviving ancient Greek text from 800 B.C. to 600 A.D).[9]

During the 1980s, Standard Generalized Markup Language (SGML) was applied to a number of other major projects such as the Dictionary of Old English (DOE)[10], and the Records of Early English Drama (REED).[11] Yuri Rubinsky,[12] one of the pioneers involved in these projects, founded SoftQuad,[13] partly to explore implications of these developments for publishing. By 1994, his ideas helped inspire new approaches to metadata that led to the Dublin Core initiative and subsequently to the vision of a semantic web. The work on SGML also inspired a group of scholars to found the Text Encoding Initiative (TEI).[14] A quest for simpler versions inspired the evolution of eXtensible Markup Language (XML) and TEI -Lite. As a result SGML, XML and variants are now used in a wide number of academic projects.[15] In Europe, one of the most famous of these is the Thesaurus Linguae Latinae[16] (TLL, founded in 1894 by Theodore Mommsen, the published version of which covers three feet of shelf space).[17] Other important projects are found around the world, such as a complete version of the Buddhist Pali Canon[18] in Korea, or the Emperorfs library[19] and all the Classics (800 million characters in Unicode)[20] in China.

2.1 Freely Available Online

Some organisations, and individuals, often with funding from national and other bodies, have managed to make their resources available on-line without cost or by means of a minimal use fee. Significant examples are the Oxford Text Archive with 2,500 texts online;[21] the Marburg Archive with over 1.5 million photographs[22]; the Perseus Digital Library (Gregory Crane, Tufts)[23]; 900,000 pages of the Max Planck-Institut für Europäische Rechtsgeschichte (Frankfurt, Manfred Thaller)[24]; 130,000 very high resolution pages of Codices Electronici Ecclesiae Coloniensis (CEEC, Manfred Thaller)[25]; the New Media Encyclopedia (Christine van Assche, Centre Pompidou)[26] and Netzspannung (Monika Fleischmann, Fraunhofer).[27] In Germany, the Prometheus[28] project, which provides online access to distributed slide collections for art history, entails a personal subscription fee of €20 annually.

Meanwhile thousands of reference works are becoming available online in a haphazard manner. There have been some attempts at aggregation. For instance, One Look Dictionary Search,[29] provides access to over 100 dictionaries including the Merriam Webster English Dictionary. There is a genuine need to make such resources more systematically available in the form of a virtual reference room and to link these resources to emerging digital libraries and virtual agoras for collaborative research and creativity. Eventually such virtual reference rooms will have various levels: namely, a section for reference materials which are freely available, and other sections which are available via different subscription options.

2.2 Available Online via Intranets

In the sciences and particularly in physics, astronomy and chemistry, networks in the form of intranets assure a ready exchange of research (cf. ˜ 2.3.2. below). In the realms of culture, humanities, and social sciences an enormous amount of research is still inaccessible beyond the intranets of the institution where they are being produced. The well known problem of the last mile, which has often been reduced to a challenge of the last 500 meters or even the last building, remains a stumbling block.. A second obstacle remains the practical challenge of interoperability in a practical sense. A third, more insidious barrier is psychological, whereby institutions with terabytes of information in their databases are worried that their materials will be misused or stolen. A major challenge of the next decades lies in ensuring that these important results of research are shared more widely at least within the scholarly community.  

2.3 Available Online Commercially

Meanwhile, there has been a trend for the results of scholarly research in terms of reference works, journals and books to come increasingly into the realm of commercial interests.

2.3.1 Reference 

Traditionally scholarly research often led to reference works, which benefited the scholarly community as part of the public good. Such reference works are now increasingly being acquired by commercial companies and made available by subscriptions with an eye on profit. A five year individual subscription to the Thesaurus Linguae Graecae (TLG) costs $400. The Oxford English Dictionary,[30] is available online with an annual subscription for individuals at £195+VAT.[31] The Allgemeine Künstler Lexikon of Thieme Becker, a seminal reference work for art history, costs €298.90 annually. Hence, while the good news is that standard reference works are now available online, the bad news remains that they come at costs that are prohibitive for persons wishing to have the equivalent of reference rooms in an online context.

Increasingly such reference materials are being bundled by publishers and private companies. Here, one of the best known examples is Dialog (now owned by Thomson), which offers access to a wide range of databases including gcoverage of scientific and technical research reports and publications from more than 150,000 journals. Abstracts to 1.2 million dissertations and more than 2 million conference papers. Pharmaceutical drug pipeline data from conception to launch.h[32]

2.3.2 Journals

During the Renaissance, the world of learning was literally a world of letters whereby scholars shared ideas via scholarly correspondence. This world of letters gradually became transformed into scholarly journals. By the mid-20th century it was generally assumed that only a few great libraries would be able to collect the entire range of scholarly journals. The past 50 years have seen both a great diversification of titles and an enormous consolidation, whereby a very small number of major publishers now dominate the field. Rhetorically this was for reasons of efficiency. In practice, prices have continued to rise so dramatically that today not a single library can afford to collect all the journals that exist. The for profit attitude that was supposed to enable more effective publication is crippling journals as a medium for scholarly communication. As the Association for Research Libraries has noted, in the period 1986-2001, gThe typical library spent 3 times as much but purchased 5% fewer titles.h[33]

The scientific community has begun to take steps towards a new approach. In the 1970s and 1980s Douglas Engelbart and Bruce Schatz envisioned new approaches. By 1990, Stevan Harnad (Princeton University and Universite d'Aix Marseille II) had outlined a potential of making the preprint process accessible electronically.[34]  In 1996, Paul Ginsparg, Los Alamos, gave a lecture on electronic publishing in science at UNESCO. In 2002, again at UNESCO, he outlined his vision of Creating a Global Knowledge Network.[35] Here he traced the history of an e-print arXiv (where "e-print" denotes self-archiving by the author), which began in 1991. As of June 2005, this has some 4,000 new submissions monthly and includes 323,889 preprints.[36] garXiv is an e-print service in the fields of physics, mathematics, non-linear science, computer science, and quantitative biology.h[37] Los Alamos has also been experimenting with distributed, open source, search and access methods.[38]

 By 2003, the German Max Planck Gesellschaft, made a more dramatic announcement: that they would make freely available all the pre-prints of research results from their 83 institutes.[39] In 2003, the Max Planck group also organized an important conference on Open Access to Knowledge in the Sciences and the Humanities that led to a Berlin Declaration:

gto promote the Internet as a functional instrument for a global scientific knowledge base and human reflection and to specify measures which research policy makers, research institutions, funding agencies, libraries, archives and museums need to consider.h[40]


Max Planck is also exploring how this online approach to science might be expanded into a European Cultural Heritage Online (ECHO),[41] which it sees as an Open Access Infrastructure for a Future Web of Culture and Science. At the 19th International Codata Conference, Adama Samma- sekou gave a keynote on Open access for All. A Required Step towards a Society of Shared Knowledge.[42] Directly and indirectly such visions and efforts are inspiring other initiatives to make scholarly content readily accessible. The Scholarly Publishing and Academic Resources Coalition (SPARC)[43] and the Public Library of Science (PLOS)[44] are two examples. In the Netherlands, SURF is sponsoring the Digital Academic Repositories (DARE) project to make Dutch research available online.[45] This also entails six related projects including 'P-Web: a tool for online publication of proceedings' (at the Erasmus Universiteit, Rotterdam).[46] In the U. S., the National Institutes of Health (NIH) have made open access part of their policy.[47] In the United Kingdom, there are plans for open access to results from work supported by the  Research Councils UK (RCUK).[48] The Association of Research Libraries (ARL) has a useful site on the issue of open access.[49]   


2.3.3. Dissertations and Books 

Traditionally knowledge of dissertations was spread via dissertation abstracts and outstanding dissertations were published in a simple format by University Microfilms (1938), and subsequently University Microfilms International (UMI). In 1985, Bell and Howell acquired University Microfilms and the company was renamed ProQuest. In the United Kingdom, Chadwyck Healey, founded in 1973, amassed a number of standard reference works in electronic form including the 221 volumes of Mignefs Patrologia Latinae.[50] In 1999, ProQuest acquired Chadwyck Healey. By 2000, ProQuestfs gDissertation Abstracts database archived over 1.6 million dissertations and master's theses. Some one million of them are available in full text in print, microform, and digital format.h[51]


The trend towards commercialization, which began with reference works and journals, has by now spread to the whole of scholarly production. Small scholarly presses and even university presses are increasingly being incorporated into a handful of large multinational media companies such as Elsevier. In addition to its dissertations, ProQuestfs collection now has 5.5 billion page images and adds g37 million images of contemporary informationh[52] annually. The jewel in this collection is the Early English Books Online (EEBO), which gcontains about 100,000 of over 125,000 titles listed in Pollard & Redgrave's Short-Title Catalogue (1475-1640) and Wing's Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplementh[53] Early English Books Online is operated by ProQuest in conjunction with the Text Creation Partnership for which membership costs range from $15,000 for a small undergraduate institution to $60,000 for an Advanced Research Library (ARL).[54]This subscription merely assures participation. Access to copies of individual texts costs members $6 per text, Meanwhile, ProQuest continues to acquire other companies. On 1 March, 2005, ProQuest acquired Explore Learning, gproducers of the world's largest online simulation library for math and science education.h[55]


These developments are significant for a number of reasons. First, at a very simple level they mean that reference works, which are essential for research, have become a very profitable business for some. In 2004, ProQuest had a gross profit of $232.5 million.[56] An important corollary is that if onefs institution is not a subscriber to the Text Creation Partnership and ProqQuestfs various resources, a scholar is effectively deprived of the entire corpus of Early English printed books, and many of the key reference works, which the past two hundred years of scholarship have painstakingly created. Ultimately this convergence of reference tools, content and educational tools creates a digital divide throughout the developed world as well as so-called developing countries: between those with enough money for expensive subscriptions and those who fall outside this charmed circle.


In the past year, there has been a dramatic new player in this arena. Googlefs gmission is to organize the world's information, but much of that information isn't yet online. Google Print aims to get it there by putting book content where you can find it most easily – right in your Google search results.h On 14 December 2004, Google announced that they were working with the gUniversity of Michigan, Harvard University, Stanford University, The New York Public Library, and Oxford University to scan all or portions of their collections and make those texts searchable on Google.h[57] These plans which entail over 16 million texts in full text will require at least ten years to be achieved.[58] A spectre of paid access to the Internet looms.[59] As will be noted below (˜ 5) this has inspired dramatic reactions in the first half of 2005.  


1.4     Commercial Infrastructure

These trends towards convergence are the more disturbing because they are becoming ever more linked with infrastructure developments. In the United States, the plans of the Next Generation Internet (NGI) Initiative[60], Internet 2[61] and the National Light Rail[62] entail a relatively small number of institutions, which are increasingly intent on acquiring ownership or at least control over network infrastructures as well as the contents used for higher education. The quest for a Next Generation Internet, for Dublin Core Metadata, for Digital Object Identifiers (DOIs); the IEEEfs quest to create a Learning Object Model (LOM)[63]; the Armyfs efforts at a Sharable Courseware Object Reference Model (SCORM)[64] are all related and in the eyes of some are part of a single coherent vision.[65]


Critical observers such as John Perry Barlow (Electronic Frontier Foundation),[66] Lawrence Lessig (Creative Commons),[67] Clifford Lynch (Coalition of Networked Information) have warned that the spectre of trying to control the whole of education and all access of knowledge goes much deeper: e.g. plans to create books that self-destruct after a few readings; to forbid reviews and even to forbid reading books out loud.[68] The same technologies that could provide us with universal access could be used to limit our access more than ever before. Learning, which was once a challenge of ability, is increasingly becoming limited to those who are wealthy enough to afford it, in a world where an ever smaller number can afford more, and the overwhelming majority can afford ever less.


These dangers go far beyond inconveniences. Michael Giesecke (1994), in his standard book on the history of printing, noted that Gutenbergfs real contribution lay not in the technology but rather in a decision to use printing for the common good.[69] Giesecke suggested that it was ultimately this attitude towards sharing knowledge that was a key to the modern world as we know it today. Independently, Jean Luc Guédon (2001)[70] made a related claim when he noted that the breakthroughs of early modern science resulted from a spirit of sharing knowledge through learned societies and academies and that trends towards commercialization of knowledge now threaten the advancement of learning. Others such as Philippe Quéau have gone further still to insist on knowledge as a Public Good and speak of a global common good (le bien commun mondial).[71] To neglect that public good endangers progress and indeed the very survival of civilization.

2.5 Government Initiatives

Governments have begun to recognize these dangers and have begun to take action. The most conspicuous example thus far has been the United Kingdom, where the JISC (Joint Information Systems Committee), has been pioneering in arranging for collective licences for its member institutions to projects such as the Early English Books Online (EEBO) mentioned earlier. This means that at least those in universities and a number of Higher Education (HE) institutions again have normal access to basic reference works and content. It is important to recognize, however, that this important step does not solve the problem. The problem of access by the majority of citizens who are not connected to these university intranet networks remains. 

In addition, the JISC is sponsoring some 135 current projects including: Building a Virtual Research Environment for the Humanities (BVREH)[72]; Collaborative Stereoscopic Access Grid Environment (CSAGE); Digital Libraries for Global Distributed Innovative Design (DIDET) and a Virtual Research Environment for the History of Political Discourse 1500-1800. These are part of a larger vision in the direction of grids and an e-Science[73] strategy, whereby the UK hopes to provide a model for next generation information access for Europe and beyond. 

By comparison, in France, the Maisons des Sciences de lfHomme (MSH)[74] are making contributions in bridging fields such as archaeology, anthropology, ethnography and social sciences, but on a much more limited scale. Meanwhile, former President Mitterandfs plans for the new Bibliothèque Nationale de France (BNF, 1988-1994)[75] included a vision of  access to full-text contents. As a result the BNFfs Gallica project has made 76,000 books and 80,000 images available online.[76] Googlefs announcement in December 2004 radically altered the dimensions of this vision.

In early March 2005, the Director of the BNF urged gEuropean governments to join forces and set up a digitization plan that would be a European response to Google Print.h[77] By 17 March, 2005 President Chirac had given the go-ahead for a French project.[78] By 22 April, 2005, 19 National Libraries had signed an agreement that they were willing in principle to work together[79]

 gOn 28 April 2005 6 EU countries sent an open letter to the European Commission and the Luxembourg Presidency of the Council asking for a European digital library. Inspired by the French president Jacques Chirac, the presidents or prime ministers of Poland, Germany, Italy, Spain and Hungary have signed the letter. On 3 May 2005 the European Commission responded with an announcement that it will boost its policy of preserving and exploiting Europe's written and audiovisual heritage. The Commission plans to issue a communication by July outlining the stakes involved and identifying the obstacles to using written and audiovisual archives in the European Union. The communication will be accompanied by a proposal for a Recommendation aimed at enlisting all the public players concerned and facilitating public-private partnerships in the task of digitising the European heritage.h[80]


As preliminary steps the efforts of the G7 pilot project, Bibliotheca Universalis, the Gateway to European National Libraries (GABRIEL) and the EC project The European Library (TEL)[81] are being co-ordinated within the BNF. EU Commissionerfs Viviane Redingfs i2010 vision is supporting these trends.[82] The vision of a European Digital Library has now become one of three flagship projects for the next five years.[83]

The plans thus far foresee a massive project that entails scanning four billion pages of text. The great open question remains whether these materials collected with the aid of public monies will be made available freely for the use of all citizens. If they are made readily accessible then the way will be open for a notebook for mankind on a scale that dwarfs previous efforts towards the semantic web, towards a Wikipedia and towards Open  Source. Even so it will be fruitful to survey briefly these existing initiatives and outline their limitations before considering requirements for new authoring and search tools that overcome these limitations.

3. Semantic Web

As noted earlier the WWW has made wonderful contributions in the domain of sharing knowledge. Thanks to their markup languages and protocols over 8 billion pages are now accessible online. If their quest for a semantic web were committed to semantics in the sense of gmeaning,h they could theoretically also address a) challenges of separating significant grains amidst the chaff of loose comments and b) elusive problems of quality. The W3 Consortium has, however, set itself a narrower goal. It is concerned primarily with machine-machine communication. As a result, it is focussed specifically on logical statements which can be verified within the binary logic of machines as either/or true/false. This is of enormous value in business, where the validity of orders, accounts and transactions needs to verified and certified. Hence, the goal of the W3 might more accurately be described as a quest for a transaction web.[84]

While both very useful and profitable, the present goals of the WWW focus only on meaning insomuch as it entails logical claims which are not open to ambiguity. Hence their quest does not address directly the needs of scholarship, where multiple meanings and ambiguities play a central role in interpretation and hermeneutics.


This is not ultimately a limitation of technology, but a conscious decision of the shapers of the technology to limit its applications.[85] Underlying this decision are deeper problems that face the computer science community as a whole. There is a fundamental assumption that the quest must be to reduce all meaning to operations which can be dealt with by machines in the absence of humans rather than a quest to use machines to record and communicate the complexities of meanings that have been developed and used by humans.


A handful of computer scientists have acknowledged this problem. Hence, Joseph Weizenbaum warned of the dangers of believing that machine-machine communication could replace human decision making,[86] as did Grant Fjermedal,[87] and Fred Brooks spoke of a need for Intelligence Augmentation rather than Artificial Intelligence (IA not AI), but for the most part this approach has been ignored. Machines and software which can be used to extend the range of manfs meanings risk becoming limited systems caught in the tautologies of their own logic systems.


Indirectly, the quest of the W3 to create frameworks that verify and authenticate users, confirming that they are who they say they are, can be as useful in the world of scholarship as in the world of business. We need, at times, to be certain that the person who sends a claim, is indeed identical with the originator of the claim, or else be informed whether and how the message has remained intact in going via intermediaries. Ultimately, linking and hyper-linking can only be truly fruitful if they can bring us back, if necessary, to the original sources of content and claims.       

Some members of the semantic web community, or more precisely, some communities concerned with the semantic web, are indeed concerned with meaning in a broader sense. For instance, those concerned with digital libraries are interested in creating standardized and interoperable thesauri. This is very important. Without clarity with respect to definitions of words and terms, there can be no certainty that we are even speaking about the same topic. To arrive at a full range of meanings, however, we need to have access to the equivalents of etymological dictionaries such as Oxford, and Grimm and these need to linked such that we can compare  seamlessly changing meanings across languages . We have classification systems, dictionaries and encyclopaedias. We need to link these such that we can go from a term to its definition and explanation. We need virtual reference rooms that provide much more than access to individual texts: we need to provide a new network of links between/among terms in reference works.

4. Wikipedia

The contributors to the Wikipedia[88] have also made great contributions to content on the web. Their commitment selflessly to add content for the greater good recalls a time-honoured mediaeval tradition that contributed greatly to the transmission of existing knowledge and considerably to the introduction and growth of new knowledge. 


Traditionally, encyclopaedias attempted to summarize the state of knowledge at the time: Pliny, Vincent of Beauvais, Saint Thomas Aquinas and the Encyclopédie of Diderot and DfAlembert are notable examples.[89] The Encyclopaedia Britannica continued this tradition until its 1911 edition. Thereafter it abandoned the quest for completeness. In 1992, Encyclopaedia Britannica introduced a distinction between a general Micropaedia and a Macropaedia for more detailed knowledge. These terms were patented and by 1995 the editors of a Free Internet Encyclopedia were advised that they could not use the term.[90] Accordingly they changed their terms to have been changed to "MicroReference" and "MacroReference" respectively.


As long as encyclopaedias were committed to recording the state of the art, it could be assumed that they covered the major literature or at least indicated the major surveys and reviews in a given field. Pauly Wissowafs Realencyclopädie der classischen Altertums-wissenschaft is a wonderful example of both the value and the dilemmas of such a quest. The two original authors died before the first edition was finished in 1852. A second edition began in 1861 and 1866 but remained unfinished. A third edition began in 1890 but took until 1980 to produce 84 volumes plus indexes and by then was too expensive for most individual scholars.[91]


One of the fundamental problems with the Wikipedia at present, is that there is no way of knowing how thoroughly a given article covers the topic in question. Some topics provide bibliographies, some do not. Some topics acknowledge using the 1911 edition of the Encyclopaedia Britannica, others do not. Critical tools concerning variants, certainty, authority, and significance are lacking.


5.       Open Source and Open Content


In Europe, there has been increasing attention to the possibility of open software,[92] which is made freely available without direct cost to individual users. This vision is linked with an open vision of Intellectual property.[93] By contrast, in the United States, Open Source has been linked with the notion of free software but free in the sense of gfreedomh more than free in the sense of gwithout paymenth. The influence of Richard Stallman has been seminal in this context.


Domain                       Open Source Solution

Office                          Open Office[94]

Photoshop                Gimp[95]
llustrator                    Inkscape[96]

Maya                           Blender[97]

Premiere                      Jahshaka[98]

Map software             Worldwind[99]

Bibliographies            Sourceforge[100]


Figure 1. Examples of emerging open source solutions.

His GNU project officially began in 1984 although he has traced its roots back to 1971.[101] With the introduction of Linus Thorvaldsf Linux in 1992,[102] the Open Source Software movement began to take on new dimensions. The past five years have seen three fundamental developments: a) a dramatic increase in the range of open source tools (figure 1)[103]. For instance, the Framasoft list now includes 905 examples of free software[104];  b) a related shift to include open source content through projects such as the Creative Commons;[105] the Open Content[106] and the Open Archives Initiative (OAI)[107] and c) discussions re: the future of scholarly communication[108] and the rise of new models for scholarly communication.[109]


Open Source, which was once seen as an interesting peripheral phenomenon is increasingly being adopted for crucial functions such as government administration[110] and even key commercial software. In June 2005, for instance, Nokia and Apple announced that they would use open source for their new mobile web browser.[111] The concept of open source software is being extended to include open content, open theory[112] and even open design.[113] The profound advantage of Open Source is that it offers new bridges across previously closed, proprietary solutions and thus potentially ushers in a new levels of interoperability across applications and systems.


The Wikipedia and Open Source are making enormous contributions and reflect the latest contributions of the World Wide Web, and a vision that goes back to the first half of the 20th century. While magnificent, since the early days of Artificial Intelligence in 1950s and throughout the emerging vision of a semantic web, there has been an underlying assumption in the computer science community that everything is either true or false; that the either/or approach of simple logic offers a sufficient model; and that ultimately the quest is simply to document true statements.


Truth is the ideal and the quest to achieve it must remain paramount. Nonetheless, the realities of physical world and especially of the human world are more complex. Even in the world of engineering the limits of truth are recognized through the notion of tolerances which may be in terms of millimeters, sometimes much less and frequently much more.


In order to extend the potential usefulness of the semantic web to the semantic meanings of humans we need more than the logical propositions of either/or statements. Pioneers in computer science have rightly pointed to the need for sense-making tools through machines, but we also need to use machines to provide us with access to the senses of meanings that have already been provided by humans. To this end, we consider five new kinds of challenges, which have hitherto not been practical: 1) methods for integrating variants; 2) levels of certainty in making a claim; 3) levels of authority in defending a claim; 4) levels of significance in assessing a claim and 5) levels of thoroughness in supporting claims re: extant knowledge in a field. All five of these are important ingredients in a quest for discerning quality.


In some fields of science, where the emphasis is only on the latest results and discoveries, the methods here proposed may readily seem like an unnecessary amount of baggage. In the humanities, or more accurately, the human sciences (scienze umane, les sciences humaines), which include social sciences, ethnology, anthropology and archaeology, the situation is much more complex. 


First and foremost, the crucial insights are about views which may not be universally accepted and yet remain fundamentally important. The writings of Dante or Milton, which are effectively commentaries on the Bible in the form of the Divine Comedy (La Commedia divina) and Paradise Lost, are expressions with an intrinsic value which is independent of whether this be the grighth or getruehh interpretation of the Bible. They are an essential part of the literary tradition of Europe, for reasons similar to why the Tale of Gengi is central to the literary traditions of Japan. A significant part of the richness of these texts lies not in the texts themselves, but in the enormous amount of further editions, texts, commentaries and other expression that they have generated over the ages.


This has two fundamental consequences. First, we cannot understand Dantefs or any authorfs importance simply through reading that author. We need access to the editions, commentaries and the rest.. Second, it follows that the latest edition is not always the best in the way we assume that the latest finding is always the best in the scientific world. There is a cumulative dimension to knowledge, especially in the human sciences. This helps to explain why even the supremely ambitious head of Google who wants to gain access to all knowledge acknowledged in a recent interview that he estimated it would take 300 years.       


Hence, this paper suggests adapting critical instruments which the scholarly and especially the library community has developed; to make the cumulative dimensions of knowledge part of our research programmes. We are concerned with vast new areas of human knowledge that need to be included within the semantic web, if we wish to have something that is truly useful for scholars as opposed to simply a tool for transactions which are necessary for business .   


The representatives of memory institutions, especially librarians rightly insist that they have been working in this direction for millennia. It is true that they have created invaluable tools for bibliographic control. Yet the vision of a library has traditionally been to record the books, documents and materials it possesses, rather than indicating to what extent their collection reflects what is known about a person, a field or a discipline. To take a simple example: the Library of Congress catalogues tell us how many editions of Shakespeare or Goethe they have, but nothing about the extent to which their collection is a comprehensive one. Similarly Google and search engines tell us how many hits, but give us not the slightest hint as to what percentage this represents of what there is to be known on that topic.


So one challenge lies in using the tools of bibliographic control from memory institutions and especially the library world and applying them to visions of the semantic web. A more subtle challenge lies in creating new frameworks that attempt to map not just isolated collections, but rather the extent to which any given collection or any given claim represents the state of knowledge about that person, field or discipline.    


The five elements considered in this paper do not solve in a single stroke a problem that will take centuries to resolve. Even so they represent ingredients for going from a mentality of simply describing what we have in our collections to frameworks whereby we can see to what extent these collections represent what is known in the field. In the human sciences this means access to more than the standard versions of names, and standard versions of the facts, It necessarily means including those claims which are almost certainly true, probably true, or uncertain. At present these materials exist in secondary literature in our collections, but our bibliographic tools are about finding titles rather than contents. In the long term, we need new kinds of bibliographic instruments that will provide access not just to contents, but provide us with knowledge about the claims made in those contents.   .     


6. Variants and Attributions 


In an ideal world, scholarship is limited to eternal truths. In everyday life, many items are straightforward questions of true or false. Obviously in citing another work, the name of the author, the title and the date need to be precisely correct, otherwise they are wrong and misleading. In many cases, however, the situation is not so straightforward. We need to incorporate variant names, associations, attributions and claims. In many cases this knowledge/information is already being recorded in databases. The innovation that is being discussed here is how this knowledge/information is integrated into our search, retrieval and other tools.    




The most obvious of these entails different spellings of a given name. For much of the 20th century there was a conviction that if one could establish a standard version this could serve as an authority file and be adopted by or simply imposed on others. Library systems have complex systems for Machine Readable Cataloging (MARC and now MARC 21), which duly reflect standard and variant names. Ironically the potentials of this information are often not exploited fully even by the libraries themselves. In terms of everyday users such systems are not available and many experts would argue that even if they were available they would be much too complex to be used by gthe man on the street,h the non-expert.


Gradually there has been a recognition that these alternative names are effectively access points to earlier documents which were unaware of the current accepted spelling. So there is a new challenge to create online authority files with all possible variants built in. These lists can be online and freely available to users.


Here a first step lies in making alternatives visible to the user. Libraries have provided a partial solution through see also references, but although their internal catalogues and databases record all the variants used by an institution, this material is typically not available to users. In search engines the situation is worse. For instance, Google sometimes offers a guess but typically does not deal with variants. Hence a user who types in Martianus Cappella gets 100 hits with no clue that this is a variant spelling of Martianus Capella which gets 17,900 hits. In the Google approach as it is today these are two separate searches. By integrating lists of variants into search engines, entering a variant name effectively becomes part of the same search as a search using the standard name. Doing so would not require the user to do something more complex, but the results would be much richer.[114] 


A prototype of the SUMS system illustrates the possibilities.[115] A user can type in a variant name such as Viator, arrive at the acknowledged modern name, Jean Pélerin and sees the other known variants.


A second step in this development would be provide users with tools to add further variant names as additional alternatives to the accepted authority names. Users could do so on a simple proviso: that they provide at least one historical document that uses the variant in question. This variant and its source would then become a regular part of the system. In using this method, non-expert users would be spared the deliberations of which variant to use. The system provides it for them. The variants remain accessible at the database level. Hence, even if the user forgets the official spelling next time round, the variants bring him/her back to the currently accepted version. Persons working in specific fields can work with subsets of the master lists to ensure that their tools match the complexity required, while saving them from unnecessary complexity.      


Traditionally the quest for authority files was the domain of a very small group of librarians, and terminology experts. Much of their effort lay in trying to impose a given version and spelling of a term and trying to eradicate other forms, versions and variants. This was partly a necessity imposed by the limitations of the print medium. With electronic media, a new challenge looms: using one accepted version and linking this with all known versions. Variants now become tools for access rather than threats to the norm, and finding new variants can in fact become a task to which all users can contribute.  


                   6.2 Classifications and Associations


Connected with this theme of variant names are the variant classifications, thesauri and associations provided by previous attempts at systematic organization. Some links between thesauri, classification systems and titles exist already. The challenge is to deal with these much more systematically. There are a number of efforts which point in this direction such as multilingual classifications in the medical field[116]and classifications in the legal field.[117] Ideally one would be able to move systematically between subsumptive, determinative and ordinal relations.[118] One can imagine a system that allows users to choose whether they wish to deal only with physical instances (particulars) or also include various kinds of metaphysical (universals). In a simplest case this would entail an option between seeing a general universal version of, say an elephant and seeing particular examples of elephants. In more complex cases there would be a possibility to distinguish among different levels of metaphysical existence-: e.g. belief, phantasy, play, scenario and fiction at different levels of subsumption, some of which may not have direct correspondences in the physical world. Needed eventually is a classification of knowledge whereby we can see which subsumptive classes have a direct physical- metaphysical link and which do not. Ultimately we need new tools that allow us to go from subsumptive to determinative and ordinal relations. 3-D visualizations such as that provided by Spectasia can help in providing overviews of such classes.


These variants are often linked with simple numbers: e.g. the 4 seasons, 7 days, 8 winds, 9 muses, 12 months etc. A system which allows us to move seamlessly from any one term to other terms in a given set would be extremely useful by way of orientation. Providing visual overviews of these associations one can use their basic spatial positions as entry points into thought systems which have different names and associations as one moves from one culture to another. The 3 cardinal virtues and 4 points of compass are simplest examples. The 10-12 signs of zodiac are one step more complex. The 44 constellations of the Northern Hemisphere are a more complex example. One can imagine an interface that uses fundamental images such as the world tree, which can be viewed as the Milky way in astronomy, a physical tree of life in botany and the spine in anatomy. By choosing levels in the microcosm- macrocosm analogies one could move through these levels by a simple pointing metaphor. From this point of departure various expressions in different cultures could then be visualised in alternation.


Some of these variant associations are not spatial. To take an example from religion which has been the source of most great cultural expressions until the past century: the Virgin Mary is universally known in the West. Some call her Star of the Sea. There are over seventy such alternative names,[119] many of which are potentially useful in increasing the range of our search. Such an approach becomes essential when we are searching in other cultures. For instance, the Indian, Mother Goddess, Durga, has 108 names.[120] Such lists are effectively like mini-specialized thesauri applied to a given deity, person, idea or concept. The unfamiliar associations of a name may seem strange and eccentric to us, but if treated systematically these again offer new entry points into knowledge. Instead of seeing them as aberrations from what we know we need to discover their potentials in helping us discover what we do not know. Else the legitimate quest for standards in computer risks becoming a closed community of the familiar and the known rather than a tool towards discovery of the unfamiliar and the unknown.  


Genealogical lists are another example of such contextualizing instruments. Hereby, a single name provides access to a range of related persons. Needed are frameworks, whereby these existing lists are made available to us as we embark on more serious searches. Such lists are again like classification systems and thesauri. They need to be linked with definitions (dictionaries), explanations (encyclopaedias), and titles (catalogues, bibliographies) elsewhere. All this belongs to the domain of virtual reference rooms (levels 1-5 in figure 2). 


Librarians, especially cataloguers, indexers and, classification experts are, of course, fully aware of the existence of such lists. Like their colleagues in the  realms of computer science and artificial intelligence there have been enormous disagreements and wranglings about which system is ontologically true. Like all quests for truth, this quest has its uses and the search for a new and better system should continue.  


In the meantime, our pragmatic suggestion is that, if we leave aside debates about which system is best, and focus rather on links between existing systems, there is enormous insight to be gained using the associations of the past as a tool for searches in the future. Instead of aiming at machines that will replace the rich ambiguities of human associations with unequivocal commands, perhaps we should aim to create machines that use these rich ambiguities of historical documents as a source for new access to our past and our present.  


Present systems such as Google assume that we know the detailed words necessary for the search, and indeed if we happen to know these then Google works surprisingly well. The problem with real research is that we usually do not know the important terms when we embark on our study. Being able to call on existing associations of earlier experts offers a way to go further. Implicit here is a notion that interfaces are something much more than physical screens. They should include the mental screens of earlier and existing experts. Their ways of organizing knowledge can serve as orientation tools in our own voyages of discovery.

6.3 Attributions


In the exact sciences only the latest version of an attribution is usually important. By contrast, in the humanities the cumulative history of attributions is potentially important. The latest claim is not always the best. Even if the latest is the best it is frequently still not definitive. For instance, in the case of  a painting, one scholar may claim a) that the painting is by Leonardo, another may claim b) that it is by his pupil while a third claims c) that it belongs to his workshop.


An either/or mentality from computer science which creates a single category for creator/author in the Dublin Core Framework provides space for only one of these claims and in the process obscures the truth that in this case there exist debates on the question of precise attribution for this painting. This almost banal example illustrates how an overzealous quest for precision can be as misleading as it is meant to be helpful. Needed are tools a) to distinguish between these various claims re: attributions and b) to aggregate automatically the cumulative claims of the research literature such that we can see at a glance, for instance, that 3 scholars believe a given painting to be by the master himself, 15 claim that it was done by pupils; 7 claim that it belongs to his workshop and 2 feel that it merely belongs to his school.         


One reviewer has reasonably suggested that such a criticism of Dublin Core is excessive: that this is an implementation issue and that it gcan be handled by incorporating some mechanism in using RDF, for example.h Of course there are ways that the matter could be handled using new tools. But everyday humans will never want to write all their essays in RDF, nor do they have time to explain how to convey the subtleties of 4,000 years of commentaries in a form that is made for machines. A real challenge for the computer science community is to create solutions that reflect what users do rather than what computer scientists assume they need to do.


                   6.4 Alternative Claims


This principle extends also to alternative claims and interpretations. One scholar may claim that Galileo invented the telescope; or that Newton was the founder of early modern science. Hence, our records concerning an individual should provide systematic access both to an authorfs writings (primary literature) and to the (secondary) literature about those writings. We should be able to trace such publications alphabetically, chronologically and geographically. Ideally we would in addition be able to trace such publications in terms of both their attributions and claims.  


As the 19th century made an increasing distinction between primary and secondary literature,  especially in fields such as theology, philosophy and (English) literature, the initial emphasis was to focus on studies of texts and objects in isolation (das Ding an sich). This led to new levels of analysis in the form of interpretation, hermeneutics close reading, criticism. In the course of the 20th century so much attention was given to these problems by the de-, re- and post-constructivist schools that the cumulative, contextual dimensions of knowledge often faded into the background. Meanwhile, three further levels of analysis came into focus, which amounted to new layers within the notion of secondary literature. (levels 8-10 in figure 2). A first of these (level 8) related to comparisons (Comparative Studies, Parallels, Similarities); A second of these (level 9) entailed interventions in Extant Objects (Conservation, Restorations. A third (level 10) entailed studies of non-extant objects (Reconstructions).


Virtual Reference Room

1. Terms                    Classifications, Thesauri

2. Definitions            Dictionaries

3. Explanations          Encyclopaedias

4. Titles                       Bibliographies, Catalogues

5. Partial Contents      Abstracts, Reviews


Primary Literature in Digital Library

6. Full Contents

Secondary Literature in Digital Library

7. Texts, Objects    Analyses, Interpretation

8. Comparisons     Comparative Studies, Parallels,

9. Interventions in Extant Object  Conservation

10.Studies of Non-Extant Object  Reconstructions 


Future Secondary Literature (Virtual Agora)

11. Collaborative Discussions of Contents, Texts,

   Comparisons, Interventions, Studies 

12. E-Preprints of Primary and Secondary Literature  

   in Collaborative Contexts

Figure 2. Virtual Reference Room, Distributed Digital Libraries and Virtual Agoras with different levels of reference and secondary literature.


Needed are systems that allow us to distinguish between these different kinds of reference and secondary literature. At present library catalogues provide us with titles of books and classification systems provide access to the subjects and concepts, but most of these systems still reflect an approach to knowledge via disciplines .   


As a result studies of the Parthenon as a building are classed under history of architecture; studies of the Parthenonfs location are classed under geography ; studies of restorations are usually classed under conservation; studies of reconstructions as to how it once looked are classed under art history, architecture or history. But aside from titles which happen to contain the word Parthenon, there is nothing to help find everything known about the Parthenon. The approach here suggested overcomes that limitation.    


With the rise of new collaborative environments, virtual agoras can serve as a drafting ground for future secondary literature (levels 11-12 in figure 2).

An emerging challenge lies in integrating new personal and collaborative knowledge with the frameworks of enduring knowledge of memory institutions while acknowledging that they are not identical. Digital libraries will thus entail much more than scanning in printed texts: they will have at least three closely coupled features: virtual reference rooms; distributed digital libraries of primary and secondary literature and virtual agoras for collaborative research and creativity. The different levels (1-12 in figure 2) can also be seen as a knowledge life cycle: i.e. reference works point to primary literature, which inspires secondary literature, which prompts collaborative discussions (virtual agoras including discussion groups, blogs, Really Simple Syndication (RSS)[121] etc.), which in turn lead to new primary and secondary literature.


This model implies that a collective notebook for mankind can effectively be an extension of the systems from collective memory institutions[122] (libraries, museums and archives), a way of adding commentaries on a cumulative body of knowledge. This means that the critical apparatus of authority files (standard names and variants) and official terminology established by these institutions can be linked directly with personal variant names and personal terms of users at different levels of professional activity. By implication, the frameworks and search mechanisms already in place for enduring knowledge in memory institutions can be extended to the realms of new personal and collaborative knowledge in the Internet. To be effective this approach needs to include levels of certainty in making a claim (˜ 7), levels of authority in defending a claim (˜ 8) levels of significance in assessing a claim (˜ 9) and levels of thoroughness in supporting a claim (˜ 10).


7. Levels of Certainty in Making a Claim


Ever since the advent of hypertext with Douglas Engelbart and Ted Nelson the emphasis has been on linking. Like all important ideas this built on earlier traditions. Footnotes and references were also concerned with linking. Electronic hypertext links introduced two fundamental steps forward: a) the link was only a click away; b) that click could potentially lead to a source outside the document being used at the moment. This is important because, traditionally a footnote in a scholarly book might conscientiously cite another article, book or manuscript in some remote library, to receive a copy of which took weeks or even months.


With electronic hypertext such a source is potentially only a click away. Google has filed patents in this domain and aims gto develop technologies that factor in the amount of important coverage produced by a source, the amount of traffic it attracts, circulation statistics, staff size, breadth of coverage and number of global operationsh and searching for methods to determine the truth value of claims.[123] It is important to recall that many aspects of this quest are already reflected in our memory institutions. Instead of spending billions in creating entirely new models it is advisable to invest in linking the new instruments with existing frameworks. Some classification systems have some means of dealing with certainty of attribution in their categories.[124] Once again the challenge lies in making more of the enormous critical apparatus which memory institutions already possess visible to users. Of course, not every user will want to use the 800+ fields of the most complex systems; but the option to use them in various combinations should be there. 


                   7.1 Direct and Indirect Links


Not all links are equally effective. A link from a reference concerning Mona Lisa in the Louvre to any of the dozens of sites containing a poor replica of the painting is less effective than a direct link to the Louvre website. One might distinguish between a) materials that are shown live, b) that come from the original location, c) via an agency, or d) via an official publication. In future, the extent to which scholarly books and articles link directly to original sources rather than to vague sites can become a new criterion for the quality of scholarship.


                   7.2 Degree of Identity


Today, when we type in a word or term, search engines such as Google assume that we are looking for something that is identical to that word. It may also offer materials that are similar to that word but there are no tools in place to define the parameters of a match. Hence, typing in Last Supper (on 15 April 2005) produced 16,200 hits but there are no functions in place to search for cases that are identical in size, shape or colour. Over two millennia ago, Aristotle discussed the importance of attributes in defining objects. Adding attributes to our search parameters will mean that we can find things with the same name and then find subsets which are the same size, shape, colour etc. Eventually this could be extended to include attributes entailing all five senses and thus be able to discover surfaces, which look the same but literally feel different.


                   7.3. Levels of Certainty


Needed also are new tools that allow authors to indicate the level of certainty behind their claims. In the sciences, such certainty is often continuous, or at least numerical, such that we can speak of parameters or tolerances within which a technique or process will function. In the human sciences, these levels of certainty are often not continuous or quantitatively measurable. Even to attempt claims such as Shakespeare was x% (e.g. 98%) a genius as an author, would be a category mistake.


Such levels of certainty can be built in to cover claims about who, what, where and when? (Appendix 1). The precision with which one covers claims, including the detail with which one indicates the extent to which certainty is possible then becomes a further criterion for defining scholarship. 


For the moment we shall focus on the problem of degree of certainty with respect to the question How? For instance, we are studying a painting of a womanfs face. We encounter a related image on the web that suggests the painting is in fact a portrait of Madame X. On one occasion we may find additional evidence which is conclusive. On other occasions the link to be made might be very certain, quite certain, very probably, quite probably or only possibly. Ideally an editing tool makes available a small popup list of choices ranging from Authoritative to Possibly (Appendix 1).


Of course memory institutions and especially libraries have been struggling which such problems of bibliographic control for centuries and the results of their painstaking efforts are included in their internal catalogues and databases. Significantly, however, such results are often not accessible in usual queries in the Internet or even in libraries, museums and archives themselves. Hence if a person searches for Leonardo da Vinci, some systems give only titles definitely by Leonardo; others include items attributed to, students of, school of, followers of, copiers of. However, no system today provides us with a systematic overview of how these categories relate to each other, let alone how the numbers in these categories have changed over time.


Comprehensive lists of artists are a first step. However, search and retrieval systems that find a name such as Leonardo da Vinci will not solve this problem. Search engines must have as part of their system not just variant names but also the names of students, members of the school, names of followers etc. An expert on Leonardo knows that Bernardino Luini, and Andrea Solario were important students. But search engines and non-expert users do not know this. So this knowledge needs to be built into our search engines. .


This matter is complicated by the reality that it is not just a question of scanning in some list of students and followers. There is debate on precisely which paintings were done by Leonardo, which by his students; who his students were etc. and this debate changes over time.   


Indeed, scholars have traditionally used a whole range of vocabulary to indicate a spectrum of certainty ranging from simple assertions via conditionals to subjunctives. Hence phrases range from: gas we all knowh; git is generally agreedh; to git is likely thath; git would be possible to concludeh; git is not to be excluded that.h 


A real challenge thus lies in incorporating such features into our authoring tools such that authors can record the state of their certainty as they are doing their work in order that we shall someday be able to trace systematically not only facts but also differing levels of certainty concerning these.    


The system that we envisage, would allow us to choose the level of commitment. Levels 3-6 would require us to commit our name to the claim and invite documentation. Level 1, a claim that something is authoritative, requires documentation. Sceptics will rightly object that such a system will never be universally accepted. Many persons will prefer simply to dump their unsubstantiated claims on the web. In the interests of freedom of the spirit persons must be free to do so and free to state whatever they wish or not. Failure to permit these options takes one down a path where what a person writes, speaks or even what a person thinks could be seen as a threat to decision makers and the state. Science fiction movies such as Minority Report have warned us of the dilemmas of seeking mind and thought control. 


Our approach rejects such basic censorship as a dead end. At the same time, by including rules and frameworks for levels of certainty, we have new possibilities of introducing search parameters which can sometimes choose to ignore unsubstantiated claims. Five centuries of experience with printing have led to similar solutions. We allow sensationalist newspapers such as the Daily Mirror, or the Bild Zeitung to publish many amazing, undocumented claims, but when we are writing a scholarly piece we usually ignore them as evidence. In future, learning how to use sources critically and being required to us sources with a given level of certainty can become new domains for learning in schools and universities.


Authorities, decision makers and sceptics generally will fear that all this assumes honesty in the system and will remain worried about dangers of the system being subverted by dishonest imposters. Fortunately, the simple rules of the game have some built in safety mechanisms. Anyone is free to state something, but anyone who claims to provide levels of certainty must also provide the supporting evidence. Hence, those who wish to use the cover of anonymity are free to do so, but thereby eliminate themselves from the certainty process. Those who add a source must provide a link to that source. If the link is false or does not confirm the claim the system can reject it. If they refer to themselves they implicate their own reputation. If they include their organization, then their organization implicitly becomes liable for defending the claim. For this reason, authoring new tools for levels of certainty in making a claim need to become linked with levels of authority in defending a claim.


8. Levels of Authority in Defending a Claim 


This could at first sound like overkill. On reflection, this approach simply formalizes an approach that has been in place informally for centuries. Whenever we meet someone we expect a business card to tell us their affiliation. If they come from a world famous university or company we implicitly give them more respect and trust than if they come from an unknown organization. The purpose of a more systematic approach is not to check all the details of each source at every turn, but rather to have in place a framework that permits us to check these sources if necessary or desired. Hence, scholarly authors wishing to document their claim, might be prompted to indicate the source for this claim that something is authoritative in a further list, i.e. whether it originates in: 1) a memory institution; 2) an organization, usually a professional body, or 3) an individual.


Source: Library 

  1. Book
  2. Article

Peer Reviewed Journal


  1. Magazine
  2. Newspaper 
  3. Television 



Figure 3. Examples of different kinds of documentation that one might wish to access.


Hereby, searchers will in future be able to use these parameters in their search criteria. For instance, within a library one might be searching for everything under a given name or subject or limit the search to specific forms of documentation (figure 3). The complexity of these lists will depend on the situation at hand. Sometimes, a simple distinction between scholarly and popular press might suffice. At other times a more detailed set of distinctions will be appropriate. 


From such examples, we begin to see how the modules and lists for inputting knowledge and the lists to search for knowledge can gradually converge. Again we see that while anyone can make links, only those links which take us back to their sources are truly helpful. The need to cite sources was recognized by Renaissance humanists, who called for a return ad fontes. But whereas the Renaissance quest was limited to pointing to sources beyond the manuscript or book at hand, the new media allow a direct link with such sources. Hence, proper use of electronic equivalents of such sources can improve our success in accessing true and meaningful knowledge and at the same time provide new criteria for judging the quality of humanists and scholars in future.[125]


9. Levels of Significance in Assessing a Claim


History has taught us that significance is one of the most elusive characteristics to assess. Confucius had less than 30 followers when he died yet his ideas have profoundly affected more than two and a half millennia. Boethius spent the last year of life in jail before being beheaded on account of a false accusation and yet his Consolation of Philosophy, written in his prison cell became the most widely read book in the West, second only to the Bible for nearly a millennium. Milton wrote the most famous written defence of freedom in the English language, the Aeropagitica, while he was jailed for belonging to the wrong political party.   


Some assure us that given the phrase gpublish or perish,h quantity of publications is the prime criterion for significance. Here caution is advised. Andrew Lang (1844-1912) was undoubtedly a significant writer. The Wikipedia records more than 140 books that he published.[126] Of Lao Tse only 81 paragraphs  are extant. Yet many would rightly insist that the those 81 paragraphs in a slender book called the Tao te Ching that inspired Taoism had considerably greater significance than the writings of one of the most productive scholars and journalists of 19th century Britain. Meanwhile, peer review, citation indexes and the emerging field of automated citation indexes also termed dynamic contextualization offer further ways of assessing significance. These tools must be truly international and multilingual. Judging  a European scholar in terms of how often they are cited in American publications can lead to distortions.   

9.1 Peer Review

In the 19th and early 20th centuries, when basic fields of study such as physics and chemistry had one standard journal the question of publication was fairly straightforward. Major scientists in the domain published their work in the standard journals, those at other levels published elsewhere. The enormous proliferation of disciplines, and specialized applications means that no one continues to have a clear view of all that is written, even in fairly gnarrowh disciplines. This has led to insistence on the importance of peer-reviewed journals and arguments that learned societies should have a greater role in the peer review process.[127]

Paul Ginsparg, (Los Alamos now Cornell University), has argued for a two-tiered approach whereby more articles are accepted almost automatically in the short term. The full peer review process is then applied to a considerably smaller subset in the longer term.[128] Here, once again the science community is suggesting new models that could potentially be used by the entire scholarly community. In terms of our model, the first tier would make personal and collaborative knowledge available at the level of e-preprints (level 12 in figure 2) and the second tier would act as filter in deciding what subset of this flux enters into the category of enduring knowledge (levels 1-10).

9.2 Citation Indexes


In the 1970s, Derek de Solla Price developed the fields of bibliometrics and scientometrics, to address the problem of significance.[129] Over the past decades these fields have blossomed into a fashion for Citation Indexes. Such indexes are undoubtedly useful. Some would have us believe that they offer a chief criterion for judging scholarship, quietly overlooking that these indexes published in the United States focus mainly on Anglo Saxon publications. Others suggest that search engines such as Google are the new equivalents of, or even replacements for citation indexes. Like the citation indexes, the number of hits on Google is undoubtedly a useful indication. A simple example can quickly show why caution is needed with this approach. If we take ten leaders whose impact on the world (for better or worse) is universally recognized, we find that their ranking in Google is rather different than we might have expected (figure 4).


1. Charles V                                    29,600,000

2. George W. Bush                       27,800,000

3. Alexander the Great                 26,500,000

4. Hitler                                           8,560,000

5. Napoleon                                  8,410,000

6. Charlemagne                             1,350,000

7. Mahatma Ghandi                      1,120,000

8. Genghis Khan                              374,000

9. Mao Tse Tung                             361,000

10. Tamurlane                                886


Figure 4. Ten Political Leaders and their hits on Google (15.04.2005)


Taken literally this list would indicate that George W. Bush is 31,354 times more significant than Tamurlane, a dictator who once ruled over large parts of Asia, Russia and the Middle East and is said to be second only to Alexander the Great in terms of land ruled. Of course, to assume that the number of hits on Google alone constitutes serious proof in isolation would be simple-minded and ridiculous. But when we recall that careers of scholars are to some extent being determined by the number of times they are cited in citation indexes, the deceptive ways in  which this kind of quantitative popularity contest is affecting our views of the world should give pause for concern.


Although it is not popular to say so, there are fashions in scholarship, just as there are fashions in clothes. Today, there is almost universal agreement that Rabelais was a great author of literature and that Leonardo da Vinci was a universal genius. It is sobering to note, however, that there were whole generations in the course of the past centuries when these figures were not at all appreciated. A generation after his death Leonardofs manuscripts were dispersed and many have never been found again. Almost than five centuries after his death we still have no complete works of Leonardo and books such as the Da Vinci Code, which are excellent novels that have virtually nothing to do with Leonardo, sell much better than the real thing. Such provocative examples serve simply to make a fundamental point that no simple criterion can solve the elusive question of significance.


9.2.1 Automatic Citation Indexes


A major breakthrough of the past few years is a trend whereby the process of citation indexes is becoming automated such that it can be integrated seamlessly into scholarly works and potentially reflect all citations rather than a sample as hitherto provided in American citation indexes. Michele Barbera and Nicolo D'Ercole (Pisa) and their team have developed Hyperjournal,[130]which includes[131]a Dynamic Contextualization, aP2P tool:


gwhich allows readers to visualize, while reading an article, all the articles quoted by and all those quoting the one they are reading. Dynamic Contextualization also enables you to easily carry out bibliometrical calculations such as: the number of quotations received by an article or by an author, citation source groupings by journal, by topic, by period.h


If this approach were combined with our knowledge concerning kinds of journals (e.g. official journals in a field, journals published by key societies or Special Interest Groups (SIGs) of experts) and/or linked with standard collections of reviews, this could lead to new insights concerning the influence of a given scholar. Meanwhile, this approach to dynamic contextualization is the more significant because Paolo dfIlorio, the author of Nietzsche Open Source[132]and Open Source Models in the Humanities. From Hyper Nietzsche to Hyper Learning  (April 2004),[133] has integrated this into his project on Hyper Learning (Hypermedia Platform for Electronic Research and Learning)

gThe overall objective of the Hyper-Learning project is to create an advanced e-learning system for the Humanities that will develop and enhance critical thinking skills. Hyper-Learning consists of four integrated components: 1) Research on functional programming for complex interactive web sites; 2) Development of a distributed web platform; 3) Establishment of Virtual Collaborative Learning Communities based around 13 representative European authors and 4) Creation of an appropriate pedagogical and legal framework.h[134]

Ultimately we need some combination of quantity of output, quantity of citations and preferably also an indication of the extent to which authors are cited by experts in their own fields. Some authors establish fields, some authors contribute to accepted fields and some distinguish themselves by demonstrating the boundaries of strictly defined fields are too narrow to address the larger questions of scholarship. We need tools that will help us to recognize the contributions of all three of these types and not just the narrow experts for which German has a precise term (Fach Idioten). We need to maintain access to both generalist and specialized knowledge; to ability to provide surveys as well as capacity to focus on details (minutiae and quisquilia). Efforts in hierarchical classification might be useful in this context.[135]


10. Levels of Thoroughness in Supporting a Claim


The above cautionary examples concerning significance may seem more evasive than incisive, but their combined thrust is that no single method offers a magic solution. Implicitly this suggests that thoroughness is the only way we can hope to achieve a balanced view. While attractive in theory this poses deep philosophical problems and challenges.


When a world expert gives a brilliant speech, attentive members of the audience are able to judge the points made in the speech. It would take another world expert of equal standing to have some sense of how much the brilliant speech omitted. The problems of brilliant speeches are also the problems of scholarship, which all too often is viewed as a series of brilliant books and articles or as a catalogue of those areas which are known and settled. Knowledge is presented as if it were a map of land conquered. All too often, however, we have no equivalent of a world map for knowledge, we have no clues as to how much has been covered so far. Roadmaps, a buzzword from the political arena, have become a fashionable term within the knowledge landscape. Alas they typically show us a few (possible information) highways and provide little indication of everyday roads, streets, paths and trails.


Even so, we know from history that such knowledge maps have frequently proved essential in the advancement of science and knowledge. In the early 19th century, once there was a periodic table, once one understood the scope and limits of chemical compounds one could start a process of looking for them systematically and filling in the missing gaps. It took a century, even then there a few bits to add, but it worked because there was a clear outline of what was not yet known (a map of ignorance in the true sense), which helped to guide explorers of new knowledge. 


In spite of all the billions of printed and online pages today, we have remarkably little by way of serious tools to map our ignorance, to provide us some indication of level of thoroughness in dealing with a claim. Intuitively we recognize the problem perfectly well. If someone wanted to make great claims about Leonardo or any author, our first advice would be that they must study what Leonardo wrote and painted. Bibliographies exist but an updated list of all drawings, paintings of Leonardo and his school, a catalogue raisonnée in the traditional sense, does not yet exist.


Hence, while making knowledge accessible is obviously important, laudable and vital, it needs to be complemented by new kinds of cartography that map both our knowledge and our ignorance; the territory covered and the areas left uncharted. In some cases this is asking too much. There are always frontiers where we have no idea where to find even the next step. But if we make maps of accomplishments and dead ends there will be more hope of finding live ends and especially live non-ends.


Even in the absence of a clear programme, a number of intuitive steps have already been made in this direction.[136] We have international bibliographies, cumulative indexes of books, reviews, dissertations[137] and many other sources. We have reference rooms in the great libraries of the world with hundreds of thousands of reference works. Needed are virtual reference rooms where systematic connections between these resources can be created.  


11. New Criteria for Scholarship


There was a time when scholars were a minority who could read and write amidst a majority who were illiterate. In a world where theoretically everyone can read and where rhetorically everyone is an author, new criteria are needed to identify scholarship, and new criteria are needed to judge its quality. Everyday rhetoric points to the importance of multimedia and yet Gregory Crane, the author of the Perseus Project, was denied tenure at Harvard on the grounds that his work did not amount to publication. Even today some tenure committees overlook electronic contributions even if they entail peer review and major publishing houses such as Oxford University Press. Ultimately some combination of printed and electronic publication is likely to remain important in future. In both cases quantity alone is not a sufficient criterion. With respect to elusive questions of quality we have suggested five further ingredients that will prove useful.


12. Conclusions


The vision of access to the whole of knowledge goes back at least to Aristotle. The 19th century transformed this vision of individual thinkers into a more programmatic quest. By 1934, Paul Otlet (Mundanaeum, Brussels now Mons) had a vision of electronic access to the whole of knowledge. The past seventy years have been paradoxical. On the positive side technology has advanced greatly. Hundreds of thousands of books have already been scanned in. Soon there will be many millions of full-text books and other objects. At the same time there has been such an explosion of new knowledge and information that the possibility of systematic treatment seems more elusive today than a century ago. This is partly because scholars have focussed so much on studying texts and objects in isolation that the larger context and cumulative dimensions of knowledge have faded into the background. The growing commercialization of reference works, scholarly journals and even scholarly dissertations and books has cast a further shadow over the vision of free access to knowledge and information. This has led to a situation whereby even those with deep financial pockets cannot afford to see the big picture.


Meanwhile, the open source movement, impulses from science, and more recently initiatives from governments have re-introduced the feasibility of universal access to human knowledge. The quest for (distributed) digital libraries needs to be complemented by virtual reference rooms and virtual agoras.[138] Hereby, the ideal of a collective notebook can become an extension of existing systems for cataloguing and searching the cumulative knowledge of collective memory institutions.  


The quest for full freedom of expression and open access in terms of quantity need not exclude the existence of criteria that highlight the central importance of quality. To this end, we have suggested that five new features that need to be added to such systems: 1) variants and multiple claims; 2) levels of certainty in making a claim; 3) levels of authority in defending a claim; 4) levels of significance in assessing a claim; 5) levels of thoroughness in supporting a claim. If these dimensions are integrated into an open source model there is reason for optimism about the potentials of the emerging technologies. The vision of open source knowledge on a fully semantic web may well take at least another century to achieve, but this only confirms that the goal is a noble one. In this context, if patience is a virtue, endurance and energy are a necessity.


Maastricht, 4 July 2005.




I am very grateful to my colleague Professor Frederic Andres for kindly reading the text and providing suggestions for further references. I am grateful also to the reviewers whose positive comments and concrete suggestions have improved this paper.


An abridged 8 page version of this paper focussing on five new criteria was published in Open Culture: accessing and sharing Knowledge. Scholarly production and education in the digital age, Milan: Università Statale, June 2005.


The websites cited in this paper were checked anew on 28 August 2005.
















Appendix 1. Different levels of certainty in making a claim in terms of basic questions.

Claim (How)

  1. Authoritative
  2. Very Certain
  3. Quite Certain
  4. Very Probably
  5. Quite Probably
  6. Possibly


Claim (Who)

  1. Author
  2. Student
  3. Workshop
  4. Follower of
  5. Copier of
  6. After


Claim (What)

  1. Object
  2. Class
  3. Species
  4. Genus


Claim (Where)

1.          House 

2.          Street

3.          City

4.          Province

5.          Country  

6.          Continent  


Claim (When)

  1. Date            
  2. c.
  3. c.- c-
  4. c.?-c.?
  5. fl.
  6. Century


[1] Paul Otlet, Monde: essaie d'universalisme -- connaissance du monde; sentiment du monde; action organisée et plan du monde, Brussels, Editions du Mundaneum, 1935),


[2]  From 1990-1999 the WWW grew from 100 thousand to 200 million users. In spite of the bust, since 2000, the WWW has grown to 888,681,131 users of the fixed Internet (as of 24 March 2005). Earlier concerns about finding enough hits have faded into the background.

[3] For an introduction see: Willard McCarty, gA serious beginner's guide to hypertext researchh:  <HLINK></HLINK>; Willard McCarty, Home Page: <HLINK></HLINK>.

For a survey of the state of the art in bibliographic control, see: Arlene G. Taylor, The Organization of Information, Westport, CT: Libraries Unlimited, 2003 (Libraries and Information Sciences Text Series). 

[4] Cf. the WWWfs Annotea project.

[5] Philipp Cimiano, Siegfried Handschuh, Steffen Staab, "Towards the Self-Annotating Web," Proceedings of the Thirteenth World Wide Web Conference (WWW 2004), New York City, May 17-22, 2004, pp. 462-471. This paper proposes PANKOW Pattern-based Annotation through Knowledge on the Web), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology.

[6]Roberto Busa, Index Thomisticus: Sancti Thomae Aquinatis operum omnium indices et concordantiae (Stuttgart-Bad Canstatt: Friedrich Fromann Verlag Günther Holzboog KG, 1974). Thomae Aquinatis opera omnia, cum hypertextibus in CD-ROM (Milano: Editoria Elettronica Editel, 1992).  Cf. Thomism Today: Europe
Video (transcript online below): <HLINK></HLINK>.

John Tomarchio, gComputer Linguistics and Philosophical Interpretation,h Paideia:


[7]Corso Interuniversitario di Lessicologi aed Ermeneutica Tomistiche Computerizzate, 
Anno Accademico 2003-2004, Rome:


[8] Shane Houdek, Classics and the Electronic Medium, Department of English, University of Minnesota, English 3960, Junior-Senior Seminar: Electronic Text, Spring 1996


[9] Thesaurus Linguae Graecae: <HLINK></HLINK>. Another of the pioneers in the TLG was Theodore Brunner (1913-1994).

[10] The Dictionary of Old English, Centre for Medieval Studies, University of Toronto:


[11] Records of Early English Drama: <HLINK></HLINK>

[12] (Systèmes) Grafnetix Systems Inc., gIn Memory of Yuri Rubinskyh: <HLINK></HLINK>;

Jonathan Seybold, gYuri Rubinskyh:  (1952-1996)<HLINK></HLINK>

[13] SoftQuad (Toronto) was founded in 1984, sold to Corel (Ottawa) in 2001, which was bought by Vector Capital (San Francisco) in 2003.


This now has the buzz-phrase gyesterdayfs information tomorrowh. For an insight into early challenges see a report (1994) by the TEIfs editor in chief, Michael Sperberg McQueen (1988-2000), gTrip Report

Berkeley and Irvine, California 8-13 March 1994h: 


[15]OASIS. Cover Pages, gAcademic Applications, SGML/XML: Academic Applications. Contentsh:

<HLINK></HLINK>. Pioneers in the field include Ian Lancashire, Homepage:


Lou Burnard, Homepage:


Manfred Thaller, Homepage : 


[16] Consortium for Latin Lexicography, gThe electronic version of the Thesaurus Linguae Latinae,h1997: <HLINK></HLINK>

[17] Ann DeVito, gDeveloping an Electronic Thesaurus Linguae Latinae,h Consortium for Latin Lexicography, July 1995:


[18] Mike Madin, hBuddhist Studies Digital Library,h Academic Info Inc., 2005: <HLINK></HLINK>

[19] Ching-chih Chen, James Z. Wang, gLarge Scale Emperor Digital Library and Semantics- Sensitive region Based Materialh:  


[20] EVA2002 Beijing Draft Outline Programme:


[21]The Oxford text Archive: <HLINK></HLINK>

[22]Foto Marburg, gBildindex der Kunst und Architekturh: <HLINK>{63c57939-0373-49da-a0cb-84cc87466e7b}&cnt=84020&%3Asysprotocol=http%3A&%3Asysbrowser=ie6&%3Alang=de&</HLINK>

[23] Gregory Crane, gThe Perseus Digital Libraryh, Tufts University:


[24] Max Planck Institut für europäische Rechtsgeschichte, „Bibliothekg:


Cf. Manfred Thaller, gFrom the Digitized  to the Digital Libraryh,  D-Lib Magazine, February 2001, Volume 7, Number 2.


[25] CEEC (Codicis Electronici Ecclesaie Coloniensis):


[26] Christine van Assche, gNew Media Encyclopaedia ,h Centre Georges Pompidou, Paris, 2005: <HLINK></HLINK>

[27] <HLINK></HLINK>

[28]Prometheus:  <HLINK></HLINK>

[29]One Look Dictionary Search: <HLINK></HLINK>

[30] The electronic Oxford English Dictionary was developed at Waterloo University and led to the development of the Open Text Corporation. Which as since developed hthe Livelink ECM Platform, an integrated framework that combines a shared content repository with user, content, and process services in a Service-Oriented Architecture (SOA).h  Open Text Corporation:     


[31] Oxford English Dictionary:


[32]Thomson Dialog, gSourcesh: <HLINK></HLINK>

[33] Association of Research Libraries (ARL), gFraming the Issue: Open Accessh.  <HLINK></HLINK>

[34] Stevan Harnad, gScholarly Skywriting and the Prepublication Continuum of Scientific Inquiry,h Psychological Science 1: 342 - 343 (reprinted in Current Contents 45: 9-13, November 11 1991). <HLINK></HLINK>.

[35]Paul Ginsparg, gCreating a global knowledge network,h Invited contribution for Conference held at UNESCO HQ, Paris, 19-23 Feb 2001, Second Joint ICSU Press - UNESCO Expert Conference on Electronic Publishing in Science, during session Responses from the scientific community, Tue 20 Feb 2001.


[36] arXiv monthly submission rate statistics: <HLINK></HLINK>

[37] e-Print archive:  <HLINK></HLINK>

Front for the Mathematics ArXiv , University of California (UC) Davis:


[38] Ashlee Vance, gLos Alamos lends open source hand to life sciences,h The Register, 23 June 2003:


[39] MPI, gPreprints of the MPIh:


[40]Max Planck Gesellschaft, gBerlin Declaration on Open Access to Knowledge in the Sciences and Humanities,h Conference on Open Access to Knowledge in the Sciences and Humanities, Berlin, 20 - 22 Oct 2003, Berlin:


Ibid, Conference Synopsis: <HLINK></HLINK>

[41] European Cultural Heritage Online (ECHO). Open Access Infrastructure for a Future Web of Culture and Science:


[42] CODATA XIX, International Conference,

The Information society: New Horizons for Science, Berlin, 7-10 November 2004:


[43]SPARC, Europe: <HLINK></HLINK>

[44] Public Library of Science : <HLINK></HLINK>.

[45] DARE  (Digital Academic Repositories):


Leo Waaijers, gBy them going, pathways are growing, DARE; a work in progress,h SURF, 7 March 2005:


[46] Ibid: 6 projects 

CoMa: Copyright Management', Universiteit van Tilburg

'DARC: Distributed Africana Repositories Community', Universiteit Leiden

'P-Web: een tool voor het online publiceren van proceedings', Erasmus Universiteit Rotterdam

'Scripties Online', Universiteit Twente; Erasmus Universiteit Rotterdam; Rijksuniversiteit Groningen

'Stroomlijning en digitalisering van het review proces', Wageningen Universiteit

'Universitair Wetenschappelijk Archief (UWA)', Universiteit van Amsterdam

[47] National Institutes of Health, Public Access.<HLINK></HLINK>

[48] Stephen Pincock, gRCUK draft mandates open access,h The Scientist, 23 June, 2005. <HLINK></HLINK>

[49] Association of Research Libraries (ARL), gFraming the Issue: Open Accessh.  <HLINK></HLINK>

[50] ProQuest. Information and Learning Company:<HLINK></HLINK>

[51] Ibid., About UMI: <HLINK></HLINK>

[52] Ibid: <HLINK></HLINK>

[53]Early English Books Online, gAbout EEBOh:


[54]Early English Books Online, gText Creation Partnership:h <HLINK></HLINK>

Pricing for the Text Creation partnership



ARL or equivalent institution


Non-ARL graduate degree granting institution with more than 15,000 FTE


Non-ARL graduate degree granting institution with fewer than 15,000 FTE


Undergraduate institution only with more than 2,500 FTE


Undergraduate institution only with fewer than 2,500 FTE



[55] PR Newswire, gProQuest Acquires ExploreLearning,h Cold Fusion Developerfs Journal, 1 March 2005: <HLINK></HLINK>

[56] ProQuest, Annual Report, 2004 :  <HLINK></HLINK>

[57]Google, gLibrary Projecth: <HLINK></HLINK>;

Google, gGoogle Checks Out Library Booksh:


Even within the US there is also opposition to the Google vision. Kimberley A. Kicheniuk, gGoogle Begins Digitalizationff: <HLINK></HLINK>

[58] Deutsche Welle, European Libraries Fight Google-ization, 27 April, 2005:


[59] Yahoofs announcement of paid subscriptions for deep search on 17 June 2005 is seen by some as a sign of things to come. gYahoo 'deep search', g The Bosh, 17 June, 2005: 


[61] Internet2: <HLINK></HLINK>

[62] Jacqueline Brown, gPacific Wave, Pacific Light Rail and National Light Rail,h CANS2002, Shanghai, 22 August, 2002: <HLINK> cans/2002/Presentations/Brown.ppt</HLINK>

[63] IEEE, gWG12: Learning Object Metadatah:



[65] See for instance the authorfs, gAmerican Visions of the Interneth:


[66] <HLINK></HLINK>


[68]CNI, gClifford A. Lynch, CNI's Executive Director, Publicationsh:


See especially: Clifford A. Lynch, "The Battle to Define the Future of the Book in the Digital World," First Monday 6: 6 (June 2001); Clifford A. Lynch, "The New Context for Bibliographic Control in the New Millennium," Bicentennial Conference for the New Millennium: Confronting the Challenge of Networked resources and the Web, Washington, D.C, November 15-17, 2000.

[69] Michael Giesecke, Der Buchdruck in der frühen Neuzeit - Eine historische Fallstudie über die Durchsetzung neuer Informations- undKommunikationstechnologien. Frankfurt/ Main (Suhrkamp) 1991, 2. Aufl. 1994; durchgesehene und mit einem Nachwort versehene Ausgabe 1998. Cf. Michale Giesecke, Homepage: 


[70] Jean-Claude Guédon (Université de Montréal), gIn Oldenburgfs Long Shadow: Librarians, Research Scientists, Publishers, and the Control of Scientific Publishing,h


[71] gLe savoir, un bien public mondialh : <HLINK></HLINK>;<HLINK></HLINK>

This idea has also been explored in the present authorfs Augmented Knowledge and Culture, University of Calgary Press, 2005 (in press). 

[72] JISC, gBuilding a Virtual Research Environment for the Humanities (BVREH),h



[73] Geoffrey Fox, David Walker, ge-Science Gap Analysis,h UKeS,-2003-01.


[74] Maisons Science de lfHomme, gCentres, réseaux, associations de recherche hébergés ou domiciliésh: <HLINK></HLINK>

[75] BNF, gA Major Project. 1988 - 1994 : From a major project to the new Bibliothèque nationale de France:


[76]BNF, gGallica: about the Projecth: <HLINK></HLINK>

[77] Valerie Khanna gFrench cry havoc over Google's library plans,h Library Staff Blog, 28 March 2005:


[78] gFrench To Provide Alternative To Google Library Projecth, Web Rank Info, 17 March  2005: <HLINK></HLINK>

[79]Deutsche Welle, European Libraries Fight Google-ization, 27 April, 2005:


[80] EDRI-gram, „Initiative European Libraries to Digitise Books, Number 3.9, 4 May 2005:


[81]gOver 11m Digital Records Now Available At European Libraryh, Managing Information News, 31 May 2005:


[82] Europa, gViviane Reding Member of the European Commission responsible for Information Society and Media i2010: Europe Must Seize the Opportunities of the Digital Economy. Press Conference on the occasion of the launch of the initiative European Information Society 2010
Brussels, 1 June 2005h:

<HLINK></HLINK>; gNew 'i2010' programme to unleash digital services in the EU,h

[83] To close the gap between the information society "haves and have nots", the Commission will propose: an Action Plan on e-Government for citizen-centred services (2006); three "quality of life" ICT flagship initiatives (technologies for an ageing society, intelligent vehicles that are smarter, safer and cleaner, and digital libraries making multimedia and multilingual European culture available to all (2007); and actions to overcome the geographic and social "digital divide", culminating in a European Initiative on e-Inclusion (2008)

UNI Global, gEU: Launches new EuropeanCommunications strategy "i2010":


[84] For a discussion of the problems involved see the authorfs: Towards a Semantic Web for Culture, JoDI (Journal of Digital Information.  Volume 4, Issue 4, Article No. 255, 2004-03-15. Special issue on New Applications of Knowledge Organization Systems.


[85] Recently Sir Tim Berners-Lee it in an interview with Andrew Updegrove explicitly stated that this direction was not the goal of the W3C:


[86] Joseph Weizenbaum, Computer Power and Human Reason: From Judgment To Computation, San Francisco: W. H. Freeman, 1976.

[87] See: Grant Fjermedal, The Tomorrow Makers, A Brave New World of Living Brain Machines, Redmond: Tempus Books, 1986, p. 188

[88] For other work in the direction of free encyclopaedias see the work of Torsten Wöllert. gOffene Enzyklopädieh,


[89] Encyclopaedias are but one expression of a much more complex tradition. For an introduction see: Frances Yates, The Art of Memory, Chicago: Chicago University Press, 1966. For details see: Giorgio Tonelli, A Short-Title List of Subject Dictionaries of the Sixteenth, Seventeenth and Eighteenth Centuries as Aids to the History of Ideas, London 1971.

[90] Free Internet Encyclopedia, gYour Commentsh, Last updated 8/22/95: <HLINK></HLINK>:

gPlease be advised that Encyclopaedia Britannica is the owner of federal trademark registrations 1,672,590 for the mark eMacropaediaf and 1,672,591 for the mark eMicropaediaf".

[91] The abridged Kleine Pauly in 12 volumes initially cost DM 268/volume. See: gHubert Cancik and Helmuth Schneider (edd.), Der Neue Pauly. Enzyklopaedie der Antike. Altertum, Band 1 (A-Ari). Stuttgart: J.B. Metzler, 1996. Pp. liii, 577. DM 268/volume (subscription price).h Bryn Mawr Classical Review 97.3.15:


[92] Perline Noisette, Thierry Noisette, La bataille du logiciel libre, dix clés pour comprendre, Paris: éditions La Découverte, collection Sur le Vif, octobre 2004.


[93] James Boyle, «A manifesto on Wipo and the Future of Intellectual Property », Mis en ligne le mercredi 29 septembre 2004.


[94] Free office suite, Open <HLINK></HLINK>

[95] The GIMP.


[96] Inkscape:


[97] Blender :




[99] NASA, WorldWind 1.03


[100] SourceForgeNet, gOpen standards and software for bibliographies and catalogingh


[101] Richard Stallman, gThe GNU Projecth

<HLINK></HLINK>. cf. Richard Stallmanfs personal site


[102] Linux International, gLinux Historyh:

<HLINK></HLINK> ; Linux Online :


[103] For a serious list of such alternatives see: Jama Poulsen, DebianLinux.Net, gFreedomh


[104] Framasoft, g939 logiciels libres dans l'annuaire,h <HLINK></HLINK>

[105] Creative commons:


[106] Open Content:


[107] Cover Pages, gOpen Archives Initiative Releases Specification for Conveying Rights Expressions:h


[108] Proceedings of the New Ways, New Technologies Conference. University of Calgary. Calgary, Alberta, Canada. University of Calgary Press. October 15, 2004

[109] Towards an Integrated Knowledge Ecosystem: A Canadian Research Strategy


[110] Robin Bloor, gThe government open source dynamich, The Register, 7 January 2005.


[111] Keith Regan, gNokia, Apple Develop Open-Source, Mobile Web Browser,h  Linux Insider, 13 June 2005. <HLINK></HLINK>

[112] Open theory: <HLINK></HLINK>

[113] CODES:Collaborative Open Design System for Integration of Information Webs with Design and Manufacturing Tools:


Open has many meanings. The Open design Alliance is open only to a small group of industry partners. Cf. Open Design Alliance: <HLINK></HLINK>

[114] In a refinement of this approach the user can be offered a choice of seeing only a subset of the complete list which uses one of the variant names.


[116] There is for instance the Galen  Classification Workbench (ClaW): <HLINK></HLINK>

There are also the efforts of the Dutch WHO Collaborating Centre for the ICIDH to act as an intermediary for international classifications between WHO and the Netherlands:


[117] Oliver Streiter and Leonhard Voltmer, gDocument Classification for Corpus-based Legal Terminology h, European Academy, Viale Druso 1, Bolzano, Italy:


[118] For a more detailed discussion see the authorfs  gTowards a Semantic Web for Cultureh, JoDI (Journal of Digital Information, Volume 4, Issue 4, Article No. 255, 2004-03-15. Special issue on New Applications of Knowledge Organization Systems.


[119] Dan Corner, gSome Revealing Catholic Names, Titles and Prayers To Maryh:


Holy Mary
Holy Mother of God;
Most honored of virgins;
Chosen daughter of the Father
Mother of Christ;
Glory of the Holy Spirit
Virgin daughter of Zion,
Virgin poor and humble,
Virgin gentle and obedient,
Handmaid of the Lord,
Mother of the Lord,
Helper of the Redeemed,
Full of grace,
Fountain of beauty,
Model of virtue,
Finest fruit of the redemption,
Perfect disciple of Christ,
Untarnished image of the Church,
Woman transformed,
Woman clothed with the sun,
Woman crowned with stars,
Gentile Lady,
Gracious Lady,
Our Lady,
Joy of Israel,
Splendor of the Church,
Pride of the human race,
Advocate of grace,
Minister of holiness,
Champion of Godfs people,
Queen of love,
Queen of mercy,
Queen of peace,
Queen of angels,
Queen of patriarchs and prophets,
Queen of apostles and martyrs,
Queen of confessors and virgins,
Queen of all saints,
Queen conceived without original sin,
Queen assumed into heaven,
Queen of all earth,
Queen of heaven,
Queen of the universe (pp. 190,191)

From the gLitany of Loretoh

Mother of the Church,
Mother of Divine grace,
Mother most pure;
Mother of chaste love;
Mother and virgin,
Sinless Mother,
Dearest of Mothers,
Model of motherhood,
Mother of good counsel;
Mother of our Creator;
Mother of our Savior;
Virgin most wise;
Virgin rightly praised;
Virgin rightly renowned;
Virgin most powerful;
Virgin gentle in mercy;
Faithful Virgin;
Mirror of justice;
Throne of wisdom;
Cause of our joy;
Shrine of the Spirit;
Glory of Israel,
Vessel of selfless devotion;
Mystical rose;
Tower of David;
Tower of ivory;
House of gold;
Ark of the covenant;
Gate of heaven;
Morning star;
Health of the sick;
Refuge of sinners;
Comfort of the troubled;
Help of Christians;
Queen of the rosary;
Queen of peace (pp. 191,192)

[120] Mark Pilgrim, gWhat is RSS?h:



Webref, gIntroduction to RSSh:


[122] The term memory institutions to describe the combination of museums, libraries and archives was introduced within the European Commission in the early 1990s and has only gradually moved towards international recognition. Some use the term cultural institutions, others speak of the ALM (archives, libraries, museums ) sector. Cf. Lorcan Dempsey,  Scientific, Industrial, and Cultural Heritage: a shared approach, Ariadne, Issue 22, 21 December, 1999   <HLINK></HLINK>

[123]Owen Gibson, „Coming soon: Googling the truthg, The Guardian, 18 June 2005. <HLINK>,3604,1509281,00.html</HLINK>

[124] Art Libraries Society of North America, Cataloging Advisory, gAnonymous Artist Relationships in the MARC 21 Bibliographic Formath, Discussion Paper 115, 14 May, 1999.
Cf. <HLINK></HLINK>. For an example from geology, see: POSC Specifications
Version 2.2.2, gClassification Systemh: <HLINK> </HLINK >

[125] For an example of current approaches see: Tim DiLauro, gChoosing the components of a digital infrastructure, First Monday, Issue 9:


[126] <HLINK></HLINK>;


[127] E.g. Charles Phelps (Provost, University of Rochester):

gThe central idea would have, the learned societies expand their role to undertake a certification process for articles, independently of whether they are submitted for, or are eventually published in the standard paper journal system. Under such a system, scholars could submit articles for review (with an agreed-upon submission fee), and the normal refereeing process of the learned society would determine whether the article qualified for their "seal of approval," which, if received, could be affixed to any electronic version of the article as retrieved by others. With such a certification, if appropriately "honored" in processes that rely upon such certifications, including tenure and promotion in colleges and universities and grant applications from governments and foundations, the necessity to carry on with paper publication (which serves only the certification and editing processes in addition to the distribution, indexing and archiving that the computer file-server system can serve) could diminish or vanish at least in some settings. But until such refereeing, and in some settings, editorial functions are provided, the paper journal system will persist in parallel with whatever electronic system emerges.h

Charles E. Phelps, gThe Future of Scholarly Publication. A Proposal for Change:h  <HLINK></HLINK>

[128] arXiv monthly submission rate statistics


[129] Derek DeSolla, 1922-1983:


[130]HyperJournal Website: Core Team: <HLINK></HLINK>

[131] HyperJournal Website: Features:


It also includes 1) Open Archive OAI-PMH protocol compliance; 2) unlimited number of scientific and editorial committees; 3) on-line anonymous peer-review.

[132]Paolo DfOrio, gNietzsche Open Sourceh: <HLINK></HLINK>

[133] Paolo D'Iorio, gSharing Knowledge in Web Communities,h Open Source Models in the Humanities, University of Chicago,
26 April 2004:
<HLINK></HLINK> . Cf. Scuola Normale, Conference: Progettare su web.Digital Libraries di parole e immagini (centri di ricerca e grandi biblioteche),  Pisa, April 2005:


For other views on the present state of scholarly communication see: Rob King, Ewa Callahan, gElectronic journals, the Internet and scholarly publishing, Annual review of Information Science, vol. 37, 2003, pp. 127-177; Rob Kling, gThe Internet and unrefereed scholarly publishing,h Annual Review of Information Science and Technology, vol. 38, 2004, pp. 591-631.


[134] Hyperlearning: HyperMedia Platform for Electronic Research and Learning: 


[135]Ashwin K Pulijala Susan Gauch , gHierarchical Text Classification:h

[136] The challenges of linking the two National central libraries of Italy in Rome and Florence offer a case in point. Maria Patrizia Calabresi, gTwo national central libraries in Italy: bibliographic  co-operation or competition?h, 66th IFLA Council and General Conference, Jerusalem, Israel, 13-18 August, 2000:


[137] E.g. Texas Womanfs  University, gTWU Librariesh:


[138] Cf. Suzanne Keene, Francesca Monti, gThe DEER: Distributed European Electronic Resource.h Final Report, 2003. In: E-Culture Net: Work Package 6, Deliverable 11a. IST-2001-37491. <HLINK></HLINK>