DLENCY

Research and Development in Digital Libraries

Gary Marchionini

I. Digital Library Perspective

Digital library is a concept that has different meanings in different communities. To the engineering and computer science community, digital library is a metaphor for the new kinds of distributed data base services that manage unstructured multimedia data. To the political and business communities, the term represents a new marketplace for the world's information resources and services. To futurist communities, digital libraries represent the manifestation of Wells' World Brain. The perspective taken here is rooted in an information science tradition.

Digital libraries are the logical extensions and augmentations of physical libraries in the electronic information society. Extensions amplify existing resources and services and augmentations enable new kinds of human problem solving and expression. As such, digital libraries offer new levels of access to broader audiences of users and new opportunities for the library and information science field to advance both theory and practice.

High levels of attention and funding were first given to digital libraries in the early and mid 1990s leading to a plethora of visions and projects invariably driven first by finding ways to apply the many technologies developed in the 1980s and second by desires to create new technologies for managing distributed information resources. This perspective is best illustrated by the mission statement of the Digital Library Initiative Interagency Coordinating Committee charged with monitoring the progress of six large-scale projects funded by the US government. "The broad goal of the Digital Libraries Initiative is to dramatically advance the means to collect, store, organize and use widely distributed knowledge resources containing diverse types of information and content stored in a variety of electronic forms." This technical emphasis stands in contrast to the mission statement of a typical large public library. "The mission of the Carnegie Library of Pittsburgh is to be a force for education, information, recreation, and inspiration in the communities it serves." Thus, much of the early attention related to digital libraries was technology-centered and content-centered rather than people and community centered. In some cases, notably the efforts of national libraries or large academic libraries, efforts focused on extending access to existing collections through digitization and network access.

It is surely that case that all libraries will have some digital collections or finding aids and there will be some libraries that offer digital collections exclusively. At present the term digital library has focused on digital collections and limited access services. Depending on the source, digital libraries include anything from simple repositories of huge volumes of homogeneous electronic data with primitive access services to the electronic extensions of the world’s most prominent libraries (see CACM, April 1995 for briefings on plans by the Library of Congress and the British Library, see Representations, Spring 1993 for several commentaries on the Bibliotheque de France [Jamet & Waysbord, 1993], see CACM, April 1998 for briefings on various digital library projects internationally). To be called a library, an entity must be rooted in one or more communities of practice and be guided by a service mission that is manifested in policies of acquisition (collection development), organization, and access. Libraries offer both content and services guided by such policies and exist in a social-political context that influences policies and operations. To be modified by the term digital, a library must have some electronic content and services. In practice, a digital library makes its digital objects and services accessible remotely through networks such as the Internet or limited-access intranets. In some cases, the digital library objects and services may be distributed transparently to users from a variety of machines and locations. Thus, digital libraries are defined by mutually dependent attributes, which include content, services, technology, and socio-political culture.

II. Content

Much digital library research, especially in the private sector has been driven by the aphorism "Content is king." Theoretically, any object from a text fragment to an animal in a zoo may be rendered digitally. Thus, there is no limit to the types of content that may be held by a digital library, however, there is a wide range of levels of practical difficulty in rendering different objects and in the efficacy such renderings offer to people. All content share intellectual, technical, and cultural challenges as well as offering specific challenges. Authority, surrogate creation, formats, intellectual property rights and costs of acquisition and maintenance are issues for all digital objects, but different types of content present special challenges.

A. Types of Content

Most digital libraries provide renderings for textual objects. Whenever possible, text is scanned and optical character recognition (OCR) technology used to create digitally coded (e.g., ASCII, Unicode) renderings. Having text in digital form allows easy character string search as well as more sophisticated search and linguistic pattern matching analysis. Manuscripts that are not easily character recognized or have inherent value in the actual script, and the figures and images from typeset documents are scanned and provided as bit mapped image renderings. OCR accuracy and scanning resolution are bound by costs for textual documents and manuscripts respectively, i.e., perfect OCR requires costly human validation; very high resolution bitmaps are more expensive to store and transmit. An additional systems management effort is needed to coordinate the retrieval and display of ASCII/Unicode and bit mapped files for large collections. In some cases, texts are manually marked up using the Standard Generalized Markup Language (SGML) so that structural content as well as formats are provided to users. The edition of the text used is often an issue, especially for translations, as is the inclusion of critical commentaries or apparatus criticuses. How surrogates such as bibliographic records, abstracts, keywords/phrases, indexes, or concordances are created (e.g., automatically or manually) and displayed to users also vary across digital libraries. Consortia such as the Text Encoding Initiative address many of these issues but different digital libraries use a variety of techniques and formats for their textual components.

Specialized digital libraries provide renderings for single medium objects such as images, statistical data, sound recordings, or silent films. Determining which formats to use is one challenge for such collections. The Museum Educational Site Licensing Project worked with images from seven prominent museums that provided images to the project in one of four distinct formats at seven distinct resolutions. Image collections provided by stock photo companies or projects such as the Library of Congress National Digital Library program currently provide multiple formats (e.g., GIF, TIFF, JPEG) to accommodate the wide range of platforms and software people may bring to the collection. For each image, multiple files must be stored, maintained, and linked to indexes and catalogs. Similar redundancies are currently necessary for digitized sound (e.g., AU, WAV, AIFF) and digitized film or video (e.g., Quicktime, AVI, MPEG, Shockwave). Although techniques to store high-quality data and create the required formats on the fly will surely be developed, digital librarians today must often juggle multiple files for the same object. Another decision librarians must make is what resolution to use for digitized objects. Is 300 or 600 dots per inch sufficient for photographs or must higher resolutions be acquired and stored? For example, an 8 bit digital rendering for a color slide of a vase in the Perseus digital library at 640 by 480 pixel resolution may be sufficient for students studying vase shapes and styles but may not be adequate for the art historian examining fine details.

Indexing for non-textual objects is particularly troublesome. Most digital libraries depend on textual captions or titles for retrieval and these distinct textual objects are themselves stored in different files. Creating and using new surrogates for non-textual objects is an active area of digital library research (see the services section below). Additionally, authority concerns are exacerbated with non-textual content (a mustache on the Mona Lisa is not an issue as it is an obvious alteration) and techniques for digital watermarking or information hiding are becoming commercially available.

Much of the interest in digital libraries stems from the possibilities of providing interactive multimedia content to users. Although video programming is the most obvious type of multimedia content, animated texts, hypermedia corpuses, on-demand video, and collaborative scenarios (MUDs and MOOs) for work or play are possible in digital environments and these dynamic digital events and objects will become part of digital libraries. Increasingly, the fruits of creativity and expression are inherently digital in nature. Computer simulations, games, and virtual worlds are objects collected in digital libraries. Interactive multimedia go beyond combining more than one medium to provide people with control mechanisms for making choices over multiple iterations, i.e., they are interactive. Digital libraries may provide access to a standardized entry point and leave it up to the user to deal with various components (much like libraries index a book rather than chapters or paragraphs). Interactive media allow the possibility of indexing at much finer granularities (e.g., words or video frames). Whether such fine-grained access is actually useful for information seekers remains to be determined. Surely, new kinds of surrogates such as document vectors or color histograms will be useful to the system for searching and text summaries and keyframe video extracts will be useful for user browsing.

All digital libraries must cope with making metadata available to users. Metadata are another level of content to librarians but a means to the content for users. Not only do digital librarians face challenges in standardizing metadata to insure interoperability across digital libraries, but the range and distinctiveness of metadata are problematic. In some cases, it is only the metadata that is made available digitally. In such cases, users search through pointers and must acquire the primary information physically or through a different (e.g., fee based) system. Such libraries are more properly considered as referral services rather than digital libraries. In more typical cases, metadata for objects of different granularity (e.g., titles for collections and titles for single objects) are mixed together on computer displays with full texts or objects. In physical libraries, the card catalog or OPAC is physically distinct from the items on shelves. These distinctions are difficult to make in electronic environments because everything is displayed on the same physical screen; thus the boundaries between metadata and primary data are often blurred.

Metadata are used primarily as intermediate steps to retrieving content but creators and digital librarians are creating new types of surrogates for objects to allow users to quickly preview and browse content. Huge challenges remain in creating surrogates for digital content. Today, most retrieval is facilitated through words--titles, captions, manually created descriptions, automatically extracted keywords, etc. There is enormous attention focused on creating non-textual surrogates such as color and shape characterizations for images and speaker identification schemes for audio recordings, but there are more difficult metadata issues looming as more content is not stored at all but created on the fly according to the specifications of the user. For example, today’s web sites create specialized graphs from enormous varieties of statistical data in government repositories such as the Bureau of Labor Statistics. These graphs are generated on the fly according to the variables users specify. These new objects are impossible to uniquely title or index in advance as the permutations of hundreds of variables allow huge variants. As more digital libraries support sophisticated user profiles or agents, customized, original information objects will be provided to users from the library’s "collection of possibilities." Physical libraries do not generally save and index results of reference activities except to create tickler files to help reference librarians the next time a question is asked or to use in creating pathfinders for popular topics. Characterizing what content is possible rather than what exists is a much larger challenge in digital libraries.

B. Managing Content

Many digital library efforts devote the bulk of their resources to managing content. The key content management functions in any library are selection and acquisition; indexing, storage, and access; and collection maintenance. Most of the research and development activity in early digital library efforts were devoted to these functions, although it is likely that more attention will be given to user services as digital libraries mature.

Selection and Acquisition.

Libraries select content according to a collection development policy. Such policies manifest the missions of the library and determine how materials budgets are expended. Many digital libraries, especially those in governmental agencies, have arisen out of the need to take an existing body of electronic materials and make them available to users. Some digital libraries are strictly opportunistic, selecting objects to digitize from the existing collection and those for which intellectual property rights are held. For example, the Library of Congress National Digital Library Program selected objects from a variety of reading rooms that were out of copyright (historical collections) or that were produced by US government agencies where copyright is not claimed. Some digital libraries mainly acquire or develop materials according to specific missions and policies and augment the collection opportunistically. For example, the Perseus project texts were mainly acquired opportunistically from the Loeb collection at Harvard University which holds rights. In some cases, new translations were commissioned. On the other hand, the bulk of the images in the Perseus Library are from original photography of museum objects that were selected to meet scholarly and pedagogical goals. Another example of a specialized digital library is the Alexandria Project at the University of California Santa Barbara that focuses on spatial information. This project aims to make existing maps and other spatial material more broadly available by combining digital representations for visual objects such as maps with the text-based attributes of names and geographic features (e.g., Smith, 1996). It is likely that as digital libraries continue to evolve, new, specialized collections will be built according to institutional missions and well-defined collection development policies.

There are two key challenges for content selection: cost and quality. First, librarians consider the costs of acquisition. Intellectual property rights are an important first consideration, but the costs of digitization and maintenance must also be taken into account. Libraries that receive collection gifts often require that donors supply funds for cataloging, shelving, and preservation and digital gifts bring their own one-time and ongoing costs. Second, librarians consider the quality of the content before acquiring it-- If content is king, quality is its lineage. This is a more problematic consideration because issues of authenticity as well as veracity arise. Which of Monet’s lily studies best represent his style at the end of his life? Is a transcribed Latin text from one 15th century Italian monastery superior to a second transcription from a neighboring city? Which of the many best seller lists are most authoritative for adult fiction? Which web site collaborative rating service is most useful for selecting eighth grade science simulations? The issues of cost and quality are addressed in numerous, labor-intensive ways in traditional libraries and it is unlikely that this will change fundamentally in digital libraries, although collaborative ratings and better communication facilities will certainly augment librarians’ as well as patrons’ abilities to make informed judgments about what they select.

Once decisions about selection are made, content must be acquired. Payments in physical libraries are already mainly computerized but delivery may be easier in digital libraries. For objects already in digital form, then file transfer through networks or mass storage is straightforward as long as file formats are well-specified. In the case of physical objects, digitization must be conducted. Scanners for text and images range in quality on several dimensions: output resolution, value and condition of the physical objects (e.g., brittle, rare manuscripts must be handled differently than technical reports), speed (digitizing 100 images is quite a different task than digitizing a million images). In addition to the engineering challenges digitization provide, policy decisions must be made. For example, which resolutions and formats to adopt, which text to OCR and error correct, how to link different digital representations for multiple media from single collections (e.g., a manuscript collection that includes field notes, photographs, and audio tapes). The complexities and tradeoffs involved in digitization and user access are well-illustrated by the CORE Project that systematically applied different digitization schemes to published chemistry materials and then conducted multiple user studies (Entlich, et. al., 1996).

Indexing, Storage, and Access.

Once content has been selected and acquired, it must be added to the collection in such a way that users will be able retrieve it effectively. Indexing, storage and access are perhaps the most active areas of research and development in digital libraries. Digital libraries have given new life to work on automatic indexing as manual indexing of huge volumes of data are beyond the resources of most libraries. In many cases, texts are "indexed" using vector-space or probabilistic information retrieval models that provide access through weighted values for all but a few common words. Thus, the classification system itself is empirically determined from the data as a by-product of the indexing. Perhaps the most successful example of this approach to date is the Inquiry system (Croft, Cook, & Wilder, 1995) used as the retrieval engine in many digital library projects (e.g., the Library of Congress). These approaches stand in stark contrast to the traditional approach of manually assigning objects to a limited number of manually constructed concept classes (classification system) represented in a controlled vocabulary (e.g., Library of Congress Subject Headings or Medical Subject Headings). Other automatic techniques index objects to mathematically abstract concept classes, for example, Latent Semantic Indexing assigns documents to "concepts" composed of term-vector document singular vectors (e.g., Deerweister, et. al, 1990) Several WWW-based services use a hybrid approach by manually creating a classification system and then using automatic techniques to assign objects. Perhaps the most ambitious effort to automatically index large volumes of documents to date is the work of Schatz and Chen (Schatz et. al, 1996; Chen et. al, 1997) who have applied supercomputer resources to indexing scientific and engineering documents. Additionally, as digital libraries become more global, multiple language documents, documents in different languages, and multiple language versions of documents are concurrently available to users who bring queries expressed in different languages. Researchers are actively applying existing text retrieval techniques to the cross-language retrieval problem (e.g., Sheridan & Ballerini, 1996).

Most retrieval systems for images, video, audio recordings and other non-textual objects have depended on text items such as title, creator name, or manually assigned subject headings for retrieval. Digital libraries have generated enormous research interest in inventing indexing techniques that do not depend on text representations. One line of research is to adapt the statistical techniques used in text retrieval to characterize objects by feature vectors for characteristics such as color (color histograms are commonly used) and brightness. Researchers also have begun to tap the research in robotics (vision systems) and signal processing to automatically extract unique attributes such as shapes, optical flow, and pitch that may be used for retrieval.

For example, image segmentation is a fundamental problem in image processing in general and also in creating surrogates for retrieval and use. Various feature analysis techniques have been exploited to identify images and provide a basis for queries. These include edge and corner detection, foreground/background separation (e.g., Rosenfeld & Smith, 1981), texture analysis (texture energy determined by filters (e.g., Jain, Ratha, & Lakshmanan, 1997), and color (e.g., Jain & Vailaya, 1996). These feature analyses are augmented by measures of optical flow in moving images (e.g., Sim & Park, 1997). The Informedia Digital Library Project at Carnegie Mellon University has applied several of these techniques to support video search and browsing (e.g., Wactlar et. al, 1996). Some of the techniques have been integrated into commercial products such as IBM's incorporation of Query by Image Content (Flicker et. al. 1995) techniques into its Digital Library Solution. It seems certain that the digital library research and development activity of the 1990s will insure that considerable progress is made in automatically indexing non-textual objects with non-textual attributes. New indexing challenges will emerge as more dynamic objects (e.g., virtual conference proceedings, active networks) are added to digital libraries. The temporal nature of such objects will require ongoing indexing--consider, for example, how you would index the events of your life as it progresses.

There is a two-fold advantage to electronic content. First, a multiplicity of pointers are economically feasible since many separate cards or other physical devices need not be created. Thus, rather than the four or so catalog cards (author, title, and a few subject headings) in a physical system, dozens or hundreds of index terms may be assigned or many different levels of representation may be created in an electronic system. It is essential to novel and flexible access interfaces that multiple and varied indexes be available. Second, unlike physical objects which must reside in a single space, electronic objects may exist in many locations. Thus, the logical many-to-many relationships among concepts and information objects may be leveraged for both searching and browsing in electronic environments.

Storage is mainly a technical requirement although new media may complicate storage decisions and costing. When data is to be delivered continuously (e.g., streaming video or audio) rather than as discrete files, then alternative technologies are required (drives and database management software that operate continuously have different engineering requirements than drives optimized for bursts of data). Today's large digital repositories use multiple levels of mass storage media (e.g., disk, tape) and mechanical robots to locate and mount the media. Various supercomputer centers today use tape robots that provide rapid access to many terabytes of data (e.g., the Oak Ridge National Laboratory in 1997 had capacity for 100 terabytes of uncompressed data). Digital libraries will surely apply such technology just as libraries today apply movable shelving and complex conveyer systems to move physical materials.

Ultimately, users must be able to access the content digital librarians have selected, indexed, and stored. During the 1970s, large libraries invested heavily in computerizing cataloging and circulation functions to give users faster and better access and service. Online Public Access Catalogs have evolved to give library patrons remote access to the bibliographic records. Digital libraries offer access to primary content using a variety of access tools. An active area of research is user interfaces for digital collections. Access interfaces depend on the content organization and storage discussed above and serve as the bridge between internal (technical services) and patron services. Access interfaces are considered under search services in the Services section.

Maintenance.

Maintaining buildings and systems, and preserving content are important and costly activities in physical libraries. Digital libraries may avoid some of the costs of wear and tear on buildings and books but still have significant maintenance costs, including some unique to electronic environments. System hardware and software upgrades have become accepted expediencies of today's workplace and there is no reason to expect that this will change. New equipment, improved or alternative network solutions (e.g., ISDN, ATM, wireless), and software upgrades will require excellent technical personnel. Archivists have long worried about the persistence of digital media. Magnetic tape life expectancies are typically less than ten years under ideal temperature and humidity conditions. Optical storage offers longer life spans, but digital librarians must plan for copying digital holdings periodically and especially plan for the inevitable obsolescences of different media types and playback devices. These maintenance issues correspond to traditional maintenance requirements but their because they apply across many industries and require rapidly changing technical skills, they tend to be much more expensive.

Just as the computational systems change, digital content may also change. A digital document may have numerous versions, especially given the ease with which electronic documents may be changed. Maintaining the most essential (not necessarily the most recent) document requires that versions be well managed, including updating and deleting the links to those objects. In addition to this version control problem, digital librarians must manage the multiplicity of indexes and file formats. More problematic are link management requirements as hypertext links are created among distinct documents. A policy such as requiring all links to point to the top of a document (e.g., main home page of a web site) aid the librarian in managing links in a database but may not serve the user who expects to go directly to the location of the relevant information.

Although most of the research and development effort in digital libraries has been devoted to building the collections and making them available to users, there is enough experience for the creation of digital librarian's tool kits. Such a tool kit might include tools for selecting, acquiring, indexing, and maintaining digital content. For example, tools would include: library building tools for viewing directory structures, converting formats, checking screen layout consistencies, quickly viewing objects, and encrypting data; interface simulators for testing interfaces on multiple platforms; database tools for property rights, file naming histories, links, and metadata definitions; and maintenance tools for automatically checking links, automating transaction log analyses, maintaining security, updating versions, and backing up the system. Although many of these tools exist, developers will surely undertake systematic efforts to augment the list and bring them all together with a common interface amenable to the widest possible set of digital libraries.

For existing libraries, many of the decisions related to managing content are questions of how many resources to divert from existing operations and what levels of redundancy to assume for physical and digital collections and services. For new, exclusively digital libraries, the decisions are driven mainly by resource acquisition.

III. Services

The range and depth of services that a library provides to patrons are driven by its service mission and policies. Policies determine who may use the library, when content and services are available, what types of services are available, and how resources are allocated to patrons and services. Digital technology offers the potential to radically change who may use a library, when they may do so, and what types of services are offered. Digital library services both amplify existing services and augment library service with new possibilities for users. Thus, libraries that offer digital content and services must reconsider their policies in light of new capabilities and patron demands.

A. Who are the patrons?

Given network capabilities, libraries must decide whether to expand their user populations beyond the usual physical limitations of time and space. Patrons can access digital content at any time of day but human library services may still be restricted to local working hours. Public libraries must decide whether to seriously consider serving the world community rather than the local population that supports the library. Like corporations that provide access through restricted intranets, public libraries now may opt to maintain local community user policies through password access. Until public intranets or some other solution emerges, it is much easier for public libraries to provide universal access to local bibliographic holding data and password access to other databases and services. The recent creation of the Gates Library Foundation aims specifically at public libraries and may have an enormous impact on electronic services in public libraries. National libraries may leverage digital technology to more realistically serve the population in its service mission, or expand its policy. The Library of Congress service mission, for example, does not explicitly serve children, however, the Library of Congress National Digital Library Program does have an explicit outreach to K-12 schools. Thus, the digital library effort has effectively broadened the scope of service for this national library. Additionally, libraries often make arrangements to serve users with special needs (e.g., access ramps, Braille books) or users from varied cultures (e.g., languages and customs) and must find ways to extend these services in digital environments.

B. What types and qualities of service to offer?

Even more difficult than who can use digital library resources are decisions about what types of services to provide digitally. Ultimately, this challenge may define the legacy of digital libraries. Libraries offer different types of reference and referral services (e.g., ready reference, exhaustive search, selective dissemination of information), instructional services (e.g., bibliographic instruction, database searching), added value services (e.g., bibliography preparation, language translation) and promotional services (e.g., literacy, freedom of expression). Although much of the impetus for digital library research and development was content, it is clear that the most used and engaging aspect of the Internet is electronic mail and chat rooms. People want to communicate and collaborate. Libraries that develop service strategies for connecting people together in information-rich environments are most likely to prosper.

Services can be provided at different quality levels according to how resources are allocated. Libraries set policies about how much time a reference librarian may spend on reference questions, how requests are received (verbal in person, written, phone, fax, email, etc.), and what types of special services are offered. Digital technology offers new capabilities as well as different, often greater, expectations on the part of patrons. The changing expectations of service populations demands that digital libraries continue to revise service policies.

C. Search services

The most basic access service is search of the library’s collection. Online catalogs have long provided author, title, and limited subject access to local holdings and more recently to union holdings across multiple libraries. The expectation for digital collections is that catalogs should seamlessly link to the digital collection itself so that remotely located users can not only find and display bibliographic information but also the primary information objects. This expectation yields several challenges to librarians. Distinguishing metadata and primary data is not a trivial problem in rich collections. In homogeneous collections it is possible to define a unit of primary information (e.g., a book rather than a chapter or series) but this is more problematic in heterogeneous collections containing finding aids, manuscript bit maps, videos, and hypertexts. The challenges are first, to extract and provide multiple levels of representation and second, to provide users with control mechanisms to move from high-level surrogates to detailed objects (Marchionini, 1995). This is a basic human-system interface problem. The mechanisms digital librarians provide to users depend on the levels of representation that are available in the collection.

The most common search mechanism is a query line or form that allows users to enter a term or terms as a query. Depending on the type of indexing the library uses, ranked lists or exact-matched sets of results are returned to the user. There is a rich history of query-based searching from the information retrieval research community and online service industry that digital libraries may build upon. Limitations in the architecture of the WWW (statelessness) strongly limited many of the early WWW-based digital library search mechanisms, but server-side caches, client side caches (e.g., "cookies") and the development of Java allow the incorporation of mechanisms known to improve search capabilities such as relevance feedback and user profiles. These advances and growing experience in web-based designs have also led to support for more sophisticated search options (e.g., proximity, scope limits). One of the most pressing needs is for search mechanisms that give users more control over results--most give users simple lists with perhaps some sorting options. Interface prototypes for the Library of Congress (Plaisant et al., 1997) give users information about the level of representation of results (e.g., collection or item, media type) as well as flexible options for sorting and display.

Interactive environments have begun to force designers to accept the user's perspective that browsing is a legitimate information seeking strategy (Marchionini, 1995). Nowhere is this more dramatic than in hypertext environments such as the WWW. Thus, many digital library access interfaces provide users with navigational mechanisms. These are often based on some high-level hierarchical classification that allows users to select categories at increasingly detailed levels of granularity to eventually reach specific information objects. Clearly, useful digital libraries will provide hybrid solutions that allow users to apply both selection and query strategies according to their specific experience and needs.

Although search forms and selection-based navigation are the default access mechanisms in most digital libraries, there are an array of novel interfaces that allow users to manipulate visualizations of collections. Shneiderman’s (1994; Ahlberg & Shneiderman, 1993) dynamic query interfaces allow users to query collections through direct manipulation tools such as sliders and immediately see the results of these actions. Fox et. al. (1993) have developed a visual interface for a computer science literature digital library that allows users to manipulate search results represented as an array of icons. Hearst (1997) has merged clustering techniques (scatter gather) for search with visualization of results (tilebars) to help users search and explore digital collections. Lin (1997) creates semantic maps that depict two-dimensional maps for high-dimensional concept spaces. The map region sizes are proportional to the importance of the concept and the juxtaposition of the regions represents the similarity of concepts. Korfhage (1997) has developed several different interfaces that graphically represent users' points of interest within a concept space. Some systems provide zooming mechanisms that allow users to easily shrink or expand information spaces. Bederson's Pad++ system (Bederson & Hollan, 1994) allows continuous zooming that is highly effective for graphical objects such as timelines, hierarchies, or images. . Marchionini et. al., (1997) have combined dynamic query interface style with video preview techniques in a digital library of instructional resources.

Digital libraries will take advantage of these developments to provide users with usable yet powerful interfaces to control sophisticated computational tools behind the scenes. Informedia, for example, provides users with a variety of interface tools such as spoken queries, video walls and video skims to search and browse with advanced pattern recognition systems in the background (Wactlar et. al., 1996).

D. Reference and question answering services

Although digital libraries may provide communication channels (e.g., chat rooms, Internet "News" groups) where people may interact to answer each others' questions, many patrons come to librarians for answers to questions. Librarians may provide answers, references to literature that may contain the answers, or referrals to other people or services. These reference services are an essential part of most libraries’ mission and an important question is how such services will evolve as a result of technology. There are five ways that reference services are provided in digital libraries.

The most basic service is to anticipate questions and provide canned answers. Frequently asked question services (FAQ) anticipate common questions and provide answers so that users can go to the FAQ service before requesting human assistance. These services are particularly popular for system-related questions that new users typically might have. In a more elaborate version of this solution, digital librarians may also create electronic pathfinders for specific topics that they anticipate may be useful to many patrons. These pathfinders or special collections are then featured prominently at the library’s virtual entry point.

A second type of solution leverages asynchronous exchange between patrons and librarians or content experts. Certainly, electronic mail requests allow users to reach reference services more conveniently. These online reference services are logical extensions of traditional reference services that respond to written requests and facilitate multiple iterations over times convenient to users and librarians. Although technology allows digital librarians to serve patrons more conveniently, these solutions still demand substantial human attention. Moreover, the availability of digital assistance tends to increase the volume of requests and the expectations of requesters.

A third approach is to combine automated and human services. If FAQ solutions fail the user, the request is forwarded to an appropriate automated service or human expert. Services such as the Answer Garden (Ackerman, 1993, Ackerman & McDonald, 1996) not only route questions through the FAQ list as they come in, but also capture new requests and the human responses and add them automatically to the FAQ list. Such a system has the added benefit of sharing new questions and responses to the corporate memory of the particular community of practice. Moreover, it can lead to longer queries which can yield better results with today's search engines. We can expect to find many new hybrid solutions to the reference problem that take advantage of both human and machine capabilities.

A fourth solution is real time dialog with a librarian or content expert augmented by technology. Software customer service hotlines and catalog order centers use databases and telephone management software to speed their work and digital libraries will also leverage such tools to provide human reference service. In the case of system help questions, help desk tools that allow respondents to replicate what users see on their screens remotely (or in intranet environments actually allow information specialists to take over a remote machine for trouble shooting) offer new possibilities for librarians to provide remote reference service. Internet chat or video links may be very effective for specific reference advice but is very expensive since it demands concurrent human attention. As such, it will find applications first in corporate digital libraries and public fee-based services.

The most ambitious solution to the question answering problem is to create software agents that take into account the user's context and act as human surrogates. The Knowledge Navigator video created by Apple Inc is the quintessential example of such an agent. Natural language understanding is necessary but not sufficient for these automatic reference services since reference librarians often help people to clarify and articulate their information needs. As the NLU problem is itself incredibly complex, it is likely that we can expect progress to be made by teaming humans and machines and finding the best allocations of machine and human resources for answering reference requests.

E. Filtering and Selective Dissemination of Information

A service that is particularly important in special libraries is selective dissemination of information--sometimes know as routing, alerting, or filtering. Users develop interest profiles and as new materials are added to the collection or become known to the library staff, they are compared to the profiles and relevant items are passed on to the users. Filtering services are particularly applicable to newswires, Internet "News", and broadcast media abstracting services. Electronic user profiles in conjunction with online database services have long been available and will surely proliferate as more library content becomes available digitally. Automatic filtering services differ from retrieval services in that in filtering the corpus changes dramatically from period to period (e.g., day to day) and the query remains relatively stable (Oard, 1997). This leads to queries (profiles) that are more carefully and fully developed, and to the need to extract the salient regularities and relationships in the corpus anew each period. In some cases, filtering services provide added value by abstracting primary information (e.g.,, answers to a standing question, hyperlinked threads across documents) to users whereas search services typically bring documents that may contain the primary information.

One interesting extension of this concept is to use the connectivity inherent in digital libraries to support collaborative filtering where patrons rate or add value to information objects and these ratings are shared with a large community so that popular items can be easily located or people can search for objects found useful by others with similar profiles (Maes, 1994; Resnick, 1997). Such an approach is analogous to peer review for research papers, but involves many more reviewers. Although there are privacy issues related to personal profiles, the benefits of collaborative filtering may make such services increasingly important for libraries. Eventually, specialized library services will emerge to manage large numbers of profiles. Such profile management systems will only be able to optimize performance (e.g., by leveraging redundancies in profiles), but also serve as population parameters for social scientists and historians studying group behavior. In addition, digital libraries may provide services that assist users in developing and maintaining profiles.

F. Instruction

Libraries have always been an essential element of the educational infrastructure. In formal learning settings (e.g., K-12, university) libraries are the center of the school. This is evident from the often cathedral-like architecture to the certification requirements imposed by accreditation bodies. More importantly, libraries are essential in supporting informal and professional learning beyond the formal school system. We have argued (Marchionini & Mauer, 1995) that digital libraries will lead to more close integration among formal, informal, and professional learning. Digital libraries offer new opportunities to break down classroom walls and allow people to learn wherever they are and whenever they want. Many digital library projects seek to bring multimedia resources to teachers and students on demand. For example, the Earth System Science Community (http://www.circles.org/) and the University of Michigan Digital Library Teaching and Learning Project (http://www.umich.edu/~aaps/) aim to provide students with rich, interactive science materials. The Baltimore Learning Community (www.learn.umd.edu) collects and indexes multimedia materials for middle school social studies and science, the Perseus Project (www.perseus.tufts.edu) provides materials and tools for students and teachers of classics, the Museum Site Licensing Project (http://www.ahip.getty.edu/mesl/home.html) brings together seven museums and seven universities to share art resources, the Library of Congress National Digital Library Project includes a Learning Page devoted to supporting K-12 schools (http://lcweb2.loc.gov/ammem/ndlpedu/) and the Informedia Digital Library has also been applied in high school settings (Christel & Pendyala, 1996). Such resources will continue to drive both teacher-led and self-directed learning as more high quality materials are digitized and thoughtful links and pathfinders are created by students, teachers, and librarians. In such digital libraries, all participants are learners and teachers.

In addition to providing the content to enrich learning, librarians help patrons acquire information-seeking skills (traditionally known as bibliographic instruction) which have become more essential in the informated society (many school library media specialists and public librarians collaborate on information literacy courses). Digital libraries have the potential to support collaborative distance learning and to provide intermediation services to aid participants in shaping questions, finding relevant materials, and interpreting and using information. These intermediations will surely require new types of human support services augmented by computational tools. The new learning facilitators who work in such environments will themselves be learners who are part librarian, part teacher, and part debate moderator. Their roles will range from facilitating collaborative learning to assisting individuals configure the local area networks carried on their bodies.

IV. Technology Requisites

Digital libraries are dependent on and driven by several general purpose technologies such as computer hardware, high-speed networking, security, and interoperability.

Better computers are needed on both the library and user end. Today’s workstations serve thousands of users per hour but as more information is streamed (e.g., video, real-time collaborative experiences) rather than transferred as discrete files, more powerful machines and storage devices and new intermediary machines will be required. In addition to storing ever-increasing volumes of digital objects, libraries will also need additional computational resources to store billing and transaction log data. Thus, continued progress in digital libraries will benefit from faster, more powerful CPUs and cheaper, higher-density storage devices.

The trend toward new and multiple input and output devices (e.g., Jacob et. al., 1993) will also influence how digital libraries evolve and are used. Speech, gestural, and tactile input devices should allow users to more easily control access tools and library resources. Likewise, a richer array of output devices ranging from large, flat-panel displays to digital paper will open new possibilities for digital librarians to share collections more broadly and easily. Software tools for rapid prototyping and testing of interfaces will also help designers improve the quality of digital library interfaces.

High-speed, reliable, ubiquitous networking will allow libraries to become increasing digital. Research and development in network architectures, low-cost access in homes, and performance metrics will continue to determine how digital library access moves from privileged locations in campuses, businesses, and government offices to homes. Developments that allow library access through a mix of wired and wireless paths will enable this spread of access. Engineering research directed at seamless interoperation of networks ranging from personal body LANs to the WWW is a high priority for the technical community.

Most importantly, networked computational resources are becoming more mobile and special purpose. Users will no longer be strictly tethered to workstations to use digital information. This trend will allow libraries to provide new genre of information services. As networked computational resources are commonly built into buildings, automobiles, appliances, clothing, jewelry, and various prosthetics, libraries will be able to provide users with continuous information streams rather than only discrete information objects. Obviously, SDI services will take on entirely new meanings when information can be streamed continually to users wherever they are, whatever they are doing, and without interrupting whatever activity is underway. Users will be able to choose to receive the latest information from their information service unobtrusively via earpiece or eyepiece during a meeting. Weather, traffic conditions, or other public information may be continually delivered on special-purpose displays or speakers built into homes, offices, or vehicles. Proactive libraries will invent new ways to augment discrete collections and services with accretional collections and ongoing services.

Software developments that support rapid, reliable, and secure transfers support this developing infrastructure. However, new software that supports user search and library services with easy to use interfaces demand enormous software engineering efforts. As much as half the code in today’s programs is devoted to the user interface and the challenges of multiple input/output devices for a wide variety of distributed computational devices will require new paradigms for human-computer interaction and improvements in algorithms for information organization and search.

An important condition for continued development of digital libraries is seamless exchange across different digital libraries. This interoperability problem is addressed on two fronts. First, groups work to create standards for data storage and transmission, for query representation, and for vocabulary control. In this solution, digital libraries adopt standards and change content and services at the local level. The standards solution proceeds based upon shared interests but depends on agreement among vested interests and most often must follow long-term implementations adopted in the marketplace. The second approach is to allow individual digital libraries to be as innovative as necessary but to create public services that map local content and services to other digital libraries (as word processing programs read files created by other systems). The Z39.50 protocol exemplifies such an approach for mapping queries to different databases (Lynch, 1991). An extension of this approach is to publish an abstract view (an ontology) accepted by a federated community that may then be used to facilitate interoperability (Wiederhold, 1992). The Stanford Digital Library Project is addressing the interoperability problem with an architecture called the InfoBus (http://www-db.stanford.edu/~testbed/).

V. Culture

Libraries are keepers of culture. As such, they are subject to and reflect the social, political, and economic forces that shape their constituency. Likewise, libraries as institutions help to shape culture. Public and academic libraries promote values of scholarship and appreciation for culture but are subject to localized beliefs and motives--turf is often defined ideologically. Corporate libraries support the mission of the organization and vie for resources with other cost units such as data processing. Digital collections provide new challenges for the socio-political context. Public library policies must specify who may access collections, which collections are digitized, and what existing resources will be cut to support technology and digitization. Corporate and academic managers must grapple with integration of computing and library functions. Libraries will evolve in different ways in different cultures and over time, successful models for specific environments will be adopted more widely, however, digital technology will not lead immediately to standardized practices but rather to more diversity based on the socio-political forces of the constituent community. Whether the potential of information technology to promote cultural standardization will overshadow its potential for empowering individual expression is a long-term socio-political issue. See the report from the social aspects of digital libraries workshop for views on these issues (http://www.gslis.ucla.edu/DL/).

A. Economic Challenges.

In addition to the challenges of community-based context, two global and interdependent issues influence research and development in digital libraries: intellectual property rights, and information security and authority. Both issues are rooted in a culture that places economic value on scarcity.

Copyright exists to promote intellectual production by providing economic incentives. Security protects unauthorized access but also must deal with the more subtle problem of insuring the veracity and authority of digital information objects. The ease with which perfect and unlimited copies of digital products may be made causes many owners of intellectual property to avoid digital distribution. Avoidance is moot for the growing set of products created in and for digital technologies (e.g., software, games, virtual worlds, hypertexts), but owners of the existing base of books, photographs, films, sound recordings, and other intellectual property have begun cautious experiments in digitizing and repurposing their assets to develop new markets. For example, the CORE Project (Entlich, et. al., 1996; ) is a collaboration between the American Chemical Society and universities and the Tulip project a collaboration between Elsevier and universities (http://www1.elsevier.nl/homepage/about/resproj/tulip.htm), both aimed at exploring scientific journal licensing and delivery schemes. Additionally, the JSTOR project aims to create a new non-profit electronic publishing medium (http://www.mellon.org/jstor.html), and the Association for Computing Machinery has developed a digital library policy and site (http://www.acm.org/dl/).

There are two types of questions that shape digital library research and development. The first question is: What does it mean to use intellectual property? Current practice provides for specialized fair use in socially beneficial situations (e.g., education, scholarship). Do these fair uses apply to digital objects? Publishers have developed agreements about derivative works composed from individual pieces of intellectual property but the nature of derivative work may be different in the digital realm. For example, do people have to pay for the right to link to other work? To deal with such questions, there are efforts to change copyright laws to protect digital objects. Some publishers press for elimination of the right of resale that allows people who purchase a book or other object the right to resell it. The extreme interpretation of this approach is that every random access memory representation for an information object requires payment to the copyright holder--a type of payment for the potential of using a digital object. See Pamela Samuelson's column in Communications of the ACM for a series of thoughtful discussions of the issues related to intellectual property in electronic environments.

The second question deals with the technical problems of how to protect intellectual property against illegal use. There are many efforts to develop technical solutions that protect copyright either through copy protection or automatic billing mechanisms. Research on encryption algorithms, digital watermarking, and electronic commerce are leading to the development of trusted systems that protect intellectual property rights by managing the necessary financial transactions while protecting consumers by providing authoritative information securely (Stefik, 1997). Encryption has advanced beyond the point where most government agencies or individuals can monitor or decode personal communications. Techniques to include either visible or hidden watermarks on digital objects have also been developed and incorporated into commercial products. These techniques insure the veracity of an object and may help prevent copying and distribution in the open marketplace. Digital commerce is an active area of research with different approaches under testing. Systems such as CyberCash use a third party intermediary to mediate transfer of property and payment; systems such as Digicash issue digital money in the form of bit stream tokens that are exchanged for intellectual property and recirculate throughout the network; and systems such as Netbill use a prefunded account to enable intellectual property transfer. These developments are important for digital libraries that may want to offer minuscule priced (e.g., one cent) digital objects to huge volumes of customers. Traditional credit card purchase schemes (often used for high cost items or conference registration) are relatively costly to maintain (e.g., thirty cents plus 1.75% of the purchase price) and thus the digital schemes above will continue to be developed until one or a few marketplace winners are determined.

B. Communities of Practice.

Libraries are as much defined by people as by information resources. Libraries come into existence because people wish to preserve and share representations of heritage and wisdom. Libraries serve one or more communities with common interests and culture. In the past, the extent of communities was highly constrained by physical distance and the size or importance of the community. Thus, many kinds of libraries were geographically bound (e.g., local public libraries) and others (e.g., international repositories, large research libraries, corporate information centers) were limited to critical communities of practice (e.g., health, science, commerce). Digital technology facilitates specialized libraries that may serve very diverse and unique world-wide communities with very few or disadvantaged members who would not be able to support or travel to a physical library. This represents a fundamental shift in library services and human culture more generally. Decentralized, specialized, globally dispersed special interests can blossom though shared information and communication resources.

Digital libraries, like any social phenomenon are not only shaped by the social elements that motivate them, but will eventually influence those elements. It is too soon to tell how institutions will change as a result of the digital libraries they create. Will the global mission of the Library of Congress change as K-12 students (who have traditionally not been part of the service mission) are attracted by the National Digital Library Learning Page? Will the National Library of Medicine's decision to make Medline available free to the public affect its primary mission to serve the medical research and development communities? Will government agencies like the Bureau of Labor Statistics shift resources to serving the increasing numbers of requests from the public that result from making labor statistics available through a digital library? If so, where will the resources come from? Will the service missions and resource allocation formulae of government agencies evolve toward dissemination of information to the detriment of information gathering and creation? How will corporate digital libraries reshape the nature of business in different corporations? What new communities of practice will emerge to amplify and augment the information industry in general? Will increasingly interdependent information resources change the function and form of nation states?

The digital library research and development community is only beginning to address the organizational impacts of digital libraries. It is clear that although the genesis of digital libraries was digit centered, there is increasing attention given to digital libraries as people and organization centered entities that reflect communities of practice. It is sensible to expect that the long-term implications for global culture will be reflected in the evolution of libraries--keepers of culture.

A third of a century ago, Douglas Engelbart provided a vision of electronic technology for augmenting the human intellect. Just as libraries reflect and influence human culture, digital libraries will extend and augment the collective intellect. They aim to make knowledge more equitably and universally accessible and to link people together through their information needs. Research and development in digital libraries may have been initiated by technology, but it ultimately is in the service of extending and augmenting human interactions.

Acknowledgements: The author thanks Doug Oard and Tony Tse for helpful comments on earlier versions of this paper.

References

Ackerman, M. (1993). Answer Garden: A Tool for Growing Organizational Memory. . Doctoral Thesis, MIT.

Ackerman, M. & McDonald, D. (1996). Answer Garden 2: merging organizational memory with collaborative help. Proceedings of ACM Conference on Computer-Supported Collaborative Work (November 1996), 97-105.

Ahlberg, C., Williamson, C., & Shneiderman, B. (1993). Dynamic queries for information exploration: An implementation and evaluation. In B. Shneiderman (Ed.), Sparks of innovation in human-computer interaction. Norwood, NJ: Ablex. p. 281-294.

Bederson, B. & Hollan, J. (1994). Pad++: A zooming graphical interface for exploring alternative interface physics. Proceedings of UIST ‘94. Marina del Rey, CA, Nov 2-4. 17-26.

Chen, H., Ng, T., Martinez, J., & Schatz, B. A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the Worm Community System, Journal of the American Society for Information Science, 48(1), 17-31, 1997.

Christel, M.G., & Pendyala, K. Informedia Goes to School: Early Findings from the Digital Video Library Project. D-Lib

Magazine, September, 1996. http://www.dlib.org/dlib/september96/informedia/09christel.html

Croft, B., Cook, R. and Wilder, D., "Providing Government Information on the Internet: Experiences with THOMAS," in Proceedings of the Digital Libraries Conference DL'95, Austin, TX. June 10-12, 1995, pp. 19-24.

Deerwester, S., Dumais, T., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391---407.

Entlich, R., Garson, L., Lesk, M., Normore, L. Olsen, J. & Weibel, S. Testing a digital library: User response to the CORE Project, Library Hi Tech, 14(4), 99-118, 1996.

Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. and Yanker, P. (1995). Query by image and video content: The QBIC system. Computer, 28(9), 23-32.

Fox, E., Hix, D., Nowell, L., Brueni, D., Wake, W., Heath, L., & Rao, D. (1993). Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science, 44(5), 480-491.

Hearst, M., Interfaces for searching the web. Scientific American, 276:68-72 (1997).

Jacob, R., Leggett, J., Myers, B., & Pausch, R. (1993). Interaction styles and input/output devices. Behavior and Information Technology, 12(2), 69--79.

Jain, A. K. & Vailaya, A. (1996). Image retrieval using color and shape, Pattern Recognition, 29(8), 1233-1244.

Jain, A., Ratha, N. & Lakshmanan, S. (1997). Object detection using Gabor filters. Pattern Recognition, 30(2), 295-309.

Jamet, D. & Waysbord, H. (1993). History, philosophy, and ambitions of the Biblitheque de France. Representations, 42 (Spring). 74-79.

Korfhage, R. (1997). Information storage and retrieval. NY: John Wiley.

Lin, X. Map displays for information retrieval, Journal of the American Society for Information Science, 48(1), 40-54, 1997.

Lynch, C. (1991). The client-server model in information retrieval. In M. Dillon (Ed.), Interfaces for information retrieval and online systems. NY: Greenwood Press. pp 301-322.

Maes, P. (1994). Agents that reduce work and information overload. Communications of the ACM, 37(7), 31-40.

Marchionini, G., Information seeking in electronic environments, Cambridge University Press, NY, 1995.

Marchionini, G. & Mauer, H. The roles of digital libraries in teaching and learning, Comm ACM, 38(4), 67-75 (1995).

Marchionini, G., Nolet, V., Williams, H., Ding, W., Beale, J., Rose, A., Gordon, A., Enomoto, E., & Harbinson, L. (1997). Content+Connectivity => Community: Digital resources for a learning community. Proceedings of ACM Digital Libraries '97 (Philadelphia, PA, July 23-26, 1997). 212-220.

Oard, D., (1997). A Conceptual Framework for Text Filtering. UMUAI '97 (to appear). http://www.glue.umd.edu/~oard/research.html.

Plaisant, C., Marchionini, G., Bruns, T., Komlodi, A., & Campbell, L. (1997). Bringing treasures to the surface: Iterative design for the Library of Congress National Digital Library Program. Proceedings of ACM CHI '97 (Atlanta, March 22-27, 1997). NY: ACM Press, 518-525.

Resnick, P. (1997). Filtering information on the Internet. Scientific American, 276(3), 62-64.

Rosenfeld, A. & Smith, R.C. (1981). Thresholding using relaxation. IEEE Transactions on Pattern Analysis Machine Intelligence PAMI-3, 598-606.

Schatz, B., Mischo, W., Cole, T., Hardin, J., Bishop, A., & Chen, H. (1996). Federating diverse collections of scientific literature, Computer, May, 28-35.

Sheridan, P. & Ballerini, J. (1996). Experiments in multilingual information retrieval using the SPIDER system. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, (Zurich, Switzerland, 58-65.

Shneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 70-77.

Sim, D., & park, R. (1997). A two-stage algorithm for motion discontinuity-preserving optical flow estimation. Computer Vision and Image Understanding, 65(1), 19-37.

Smith, T. A digital library for geographically references materials, Computer, May, 54-60, 1996.

Stefik, M. (1997). Trusted systems. Scientific American, 276(3), 78-81.

Varian. H. (1997). Versioning information goods. Digital Information and Intellectual Property. (Harvard University Workshop, January 23-25, 1997).

Wactlar, H., Kanade, T., Smith, M., Stevens, S. (1996). Intelligent access to digital video: Informedia Project. Computer, May, 46-52.

Wiederhold, G. (1992). Mediation in the architecture of future information systems. IEEE Computer, March, 38-49.

Bibliography

Scientific American (1997). Special report. The Internet: Fulfilling the promise. 276(3), March., pp. 49-83.

Dailianas, A., Allen, R.B., & England, P. (1995). Comparison of automatic video segmentation algorithms. Proceedings of SPIE--Photonics East ‘95, Philadelphia, Nov., 1995.

Elliott, E. (1993). Watch, grab, arrange, see: Thinking with motion images via streams and collages. MSVS Thesis Document. MIT Media Lab: Cambridge, MA.

England, P., Allen, R.A., Dailianas, A., Sullivan, M., Bianchi, M., & Heybey, A. (1966) The video library toolkit: A system for indexing and browsing digital video libraries. Proceedings of SPIE Photonics West ‘96, San Jose, Jan. 1996.

Fox, E. & Lunin, L. Perspectives on digital libraries: Introduction and overview, Journal of the American Society for Information Science, 44(8), 441-445, 1993.

Fox, E.A., Akscyn, R., Furuta, R., & Leggett, J. (1995). Digital libraries: Introduction. Communications of the ACM, 38(4), 22-28.

Hearst, M. TextTiling: Segmenting text into multi-paragraph subtopic passages, Computational Linguistics, 23(1), 33-64, 1997.

Lesk, M. Practical digital libraries: Books, bytes, and bucks. Morgan Kaufmann.

Otsuji, K., Tonomura, Y., and Ohba, Y. (1991). Video browsing using brightness data. SPIE Visual communications and Image Processing 91: Image Processing, 1606, 980-989.

Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Reading, MA: Addison-Wesley.

Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

Samuelson, P., & Glushko, R. J. (1991). Intellectual property rights for digital library and hypertext publishing systems: An analysis of Xanadu. Proceedings of Hypertext '91 (San Antonio, December 15-18, 1991), pp. 39---50.

Teodosio, L. & Bender,. W. (1993). Salient video stills: Content and context preserved. Proceedings of ACM Multimedia 93 (Anaheim, CA, Aug. 1-6, 1993), NY: ACM Press, p. 39-46.