JCDL 2006: Opening Information Horizons
Metadata Tools for Digital Resource Repositories Workshop
June 15, 2006, Chapel Hill, NC, USA
A Tiny Retrieval Protocol: THUMP and Kernel Metadata
- John Kunze,
Preservation Technologist for the California Digital Library
- John Kunze is a preservation technologist for the California Digital Library
and has a background in computer science and mathematics. His current work
focuses on archiving websites, creating long-term durable digital references
(ARKs) to information objects, and specifying lightweight (kernel) metadata.
Prior work includes major contributions to the standardization of URLs, Dublin
Core metadata, and the Z39.50 search and retrieval protocol. In an earlier life
he designed, wrote, and ran UC Berkeley's first campus-wide information system,
which was an early rival and client of the World Wide Web. Before that he was a
BSD Unix hacker whose work survives in today's Linux and Apple systems.
- Kevin A. Gamiel,
Research Programmer, Renaissance Computing Institute
- Kevin Gamiel is a research software developer with the Renaissance
Computing Institute (RENCI), a collaborative venture of Duke University, North
Carolina State University, the University of North Carolina at Chapel Hill and
the state of North Carolina. Current work involves an NIH-funded
multidisciplinary approach to exploratory genetic analysis, he leads the NC and
TeraGrid Bioportal projects, performance analysis of high performance codes on
hundreds of compute nodes, and a number of other RENCI efforts. Past work
includes contribution to the Dublin Core metadata effort and the Z39.50
standard, Kevin was co-chair of the Networked Information Retrieval (NIR) and
Integration of Internet Information Resources (IIIR) IETF working groups.
- Abstract
-
Web information retrieval designs cycle naturally between periods of
expanding functionality and contracting complexity. This talk presents a
contraction-phase design that tries to retain the best features of modern
retrieval designs while being very easy to implement. Leveraging existing
search systems, it calls for an extra external interface but otherwise
requires no internal system changes.
The new interface is specified by THUMP -- The HTTP URL Mapping Protocol
-- a very lightweight protocol that can be used for focused, known-item
retrievals and broad search engine queries. To keep implementation
barriers low, the interface can be thoroughly tested with ubiquitous tools
such as web browsers and the telnet remote login software. The talk will
address implementation experiences in a scientific computing context.
THUMP returns information in the form of an Electronic Resource Citation
(ERC), a simple, compact, and printable record designed to hold data
associated with information objects. By design, the ERC is a metadata
format that balances the needs for expressive power, very simple machine
processing, and direct human manipulation. The ERC uses a "kernel" subset
of four required metadata elements defined by a working group of the
Dublin Core Metadata Initiative.
- Slides
-
Presentation by Kunze
-
Presentation by Gamiel