The VidArch Project

The VidArch project built on earlier work with digital video files and their surrogates, seeking ways in which to preserve a video work's context and highlighting its essence, thus making it more understandable and accessible to future generations. This project focused on developing a preservation framework for digital video context by applying it to two important digital video collections: the complete series of NASA broadcast educational videos and the complete set of juried ACM SIGCHI videos presented at annual conferences from 1983 to the present.

The project addressed the important context aspect of digital preservation on both theoretical and practical fronts, which should improve archival decision-making and finding-aid creation and suggest ways to leverage technology further to make them more efficient and effective.

Some of the project's focus was on the US Presidential Election of 2008. The list of actively collected YouTube queries is located here and the list of Blogosphere queries is here. This collection has since ended.

Project Participants

Project Papers and Reports

Gary Marchionini, Helen Tibbo, Cal A. Lee, Paul Jones, Robert Capra, Gary Geisler, Terrell Russell, Laura Sheble*, Sarah Jorda, Yaxiao Song, Dawne E. Howard, Rachael Clemens, Brenn Hill (2009). VidArch: Preserving Video Objects and Context Final Report. (Project Report, 6.7MB pdf)
Chirag Shah (2009). What Do You Look Like on YouTube? Politics Magazine. July 2009. [Online]
Chirag Shah (2009). Supporting Research Data Collection from YouTube with TubeKit. In the Proceedings of YouTube and the 2008 Election Cycle in the United States. April 16-17, 2009. Amherst, MA. (3rd prize)
Gary Marchionini, Chirag Shah, Christopher A. Lee, Robert Capra (2009). Query Parameters for Harvesting Digital Video and Associated Contextual Information. In the Proceedings of JCDL 2009. June 15-19, 2009. Austin, Texas.
Christopher A. Lee, Richard Marciano, Chien-yi Hou, Chirag Shah (2009). From harvesting to cultivating: transformation of a Web collecting system into a robust curation environment. Poster in the Proceedings of JCDL 2009. June 15-19, 2009. Austin, Texas.
Chirag Shah (2009). ContextMiner - explore globally, aggregate locally. Demo in the Proceedings of JCDL 2009. June 15-19, 2009. Austin, Texas.
Chirag Shah (2009). ContextMiner - Collect Different. Demo at DigCCurr 2009. April 1-3, 2009. Chapel Hill, NC.
Chirag Shah (2009). ContextMiner: Explore Globally, Aggregate Locally. In IEEE Computer. March 2009.
Chirag Shah (2009). Mining Contextual Information for Ephemeral Digital Video Preservation. International Journal of Digital Curation, 4(1) (2009). [PDF]
Christopher A. Lee, Chirag Shah, and Terrell Russell (2008). ContextMiner: A toolkit for creating, managing, and monitoring web collection camapaigns. Demo at the 4th International Digital Curation Conference. December 2008. Edinburgh, Scotland.
Chirag Shah (2008). YouTube Crawling: A VidArch Year in Retrospect. (Project Report, 352KB pdf)
Robert Capra, Christopher A. Lee, Gary Marchionini, Terrell Russell, Chirag Shah, and Fred Stutzman (2008). Selection and Context Scoping for Digital Video Collections: An Investigation of YouTube and Blogs. JCDL 2008.
Chirag Shah and Gary Marchionini (2008). Hunting for Hip, Hipsters, and Happenings on YouTube. ASIST 2008.
Chirag Shah (2008). TubeKit - A Query-based YouTube Crawling Toolkit. Demo appeared at JCDL 2008.
Gary Marchionini, Helen Tibbo, Chirag Shah, Christopher A. Lee (2007). Telling the Whole Story: Selecting and Collecting Web-Based Videos for Archival Collections. Poster in the proceedings of Digital Curation Conference (DCC). Washington DC, USA. December 11-13, 2007.
Chirag Shah and Gary Marchionini (2007). Capturing Relevant Information for Digital Curation. JCDL 2007 Conference Poster. In Proceedings of the 2007 Conference on Digital Libraries (Vancouver, BC, Canada, June 18 - 23, 2007). JCDL '07. ACM Press, New York, NY, 496-496. (Poster, 118KB pdf)
Chirag Shah and Gary Marchionini (2007). ContextMiner: A Tool for Digital Library Curators. JCDL 2007 Conference Demo. In Proceedings of the 2007 Conference on Digital Libraries (Vancouver, BC, Canada, June 18 - 23, 2007). JCDL '07. ACM Press, New York, NY, 514-514. (Demo, 512KB pdf)
Chirag Shah and Gary Marchionini (2007). Preserving 2008 US Presidential Election Videos. Paper at the 7th International Workshop on Web Archiving and Digital Preservation (IWAW'07). (Paper, 214KB pdf)
Chirag Shah and Gary Marchionini (2007). DiscoverInfo: A Tool for Discovering Information with Relevance and Novelty. Demo to appear in SIGIR 2007.
Helen R. Tibbo, Christopher A. Lee, Gary Marchionini, Dawne Howard (2006). VidArch: Preserving Meaning of Digital Video over Time through Creating and Capture of Contextual Documentation. IS&T Archiving 2006. (Paper, 360KB pdf)
Helen R. Tibbo (2006). Preserving Video Objects and Context: A Demonstration Project. IS&T Archiving 2006. (Slides, 1.3MB ppt)
Christopher A. Lee, Helen R. Tibbo, Dawne Howard, Yaxiao Song, Terrell Russell (2006). Keeping the Context: An Investigation in Preserving Collections of Digital Video. IEEE ACM Joint Conference on Digital Libraries (JCDL 2006). (Paper, 136KB pdf)
Helen R. Tibbo (2006). Preserving Video Objects and Context: A Demonstration Project. IEEE ACM Joint Conference on Digital Libraries (JCDL 2006). (Slides, 1.0MB ppt)
Finding Aid - Videos from Conference Proceedings, Association for Computing Machinery (ACM), 1983-2003
Finding Aid - NASA K-16 Science Education Programs Videos, 1998-2005

Demos

ContextMiner - Demo

ContextMiner is a simple and intuitive interface for a digital library curator. It is meant to help the curator in collecting metadata and contextual information for a digital object to be preserved.
DiscoverInfo - Demo

DiscoverInfo is a tool to explore a collection of documents using searching with a typical search-engine-like interface, browsing with term-clouds, and discovering new information with the help of novelty visualization for documents.
DIToolkit - Demo

Using DIToolkit, one can automate the creation of interfaces such as the one shown in the DiscoverInfo demo. DIToolkit enables one to point to a website, get a crawl of it, index the documents (text, html, pdf), and provide searching and browsing capabilities that include relevance ranking and a novelty grid.
TubeKit - Demo

TubeKit is a toolkit for creating YouTube crawlers. It allows one to build one's own crawler that can crawl YouTube based on a set of seed queries and collect up to 24 different attributes. TubeKit assists in all the phases of this process starting with database creation to finally giving access to the collected data via browsing and searching interfaces.