idl.tmt.documentsource.webcrawl
Interface WebCrawlContext

All Known Implementing Classes:
WgetWebCrawlContext

public interface WebCrawlContext

This class provides an interface to the data associated with the web crawl that was executed in order to mirror a web site. Created on Jan 23, 2004

Author:
jelsas

Method Summary
 DocumentIDMapper getDocumentIDMapper()
          Retrieves the DocumentIDMapper object associated with this web crawl.
 FilesystemDocumentProvider getDocumentProvider()
          Provides the document provider which will traverse the local filesystem and retrieve all the documents mirrored with this web crawl.
 java.net.URL[] getRemoteCrawlRoots()
          Retrieves the URLs used as the remote root of the crawl.
 URLMapper getURLMapper()
          Retrieves the URLMapper object associated with this web crawl.
 

Method Detail

getURLMapper

public URLMapper getURLMapper()
Retrieves the URLMapper object associated with this web crawl.
Returns:
the URLMapper object.

getDocumentIDMapper

public DocumentIDMapper getDocumentIDMapper()
Retrieves the DocumentIDMapper object associated with this web crawl.
Returns:
the DocumentIDMapper object.

getRemoteCrawlRoots

public java.net.URL[] getRemoteCrawlRoots()
Retrieves the URLs used as the remote root of the crawl.
Returns:
An array of URLs which were used as roots for this web crawl.

getDocumentProvider

public FilesystemDocumentProvider getDocumentProvider()
Provides the document provider which will traverse the local filesystem and retrieve all the documents mirrored with this web crawl.
Returns:
the document provider.