|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--idl.tmt.documentsource.webcrawl.WgetWebCrawlContext
This class provides the context for a web crawl done using wget. The wget crawl must have been invoked with the options '-nv -o [logfilename]' to produce the properly formatted log file. See the WGET manual for detailed documentation of those and other options. Created on Feb 27, 2004
| Field Summary | |
private static java.lang.String |
DEFAULT_INDEX
|
private int |
docCount
|
private MultiMapDocumentIDMap |
docIDMap
|
private java.util.Iterator |
localDocIterator
|
private java.io.File |
localRootDir
|
private MultiMapURLMapper |
urlMapper
|
| Constructor Summary | |
WgetWebCrawlContext(java.io.File wgetLogFile,
java.io.File localRootDir)
Creates a new WgetWebCrawlContext object. |
|
| Method Summary | |
int |
documentCount()
Returns a count of documents |
DocumentIDMapper |
getDocumentIDMapper()
Returns the DocumentIDMapper for this web crawl |
FilesystemDocumentProvider |
getDocumentProvider()
Returns a reference to the FilesystemDocumentProvider |
java.io.File |
getNextDocument()
Returns the next document URL |
java.net.URL[] |
getRemoteCrawlRoots()
This method is unsupported for this implementation, and returns null. |
java.io.File |
getRoot()
Returns the local root directory where the mirrored documents are located. |
URLMapper |
getURLMapper()
returns the URL Mapper object |
boolean |
hasMoreDocuments()
Checks if there are more documents to be returned |
| Methods inherited from class java.lang.Object |
|
| Field Detail |
private java.io.File localRootDir
private MultiMapURLMapper urlMapper
private MultiMapDocumentIDMap docIDMap
private java.util.Iterator localDocIterator
private int docCount
private static final java.lang.String DEFAULT_INDEX
| Constructor Detail |
public WgetWebCrawlContext(java.io.File wgetLogFile,
java.io.File localRootDir)
throws java.io.IOException
| Method Detail |
public URLMapper getURLMapper()
getURLMapper in interface WebCrawlContextWebCrawlContext.getURLMapper()public java.net.URL[] getRemoteCrawlRoots()
getRemoteCrawlRoots in interface WebCrawlContextWebCrawlContext.getRemoteCrawlRoots()public FilesystemDocumentProvider getDocumentProvider()
getDocumentProvider in interface WebCrawlContextWebCrawlContext.getDocumentProvider()public int documentCount()
documentCount in interface FilesystemDocumentProviderFilesystemDocumentProvider.documentCount()public java.io.File getRoot()
getRoot in interface FilesystemDocumentProviderFilesystemDocumentProvider.getRoot()public java.io.File getNextDocument()
getNextDocument in interface DocumentProviderDocumentProvider.getNextDocument()public boolean hasMoreDocuments()
hasMoreDocuments in interface DocumentProviderDocumentProvider.hasMoreDocuments()public DocumentIDMapper getDocumentIDMapper()
getDocumentIDMapper in interface WebCrawlContextWebCrawlContext.getDocumentIDMapper()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||