idl.tmt.documentsource.webcrawl
Class MultiMapDocumentIDMap
java.lang.Object
|
+--idl.tmt.documentsource.webcrawl.MultiMapDocumentIDMap
- All Implemented Interfaces:
- DocumentIDMapper, java.io.Serializable
- public class MultiMapDocumentIDMap
- extends java.lang.Object
- implements DocumentIDMapper
Provides the mapping between documents (local Files) and document
ID's. Also responsible for resolving document references from within
a document being parsed. The document references supported by
this class are relative URLs (relative file paths, full file paths,
or full URLs).
Created on Mar 18, 2004
- Author:
- jelsas
- See Also:
- Serialized Form
|
Constructor Summary |
MultiMapDocumentIDMap(java.util.HashMap idToLocalFile,
java.util.HashMap localFileToID,
URLMapper urlMapper)
Creates a new MultiMapDocumentIDMap. |
|
Method Summary |
void |
applyFilter(TmtMatrix filterMatrix)
Filters this documentIDMapper based on the given matrix. |
int |
getDocID(java.io.File localDoc)
Returns the Document ID referred to by the File of the local
document provided. |
java.io.File |
getLocalDoc(int docID)
Returns the local URL of the document corresponding to
the passed in Document ID. |
int |
resolveDocumentReference(int fromDocID,
java.lang.String documentReference)
Resolves the String document reference with relation to the document
referred to by the fromDocID. |
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
urlMapper
private URLMapper urlMapper
idToLocalFile
private java.util.HashMap idToLocalFile
localFileToID
private java.util.HashMap localFileToID
DOCUMENT_ID_UNKNOWN
private static final int DOCUMENT_ID_UNKNOWN
MultiMapDocumentIDMap
public MultiMapDocumentIDMap(java.util.HashMap idToLocalFile,
java.util.HashMap localFileToID,
URLMapper urlMapper)
- Creates a new MultiMapDocumentIDMap.
- Parameters:
idToLocalURL - HashMap containing Integers as keys and URLs as values.
The URLs should be local "file://" URLs.localURLToID - HashMap containing reverse mapping as idToLocalURL
mapping.urlMapper - URLMapper helper object to aid in document reference
resolution.
resolveDocumentReference
public int resolveDocumentReference(int fromDocID,
java.lang.String documentReference)
- Resolves the String document reference with relation to the document
referred to by the fromDocID.
- Specified by:
resolveDocumentReference in interface DocumentIDMapper
- See Also:
DocumentIDMapper.resolveDocumentReference(int, java.lang.String)
getDocID
public int getDocID(java.io.File localDoc)
- Returns the Document ID referred to by the File of the local
document provided. If the File has not been mapped to an ID,
-1 is returned. This can happen if the document referred to
was not downloaded in a web crawl.
- Specified by:
getDocID in interface DocumentIDMapper
- See Also:
DocumentIDMapper.getDocID(java.io.File)
getLocalDoc
public java.io.File getLocalDoc(int docID)
- Returns the local URL of the document corresponding to
the passed in Document ID.
- Specified by:
getLocalDoc in interface DocumentIDMapper
- See Also:
idl.tmt.documentsource.DocumentIDMapper#getDocURL(int)
toString
public java.lang.String toString()
- Overrides:
toString in class java.lang.Object
applyFilter
public void applyFilter(TmtMatrix filterMatrix)
throws BadDimensionException
- Filters this documentIDMapper based on the given matrix. The matrix
must be an M x N matrix where N is the number of documents existing
in the DocIDMapping, M is the number of documents in the filtered
DocIDMapping, and M <= N. The columns of this matrix represent
the existing document ID, and the rows represent the new document
ID. The matrix is non-zero where the old doc ID is mapped to the new
doc ID.
- Specified by:
applyFilter in interface DocumentIDMapper
- See Also:
DocumentIDMapper.applyFilter(idl.tmt.representation.matrix.TmtMatrix)