idl.tmt.documentsource.webcrawl
Class MultiMapDocumentIDMap

java.lang.Object
  |
  +--idl.tmt.documentsource.webcrawl.MultiMapDocumentIDMap
All Implemented Interfaces:
DocumentIDMapper, java.io.Serializable

public class MultiMapDocumentIDMap
extends java.lang.Object
implements DocumentIDMapper

Provides the mapping between documents (local Files) and document ID's. Also responsible for resolving document references from within a document being parsed. The document references supported by this class are relative URLs (relative file paths, full file paths, or full URLs). Created on Mar 18, 2004

Author:
jelsas
See Also:
Serialized Form

Field Summary
private static int DOCUMENT_ID_UNKNOWN
           
private  java.util.HashMap idToLocalFile
           
private  java.util.HashMap localFileToID
           
private  URLMapper urlMapper
           
 
Constructor Summary
MultiMapDocumentIDMap(java.util.HashMap idToLocalFile, java.util.HashMap localFileToID, URLMapper urlMapper)
          Creates a new MultiMapDocumentIDMap.
 
Method Summary
 void applyFilter(TmtMatrix filterMatrix)
          Filters this documentIDMapper based on the given matrix.
 int getDocID(java.io.File localDoc)
          Returns the Document ID referred to by the File of the local document provided.
 java.io.File getLocalDoc(int docID)
          Returns the local URL of the document corresponding to the passed in Document ID.
 int resolveDocumentReference(int fromDocID, java.lang.String documentReference)
          Resolves the String document reference with relation to the document referred to by the fromDocID.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 

Field Detail

urlMapper

private URLMapper urlMapper

idToLocalFile

private java.util.HashMap idToLocalFile

localFileToID

private java.util.HashMap localFileToID

DOCUMENT_ID_UNKNOWN

private static final int DOCUMENT_ID_UNKNOWN
Constructor Detail

MultiMapDocumentIDMap

public MultiMapDocumentIDMap(java.util.HashMap idToLocalFile,
                             java.util.HashMap localFileToID,
                             URLMapper urlMapper)
Creates a new MultiMapDocumentIDMap.
Parameters:
idToLocalURL - HashMap containing Integers as keys and URLs as values. The URLs should be local "file://" URLs.
localURLToID - HashMap containing reverse mapping as idToLocalURL mapping.
urlMapper - URLMapper helper object to aid in document reference resolution.
Method Detail

resolveDocumentReference

public int resolveDocumentReference(int fromDocID,
                                    java.lang.String documentReference)
Resolves the String document reference with relation to the document referred to by the fromDocID.
Specified by:
resolveDocumentReference in interface DocumentIDMapper
See Also:
DocumentIDMapper.resolveDocumentReference(int, java.lang.String)

getDocID

public int getDocID(java.io.File localDoc)
Returns the Document ID referred to by the File of the local document provided. If the File has not been mapped to an ID, -1 is returned. This can happen if the document referred to was not downloaded in a web crawl.
Specified by:
getDocID in interface DocumentIDMapper
See Also:
DocumentIDMapper.getDocID(java.io.File)

getLocalDoc

public java.io.File getLocalDoc(int docID)
Returns the local URL of the document corresponding to the passed in Document ID.
Specified by:
getLocalDoc in interface DocumentIDMapper
See Also:
idl.tmt.documentsource.DocumentIDMapper#getDocURL(int)

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

applyFilter

public void applyFilter(TmtMatrix filterMatrix)
                 throws BadDimensionException
Filters this documentIDMapper based on the given matrix. The matrix must be an M x N matrix where N is the number of documents existing in the DocIDMapping, M is the number of documents in the filtered DocIDMapping, and M <= N. The columns of this matrix represent the existing document ID, and the rows represent the new document ID. The matrix is non-zero where the old doc ID is mapped to the new doc ID.
Specified by:
applyFilter in interface DocumentIDMapper
See Also:
DocumentIDMapper.applyFilter(idl.tmt.representation.matrix.TmtMatrix)