idl.tmt.representation
Class LinkTextRepresentationBuilder

java.lang.Object
  |
  +--idl.tmt.representation.BagOfWordsRepresentationBuilder
        |
        +--idl.tmt.representation.LinkTextRepresentationBuilder
All Implemented Interfaces:
HypertextParsingListener, ParsingListener, RepresentationBuilder, WordParsingListener

public class LinkTextRepresentationBuilder
extends BagOfWordsRepresentationBuilder
implements WordParsingListener, HypertextParsingListener

Created on Mar 22, 2004

Author:
jelsas

Field Summary
private  boolean inLink
           
private  int linkDocID
           
 
Fields inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
binarize, debug, myMatrix, numDocs, rep, shareTermlist, termList, textParser, weight
 
Fields inherited from interface idl.tmt.documentparsing.WordParsingListener
ALLOWWORD_LIST, DELIMITER, STOPWORD_LIST
 
Constructor Summary
LinkTextRepresentationBuilder()
           
LinkTextRepresentationBuilder(boolean binarize, TermList termList)
           
LinkTextRepresentationBuilder(int numDocs)
          Creates a new LinkTextRepresentationBuilder.
LinkTextRepresentationBuilder(int numDocs, TermList termList, boolean binarize)
          Creates a LinkTextRepresentationBuilder with the specified term list.
 
Method Summary
 void documentCollectionComplete()
          Indicates that parsing of the collection is done, and this object builds the representation matrix.
 void documentComplete()
          Ignored in this representation builder because we're just interested in the documents linked to, not the current document.
 void endLink()
          Indicates that the end of a link has been reached.
 void newDocument(int docID)
          Ignored in this representation builder because we're just interested in the documents linked to, not the current document.
 void startLink(int linkDocID)
          Indicates that a link has been started.
 void word(java.lang.String word, int pos)
          Indicates that a new word has been encountered.
 
Methods inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
addTermToDocRepresentation, buildRepresentation, cleanup, getRepresentation, getTermList, getWeight, isBinarize, isDebug, isShareTermlist, setBinarize, setDebug, setNumDocuments, setShareTermlist, setTermList, setTextParser, setWeight, toString
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 

Field Detail

inLink

private boolean inLink

linkDocID

private int linkDocID
Constructor Detail

LinkTextRepresentationBuilder

public LinkTextRepresentationBuilder(int numDocs)
Creates a new LinkTextRepresentationBuilder. This class builds document representation based on the link text from documents that link to them.

LinkTextRepresentationBuilder

public LinkTextRepresentationBuilder()

LinkTextRepresentationBuilder

public LinkTextRepresentationBuilder(int numDocs,
                                     TermList termList,
                                     boolean binarize)
Creates a LinkTextRepresentationBuilder with the specified term list.
Parameters:
termList - shared term list to use
numDocs - number of documents in this collection
binarize - indicates whether the representation should be binary instead of term counts

LinkTextRepresentationBuilder

public LinkTextRepresentationBuilder(boolean binarize,
                                     TermList termList)
Method Detail

word

public void word(java.lang.String word,
                 int pos)
Indicates that a new word has been encountered. Only words within links are stored.
Specified by:
word in interface WordParsingListener
See Also:
WordParsingListener.word(java.lang.String, int)

startLink

public void startLink(int linkDocID)
Indicates that a link has been started.
Specified by:
startLink in interface HypertextParsingListener
See Also:
HypertextParsingListener.startLink(int)

endLink

public void endLink()
Indicates that the end of a link has been reached.
Specified by:
endLink in interface HypertextParsingListener
See Also:
HypertextParsingListener.endLink()

newDocument

public void newDocument(int docID)
Ignored in this representation builder because we're just interested in the documents linked to, not the current document.
Specified by:
newDocument in interface ParsingListener
See Also:
ParsingListener.newDocument(int)

documentComplete

public void documentComplete()
Ignored in this representation builder because we're just interested in the documents linked to, not the current document.
Specified by:
documentComplete in interface ParsingListener
See Also:
ParsingListener.documentComplete()

documentCollectionComplete

public void documentCollectionComplete()
Indicates that parsing of the collection is done, and this object builds the representation matrix.
Specified by:
documentCollectionComplete in interface ParsingListener
See Also:
ParsingListener.documentCollectionComplete()