idl.tmt.representation
Class TitleTextRepresentationBuilder

java.lang.Object
  |
  +--idl.tmt.representation.BagOfWordsRepresentationBuilder
        |
        +--idl.tmt.representation.TitleTextRepresentationBuilder
All Implemented Interfaces:
HTMLParsingListener, ParsingListener, RepresentationBuilder, WordParsingListener

public class TitleTextRepresentationBuilder
extends BagOfWordsRepresentationBuilder
implements WordParsingListener, HTMLParsingListener

Created on Mar 19, 2004

Author:
jelsas

Field Summary
private  int currentDocID
           
private  boolean inTitle
           
 
Fields inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
binarize, debug, myMatrix, numDocs, rep, shareTermlist, termList, textParser, weight
 
Fields inherited from interface idl.tmt.documentparsing.WordParsingListener
ALLOWWORD_LIST, DELIMITER, STOPWORD_LIST
 
Constructor Summary
TitleTextRepresentationBuilder()
           
TitleTextRepresentationBuilder(boolean binarize, TermList termList)
           
TitleTextRepresentationBuilder(int numDocs, TermList termList)
          Creates a new TitleTextRepresentationBuilder with the specified term list.
TitleTextRepresentationBuilder(TermList termList)
           
 
Method Summary
 void documentCollectionComplete()
          Builds the document representation matrix
 void documentComplete()
          Indicates that the parsing of the current document has completed.
 void endTag(javax.swing.text.html.HTML.Tag tag, int pos)
          Indicates that an HTML end tag has been reached
 void newDocument(int docID)
          Indicates that a new document parsing has begun.
 void startTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet atts, int pos)
          Indicates that a new HTML start tag has been entered.
 void word(java.lang.String word, int pos)
          Adds the word to the current document's representation if we are in the process of parsing the title of a document.
 
Methods inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
addTermToDocRepresentation, buildRepresentation, cleanup, getRepresentation, getTermList, getWeight, isBinarize, isDebug, isShareTermlist, setBinarize, setDebug, setNumDocuments, setShareTermlist, setTermList, setTextParser, setWeight, toString
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 

Field Detail

inTitle

private boolean inTitle

currentDocID

private int currentDocID
Constructor Detail

TitleTextRepresentationBuilder

public TitleTextRepresentationBuilder(int numDocs,
                                      TermList termList)
Creates a new TitleTextRepresentationBuilder with the specified term list. This class can build a matrix document representation from the title text of HTML documents.

TitleTextRepresentationBuilder

public TitleTextRepresentationBuilder(TermList termList)

TitleTextRepresentationBuilder

public TitleTextRepresentationBuilder(boolean binarize,
                                      TermList termList)

TitleTextRepresentationBuilder

public TitleTextRepresentationBuilder()
Method Detail

word

public void word(java.lang.String word,
                 int pos)
Adds the word to the current document's representation if we are in the process of parsing the title of a document.
Specified by:
word in interface WordParsingListener
Following copied from interface: idl.tmt.documentparsing.WordParsingListener
Parameters:
word - The word encountered
pos - The character position of the word in the document

startTag

public void startTag(javax.swing.text.html.HTML.Tag tag,
                     javax.swing.text.MutableAttributeSet atts,
                     int pos)
Description copied from interface: HTMLParsingListener
Indicates that a new HTML start tag has been entered.
Specified by:
startTag in interface HTMLParsingListener
Following copied from interface: idl.tmt.documentparsing.HTMLParsingListener
Parameters:
tag - The tag
atts - The tag's attributes
pos - The character position of this tag in the document

endTag

public void endTag(javax.swing.text.html.HTML.Tag tag,
                   int pos)
Description copied from interface: HTMLParsingListener
Indicates that an HTML end tag has been reached
Specified by:
endTag in interface HTMLParsingListener
Following copied from interface: idl.tmt.documentparsing.HTMLParsingListener
Parameters:
tag - The tag
pos - The character position of this tag in the document

newDocument

public void newDocument(int docID)
Description copied from interface: ParsingListener
Indicates that a new document parsing has begun. The invocation of this method implies that parsing has completed on the current document.
Specified by:
newDocument in interface ParsingListener
Following copied from interface: idl.tmt.documentparsing.ParsingListener
Parameters:
docID - the numeric ID of the new document to be parsed

documentComplete

public void documentComplete()
Description copied from interface: ParsingListener
Indicates that the parsing of the current document has completed.
Specified by:
documentComplete in interface ParsingListener

documentCollectionComplete

public void documentCollectionComplete()
Builds the document representation matrix
Specified by:
documentCollectionComplete in interface ParsingListener
See Also:
ParsingListener.documentCollectionComplete()