idl.tmt.representation
Class BodyTextRepresentationBuilder

java.lang.Object
  |
  +--idl.tmt.representation.BagOfWordsRepresentationBuilder
        |
        +--idl.tmt.representation.BodyTextRepresentationBuilder
All Implemented Interfaces:
HTMLParsingListener, ParsingListener, RepresentationBuilder, WordParsingListener

public class BodyTextRepresentationBuilder
extends BagOfWordsRepresentationBuilder
implements WordParsingListener, HTMLParsingListener

Representation builder which creates a document representation containing the body text of a document. Created on Apr 7, 2004

Author:
jelsas

Field Summary
private  int currentDocID
           
private  boolean inBody
           
 
Fields inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
binarize, debug, myMatrix, numDocs, rep, shareTermlist, termList, textParser, weight
 
Fields inherited from interface idl.tmt.documentparsing.WordParsingListener
ALLOWWORD_LIST, DELIMITER, STOPWORD_LIST
 
Constructor Summary
BodyTextRepresentationBuilder()
           
BodyTextRepresentationBuilder(boolean binarize, TermList termList)
           
BodyTextRepresentationBuilder(int numDocs)
           
BodyTextRepresentationBuilder(int numDocs, TermList termList)
           
 
Method Summary
 void documentCollectionComplete()
          Indicates the completion of the document collection, and builds the matrix representation.
 void documentComplete()
          Indicates the completion of a document.
 void endTag(javax.swing.text.html.HTML.Tag tag, int pos)
          Indicates the end of an HTML tag.
 void newDocument(int docID)
          Indicates the start of a new document.
 void startTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet atts, int pos)
          Indicates the start of an HTML tag.
 void word(java.lang.String word, int pos)
          Indicates a word is encountered.
 
Methods inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder
addTermToDocRepresentation, buildRepresentation, cleanup, getRepresentation, getTermList, getWeight, isBinarize, isDebug, isShareTermlist, setBinarize, setDebug, setNumDocuments, setShareTermlist, setTermList, setTextParser, setWeight, toString
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 

Field Detail

currentDocID

private int currentDocID

inBody

private boolean inBody
Constructor Detail

BodyTextRepresentationBuilder

public BodyTextRepresentationBuilder(int numDocs)

BodyTextRepresentationBuilder

public BodyTextRepresentationBuilder(int numDocs,
                                     TermList termList)

BodyTextRepresentationBuilder

public BodyTextRepresentationBuilder(boolean binarize,
                                     TermList termList)

BodyTextRepresentationBuilder

public BodyTextRepresentationBuilder()
Method Detail

word

public void word(java.lang.String word,
                 int pos)
Indicates a word is encountered.
Specified by:
word in interface WordParsingListener
See Also:
WordParsingListener.word(java.lang.String, int)

newDocument

public void newDocument(int docID)
Indicates the start of a new document.
Specified by:
newDocument in interface ParsingListener
See Also:
ParsingListener.newDocument(int)

documentComplete

public void documentComplete()
Indicates the completion of a document.
Specified by:
documentComplete in interface ParsingListener
See Also:
ParsingListener.documentComplete()

documentCollectionComplete

public void documentCollectionComplete()
Indicates the completion of the document collection, and builds the matrix representation.
Specified by:
documentCollectionComplete in interface ParsingListener
See Also:
ParsingListener.documentCollectionComplete()

startTag

public void startTag(javax.swing.text.html.HTML.Tag tag,
                     javax.swing.text.MutableAttributeSet atts,
                     int pos)
Indicates the start of an HTML tag.
Specified by:
startTag in interface HTMLParsingListener
See Also:
HTMLParsingListener.startTag(javax.swing.text.html.HTML.Tag, javax.swing.text.MutableAttributeSet, int)

endTag

public void endTag(javax.swing.text.html.HTML.Tag tag,
                   int pos)
Indicates the end of an HTML tag.
Specified by:
endTag in interface HTMLParsingListener
See Also:
HTMLParsingListener.endTag(javax.swing.text.html.HTML.Tag, int)