idl.tmt.representation
Class BodyTextRepresentationBuilder
java.lang.Object
|
+--idl.tmt.representation.BagOfWordsRepresentationBuilder
|
+--idl.tmt.representation.BodyTextRepresentationBuilder
- All Implemented Interfaces:
- HTMLParsingListener, ParsingListener, RepresentationBuilder, WordParsingListener
- public class BodyTextRepresentationBuilder
- extends BagOfWordsRepresentationBuilder
- implements WordParsingListener, HTMLParsingListener
Representation builder which creates a document representation containing
the body text of a document.
Created on Apr 7, 2004
- Author:
- jelsas
|
Method Summary |
void |
documentCollectionComplete()
Indicates the completion of the document collection, and builds
the matrix representation. |
void |
documentComplete()
Indicates the completion of a document. |
void |
endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
Indicates the end of an HTML tag. |
void |
newDocument(int docID)
Indicates the start of a new document. |
void |
startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
Indicates the start of an HTML tag. |
void |
word(java.lang.String word,
int pos)
Indicates a word is encountered. |
| Methods inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder |
addTermToDocRepresentation, buildRepresentation, cleanup, getRepresentation, getTermList, getWeight, isBinarize, isDebug, isShareTermlist, setBinarize, setDebug, setNumDocuments, setShareTermlist, setTermList, setTextParser, setWeight, toString |
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
currentDocID
private int currentDocID
inBody
private boolean inBody
BodyTextRepresentationBuilder
public BodyTextRepresentationBuilder(int numDocs)
BodyTextRepresentationBuilder
public BodyTextRepresentationBuilder(int numDocs,
TermList termList)
BodyTextRepresentationBuilder
public BodyTextRepresentationBuilder(boolean binarize,
TermList termList)
BodyTextRepresentationBuilder
public BodyTextRepresentationBuilder()
word
public void word(java.lang.String word,
int pos)
- Indicates a word is encountered.
- Specified by:
word in interface WordParsingListener
- See Also:
WordParsingListener.word(java.lang.String, int)
newDocument
public void newDocument(int docID)
- Indicates the start of a new document.
- Specified by:
newDocument in interface ParsingListener
- See Also:
ParsingListener.newDocument(int)
documentComplete
public void documentComplete()
- Indicates the completion of a document.
- Specified by:
documentComplete in interface ParsingListener
- See Also:
ParsingListener.documentComplete()
documentCollectionComplete
public void documentCollectionComplete()
- Indicates the completion of the document collection, and builds
the matrix representation.
- Specified by:
documentCollectionComplete in interface ParsingListener
- See Also:
ParsingListener.documentCollectionComplete()
startTag
public void startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
- Indicates the start of an HTML tag.
- Specified by:
startTag in interface HTMLParsingListener
- See Also:
HTMLParsingListener.startTag(javax.swing.text.html.HTML.Tag, javax.swing.text.MutableAttributeSet, int)
endTag
public void endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
- Indicates the end of an HTML tag.
- Specified by:
endTag in interface HTMLParsingListener
- See Also:
HTMLParsingListener.endTag(javax.swing.text.html.HTML.Tag, int)