idl.tmt.representation
Class H1TextRepresentationBuilder
java.lang.Object
|
+--idl.tmt.representation.BagOfWordsRepresentationBuilder
|
+--idl.tmt.representation.H1TextRepresentationBuilder
- All Implemented Interfaces:
- HTMLParsingListener, ParsingListener, RepresentationBuilder, WordParsingListener
- public class H1TextRepresentationBuilder
- extends BagOfWordsRepresentationBuilder
- implements HTMLParsingListener, WordParsingListener
Created on Jun 18, 2004
- Author:
- jelsas
|
Method Summary |
void |
documentCollectionComplete()
Builds the document representation matrix |
void |
documentComplete()
Indicates that the parsing of the current document has completed. |
void |
endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
Indicates that an HTML end tag has been reached |
void |
newDocument(int docID)
Indicates that a new document parsing has begun. |
void |
startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
Indicates that a new HTML start tag has been entered. |
void |
word(java.lang.String word,
int pos)
Adds the word to the current document's representation if we are
in the process of parsing the title of a document. |
| Methods inherited from class idl.tmt.representation.BagOfWordsRepresentationBuilder |
addTermToDocRepresentation, buildRepresentation, cleanup, getRepresentation, getTermList, getWeight, isBinarize, isDebug, isShareTermlist, setBinarize, setDebug, setNumDocuments, setShareTermlist, setTermList, setTextParser, setWeight, toString |
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
inH1
private boolean inH1
currentDocID
private int currentDocID
H1TextRepresentationBuilder
public H1TextRepresentationBuilder(int numDocs,
TermList termList)
- Creates a new H1TextRepresentationBuilder with the specified term
list. This class can build a matrix document representation from the
H1 text of HTML documents.
H1TextRepresentationBuilder
public H1TextRepresentationBuilder(TermList termList)
H1TextRepresentationBuilder
public H1TextRepresentationBuilder(boolean binarize,
TermList termList)
H1TextRepresentationBuilder
public H1TextRepresentationBuilder()
word
public void word(java.lang.String word,
int pos)
- Adds the word to the current document's representation if we are
in the process of parsing the title of a document.
- Specified by:
word in interface WordParsingListener
- Following copied from interface:
idl.tmt.documentparsing.WordParsingListener
- Parameters:
word - The word encounteredpos - The character position of the word in the document
startTag
public void startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that a new HTML start tag has been entered.
- Specified by:
startTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagatts - The tag's attributespos - The character position of this tag in the document
endTag
public void endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that an HTML end tag has been reached
- Specified by:
endTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagpos - The character position of this tag in the document
newDocument
public void newDocument(int docID)
- Description copied from interface:
ParsingListener
- Indicates that a new document parsing has begun. The
invocation of this method implies that parsing has completed
on the current document.
- Specified by:
newDocument in interface ParsingListener
- Following copied from interface:
idl.tmt.documentparsing.ParsingListener
- Parameters:
docID - the numeric ID of the new document to be parsed
documentComplete
public void documentComplete()
- Description copied from interface:
ParsingListener
- Indicates that the parsing of the current document has completed.
- Specified by:
documentComplete in interface ParsingListener
documentCollectionComplete
public void documentCollectionComplete()
- Builds the document representation matrix
- Specified by:
documentCollectionComplete in interface ParsingListener
- See Also:
ParsingListener.documentCollectionComplete()