idl.tmt.classification
Class HTMLMetricsClassifier
java.lang.Object
|
+--idl.tmt.classification.HTMLMetricsClassifier
- All Implemented Interfaces:
- CharacterParsingListener, ClassificationBuilder, HTMLParsingListener, ParsingListener
- public class HTMLMetricsClassifier
- extends java.lang.Object
- implements ClassificationBuilder, HTMLParsingListener, CharacterParsingListener
Created on Jun 21, 2004
- Author:
- jelsas
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait |
classification
private SimpleClassification classification
IS_INDEX
private static final java.lang.String IS_INDEX
IS_TABLE
private static final java.lang.String IS_TABLE
IS_CONTENT
private static final java.lang.String IS_CONTENT
currDocId
private int currDocId
currDocAnchorCount
private int currDocAnchorCount
currDocTagCount
private int currDocTagCount
currDocTDCount
private int currDocTDCount
currDocTDCharCount
private int currDocTDCharCount
inTD
private boolean inTD
indexThreshold
private double indexThreshold
tableThreshold
private double tableThreshold
contentThreshold
private double contentThreshold
HTMLMetricsClassifier
public HTMLMetricsClassifier()
getClassification
public DocumentClassification getClassification()
- Specified by:
getClassification in interface ClassificationBuilder
startTag
public void startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that a new HTML start tag has been entered.
- Specified by:
startTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagatts - The tag's attributespos - The character position of this tag in the document
endTag
public void endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that an HTML end tag has been reached
- Specified by:
endTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagpos - The character position of this tag in the document
characters
public void characters(char[] characters,
int pos)
- Description copied from interface:
CharacterParsingListener
- Indicates that a string of characters has been encountered
in the document being parsed
- Specified by:
characters in interface CharacterParsingListener
- Following copied from interface:
idl.tmt.documentparsing.CharacterParsingListener
- Parameters:
characters - The characters encountered.pos - The start position of these characters in the document
newDocument
public void newDocument(int docID)
- Description copied from interface:
ParsingListener
- Indicates that a new document parsing has begun. The
invocation of this method implies that parsing has completed
on the current document.
- Specified by:
newDocument in interface ParsingListener
- Following copied from interface:
idl.tmt.documentparsing.ParsingListener
- Parameters:
docID - the numeric ID of the new document to be parsed
documentComplete
public void documentComplete()
- Description copied from interface:
ParsingListener
- Indicates that the parsing of the current document has completed.
- Specified by:
documentComplete in interface ParsingListener
documentCollectionComplete
public void documentCollectionComplete()
- Description copied from interface:
ParsingListener
- Indicates that the parsing of the entire collection of documents is complete.
- Specified by:
documentCollectionComplete in interface ParsingListener
getIndexThreshold
public double getIndexThreshold()
getTableThreshold
public double getTableThreshold()
setIndexThreshold
public void setIndexThreshold(double d)
setTableThreshold
public void setTableThreshold(double d)