idl.tmt.util
Class TitleSaver
java.lang.Object
|
+--idl.tmt.util.TitleSaver
- All Implemented Interfaces:
- CharacterParsingListener, HTMLParsingListener, ParsingListener
- public class TitleSaver
- extends java.lang.Object
- implements CharacterParsingListener, HTMLParsingListener
Class that acts as a parsing listener, but only saves the
text in the HTML title tag for each document.
the file format for the output is:
docID|Title
Created on May 4, 2004
- Author:
- jelsas
|
Constructor Summary |
TitleSaver(java.lang.String fileName)
|
|
Method Summary |
void |
characters(char[] characters,
int pos)
Indicates that a string of characters has been encountered
in the document being parsed |
void |
documentCollectionComplete()
Indicates that the parsing of the entire collection of documents is complete. |
void |
documentComplete()
Indicates that the parsing of the current document has completed. |
void |
endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
Indicates that an HTML end tag has been reached |
void |
newDocument(int docID)
Indicates that a new document parsing has begun. |
void |
startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
Indicates that a new HTML start tag has been entered. |
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait |
inTitle
private boolean inTitle
titleWriter
private java.io.Writer titleWriter
currentDocID
private int currentDocID
TitleSaver
public TitleSaver(java.lang.String fileName)
throws java.io.IOException
characters
public void characters(char[] characters,
int pos)
- Description copied from interface:
CharacterParsingListener
- Indicates that a string of characters has been encountered
in the document being parsed
- Specified by:
characters in interface CharacterParsingListener
- Following copied from interface:
idl.tmt.documentparsing.CharacterParsingListener
- Parameters:
characters - The characters encountered.pos - The start position of these characters in the document
newDocument
public void newDocument(int docID)
- Description copied from interface:
ParsingListener
- Indicates that a new document parsing has begun. The
invocation of this method implies that parsing has completed
on the current document.
- Specified by:
newDocument in interface ParsingListener
- Following copied from interface:
idl.tmt.documentparsing.ParsingListener
- Parameters:
docID - the numeric ID of the new document to be parsed
documentComplete
public void documentComplete()
- Description copied from interface:
ParsingListener
- Indicates that the parsing of the current document has completed.
- Specified by:
documentComplete in interface ParsingListener
documentCollectionComplete
public void documentCollectionComplete()
- Description copied from interface:
ParsingListener
- Indicates that the parsing of the entire collection of documents is complete.
- Specified by:
documentCollectionComplete in interface ParsingListener
startTag
public void startTag(javax.swing.text.html.HTML.Tag tag,
javax.swing.text.MutableAttributeSet atts,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that a new HTML start tag has been entered.
- Specified by:
startTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagatts - The tag's attributespos - The character position of this tag in the document
endTag
public void endTag(javax.swing.text.html.HTML.Tag tag,
int pos)
- Description copied from interface:
HTMLParsingListener
- Indicates that an HTML end tag has been reached
- Specified by:
endTag in interface HTMLParsingListener
- Following copied from interface:
idl.tmt.documentparsing.HTMLParsingListener
- Parameters:
tag - The tagpos - The character position of this tag in the document