idl.tmt.util
Class TitleSaver

java.lang.Object
  |
  +--idl.tmt.util.TitleSaver
All Implemented Interfaces:
CharacterParsingListener, HTMLParsingListener, ParsingListener

public class TitleSaver
extends java.lang.Object
implements CharacterParsingListener, HTMLParsingListener

Class that acts as a parsing listener, but only saves the text in the HTML title tag for each document. the file format for the output is: docID|Title Created on May 4, 2004

Author:
jelsas

Field Summary
private  int currentDocID
           
private  boolean inTitle
           
private  java.io.Writer titleWriter
           
 
Constructor Summary
TitleSaver(java.lang.String fileName)
           
 
Method Summary
 void characters(char[] characters, int pos)
          Indicates that a string of characters has been encountered in the document being parsed
 void documentCollectionComplete()
          Indicates that the parsing of the entire collection of documents is complete.
 void documentComplete()
          Indicates that the parsing of the current document has completed.
 void endTag(javax.swing.text.html.HTML.Tag tag, int pos)
          Indicates that an HTML end tag has been reached
 void newDocument(int docID)
          Indicates that a new document parsing has begun.
 void startTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet atts, int pos)
          Indicates that a new HTML start tag has been entered.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

inTitle

private boolean inTitle

titleWriter

private java.io.Writer titleWriter

currentDocID

private int currentDocID
Constructor Detail

TitleSaver

public TitleSaver(java.lang.String fileName)
           throws java.io.IOException
Method Detail

characters

public void characters(char[] characters,
                       int pos)
Description copied from interface: CharacterParsingListener
Indicates that a string of characters has been encountered in the document being parsed
Specified by:
characters in interface CharacterParsingListener
Following copied from interface: idl.tmt.documentparsing.CharacterParsingListener
Parameters:
characters - The characters encountered.
pos - The start position of these characters in the document

newDocument

public void newDocument(int docID)
Description copied from interface: ParsingListener
Indicates that a new document parsing has begun. The invocation of this method implies that parsing has completed on the current document.
Specified by:
newDocument in interface ParsingListener
Following copied from interface: idl.tmt.documentparsing.ParsingListener
Parameters:
docID - the numeric ID of the new document to be parsed

documentComplete

public void documentComplete()
Description copied from interface: ParsingListener
Indicates that the parsing of the current document has completed.
Specified by:
documentComplete in interface ParsingListener

documentCollectionComplete

public void documentCollectionComplete()
Description copied from interface: ParsingListener
Indicates that the parsing of the entire collection of documents is complete.
Specified by:
documentCollectionComplete in interface ParsingListener

startTag

public void startTag(javax.swing.text.html.HTML.Tag tag,
                     javax.swing.text.MutableAttributeSet atts,
                     int pos)
Description copied from interface: HTMLParsingListener
Indicates that a new HTML start tag has been entered.
Specified by:
startTag in interface HTMLParsingListener
Following copied from interface: idl.tmt.documentparsing.HTMLParsingListener
Parameters:
tag - The tag
atts - The tag's attributes
pos - The character position of this tag in the document

endTag

public void endTag(javax.swing.text.html.HTML.Tag tag,
                   int pos)
Description copied from interface: HTMLParsingListener
Indicates that an HTML end tag has been reached
Specified by:
endTag in interface HTMLParsingListener
Following copied from interface: idl.tmt.documentparsing.HTMLParsingListener
Parameters:
tag - The tag
pos - The character position of this tag in the document