idl.tmt.documentparsing
Class HTMLDocumentParser

java.lang.Object
  |
  +--javax.swing.text.html.HTMLEditorKit.ParserCallback
        |
        +--idl.tmt.documentparsing.HTMLDocumentParser
All Implemented Interfaces:
DocumentParser

public class HTMLDocumentParser
extends javax.swing.text.html.HTMLEditorKit.ParserCallback
implements DocumentParser

A DocumentParser which parses HTML documents. This class uses the java HTML parsing packages in javax.swing.text.html . Created on Feb 25, 2004

Author:
jelsas

Field Summary
private  int currentDocumentID
           
private  boolean delegateToTextParser
           
private  DocumentIDMapper docIDMap
           
private static boolean IGNORE_CHAR_SET
           
private  boolean isParsingHTMLTags
           
private  boolean isParsingLinks
           
private  java.util.LinkedList parsingListeners
           
private  javax.swing.text.html.parser.ParserDelegator pd
           
private  TextDocumentParser textParser
           
 
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
HTMLDocumentParser()
          Creates a new HTMLDocumentParser, without specifying the DocumentIDMap.
HTMLDocumentParser(DocumentIDMapper docIDMap)
          Creates a new HTMLDocumentParser with the specified text parser & doc ID mapper.
 
Method Summary
 void addParsingListener(ParsingListener listener)
          Adds the specific parsing listener
 void handleEndTag(javax.swing.text.html.HTML.Tag t, int pos)
          Handles the end of a HTML tag that has some contents (start, some text or other HTML tags, and then an end) and notifies the HTMLParsingListeners and HypertextParsingListeners if it is an anchor tag.
 void handleSimpleTag(javax.swing.text.html.HTML.Tag t, javax.swing.text.MutableAttributeSet a, int pos)
          Handles a simple HTML tag (just a start & end with no text-node contents) and notifies the HTMLParsingListeners of the tag & position
 void handleStartTag(javax.swing.text.html.HTML.Tag t, javax.swing.text.MutableAttributeSet a, int pos)
          Handles the start of a HTML tag that has some contents (start, some text or other HTML tags, and then an end) and notifies the HTMLParsingListeners and HypertextParsingListeners if it is an anchor tag.
 void handleText(char[] data, int pos)
          This method simply delegates the text parsing to the TextDocumentParser.
 void parseDocument(int docID, java.io.Reader documentReader)
          Parses the document backing this reader
 void removeParsingListener(ParsingListener listener)
          removes the specified parsing listener
 void setParameter(java.lang.String name, java.lang.Object value)
          This method configures this parser for various runtime parameters.
 
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
, flush, handleComment, handleEndOfLineString, handleError
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

isParsingHTMLTags

private boolean isParsingHTMLTags

isParsingLinks

private boolean isParsingLinks

delegateToTextParser

private boolean delegateToTextParser

parsingListeners

private java.util.LinkedList parsingListeners

pd

private javax.swing.text.html.parser.ParserDelegator pd

IGNORE_CHAR_SET

private static final boolean IGNORE_CHAR_SET

docIDMap

private DocumentIDMapper docIDMap

textParser

private TextDocumentParser textParser

currentDocumentID

private int currentDocumentID
Constructor Detail

HTMLDocumentParser

public HTMLDocumentParser(DocumentIDMapper docIDMap)
Creates a new HTMLDocumentParser with the specified text parser & doc ID mapper. Either parameter can be null

HTMLDocumentParser

public HTMLDocumentParser()
Creates a new HTMLDocumentParser, without specifying the DocumentIDMap. Without this map, this parser cannot parse hyperlink information -- that is, it can't resolve linked documents to their ID's.
Method Detail

setParameter

public void setParameter(java.lang.String name,
                         java.lang.Object value)
                  throws InvalidParameterException
This method configures this parser for various runtime parameters.
Specified by:
setParameter in interface DocumentParser
See Also:
DocumentParser.setParameter(java.lang.String, java.lang.Object)

addParsingListener

public void addParsingListener(ParsingListener listener)
Adds the specific parsing listener
Specified by:
addParsingListener in interface DocumentParser
See Also:
DocumentParser.addParsingListener(idl.tmt.documentparsing.ParsingListener)

removeParsingListener

public void removeParsingListener(ParsingListener listener)
removes the specified parsing listener
Specified by:
removeParsingListener in interface DocumentParser
See Also:
DocumentParser.removeParsingListener(idl.tmt.documentparsing.ParsingListener)

parseDocument

public void parseDocument(int docID,
                          java.io.Reader documentReader)
                   throws java.io.IOException
Parses the document backing this reader
Specified by:
parseDocument in interface DocumentParser
See Also:
DocumentParser.parseDocument(int, java.io.Reader)

handleText

public void handleText(char[] data,
                       int pos)
This method simply delegates the text parsing to the TextDocumentParser.
Overrides:
handleText in class javax.swing.text.html.HTMLEditorKit.ParserCallback
See Also:
TextDocumentParser.parseText(int, char[], int, int)

handleSimpleTag

public void handleSimpleTag(javax.swing.text.html.HTML.Tag t,
                            javax.swing.text.MutableAttributeSet a,
                            int pos)
Handles a simple HTML tag (just a start & end with no text-node contents) and notifies the HTMLParsingListeners of the tag & position
Overrides:
handleSimpleTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleStartTag

public void handleStartTag(javax.swing.text.html.HTML.Tag t,
                           javax.swing.text.MutableAttributeSet a,
                           int pos)
Handles the start of a HTML tag that has some contents (start, some text or other HTML tags, and then an end) and notifies the HTMLParsingListeners and HypertextParsingListeners if it is an anchor tag.
Overrides:
handleStartTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleEndTag

public void handleEndTag(javax.swing.text.html.HTML.Tag t,
                         int pos)
Handles the end of a HTML tag that has some contents (start, some text or other HTML tags, and then an end) and notifies the HTMLParsingListeners and HypertextParsingListeners if it is an anchor tag.
Overrides:
handleEndTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback