idl.tmt.documentparsing.filters
Class Stemmer
java.lang.Object
|
+--idl.tmt.documentparsing.filters.Stemmer
- class Stemmer
- extends java.lang.Object
Stemmer, implementing the Porter Stemming Algorithm
The Stemmer class transforms a word into its root form. The input
word can be provided a character at time (by calling add()), or at once
by calling one of the various stem(something) methods.
|
Field Summary |
private char[] |
b
|
private int |
i
|
private int |
i_end
|
private static int |
INC
|
private int |
j
|
private int |
k
|
|
Method Summary |
void |
add(char ch)
Add a character to the word being stemmed. |
void |
add(char[] w,
int wLen)
Adds wLen characters to the word being stemmed contained in a portion
of a char[] array. |
private boolean |
cons(int i)
|
private boolean |
cvc(int i)
|
private boolean |
doublec(int j)
|
private boolean |
ends(java.lang.String s)
|
char[] |
getResultBuffer()
Returns a reference to a character buffer containing the results of
the stemming process. |
int |
getResultLength()
Returns the length of the word resulting from the stemming process. |
private int |
m()
|
static void |
main(java.lang.String[] args)
Test program for demonstrating the Stemmer. |
private void |
r(java.lang.String s)
|
private void |
setto(java.lang.String s)
|
void |
stem()
Stem the word placed into the Stemmer buffer through calls to add(). |
private void |
step1()
|
private void |
step2()
|
private void |
step3()
|
private void |
step4()
|
private void |
step5()
|
private void |
step6()
|
java.lang.String |
toString()
After a word has been stemmed, it can be retrieved by toString(),
or a reference to the internal buffer can be retrieved by getResultBuffer
and getResultLength (which is generally more efficient.) |
private boolean |
vowelinstem()
|
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
b
private char[] b
i
private int i
i_end
private int i_end
j
private int j
k
private int k
INC
private static final int INC
Stemmer
public Stemmer()
add
public void add(char ch)
- Add a character to the word being stemmed. When you are finished
adding characters, you can call stem(void) to stem the word.
add
public void add(char[] w,
int wLen)
- Adds wLen characters to the word being stemmed contained in a portion
of a char[] array. This is like repeated calls of add(char ch), but
faster.
toString
public java.lang.String toString()
- After a word has been stemmed, it can be retrieved by toString(),
or a reference to the internal buffer can be retrieved by getResultBuffer
and getResultLength (which is generally more efficient.)
- Overrides:
toString in class java.lang.Object
getResultLength
public int getResultLength()
- Returns the length of the word resulting from the stemming process.
getResultBuffer
public char[] getResultBuffer()
- Returns a reference to a character buffer containing the results of
the stemming process. You also need to consult getResultLength()
to determine the length of the result.
cons
private final boolean cons(int i)
m
private final int m()
vowelinstem
private final boolean vowelinstem()
doublec
private final boolean doublec(int j)
cvc
private final boolean cvc(int i)
ends
private final boolean ends(java.lang.String s)
setto
private final void setto(java.lang.String s)
r
private final void r(java.lang.String s)
step1
private final void step1()
step2
private final void step2()
step3
private final void step3()
step4
private final void step4()
step5
private final void step5()
step6
private final void step6()
stem
public void stem()
- Stem the word placed into the Stemmer buffer through calls to add().
Returns true if the stemming process resulted in a word different
from the input. You can retrieve the result with
getResultLength()/getResultBuffer() or toString().
main
public static void main(java.lang.String[] args)
- Test program for demonstrating the Stemmer. It reads text from a
a list of files, stems each word, and writes the result to standard
output. Note that the word stemmed is expected to be in lower case:
forcing lower case must be done outside the Stemmer class.
Usage: Stemmer file-name file-name ...