Set the minimum allowable standard deviation for normal density calculation.
- Version:
- $Revision: 1.4 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
- Serialized Form
| Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
m_modelNormal
private double[][][] m_modelNormal
- hold the normal estimators for each cluster
m_minStdDev
private double m_minStdDev
- default minimum standard deviation
m_weights
private double[][] m_weights
- hold the weights of each instance for each cluster
m_priors
private double[] m_priors
- the prior probabilities for clusters
m_loglikely
private double m_loglikely
- the loglikelihood of the data
m_theInstances
private Instances m_theInstances
- training instances
m_num_clusters
private int m_num_clusters
- number of clusters selected by the user or cross validation
m_initialNumClusters
private int m_initialNumClusters
- the initial number of clusters requested by the user--- -1 if
xval is to be used to find the number of clusters
m_num_attribs
private int m_num_attribs
- number of attributes
m_num_instances
private int m_num_instances
- number of training instances
m_max_iterations
private int m_max_iterations
- maximum iterations to perform
m_max_clusterers_to_build
private int m_max_clusterers_to_build
- maxumum clusterers to build when seeking the best one. defaults
to 1, in which case only a single cluster is built
m_minValues
private double[] m_minValues
- attribute min values
m_maxValues
private double[] m_maxValues
- attribute max values
m_rr
private java.util.Random m_rr
- random numbers and seed
m_rseed
private int m_rseed
m_verbose
private boolean m_verbose
- Verbose?
m_initializer
private EMInitializer m_initializer
- Initializer object responsibe for initializing EM model
m_normConst
private static double m_normConst
- Constant for normal distribution.
EnhancedEM
public EnhancedEM()
- Constructor.
globalInfo
public java.lang.String globalInfo()
- Returns a string describing this clusterer
- Returns:
- a description of the evaluator suitable for
displaying in the explorer/experimenter gui
listOptions
public java.util.Enumeration listOptions()
- Returns an enumeration describing the available options..
Valid options are:
-V
Verbose.
-N
Specify the number of clusters to generate. If omitted,
EM will use cross validation to select the number of clusters
automatically.
-I
Terminate after this many iterations if EM has not converged.
-S
Specify random number seed.
-M
Set the minimum allowable standard deviation for normal density
calculation.
- Specified by:
listOptions in interface OptionHandler
- Returns:
- an enumeration of all the available options.
setOptions
public void setOptions(java.lang.String[] options)
throws java.lang.Exception
- Parses a given list of options.
- Specified by:
setOptions in interface OptionHandler
- Parameters:
options - the list of options as an array of strings- Throws:
java.lang.Exception - if an option is not supported
minStdDevTipText
public java.lang.String minStdDevTipText()
- Returns the tip text for this property
- Returns:
- tip text for this property suitable for
displaying in the explorer/experimenter gui
setMinStdDev
public void setMinStdDev(double m)
- Set the minimum value for standard deviation when calculating
normal density. Reducing this value can help prevent arithmetic
overflow resulting from multiplying large densities (arising from small
standard deviations) when there are many singleton or near singleton
values.
- Parameters:
m - minimum value for standard deviation
getMinStdDev
public double getMinStdDev()
- Get the minimum allowable standard deviation.
- Returns:
- the minumum allowable standard deviation
seedTipText
public java.lang.String seedTipText()
- Returns the tip text for this property
- Returns:
- tip text for this property suitable for
displaying in the explorer/experimenter gui
setSeed
public void setSeed(int s)
- Set the random number seed
- Parameters:
s - the seed
getSeed
public int getSeed()
- Get the random number seed
- Returns:
- the seed
numClustersTipText
public java.lang.String numClustersTipText()
- Returns the tip text for this property
- Returns:
- tip text for this property suitable for
displaying in the explorer/experimenter gui
setNumClusters
public void setNumClusters(int n)
throws java.lang.Exception
- Set the number of clusters (-1 to select by CV).
- Specified by:
setNumClusters in interface NumberOfClustersRequestable
- Parameters:
n - the number of clusters- Throws:
java.lang.Exception - if n is 0
getNumClusters
public int getNumClusters()
- Get the number of clusters
- Returns:
- the number of clusters.
maxIterationsTipText
public java.lang.String maxIterationsTipText()
- Returns the tip text for this property
- Returns:
- tip text for this property suitable for
displaying in the explorer/experimenter gui
setMaxIterations
public void setMaxIterations(int i)
throws java.lang.Exception
- Set the maximum number of iterations to perform
- Parameters:
i - the number of iterations- Throws:
java.lang.Exception - if i is less than 1
getMaxIterations
public int getMaxIterations()
- Get the maximum number of iterations
- Returns:
- the number of iterations
setMaxClusterersToBuild
public void setMaxClusterersToBuild(int i)
throws java.lang.Exception
- Set the maximum number of clusterers to build when seeking
the best one
- Parameters:
i - the number of clusterers to build- Throws:
java.lang.Exception - if i is less than 1
getMaxClusterersToBuild
public int getMaxClusterersToBuild()
- Get the maximum number of clusterers to build when seeking
the best one.
- Returns:
- the number of clusterers to buld
setDebug
public void setDebug(boolean v)
- Set debug mode - verbose output
- Parameters:
v - true for verbose output
getDebug
public boolean getDebug()
- Get debug mode
- Returns:
- true if debug mode is set
getOptions
public java.lang.String[] getOptions()
- Gets the current settings of EM.
- Specified by:
getOptions in interface OptionHandler
- Returns:
- an array of strings suitable for passing to setOptions()
EM_Init
private void EM_Init(Instances inst)
throws java.lang.Exception
- Initialise estimators and storage.
- Parameters:
inst - the instances
estimate_priors
private void estimate_priors(Instances inst)
throws java.lang.Exception
- calculate prior probabilites for the clusters
- Parameters:
inst - the instances- Throws:
java.lang.Exception - if priors can't be calculated
logNormalDens
private double logNormalDens(double x,
double mean,
double stdDev)
- Density function of normal distribution.
- Parameters:
x - input valuemean - mean of distributionstdDev - standard deviation of distribution
new_estimators
private void new_estimators()
- New probability estimators for an iteration
- Parameters:
num_cl - the numbe of clusters
M
private void M(Instances inst)
throws java.lang.Exception
- The M step of the EM algorithm.
- Parameters:
inst - the training instances
E
private double E(Instances inst,
boolean change_weights)
throws java.lang.Exception
- The E step of the EM algorithm. Estimate cluster membership
probabilities.
- Parameters:
inst - the training instances- Returns:
- the average log likelihood
resetOptions
protected void resetOptions()
- Reset to default options
getClusterModelsNumericAtts
public double[][][] getClusterModelsNumericAtts()
- Return the normal distributions for the cluster models
- Returns:
- a
double[][][] value
getClusterPriors
public double[] getClusterPriors()
- Return the priors for the clusters
- Returns:
- a
double[] value
toString
public java.lang.String toString()
- Outputs the generated clusters into a string.
- Overrides:
toString in class java.lang.Object
EM_Report
private void EM_Report(Instances inst)
- verbose output for debugging
- Parameters:
inst - the training instances
CVClusters
private void CVClusters()
throws java.lang.Exception
- estimate the number of clusters by cross validation on the training
data.
numberOfClusters
public int numberOfClusters()
throws java.lang.Exception
- Returns the number of clusters.
- Overrides:
numberOfClusters in class Clusterer
- Returns:
- the number of clusters generated for a training dataset.
- Throws:
java.lang.Exception - if number of clusters could not be returned
successfully
updateMinMax
private void updateMinMax(Instance instance)
- Updates the minimum and maximum values for all the attributes
based on a new instance.
- Parameters:
instance - the new instance
buildClusterer
public void buildClusterer(Instances data)
throws java.lang.Exception
- Generates a clusterer. Has to initialize all fields of the clusterer
that are not being set via options.
- Overrides:
buildClusterer in class Clusterer
- Parameters:
data - set of instances serving as training data- Throws:
java.lang.Exception - if the clusterer has not been
generated successfully
clusterPriors
public double[] clusterPriors()
- Returns the cluster priors.
- Overrides:
clusterPriors in class DensityBasedClusterer
logDensityPerClusterForInstance
public double[] logDensityPerClusterForInstance(Instance inst)
throws java.lang.Exception
- Computes the log of the conditional density (per cluster) for a given instance.
- Overrides:
logDensityPerClusterForInstance in class DensityBasedClusterer
- Parameters:
instance - the instance to compute the density for- Returns:
- the density.
- Throws:
java.lang.Exception - if the density could not be computed
successfully
doEM
private void doEM()
throws java.lang.Exception
- Perform the EM algorithm
iterate
private double iterate(Instances inst,
boolean report)
throws java.lang.Exception
- iterates the E and M steps until the log likelihood of the data
converges.
- Parameters:
inst - the training instances.num_cl - the number of clusters.report - be verbose.- Returns:
- the log likelihood of the data
main
public static void main(java.lang.String[] argv)
- Main method for testing this class.
- Parameters:
argv - should contain the following arguments:
-t training file [-T test file] [-N number of clusters] [-S random seed]
getLogLikely
public double getLogLikely()
setInitializer
public void setInitializer(EMInitializer m_initializer)
setInitializerName
public void setInitializerName(java.lang.String initializerClassName)
throws java.lang.Exception
getInitializer
public EMInitializer getInitializer()