gmcluster¶

Functions:

`estimate_gm_params`(data[, init_K, final_K, ...])	Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations.
`split_classes`(mixture)	Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses.
`compute_class_likelihood`(mixture, data)	Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture.
`generate_gm_samples`(mixture[, N])	Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations.

gmcluster.estimate_gm_params(data, init_K=20, final_K=0, verbose=True, est_kind='full', decorrelate_coordinates=False, alpha=0.1)[source]¶

Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations.

Parameters:

data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations
init_K (int,optional) – the initial number of clusters to start with and will be reduced to find the optimal order or the desired order based on MDL
final_K (int,optional) – the final number of clusters for the model. Estimate the optimal order if final_K == 0
verbose (bool,optional) – true/false, return clustering information if true
est_kind (str,optional) –
- est_kind = ‘diag’ constrains the class covariance matrices to be diagonal
- est_kind = ‘full’ allows the class covariance matrices to be full matrices
decorrelate_coordinates (bool,optional) – true/false, decorrelate the coordinates to better condition the problem if true
alpha (float,optional) – a constant (0 < alpha <= 1) that controls the shape of the cluster by regularizing the covariance matrices. alpha = 1 gives the cluster a spherical shape and alpha = 0 gives the cluster an elliptical shape. The default value is 0.1

Returns:

class object –

a structure with optimum Gaussian mixture parameters, where

opt_mixture.K: order of the mixture
opt_mixture.M: dimension of observation vectors
opt_mixture.cluster: an array of cluster structures with each containing the converged cluster parameters
opt_mixture.rissanen: converged MDL(K)
opt_mixture.loglikelihood: ln( Prob{Y=y|K, theta*} )
opt_mixture.pnk: Prob(Xn=k|Yn=yn, theta)
opt_mixture.D_reg: a diagonal matrix used for regularizing class covariance matrices

gmcluster.split_classes(mixture)[source]¶

Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses.

Parameters:: mixture (class) – a structure representing the parameters for a Gaussian mixture of order K (K subclasses)
Returns:: list – a list of K structures, each representing the parameters for a Gaussian mixture of order 1 (one of the K original subclasses)

gmcluster.compute_class_likelihood(mixture, data)[source]¶

Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture.

Parameters:

mixture (class) – a structure representing the parameters for a Gaussian mixture of order 1
data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations

Returns:

ndarray – an N x 1 array with the n-th entry returning the log-likelihood of the n-th observation for the given Gaussian mixture of order 1

gmcluster.generate_gm_samples(mixture, N=500)[source]¶

Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations.

Parameters:

mixture (class) – a structure representing the parameters for a Gaussian mixture of a given order
N (int,optional) – number of observation

Returns:

ndarray – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations