gmcluster¶
Functions:
|
Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations. |
|
Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses. |
|
Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture. |
|
Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations. |
- gmcluster.estimate_gm_params(data, init_K=20, final_K=0, verbose=True, est_kind='full', decorrelate_coordinates=False, alpha=0.1)[source]¶
Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations.
- Parameters:
data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations
init_K (int,optional) – the initial number of clusters to start with and will be reduced to find the optimal order or the desired order based on MDL
final_K (int,optional) – the final number of clusters for the model. Estimate the optimal order if final_K == 0
verbose (bool,optional) – true/false, return clustering information if true
est_kind (str,optional) –
est_kind = ‘diag’ constrains the class covariance matrices to be diagonal
est_kind = ‘full’ allows the class covariance matrices to be full matrices
decorrelate_coordinates (bool,optional) – true/false, decorrelate the coordinates to better condition the problem if true
alpha (float,optional) – a constant (0 < alpha <= 1) that controls the shape of the cluster by regularizing the covariance matrices. alpha = 1 gives the cluster a spherical shape and alpha = 0 gives the cluster an elliptical shape. The default value is 0.1
- Returns:
class object –
- a structure with optimum Gaussian mixture parameters, where
opt_mixture.K: order of the mixture
opt_mixture.M: dimension of observation vectors
opt_mixture.cluster: an array of cluster structures with each containing the converged cluster parameters
opt_mixture.rissanen: converged MDL(K)
opt_mixture.loglikelihood: ln( Prob{Y=y|K, theta*} )
opt_mixture.pnk: Prob(Xn=k|Yn=yn, theta)
opt_mixture.D_reg: a diagonal matrix used for regularizing class covariance matrices
- gmcluster.split_classes(mixture)[source]¶
Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses.
- Parameters:
mixture (class) – a structure representing the parameters for a Gaussian mixture of order K (K subclasses)
- Returns:
list – a list of K structures, each representing the parameters for a Gaussian mixture of order 1 (one of the K original subclasses)
- gmcluster.compute_class_likelihood(mixture, data)[source]¶
Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture.
- Parameters:
mixture (class) – a structure representing the parameters for a Gaussian mixture of order 1
data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations
- Returns:
ndarray – an N x 1 array with the n-th entry returning the log-likelihood of the n-th observation for the given Gaussian mixture of order 1
- gmcluster.generate_gm_samples(mixture, N=500)[source]¶
Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations.
- Parameters:
mixture (class) – a structure representing the parameters for a Gaussian mixture of a given order
N (int,optional) – number of observation
- Returns:
ndarray – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations