gmcluster

Functions:

estimate_gm_params(data[, init_K, final_K, ...])

Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations.

split_classes(mixture)

Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses.

compute_class_likelihood(mixture, data)

Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture.

generate_gm_samples(mixture[, N])

Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations.

gmcluster.estimate_gm_params(data, init_K=20, final_K=0, verbose=True, est_kind='full', decorrelate_coordinates=False, alpha=0.1)[source]

Function to perform the EM algorithm to estimate the order, and parameters of a Gaussian mixture model for a given set of observations.

Parameters:
  • data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations

  • init_K (int,optional) – the initial number of clusters to start with and will be reduced to find the optimal order or the desired order based on MDL

  • final_K (int,optional) – the final number of clusters for the model. Estimate the optimal order if final_K == 0

  • verbose (bool,optional) – true/false, return clustering information if true

  • est_kind (str,optional) –

    • est_kind = ‘diag’ constrains the class covariance matrices to be diagonal

    • est_kind = ‘full’ allows the class covariance matrices to be full matrices

  • decorrelate_coordinates (bool,optional) – true/false, decorrelate the coordinates to better condition the problem if true

  • alpha (float,optional) – a constant (0 < alpha <= 1) that controls the shape of the cluster by regularizing the covariance matrices. alpha = 1 gives the cluster a spherical shape and alpha = 0 gives the cluster an elliptical shape. The default value is 0.1

Returns:

class object

a structure with optimum Gaussian mixture parameters, where
  • opt_mixture.K: order of the mixture

  • opt_mixture.M: dimension of observation vectors

  • opt_mixture.cluster: an array of cluster structures with each containing the converged cluster parameters

  • opt_mixture.rissanen: converged MDL(K)

  • opt_mixture.loglikelihood: ln( Prob{Y=y|K, theta*} )

  • opt_mixture.pnk: Prob(Xn=k|Yn=yn, theta)

  • opt_mixture.D_reg: a diagonal matrix used for regularizing class covariance matrices

gmcluster.split_classes(mixture)[source]

Function to splits the Gaussian mixture with K subclasses into K Gaussian mixtures, each of order 1 containing each of the subclasses.

Parameters:

mixture (class) – a structure representing the parameters for a Gaussian mixture of order K (K subclasses)

Returns:

list – a list of K structures, each representing the parameters for a Gaussian mixture of order 1 (one of the K original subclasses)

gmcluster.compute_class_likelihood(mixture, data)[source]

Function to calculate the log-likelihood of data vectors assuming they are generated by a given Gaussian mixture.

Parameters:
  • mixture (class) – a structure representing the parameters for a Gaussian mixture of order 1

  • data (ndarray) – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations

Returns:

ndarray – an N x 1 array with the n-th entry returning the log-likelihood of the n-th observation for the given Gaussian mixture of order 1

gmcluster.generate_gm_samples(mixture, N=500)[source]

Function to generate Gaussian mixture model with K clusters for a given set of parameters and number of observations.

Parameters:
  • mixture (class) – a structure representing the parameters for a Gaussian mixture of a given order

  • N (int,optional) – number of observation

Returns:

ndarray – an N x M 2D array of observation vectors with each row being an M-dimensional observation vector, totally N observations