Gaussian Mixture Models

Probabilistic model assuming data comes from a mixture of Gaussian distributions

What are Gaussian Mixture Models?

GMM is a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Unlike K-means which performs hard clustering, GMM provides soft clustering with probability distributions.

Key Concepts:

  • Components: Individual Gaussian distributions
  • Weights: Probability of selecting each component
  • Mean: Center of each Gaussian
  • Covariance: Shape and orientation of each Gaussian
  • EM Algorithm: Expectation-Maximization for fitting

Advantages over K-Means:

  • Soft clustering (probabilistic assignments)
  • Can model elliptical clusters
  • Provides uncertainty estimates
  • More flexible cluster shapes
  • Model selection via AIC/BIC

Parameters

Number of Gaussian distributions
Shape flexibility of clusters

Cluster Visualization

Results

Run GMM to see results

Understanding GMM

When to Use GMM

  • Need probabilistic cluster assignments
  • Clusters have different shapes/sizes
  • Data follows Gaussian distribution
  • Uncertainty quantification needed
  • Soft boundaries between clusters

Covariance Types

  • Full: Each component has its own covariance matrix
  • Tied: All components share the same covariance
  • Diagonal: Covariance matrices are diagonal
  • Spherical: Covariance matrices are spherical

EM Algorithm Steps

  1. Expectation (E-step): Calculate probability of each point belonging to each Gaussian
  2. Maximization (M-step): Update parameters (means, covariances, weights) to maximize likelihood
  3. Repeat: Continue until convergence or max iterations reached