Gaussian Mixture Models
Probabilistic model assuming data comes from a mixture of Gaussian distributions
What are Gaussian Mixture Models?
GMM is a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Unlike K-means which performs hard clustering, GMM provides soft clustering with probability distributions.
Key Concepts:
- Components: Individual Gaussian distributions
- Weights: Probability of selecting each component
- Mean: Center of each Gaussian
- Covariance: Shape and orientation of each Gaussian
- EM Algorithm: Expectation-Maximization for fitting
Advantages over K-Means:
- Soft clustering (probabilistic assignments)
- Can model elliptical clusters
- Provides uncertainty estimates
- More flexible cluster shapes
- Model selection via AIC/BIC
Parameters
Number of Gaussian distributions
Shape flexibility of clusters
Cluster Visualization
Results
Run GMM to see results
Understanding GMM
When to Use GMM
- Need probabilistic cluster assignments
- Clusters have different shapes/sizes
- Data follows Gaussian distribution
- Uncertainty quantification needed
- Soft boundaries between clusters
Covariance Types
- Full: Each component has its own covariance matrix
- Tied: All components share the same covariance
- Diagonal: Covariance matrices are diagonal
- Spherical: Covariance matrices are spherical
EM Algorithm Steps
- Expectation (E-step): Calculate probability of each point belonging to each Gaussian
- Maximization (M-step): Update parameters (means, covariances, weights) to maximize likelihood
- Repeat: Continue until convergence or max iterations reached