DBSCAN Clustering
Density-Based Spatial Clustering of Applications with Noise
What is DBSCAN?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups together points that are closely packed together, marking points in low-density regions as outliers.
Key Concepts:
- ε (epsilon): Maximum distance between two points to be neighbors
- MinPts: Minimum points to form a dense region
- Core Points: Points with at least MinPts neighbors within ε
- Border Points: Non-core points within ε of a core point
- Noise Points: Neither core nor border points
Advantages:
- Finds arbitrary shaped clusters
- Robust to outliers
- No need to specify number of clusters
- Can find clusters of different densities
Parameters
Maximum distance for neighborhood
Minimum points for dense region
Cluster Visualization
Core Points
Border Points
Noise Points
Results
Run DBSCAN to see results
Understanding DBSCAN
When to Use DBSCAN
- Non-spherical cluster shapes
- Clusters of varying densities
- Data with noise and outliers
- Unknown number of clusters
- Spatial data clustering
Limitations
- Struggles with varying densities
- Sensitive to parameter selection
- High dimensional data challenges
- Not suitable for clusters with large density differences
Parameter Selection Tips
- MinPts: Generally set to dimensionality × 2, minimum of 3
- Epsilon: Use k-distance graph, look for the “elbow”
- For 2D data: MinPts = 4 is often a good start
- Dense data: Increase MinPts for better noise detection