DBSCAN Clustering

Density-Based Spatial Clustering of Applications with Noise

What is DBSCAN?

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups together points that are closely packed together, marking points in low-density regions as outliers.

Key Concepts:

  • ε (epsilon): Maximum distance between two points to be neighbors
  • MinPts: Minimum points to form a dense region
  • Core Points: Points with at least MinPts neighbors within ε
  • Border Points: Non-core points within ε of a core point
  • Noise Points: Neither core nor border points

Advantages:

  • Finds arbitrary shaped clusters
  • Robust to outliers
  • No need to specify number of clusters
  • Can find clusters of different densities

Parameters

Maximum distance for neighborhood
Minimum points for dense region

Cluster Visualization

Core Points
Border Points
Noise Points

Results

Run DBSCAN to see results

Understanding DBSCAN

When to Use DBSCAN

  • Non-spherical cluster shapes
  • Clusters of varying densities
  • Data with noise and outliers
  • Unknown number of clusters
  • Spatial data clustering

Limitations

  • Struggles with varying densities
  • Sensitive to parameter selection
  • High dimensional data challenges
  • Not suitable for clusters with large density differences

Parameter Selection Tips

  • MinPts: Generally set to dimensionality × 2, minimum of 3
  • Epsilon: Use k-distance graph, look for the “elbow”
  • For 2D data: MinPts = 4 is often a good start
  • Dense data: Increase MinPts for better noise detection