Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Histograms & KDE

Purpose

Histograms and kernel density estimation (KDE) show the distribution of a single continuous variable. They help identify:

  • Central tendency (mode, median).
  • Spread and skewness.
  • Presence of multiple modes or outliers.

Example in the Notebook

The notebook plots the distribution of total bill from the tips dataset:

  • Histogram with 30 bins.
  • Overlaid KDE curve.
  • Normalized to density (area under curve = 1).

Key Code Snippet

# Use data= and x= parameters for modern seaborn API
sns.histplot(data=tips, x='total_bill', bins=30, kde=True, stat='density', ax=ax, color='C2')
ax.set_title('Distribution of total bill (tips dataset)')

Customization Tips

  • Bin count: Increase bins=30 for more detail or decrease for a smoother view.
  • Stat type: Use stat='count' for raw counts, stat='density' for normalized density, or stat='percent'.
  • KDE bandwidth: Control smoothing with kde=True or fine-tune via hue and Seaborn's bw_adjust.
  • Multiple groups: Use hue='day' to overlay distributions by category.
  • Stacking: Set multiple='stack' or multiple='dodge' for grouped histograms.

When to Use

  • Always: for exploring univariate distributions.
  • Consider: KDE for smooth, continuous estimates.
  • Avoid: histograms for very small samples (< 20 observations).

See Also