Histograms & KDE
Purpose
Histograms and kernel density estimation (KDE) show the distribution of a single continuous variable. They help identify:
- Central tendency (mode, median).
- Spread and skewness.
- Presence of multiple modes or outliers.
Example in the Notebook
The notebook plots the distribution of total bill from the tips dataset:
- Histogram with 30 bins.
- Overlaid KDE curve.
- Normalized to density (area under curve = 1).
Key Code Snippet
# Use data= and x= parameters for modern seaborn API
sns.histplot(data=tips, x='total_bill', bins=30, kde=True, stat='density', ax=ax, color='C2')
ax.set_title('Distribution of total bill (tips dataset)')
Customization Tips
- Bin count: Increase
bins=30for more detail or decrease for a smoother view. - Stat type: Use
stat='count'for raw counts,stat='density'for normalized density, orstat='percent'. - KDE bandwidth: Control smoothing with
kde=Trueor fine-tune viahueand Seaborn'sbw_adjust. - Multiple groups: Use
hue='day'to overlay distributions by category. - Stacking: Set
multiple='stack'ormultiple='dodge'for grouped histograms.
When to Use
- Always: for exploring univariate distributions.
- Consider: KDE for smooth, continuous estimates.
- Avoid: histograms for very small samples (< 20 observations).
See Also
- Seaborn: histplot()
- Matplotlib: hist()