Scatter Plots & Regression
Purpose
Scatter plots reveal relationships between two continuous variables. Adding a regression line or smoothed curve helps identify trends and fit patterns.
Example in the Notebook
The notebook uses the iris dataset to plot:
- Sepal length vs. sepal width as a scatter plot, colored and styled by species.
- A LOWESS (Locally Weighted Scatterplot Smoothing) regression curve overlay.
Key Code Snippet
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species', style='species', ax=ax, s=60)
sns.regplot(data=iris, x='sepal_length', y='sepal_width', scatter=False, ax=ax,
color='gray', lowess=True)
Customization Tips
- Point size: Change
s=60to adjust marker size. - Regression method: Use
lowess=Truefor non-parametric smoothing or omit for linear regression. - Color palettes: Customize the
huepalette withpalette='Set2'or others. - Transparency: Add
alpha=0.6to scatter points for overlapping data visibility. - Faceting: Use
sns.relplot(..., col='species')to create subplots per category.
When to Use
- Always: for exploring correlations between two continuous variables.
- Consider: adding transparency or jitter when points overlap heavily.
- Avoid: when you have too many points (>10,000) without aggregation; use hexbin or datashader instead.
See Also
- Seaborn: scatterplot(), regplot()
- Matplotlib: scatter()