Introduction
Welcome to the Data Visualization Techniques Demo!
This project showcases a comprehensive set of data visualization techniques using popular Python libraries:
- Matplotlib — static plotting and customization
- Seaborn — statistical visualization with pandas integration
- Plotly — interactive web-based charts
- Datashader — rendering very large datasets efficiently
What's Included
- A Jupyter Notebook (
data-visualization-demo.ipynb) with live, runnable examples across all major visualization types. - Interactive HTML exports for Plotly charts that work without a running Python kernel.
- Automated helpers to export the notebook to PDF using
nbconvertor headless Chromium. - Examples of exporting animations to MP4/GIF and saving static plots as PNG.
- Comprehensive documentation (this site) for quick reference.
Key Features
- Multi-library coverage — Matplotlib for low-level control, Seaborn for quick statistical plots, Plotly for interactivity.
- Large dataset handling — Datashader fallback for hexbin aggregation when rendering hundreds of thousands of points.
- Export flexibility — Three approaches to PDF generation: LaTeX-based nbconvert, browser print, or headless Chromium.
- Animation support — FuncAnimation examples with MP4/GIF export options.
- Reproducible environment — Pinned dependencies in
requirements.txtfor exact reproduction.
Quick Start
- Clone or download this repository.
- Set up a Python virtual environment and install dependencies (see Installation).
- Launch Jupyter and open the notebook (see Running the Notebook).
- Run cells to explore visualization techniques and export examples.
For detailed instructions, see Getting Started.
Next Steps
- Explore Visualization Techniques for explanations of each chart type.
- Learn about Exporting & PDF Generation to create shareable documents.
- Refer to Troubleshooting if you run into issues.
Happy visualizing!
Getting Started
This section covers how to set up your environment and run the notebook.
Prerequisites
- Python 3.9 or later (3.10+ recommended)
- pip (usually comes with Python; verify with
pip --version) - A terminal or command prompt
- ~2GB of disk space (for venv + dependencies)
System Requirements
The notebook can run on any modern OS (Linux, macOS, Windows). Some optional features require extra system packages (see below).
What You'll Need
- Python packages — listed in
requirements.txt(matplotlib, seaborn, plotly, jupyter, datashader, etc.). - Optional: LaTeX — for direct PDF export via
nbconvert --to pdf. Install TeX Live (Linux) or MacTeX (macOS). - Optional: Chromium/Chrome — for headless HTML->PDF conversion (alternative to LaTeX).
- Optional: ffmpeg — for exporting animations to MP4 format.
Next Steps
- Installation — detailed install instructions by platform.
- Running the Notebook — how to start Jupyter and open the notebook.
Installation
Python Packages (Required)
Using a Virtual Environment (Recommended)
Create a fresh Python environment to isolate this project's dependencies:
cd /path/to/data-visualization-demo
python -m venv .venv
Activate the virtual environment:
-
Linux/macOS:
source .venv/bin/activate -
Windows (PowerShell):
.\.venv\Scripts\Activate.ps1 -
Windows (Command Prompt):
.venv\Scripts\activate.bat
Install Dependencies
With the virtual environment activated, install all required Python packages:
pip install -r requirements.txt
This installs ~75 packages including:
- Jupyter Lab / Notebook (latest)
- Matplotlib 3.10.7 — static plotting
- Seaborn 0.13.2 — statistical visualization
- Plotly 6.4.0 — interactive web charts
- Pandas 2.3.3 — data manipulation
- NumPy 2.3.4 — numerical computing
- Datashader 0.18.2 — large dataset aggregation
- Statsmodels 0.14.5 — regression and statistics
- And ~50+ supporting libraries (see
requirements.txtfor full list).
System Packages (Optional)
Depending on which features you want to use, you may need to install additional system-level packages.
LaTeX (for Direct PDF Export)
Required for jupyter nbconvert --to pdf.
Debian/Ubuntu:
sudo apt update
sudo apt install -y texlive-xetex texlive-fonts-recommended texlive-latex-recommended
macOS (with Homebrew):
brew install mactex
Windows: Download and install MiKTeX or TeX Live from their official sites.
Chromium / Google Chrome (for Headless HTML→PDF)
Alternative to LaTeX; recommended for environments where TeX is difficult to install.
Debian/Ubuntu:
sudo apt update
sudo apt install -y chromium-browser
macOS (with Homebrew):
brew install chromium
Windows: Download from Google Chrome or Chromium.
ffmpeg (for Animation Export to MP4)
Debian/Ubuntu:
sudo apt install -y ffmpeg
macOS (with Homebrew):
brew install ffmpeg
Windows: Download from ffmpeg.org or install via Chocolatey:
choco install ffmpeg
Verification
To verify your installation, activate your virtual environment and run:
python -c "import jupyter, matplotlib, seaborn, plotly; print('All core packages imported successfully!')"
jupyter --version
You should see version information for Jupyter and no errors.
Next: Running the Notebook
Once installed, see Running the Notebook to get started.
Running the Notebook
Prerequisites
Ensure you have:
- Installed Python packages (see Installation).
- Activated your virtual environment.
Starting Jupyter
From the project directory, activate your environment and launch Jupyter:
Using Jupyter Lab (Recommended)
jupyter lab
Your browser should open automatically to http://localhost:8888 (or a similar URL with a token).
Using Jupyter Notebook
jupyter notebook
Opening the Notebook
- In the Jupyter file browser, navigate to and click on
data-visualization-demo.ipynb. - The notebook will open in a new tab.
Running Cells
- Run a single cell: Click the cell and press
Ctrl+Enter(orCmd+Enteron macOS). - Run all cells: From the menu, select Run → Run All Cells.
- Run from a specific point: Click a cell and select Run → Run All Below.
Key Sections
The notebook is organized as follows:
- Setup — Import libraries and install missing dependencies (run this first).
- Load Datasets — Load iris, tips, and flights datasets from Seaborn.
- Visualization Examples — Various chart types (line, scatter, histogram, heatmap, etc.).
- Interactive Plots — Plotly examples with HTML export fallbacks.
- Animations — Matplotlib FuncAnimation with MP4/GIF export.
- Exporting to PDF — Multiple approaches for notebook→PDF conversion.
Troubleshooting
"Kernel died" or "Import error"
- Ensure you activated the virtual environment before launching Jupyter.
- Try restarting Jupyter and running the Setup cell again.
Plots not showing inline
- Confirm
%matplotlib inlineis in the setup cell and has been executed.
Memory or performance issues
- The datashader example creates a large synthetic dataset. Reduce the
multvariable in that cell if your system is resource-constrained.
For more help, see Troubleshooting.
Visualization Techniques
This section describes each visualization technique covered in the notebook.
Overview
The notebook includes examples for:
- Line Plots & Time Series — temporal data with rolling averages and confidence bands.
- Scatter Plots & Regression — bivariate relationships with regression overlays.
- Histograms & KDE — univariate distributions and density estimation.
- Box & Violin Plots — comparing distributions across categories.
- Heatmaps & Correlations — showing correlation matrices and multivariate patterns.
- Interactive Plotly Charts — web-based interactive exploration.
- Datashader — efficient rendering of large point clouds.
- Animations — time-based or parameter-sweep animations.
Each subsection below provides details on the purpose, use cases, and customization options.
Quick Links
- Line Plots & Time Series
- Scatter Plots & Regression
- Histograms & KDE
- Box & Violin Plots
- Heatmaps & Correlations
- Interactive Plots with Plotly
- Datashader for Large Datasets
- Animations
Line Plots & Time Series
Purpose
Line plots are ideal for showing trends over time or continuous sequences. They're particularly effective for:
- Time series data (stock prices, temperature, passenger counts).
- Showing changes and trends across ordered categories.
- Comparing multiple time series on the same axes.
Example in the Notebook
The notebook uses the flights dataset to demonstrate:
- Raw monthly passenger counts as a line.
- A 12-month rolling average (smoothed trend).
- Confidence bands (rolling mean ± standard deviation).
Key Code Snippet
ts = fl.set_index('date')['passengers'].sort_index()
# Use .to_numpy() for compatibility with modern matplotlib type hints
ax.plot(ts.index.to_numpy(), ts.to_numpy(), label='passengers', color='C0')
rolling = ts.rolling(window=12).mean()
ax.plot(rolling.index.to_numpy(), rolling.to_numpy(), label='12-month rolling mean', color='C1')
rolling_std = ts.rolling(window=12).std()
ax.fill_between(ts.index.to_numpy(), (rolling - rolling_std).to_numpy(), (rolling + rolling_std).to_numpy(),
color='C1', alpha=0.2)
Customization Tips
- Adjust rolling window: Change
window=12to a different period (e.g., 6 for half-yearly). - Confidence levels: Modify the band width (e.g., use 2× standard deviation for ~95% CI).
- Multiple series: Plot multiple lines on the same axes with different colors and labels.
- Markers: Add
marker='o'toax.plot()to show data points.
When to Use
- Always: for time-indexed data.
- Consider: when you have many repeated measurements and want to show a trend.
- Avoid: for categorical data without a natural order (use bar plots instead).
See Also
- Matplotlib documentation: plt.plot()
- Pandas rolling: Series.rolling()
Scatter Plots & Regression
Purpose
Scatter plots reveal relationships between two continuous variables. Adding a regression line or smoothed curve helps identify trends and fit patterns.
Example in the Notebook
The notebook uses the iris dataset to plot:
- Sepal length vs. sepal width as a scatter plot, colored and styled by species.
- A LOWESS (Locally Weighted Scatterplot Smoothing) regression curve overlay.
Key Code Snippet
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species', style='species', ax=ax, s=60)
sns.regplot(data=iris, x='sepal_length', y='sepal_width', scatter=False, ax=ax,
color='gray', lowess=True)
Customization Tips
- Point size: Change
s=60to adjust marker size. - Regression method: Use
lowess=Truefor non-parametric smoothing or omit for linear regression. - Color palettes: Customize the
huepalette withpalette='Set2'or others. - Transparency: Add
alpha=0.6to scatter points for overlapping data visibility. - Faceting: Use
sns.relplot(..., col='species')to create subplots per category.
When to Use
- Always: for exploring correlations between two continuous variables.
- Consider: adding transparency or jitter when points overlap heavily.
- Avoid: when you have too many points (>10,000) without aggregation; use hexbin or datashader instead.
See Also
- Seaborn: scatterplot(), regplot()
- Matplotlib: scatter()
Histograms & KDE
Purpose
Histograms and kernel density estimation (KDE) show the distribution of a single continuous variable. They help identify:
- Central tendency (mode, median).
- Spread and skewness.
- Presence of multiple modes or outliers.
Example in the Notebook
The notebook plots the distribution of total bill from the tips dataset:
- Histogram with 30 bins.
- Overlaid KDE curve.
- Normalized to density (area under curve = 1).
Key Code Snippet
# Use data= and x= parameters for modern seaborn API
sns.histplot(data=tips, x='total_bill', bins=30, kde=True, stat='density', ax=ax, color='C2')
ax.set_title('Distribution of total bill (tips dataset)')
Customization Tips
- Bin count: Increase
bins=30for more detail or decrease for a smoother view. - Stat type: Use
stat='count'for raw counts,stat='density'for normalized density, orstat='percent'. - KDE bandwidth: Control smoothing with
kde=Trueor fine-tune viahueand Seaborn'sbw_adjust. - Multiple groups: Use
hue='day'to overlay distributions by category. - Stacking: Set
multiple='stack'ormultiple='dodge'for grouped histograms.
When to Use
- Always: for exploring univariate distributions.
- Consider: KDE for smooth, continuous estimates.
- Avoid: histograms for very small samples (< 20 observations).
See Also
- Seaborn: histplot()
- Matplotlib: hist()
Box & Violin Plots
Purpose
Box plots and violin plots compare distributions across groups. They highlight:
- Median and quartiles (box plot).
- Full distribution shape (violin plot).
- Outliers and range.
Example in the Notebook
The notebook displays total bill by day using:
- A box plot (left panel) — shows median, IQR, whiskers, and outliers.
- A violin plot (right panel) — shows the full distribution shape.
- An overlaid swarm plot (right panel) — individual data points.
Key Code Snippet
sns.boxplot(data=tips, x='day', y='total_bill', hue='day', dodge=False, ax=axes[0], palette='pastel')
sns.violinplot(data=tips, x='day', y='total_bill', hue='day', split=False, inner=None, ax=axes[1], palette='muted')
sns.swarmplot(data=tips, x='day', y='total_bill', color='k', size=3, ax=axes[1])
Customization Tips
- Boxplot parts: Show quartiles, mean, or whiskers with
showmeans=True,showfliers=True. - Violin plot shape: Use
split=Trueto compare two hue groups side-by-side (for 2-level categorical). - Inner representation: Set
inner='box',inner='quartile', orinner=None(violin only). - Point overlay: Add
sns.swarmplot()orsns.stripplot()for individual observations. - Orient: Use
orient='h'for horizontal layout.
When to Use
- Box plots: quick comparison of medians and spreads across many groups.
- Violin plots: when distribution shape matters more than individual points.
- Swarm overlay: when sample size is small (< 50 per group) and you want to see all points.
See Also
- Seaborn: boxplot(), violinplot(), swarmplot()
Heatmaps & Correlations
Purpose
Heatmaps visualize two-dimensional data (matrices) using color intensity. Common uses:
- Correlation matrices (how variables relate).
- Confusion matrices (classification performance).
- Time-series aggregations (e.g., activity by hour and day).
Example in the Notebook
The notebook computes and displays the correlation matrix of iris features:
- Sepal length, sepal width, petal length, petal width.
- Uses a diverging color palette (coolwarm) centered at 0.
- Annotations show exact correlation values.
Key Code Snippet
corr = iris.select_dtypes(include=np.number).corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, ax=ax)
Customization Tips
- Color map: Use
cmap='viridis'(perceptually uniform),cmap='RdBu_r'(diverging), orcmap='Greens'(sequential). - Annotation: Set
annot=Trueto show values, orannot=Falseto hide. - Centering: Use
center=0for diverging colormaps to anchor at a meaningful value. - Normalization: Use
vminandvmaxto control the color scale range. - Clustermap: Use
sns.clustermap()to hierarchically cluster rows/columns.
When to Use
- Always: for visualizing correlations and matrices.
- Consider: clustering when you want to identify related variables/samples.
- Avoid: heatmaps for more than ~20×20 cells without annotation clarity.
See Also
- Seaborn: heatmap(), clustermap()
- Matplotlib: imshow()
Interactive Plots with Plotly
Purpose
Interactive plots enable users to explore data dynamically via hover tooltips, zooming, panning, and legend toggling. Plotly is a popular library for building web-based interactive charts.
Example in the Notebook
The notebook creates two Plotly Express charts:
- Scatter plot: total bill vs. tip, colored by day, with hover data (sex, party size).
- Line plot: monthly passenger counts over time.
Both are exported to HTML files that can be opened in any web browser.
Key Code Snippet
import plotly.express as px
fig = px.scatter(tips, x='total_bill', y='tip', color='day',
hover_data=['sex', 'size'], title='Tips: total bill vs tip')
fig.write_html('interactive_tips.html', include_plotlyjs='cdn')
Features
- Hover tooltips: Show data values and custom labels.
- Zoom & pan: Click and drag to zoom; double-click to reset.
- Legend toggling: Click legend items to show/hide traces.
- Download: Camera icon to save as PNG.
- Export: Save as HTML for sharing or embedding in reports.
Customization Tips
- Markers: Change marker size with
size='size'column or fixedmarker_size=10. - Colors: Use
color='day'for categorical orcolor_continuous_scale='Viridis'for continuous. - Faceting: Add
facet_row='day'orfacet_col='day'for subplots. - Annotations: Use
fig.add_annotation()to add text or arrows. - Custom styling: Modify
fig.update_layout()andfig.update_traces().
When to Use
- Always: for exploratory data analysis and interactive dashboards.
- Consider: exporting to HTML for stakeholder sharing (no Python environment needed).
- Avoid: Plotly for very large datasets (>50k points) without aggregation; consider datashader instead.
See Also
- Plotly Express: scatter(), line()
- Plotly documentation: plotly.com
Datashader for Large Datasets
Purpose
When you have hundreds of thousands or millions of data points, traditional scatter plots become too slow and visually cluttered. Datashader rasterizes and aggregates points into a grid for efficient, artifact-free rendering.
Use Cases
- Stock tick data (millions of prices per second).
- Sensor readings from IoT devices.
- Geographic data points (billions of GPS coordinates).
- Scientific simulations with dense output.
Example in the Notebook
The notebook demonstrates datashader by:
- Creating a synthetic dataset (tips repeated 2000 times with jitter) — ~500k rows.
- Aggregating points into a canvas grid.
- Shading the grid by point count using a colormap.
- Falling back to Matplotlib hexbin if datashader is unavailable.
Key Code Snippet
import datashader as ds
import datashader.transfer_functions as tf
cvs = ds.Canvas(plot_width=800, plot_height=400)
agg = cvs.points(big, 'x', 'y', ds.count())
img = tf.shade(agg, cmap=colorcet.m_fire, how='eq_hist')
Customization Tips
- Canvas size: Adjust
plot_widthandplot_heightfor resolution. - Aggregation: Use
ds.count()(default),ds.mean(),ds.sum(), or custom reductions. - Colormaps: Choose from
colorcetlibrary (perceptually uniform):m_fire,m_viridis, etc. - Normalization: Use
how='eq_hist'for histogram equalization orhow='linear'for simple scaling. - Fallback: The notebook includes a
sns.hexbin()fallback if datashader is not installed.
Installation
If datashader is not in your environment:
pip install datashader colorcet
When to Use
- Always: for datasets with >100k points.
- Consider: for exploratory analysis of massive datasets.
- Avoid: if you need individual point interactivity (use aggregated tooltips instead).
See Also
- Datashader: datashader.org
- Colorcet: colorcet library
Animations
Purpose
Animations bring data to life by showing changes over time or through parameter space. They're useful for:
- Illustrating temporal evolution.
- Demonstrating algorithm convergence.
- Creating engaging presentations or educational content.
Example in the Notebook
The notebook uses Matplotlib's FuncAnimation to create a simple animated sine wave:
- The wave oscillates horizontally.
- The animation runs for 200 frames at 30 ms per frame.
- It's displayed inline as HTML/JavaScript in the notebook.
- It's also exported to MP4 (with ffmpeg) or GIF (with Pillow).
Key Code Snippet
from matplotlib import animation
fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 400)
line, = ax.plot(x, np.sin(x))
ax.set_ylim(-1.5, 1.5)
def init():
line.set_ydata(np.sin(x))
return (line,)
def animate(i):
line.set_ydata(np.sin(x + i/10.0))
return (line,)
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=200, interval=30, blit=True)
Export Options
- Inline (Jupyter): Display directly with
HTML(anim.to_jshtml())(no external files). - MP4: Requires ffmpeg. Use
anim.save('out.mp4', writer='ffmpeg', fps=30). - GIF: Requires Pillow. Use
anim.save('out.gif', writer='pillow', fps=30).
Customization Tips
- Frame count: Increase
frames=200for longer animations. - Speed: Adjust
interval=30(milliseconds between frames; smaller = faster). - Blit mode: Set
blit=Truefor faster rendering (only redraws changed elements). - Complex animations: Use
update_lines_and_patches()or custom update functions. - Interactivity: Combine with Jupyter widgets (
ipywidgets) for interactive parameter control.
When to Use
- Always: for time-series visualizations and pedagogical content.
- Consider: GIF for social media sharing (MP4 may have compatibility issues on some platforms).
- Avoid: animations for static reports (use stills or summaries instead).
See Also
- Matplotlib: FuncAnimation
- Examples: matplotlib.org/gallery/animation/
Exporting & PDF Generation
Overview
Converting a Jupyter notebook to a shareable PDF is a common task. This documentation covers three approaches:
- Direct PDF via nbconvert (simplest, requires LaTeX).
- HTML export + browser print (universal, manual step).
- Automated headless Chromium (modern, no LaTeX).
Quick Links
Comparison
| Method | Ease | Speed | Requirements | Notes |
|---|---|---|---|---|
| nbconvert PDF | Easy | Fast | LaTeX | Fails if LaTeX missing |
| HTML + Print | Medium | Medium | Browser | Manual step; good UI control |
| Chromium | Medium | Fast | Chromium | Fully automated; modern alternative to LaTeX |
Choosing the Right Method
- If LaTeX is installed: Use nbconvert for simplicity.
- If you prefer a UI: Export to HTML and use browser Print dialog.
- If you want full automation without LaTeX: Use headless Chromium.
All three methods are implemented as helper cells in the notebook for easy access.
Direct PDF via nbconvert
Overview
nbconvert 7.16+ is Jupyter's official tool for converting notebooks to various formats. Using the --to pdf option generates a PDF via a LaTeX intermediate (requires Pandoc and LaTeX).
Prerequisites
- Jupyter and nbconvert installed (both in
requirements.txt). - LaTeX installed on your system (see Installation).
CLI Usage
From the project directory:
jupyter nbconvert --to pdf data-visualization-demo.ipynb
This creates data-visualization-demo.pdf in the same directory.
Options
--output-dir=./output— specify output directory.--no-input— exclude input cells (show only output).--template=classic— use different templates.--execute— re-run all cells before export (slow for large notebooks).
Example:
jupyter nbconvert --to pdf --no-input --output-dir=./pdf-output data-visualization-demo.ipynb
Troubleshooting
Error: "xelatex not found"
- LaTeX is not installed. See Installation for platform-specific steps.
Error: "PDF was not created"
- Complex plots or Unicode may cause LaTeX issues. Try simplifying or using alternative fonts.
- Use
--debugflag for verbose output:jupyter nbconvert --debug --to pdf ...
Performance:
- First run is slow (LaTeX compilation takes ~30s). Subsequent runs are faster.
When to Use
- Simple, static documents with standard plots.
- Archival: guaranteed PDF output format (backward compatible).
- CI/CD pipelines: if LaTeX is already available in your environment.
See Also
- nbconvert: sphinx.nbconvert.readthedocs.io
HTML Export + Browser Print
Overview
This method exports the notebook to HTML (widely supported) and then uses your web browser's Print dialog to save as PDF. It's universal, works everywhere, and gives you control over page setup.
Step 1: Export to HTML
From the project directory:
jupyter nbconvert --to html data-visualization-demo.ipynb
This creates data-visualization-demo.html.
Step 2: Open in a Browser
Open data-visualization-demo.html in your preferred web browser (Chrome, Firefox, Safari, Edge, etc.).
Step 3: Print to PDF
- Open Print dialog:
Ctrl+P(Windows/Linux) orCmd+P(macOS). - Configure:
- Destination: "Save as PDF" (Chrome/Edge) or "Print to File" (Firefox).
- Margins: Choose "None" to minimize whitespace.
- Paper size: A4 or Letter as desired.
- Background graphics: Check if you want colors (usually on by default).
- Save: Choose a filename and click "Save."
Advantages
- No extra software: Works with any browser.
- Visual control: Adjust margins, headers, footers in the Print dialog.
- Compatibility: HTML is backward-compatible and future-proof.
- Responsive: Modern browsers handle layout well.
Troubleshooting
Interactive plots don't appear
- Plotly and other interactive content render only if JavaScript is enabled in your browser. Check browser console for JS errors.
Headers/footers in printed PDF
- Adjust in the Print dialog. Some browsers allow custom headers/footers.
Page breaks
- Long notebooks may split awkwardly. Use CSS (custom notebook template) to control page breaks if needed.
See Also
- Notebook Print Stylesheet: Jupyter docs
Chromium Headless
Overview
Headless Chromium (or Google Chrome) can render HTML and export to PDF programmatically without a GUI. This is a modern, fast alternative to LaTeX and works reliably across platforms.
Prerequisites
- HTML file exported (see HTML Export + Browser Print).
- Chromium or Google Chrome installed (see Installation).
CLI Usage
Basic Command
chromium --headless --disable-gpu --print-to-pdf=output.pdf data-visualization-demo.html
Or with Google Chrome:
google-chrome --headless --disable-gpu --print-to-pdf=output.pdf data-visualization-demo.html
This creates output.pdf.
Options
--print-to-pdf=<path>— output PDF path (required).--print-to-pdf-margin-top=<mm>— top margin in millimeters.--print-to-pdf-margin-bottom=<mm>— bottom margin.--print-to-pdf-margin-left=<mm>— left margin.--print-to-pdf-margin-right=<mm>— right margin.--print-to-pdf-paper-width=<mm>— paper width (default 210 for A4).--print-to-pdf-paper-height=<mm>— paper height (default 297 for A4).--print-to-pdf-prefer-css-page-size— respect CSS page size if defined.
Example with custom margins:
chromium --headless --disable-gpu \
--print-to-pdf=output.pdf \
--print-to-pdf-margin-top=10 \
--print-to-pdf-margin-bottom=10 \
data-visualization-demo.html
Automated in the Notebook
The notebook includes a Python cell that automates this process:
- Looks for Chromium/Chrome on your PATH.
- Detects the HTML file.
- Runs the headless print-to-pdf command.
- Reports success or suggests alternatives.
Advantages
- No LaTeX: Avoids system dependency complexity.
- Fast: PDF generation is quick (seconds).
- Reliable: Chromium's rendering engine is modern and well-tested.
- Portable: Works on Linux, macOS, Windows.
- Customizable: Fine control over margins, page size, and other parameters.
Troubleshooting
Error: "Chromium not found"
- Install Chromium or Chrome (see Installation).
- Verify it's on your PATH:
which chromiumorwhich google-chrome.
Sandbox errors (Docker/WSL)
- If sandboxing fails, try without
--no-sandbox(be aware of security implications):chromium --headless --no-sandbox --disable-gpu --print-to-pdf=output.pdf file.html
PDF is blank
- Ensure the HTML file is valid and self-contained. Plotly charts with CDN URLs should work.
- Try opening the HTML in a browser first to confirm it renders.
See Also
- Chromium docs: headless-mode
- Supported flags:
chromium --help | grep print-to-pdf
Troubleshooting
Common Issues & Solutions
Jupyter / Kernel Issues
"Kernel died" or "Connection lost"
Symptoms: Jupyter stops responding or crashes.
Solutions:
- Restart the kernel: Kernel → Restart in Jupyter menu.
- Ensure you activated the virtual environment before launching Jupyter.
- Check available system memory; large datashader computations consume RAM.
- Update Jupyter:
pip install --upgrade jupyter ipykernel.
"ImportError: No module named 'X'" or "ModuleNotFoundError"
Symptoms: A cell fails with ImportError or ModuleNotFoundError for a library.
Solutions:
- Install the missing package:
pip install <package_name>orpip install -r requirements.txtfor full environment. - Ensure the virtual environment is activated:
which pythonshould show.venv/bin/python. - Restart the kernel after installing: Kernel → Restart.
- Verify the package is in your environment:
pip list | grep <package_name>.
Plotting Issues
Plots not showing inline
Symptoms: Code runs but no plot appears in the notebook.
Solutions:
- Ensure
%matplotlib inlineis in the setup cell and has been executed (it's in cell 1). - Verify
import matplotlib.pyplot as pltis run before plotting. - Confirm matplotlib backend is set:
%matplotlib inlineshould precede all plotting commands. - For Jupyter Lab, ensure the Lab extensions are up to date:
jupyter labextension list.
"No module named 'datashader'"
Symptoms: Datashader cell fails.
Solutions:
- Install datashader:
pip install datashader colorcet. - Restart the kernel.
- The notebook includes a fallback to Matplotlib hexbin.
Plotly charts show as blank boxes
Symptoms: Interactive plots don't render.
Solutions:
- Check browser console (F12) for JavaScript errors.
- Ensure JavaScript is enabled in your browser.
- Plotly requires an internet connection (CDN mode) unless you specify a local mode.
- Try exporting to HTML: the notebook handles this automatically.
Export Issues
"nbconvert: command not found"
Symptoms: jupyter nbconvert --to pdf ... fails.
Solutions:
- Ensure nbconvert is installed:
pip install nbconvert. - Verify you're in the virtual environment.
- Use the full path:
/path/to/.venv/bin/jupyter nbconvert ....
PDF export fails with LaTeX errors
Symptoms: jupyter nbconvert --to pdf produces errors like "xelatex not found."
Solutions:
- Install LaTeX (see Installation).
- Use alternative export methods (HTML + browser print, or Chromium headless).
- Try
--debugflag for detailed error messages.
Chromium headless produces blank or corrupted PDF
Symptoms: PDF is blank, garbled, or partially rendered.
Solutions:
- Verify the HTML file opens correctly in a browser.
- Check for complex CSS or JavaScript that may not be supported.
- Try with explicit margins:
--print-to-pdf-margin-top=10. - Upgrade Chromium:
apt update && apt install chromium-browser(Linux).
Performance Issues
Notebook is very slow
Symptoms: Cells take a long time to execute.
Solutions:
- The datashader example creates a large synthetic dataset. Reduce
mult=2000to a smaller value. - Disable inline plotting if not needed (no
%matplotlib inline). - Reduce plot resolution or number of points.
- Close unused tabs/applications to free system memory.
Animation export is very slow
Symptoms: anim.save() takes minutes or doesn't complete.
Solutions:
- Reduce number of frames:
frames=100instead of200. - Use GIF instead of MP4 (simpler encoding).
- Ensure ffmpeg is properly installed and on PATH.
Environment Issues
Virtual environment not activating
Symptoms: source .venv/bin/activate doesn't seem to work.
Solutions:
- Verify the venv exists:
ls -la .venv/. - Check shell type:
echo $SHELL. - For bash:
. .venv/bin/activate(dot-space prefix). - For PowerShell (Windows):
.\.venv\Scripts\Activate.ps1. - Check permissions:
chmod +x .venv/bin/activate.
Different Python versions conflicting
Symptoms: python and python3 point to different versions; notebooks use the wrong interpreter.
Solutions:
- Always use the venv Python:
which pythonshould show.venv/bin/python. - Create venv with explicit version:
python3.10 -m venv .venv. - In Jupyter, select the kernel from the venv: Kernel → Change Kernel.
Getting Help
- Jupyter docs: jupyter.readthedocs.io
- Matplotlib FAQ: matplotlib.org/faq
- Seaborn: seaborn.pydata.org
- Plotly: plotly.com
- GitHub Issues: Report bugs to respective library repositories.
Still Stuck?
- Check error messages carefully; they often suggest the fix.
- Search the library's documentation for your specific error.
- Search GitHub issues for similar problems.
- Consider running a fresh venv and re-installing all packages.
Good luck!