Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Creating Density-Encoded Scatter Plots for Large Datasets in Python

Tech May 16 1

Density-encoded scatter plots (also called KDE scatter plots or density point plots) visualize 2D data distributions using color intensity instead of just overlapping points. Unlike standard scatter plots, which suffer from overplotting when handling thousands or more points, these charts use kernel density estimation (KDE) to calculate local data density and map it to a color scale. This makes underlying patterns, clusters, and sparse regions far easier to interpret at a glance.

Key technical concepts include:

  • Scatter plots: The foundational 2D data representation, mapping variable values to (x,y) coordinates. Overplotting occurs when points overlap excessively, obscuring distribution details.
  • Kernel Density Estimation (KDE): A non-parametrci method that smooths individual observations using a kernel function (e.g., Gaussian) and bandwidth parameter to estimate a continuous probability density function (PDF). This transforms discrete points into a continuous density surface.
  • Color mapping: Assigns colors to density values, with darker/richer hues representing higher point concentrations and lighter tones indicating lower density. Matplotlib provides a wide range of colormaps (e.g., viridis, RdBu, Spectral) to suit different visualization goals.

Why use density scatter plots?

  1. Clarity for large datasets: Eliminates overplotting by encoding density, revealing trends hidden in dense point clusters.
  2. Pattern discovery: Intensity variations highlight clusters, gradients, and outliers, aiding exploratory data analysis.
  3. Anomaly and cluster detection: Sparse regions or isolated high-density spots may indicate errors or meaningful subgroups.
  4. Model validation: Compares observed vs. predicted values to spot prediction biases or inconsistencies.
  5. Customizability: Supports adjustments to colormaps, transparency, marker sizes, and integration with other visualization elements like contour lines.

Example: Density Scatter Plot with Polynomial Fit

The following code generates synthetic 2D dataset, computes a KDE-based density map, and overlays a 7th-degree polynomial fit to demonstrate the technique.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm, colors, ticker
from scipy.stats import gaussian_kde

# Set random seed for reproducibility
np.random.seed(2025)

# Generate synthetic dataset: features (X) and targets (Y) with noise
features = np.random.normal(loc=0, scale=1, size=1000)
targets = features + np.random.normal(loc=0.1, scale=1, size=1000)

# Compute 2D kernel density estimate
data = np.vstack([features, targets])
density = gaussian_kde(data)
z_values = density(data)
sort_indices = z_values.argsort()
features_sorted, targets_sorted, z_sorted = (
    features[sort_indices],
    targets[sort_indices],
    z_values[sort_indices],
)

# Create figure and axes
fig, ax = plt.subplots(figsize=(8, 5), dpi=120)

# Define color map (viridis_r: reverse of the popular viridis colormap)
cmap = "viridis_r"

# Plot density-encoded scatter points
sc = ax.scatter(
    features_sorted,
    targets_sorted,
    c=z_sorted,
    cmap=cmap,
    alpha=0.8,
)

# Fit 7th-degree polynomial to the data
poly_degree = 7
poly_coeff = np.polyfit(features_sorted, targets_sorted, poly_degree)
poly_func = np.poly1d(poly_coeff)
y_pred = np.polyval(poly_func, features_sorted)

# Calculate R-squared value
corr = np.corrcoef(y_pred, targets_sorted)[0, 1]
r_squared = corr ** 2
print(f"R-squared value for polynomial fit: {r_squared:.3f}")

# Plot polynomial fit line
x_range = np.linspace(features_sorted.min(), features_sorted.max(), 1000)
ax.plot(x_range, poly_func(x_range), color="#FF6B6B", linewidth=2.5)

# Configure axes
ax.set_xlabel("Feature Value", fontsize=12, labelpad=10)
ax.set_ylabel("Target Value", fontsize=12, labelpad=10)
ax.set_title("Density Scatter Plot with Polynomial Fit", fontsize=14, pad=15)

# Customize ticks and spines
ax.tick_params(
    axis="both",
    direction="out",
    labelsize=11,
    length=5,
    width=1.2,
)
ax.xaxis.set_minor_locator(ticker.AutoMinorLocator())
ax.yaxis.set_minor_locator(ticker.AutoMinorLocator())
for spine in ax.spines.values():
    spine.set_linewidth(1.2)

# Add density colorbar
norm = colors.Normalize(vmin=z_sorted.min(), vmax=z_sorted.max())
cbar = fig.colorbar(
    cm.ScalarMappable(norm=norm, cmap=cmap),
    ax=ax,
    label="Local Point Density",
    shrink=0.85,
)
cbar.ax.tick_params(labelsize=10)
cbar.locator = ticker.MaxNLocator(nbins=8)
cbar.update_ticks()

# Add R-squared annotation
ax.text(
    x=-5,
    y=4.5,
    s=f"$R^2 = {r_squared:.3f}$",
    fontsize=12,
    fontweight="bold",
)

# Add grid and adjust layout
ax.grid(True, alpha=0.4, linestyle="--", color="#808080")
plt.tight_layout()

# Save and display
plt.savefig("density_scatter_plot.png", dpi=300, bbox_inches="tight")
plt.show()

This code generates synthetic data, calculates KDE density, plots the density scatter plot, fits a 7th-degree polynomial, computes R-squared, and customizes the visualization with labels, ticks, a colorbar, and a grid. The resulting plot clearly shows regions of high and low data density, along with the polynomial trend line.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.