Data

Visualizing distributions

updated 2026-05-02 2 min read #statistics #fundamentals #visualization

Most stats intuitions come from being able to picture a distribution. This note collects the half-dozen shapes that come up constantly and what each one looks like at a glance.

The normal distribution

The default. Mean μ\mu, variance σ2\sigma^2:

f(xμ,σ2)=1σ2πexp ⁣((xμ)22σ2)f(x \mid \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} \exp\!\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right)

Standard normal distribution with sigma markers

Useful facts:

  • 68%\approx 68\% of mass within ±1σ\pm 1\sigma
  • 95%\approx 95\% within ±2σ\pm 2\sigma
  • 99.7%\approx 99.7\% within ±3σ\pm 3\sigma

The Central Limit Theorem says sums of independent finite-variance random variables tend to a normal — which is why it shows up everywhere.

Distribution zoo

DistributionShapeUse when…Parameters
Normalbellsums of many small effectsμ,σ\mu, \sigma
Log-normalright-skewedproducts of positive effects (incomes, file sizes)μ,σ\mu, \sigma
Exponentialdecliningtime between independent eventsλ\lambda
Poissondiscrete bellcount of rare events in a windowλ\lambda
Binomialdiscrete bellkk successes in nn trialsn,pn, p
Betaflexible [0,1][0,1]Bayesian priors over probabilitiesα,β\alpha, \beta
Power-lawheavy tail”rich get richer” processesα\alpha

Generating + plotting

A reasonable default workflow with NumPy and Matplotlib:

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

samples = {
    "normal":      rng.normal(0, 1, 10_000),
    "log-normal":  rng.lognormal(0, 0.5, 10_000),
    "exponential": rng.exponential(1.0, 10_000),
    "power-law":   rng.pareto(1.5, 10_000) + 1,
}

fig, axes = plt.subplots(2, 2, figsize=(8, 6))
for ax, (name, x) in zip(axes.flat, samples.items()):
    ax.hist(x, bins=60, density=True, alpha=0.7)
    ax.set_title(name)
    ax.set_xlim(np.quantile(x, [0.001, 0.999]))  # trim long tails
plt.tight_layout()

Tip — when plotting heavy-tailed distributions, switch to log-log axes (ax.set_xscale("log")). A power-law becomes a straight line.

Choosing one

flowchart TD
    A{What are you modeling?} --> B[continuous & symmetric]
    A --> C[continuous & positive only]
    A --> D[discrete counts]
    A --> E[probability of success]
    B --> B1[Normal]
    C --> C1{tail behavior?}
    C1 --> C2[exponential / log-normal]
    C1 --> C3[Pareto / power-law]
    D --> D1{rare events?}
    D1 --> D2[Poisson]
    D1 --> D3[Binomial]
    E --> E1[Beta or Binomial]

Tail risk

The most common modeling mistake is assuming normal when the underlying process is heavy-tailed. A few markers that you’re in heavy-tail territory:

  • sample mean keeps drifting as you add data
  • variance estimates are unstable across subsamples
  • log-log plot of survival function is roughly linear
  • one observation moves the mean by more than a percent

If two or more of these are true, a Gaussian model will systematically underestimate risk. See cap-theorem for an analogous “average case hides the worst case” pattern in distributed systems.

  • attention — softmax is a categorical distribution; cross-entropy is KL-divergence to a one-hot
  • cap-theorem — tail behavior of latency in distributed systems