Here’s a deep, no-fluff summary of the Medium article you shared.
What the article sets out to do
- Give an intuitive and mathematical refresher on bias, variance, and irreducible error (as components of prediction error).
- Show how to empirically decompose error into bias² and variance for real models using bootstrapping and a hands-on Python workflow (via MLxtend’s
bias_variance_decomp). (Read Medium articles with AI)
Key concepts (short & precise)
- Bias: systematic error—the gap between average model predictions and the true function. High bias → underfitting.
- Variance: sensitivity to the training sample—predictions swing a lot across data resamples. High variance → overfitting.
- Trade-off: Increasing model complexity usually lowers bias but raises variance; “sweet spot” is found empirically, not analytically. (Read Medium articles with AI)
How the decomposition is estimated
- Procedure: Repeatedly bootstrap the training set, fit the same estimator each time, and evaluate all models on a fixed test set.
- Computed quantities per the article’s pseudocode:
- Average expected loss (MSE)
- Average bias (squared gap between ensemble mean prediction and ground truth)
- Average variance (dispersion of individual predictions around the ensemble mean)
with the relation expected loss ≈ bias² + variance (noise omitted on the fixed test set).
- Implementation is built around MLxtend’s routine with a clear loop: resample → fit → predict → aggregate. (Read Medium articles with AI)
Empirical examples & takeaways
Dataset: Boston Housing (regression).
- Decision Tree (single)
- Avg expected loss: 32.419; bias: 14.197; variance: 18.222.
- Bagging of the same tree
- Avg expected loss: 18.693; bias: 15.292; variance: 3.402.
- Insight: Bagging slashes variance (models agree more) while slightly raising bias; overall error improves.
- Neural network (Keras)
- Baseline: loss 25.470; bias 19.927; variance 5.543.
- Higher complexity (more neurons): loss 23.458; bias 17.608; variance 5.850 → lower bias, higher variance, net gain in error. (Read Medium articles with AI)
Practical guidance distilled
- Don’t chase only bias minimization—balance both components.
- To reduce bias: increase model capacity (e.g., boosting, richer features/feature engineering).
- To reduce variance: use bagging/ensembles or regularization.
- In practice, sweep model complexities and pick the lowest total error; the theoretical “sweet spot” isn’t closed-form. (Read Medium articles with AI)
Why this matters
- The decomposition turns a vague “over/underfit” diagnosis into measurable levers: you see if you’re dominated by bias or by variance, then choose the corresponding fix (capacity vs. stability). The article’s experiments concretely show bagging → variance↓ and capacity↑ → bias↓ (variance↑)—exactly what the theory predicts. (Read Medium articles with AI)
If you’d like, I can recreate the bias–variance plots and numbers from the article with scikit-learn/MLxtend on your machine and hand you a runnable notebook.