← Hub · Classical ML

Classical machine learning — theory essentials

Supervised learning from tabular or engineered features: the goal is generalization to unseen data. These ideas apply to scikit-learn, XGBoost, and as intuition for neural nets.

01 Bias–variance tradeoff

High bias (underfitting): the model class is too simple to capture the signal. High variance (overfitting): the model fits noise in the training set. Total error decomposes (conceptually) into bias² + variance + irreducible noise—tuning model complexity balances the first two.

Figure — complexity vs error (schematic)

Training error keeps falling as complexity grows; test error often has a U-shape—early stopping, regularization, and cross-validation hunt the minimum test error region.

02 Train / validation / test

Split data so that hyperparameters are chosen on validation data, and final reporting uses a held-out test set touched only once. K-fold CV averages performance across folds when data is scarce—reducing variance in the score estimate.

Figure — three-way holdout

For time-series, validate on future chunks—random splits leak future information into the past.

03 Metrics (what “good” means)

Setting	Common metrics	Notes
Binary classification	ROC-AUC, PR-AUC, F1	Use PR when classes are imbalanced; ROC can look optimistic.
Multi-class	Macro/micro F1, log loss	Macro treats classes equally; micro follows global counts.
Regression	RMSE, MAE, MAPE	MAPE breaks near zero targets; robust losses for outliers.

Figure — ROC space (intuition)

AUC summarizes ranking quality across thresholds; it does not replace calibration when you need accurate probabilities.

04 Ensembles

Bagging (random forests) reduces variance by averaging many high-variance learners. Boosting (gradient boosting) reduces bias sequentially by fitting errors—often state-of-the-art on tabular data. Stacking learns a meta-model on base predictions—powerful but easier to overfit without careful CV.