← Hub · Classical ML

Classical machine learning — theory essentials

Supervised learning from tabular or engineered features: the goal is generalization to unseen data. These ideas apply to scikit-learn, XGBoost, and as intuition for neural nets.

01 Bias–variance tradeoff

High bias (underfitting): the model class is too simple to capture the signal. High variance (overfitting): the model fits noise in the training set. Total error decomposes (conceptually) into bias² + variance + irreducible noise—tuning model complexity balances the first two.

Figure — complexity vs error (schematic)
Model complexity → Error → train test sweet spot

Training error keeps falling as complexity grows; test error often has a U-shape—early stopping, regularization, and cross-validation hunt the minimum test error region.

02 Train / validation / test

Split data so that hyperparameters are chosen on validation data, and final reporting uses a held-out test set touched only once. K-fold CV averages performance across folds when data is scarce—reducing variance in the score estimate.

Figure — three-way holdout
Train Validation Test

For time-series, validate on future chunks—random splits leak future information into the past.

03 Metrics (what “good” means)

SettingCommon metricsNotes
Binary classificationROC-AUC, PR-AUC, F1Use PR when classes are imbalanced; ROC can look optimistic.
Multi-classMacro/micro F1, log lossMacro treats classes equally; micro follows global counts.
RegressionRMSE, MAE, MAPEMAPE breaks near zero targets; robust losses for outliers.
Figure — ROC space (intuition)
FPR TPR better ↑

AUC summarizes ranking quality across thresholds; it does not replace calibration when you need accurate probabilities.

04 Ensembles

Bagging (random forests) reduces variance by averaging many high-variance learners. Boosting (gradient boosting) reduces bias sequentially by fitting errors—often state-of-the-art on tabular data. Stacking learns a meta-model on base predictions—powerful but easier to overfit without careful CV.