01 Why MLOps exists
Shipping a model once is easy; keeping it correct, safe, and cost-effective over time is not. Data drifts, schemas change, dependencies age, and business metrics shift. MLOps borrows from DevOps (CI/CD, IaC, observability) but adds ML-specific concerns: experiment tracking, reproducible training, model artifacts, and evaluation beyond unit tests.
02 Lifecycle loop
Monitoring closes the loop: latency, errors, data drift, and business KPIs trigger investigations—sometimes leading to new training runs or rollbacks.
03 Typical components
| Area | Examples |
|---|---|
| Versioning | Data snapshots, feature definitions, code, hyperparameters, model weights (often in a model registry). |
| Training pipeline | Scheduled or triggered jobs; reproducible environments (containers); evaluation gates before promotion. |
| Serving | Batch scoring, online APIs, edge—consistent preprocessing between train and serve. |
| Quality | Data validation tests, model cards, bias/fairness analysis where required. |
| Operations | Alerts, dashboards, tracing; avoid logging sensitive payloads in clear text. |
04 CI/CD for ML vs pure software
Promotion often requires both statistical thresholds (AUC, calibration) and operational checks (latency budget, canary success).