← Hub · MLOps

MLOps — ML in production

MLOps is the set of practices that automates and governs the end-to-end machine learning lifecycle: data → training → validation → deployment → monitoring → retraining—with the same rigor as software delivery.

01 Why MLOps exists

Shipping a model once is easy; keeping it correct, safe, and cost-effective over time is not. Data drifts, schemas change, dependencies age, and business metrics shift. MLOps borrows from DevOps (CI/CD, IaC, observability) but adds ML-specific concerns: experiment tracking, reproducible training, model artifacts, and evaluation beyond unit tests.

02 Lifecycle loop

Figure — closed-loop MLOps (conceptual)

Monitoring closes the loop: latency, errors, data drift, and business KPIs trigger investigations—sometimes leading to new training runs or rollbacks.

03 Typical components

Area	Examples
Versioning	Data snapshots, feature definitions, code, hyperparameters, model weights (often in a model registry).
Training pipeline	Scheduled or triggered jobs; reproducible environments (containers); evaluation gates before promotion.
Serving	Batch scoring, online APIs, edge—consistent preprocessing between train and serve.
Quality	Data validation tests, model cards, bias/fairness analysis where required.
Operations	Alerts, dashboards, tracing; avoid logging sensitive payloads in clear text.

04 CI/CD for ML vs pure software

Figure — extra gates for ML

Promotion often requires both statistical thresholds (AUC, calibration) and operational checks (latency budget, canary success).

LLMOps adds prompt/version management, RAG indices, LLM eval harnesses, and cost controls—see LLMOps. Foundation: DevOps.