01 What LLMOps adds to MLOps
Classical MLOps centers on numeric metrics and fixed schemas. LLM applications add non-determinism, long prompts, multi-step chains, external retrieval, and subjective quality—so you need eval harnesses (golden sets, LLM-as-judge with caution, human review), prompt regression tests, and tracing per request.
02 Reference architecture (simplified)
Each layer needs versioning, monitoring, and failure modes (timeouts, empty retrieval, policy refusals).
03 Evaluation & feedback
Thumbs-up/down and escalation queues feed improved prompts, few-shot examples, or fine-tuning datasets—closing the product loop.
04 Cost & reliability
Token usage drives spend; caching, smaller models for easy subtasks, and summarization of long contexts help. Set budgets, alert on anomaly spikes, and design degraded modes when the model or retrieval is unavailable.
Treat prompts and retrieved text as sensitive where applicable; redact logs appropriately and document data retention for customer content.