← Hub · LLMOps

LLMOps — operating language models

LLMOps extends MLOps for systems built on LLMs: versioning prompts and tools, maintaining retrieval corpora, running task-specific evaluations, controlling cost and latency, and governing safety (PII, toxicity, policy).

01 What LLMOps adds to MLOps

Classical MLOps centers on numeric metrics and fixed schemas. LLM applications add non-determinism, long prompts, multi-step chains, external retrieval, and subjective quality—so you need eval harnesses (golden sets, LLM-as-judge with caution, human review), prompt regression tests, and tracing per request.

02 Reference architecture (simplified)

Figure — LLM application stack
Gateway · auth · rate limits · routing Orchestration / agents prompt templates · tools RAG · vector index chunking · refresh Model API · routing · fallback models

Each layer needs versioning, monitoring, and failure modes (timeouts, empty retrieval, policy refusals).

03 Evaluation & feedback

Figure — eval + human-in-the-loop
Traffic Auto eval task success · safety Human review

Thumbs-up/down and escalation queues feed improved prompts, few-shot examples, or fine-tuning datasets—closing the product loop.

04 Cost & reliability

Token usage drives spend; caching, smaller models for easy subtasks, and summarization of long contexts help. Set budgets, alert on anomaly spikes, and design degraded modes when the model or retrieval is unavailable.

Governance

Treat prompts and retrieved text as sensitive where applicable; redact logs appropriately and document data retention for customer content.