FastAPI basics
App, routes, and automatic OpenAPI
Run with uvicorn for ASGI
FastAPI is built on Starlette + Pydantic. Type hints on parameters drive validation and generated /docs. Use APIRouter to split large services.
from fastapi import FastAPI app = FastAPI(title="Scoring API", version="1.0.0") @app.get("/health") def health() -> dict[str, str]: return {"status": "ok"}
uvicorn module:app --reload. Behind production, put gunicorn+uvicorn workers or a managed container platform in front.Pydantic models
Request / response schemas
Validation before your handler runs
from pydantic import BaseModel, Field class PredictIn(BaseModel): features: list[float] = Field(..., min_length=1, max_length=10_000) class PredictOut(BaseModel): score: float @app.post("/predict", response_model=PredictOut) def predict(body: PredictIn) -> PredictOut: # body.features is already validated ...
Serving ML models
Load once, infer many times
Lifespan hooks & thread safety
Load model weights when the process starts (FastAPI lifespan context), not per request. For PyTorch, set model.eval() and wrap inference in torch.no_grad(). Batch inputs when possible to amortize GPU kernel launch.
from contextlib import asynccontextmanager @asynccontextmanager async def lifespan(app: FastAPI): # load model, connect pools app.state.model = load_model("weights.pt") yield # cleanup app = FastAPI(lifespan=lifespan)
Data engineering touchpoints
Beyond the API layer
Where features and labels come from
Scheduled ETL/ELT (Airflow, Dagster, dbt) materializes tables your API or batch scorer reads. Polars and PySpark handle large extracts before they touch FastAPI.
Kafka / Redpanda + consumers for near-real-time features; still often land in a store the API queries. Design idempotent consumers and monitor lag.
Operationally: structured logging (no secrets/PII in clear text), metrics (latency, error rate), traces across services, and health checks align with platform rules for production ML.