Page 4 of 4 · ← Hub

FastAPI & engineering

Build HTTP APIs with automatic OpenAPI docs, validated payloads, and async I/O. Patterns for loading PyTorch or sklearn models, batch prediction, and how this connects to batch/stream data pipelines in production.

FastAPI Pydantic v2 uvicorn async
4
Topics
01

FastAPI basics

App, routes, and automatic OpenAPI

Run with uvicorn for ASGI

FastAPI is built on Starlette + Pydantic. Type hints on parameters drive validation and generated /docs. Use APIRouter to split large services.

python
from fastapi import FastAPI

app = FastAPI(title="Scoring API", version="1.0.0")

@app.get("/health")
def health() -> dict[str, str]:
    return {"status": "ok"}
Run locally: uvicorn module:app --reload. Behind production, put gunicorn+uvicorn workers or a managed container platform in front.
02

Pydantic models

Request / response schemas

Validation before your handler runs

python
from pydantic import BaseModel, Field

class PredictIn(BaseModel):
    features: list[float] = Field(..., min_length=1, max_length=10_000)

class PredictOut(BaseModel):
    score: float

@app.post("/predict", response_model=PredictOut)
def predict(body: PredictIn) -> PredictOut:
    # body.features is already validated
    ...
03

Serving ML models

🎯

Load once, infer many times

Lifespan hooks & thread safety

Load model weights when the process starts (FastAPI lifespan context), not per request. For PyTorch, set model.eval() and wrap inference in torch.no_grad(). Batch inputs when possible to amortize GPU kernel launch.

python
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # load model, connect pools
    app.state.model = load_model("weights.pt")
    yield
    # cleanup

app = FastAPI(lifespan=lifespan)
If multiple workers process share the same GPU, coordinate memory—often one model replica per GPU. For sklearn, joblib-loaded pipelines are typical; watch pickling compatibility across versions.
04

Data engineering touchpoints

🔗

Beyond the API layer

Where features and labels come from

Batch

Scheduled ETL/ELT (Airflow, Dagster, dbt) materializes tables your API or batch scorer reads. Polars and PySpark handle large extracts before they touch FastAPI.

Streaming

Kafka / Redpanda + consumers for near-real-time features; still often land in a store the API queries. Design idempotent consumers and monitor lag.

Operationally: structured logging (no secrets/PII in clear text), metrics (latency, error rate), traces across services, and health checks align with platform rules for production ML.