Quality-gated ETL
Glue PySpark: Laplacian blur scoring, perceptual-hash dedup, 640×640 normalize, Great Expectations checks before any write.
A vehicle-damage assessment pipeline: raw claim photo in, schema-valid damage JSON out — routed by severity and confidence to repair workflows, ERP, or a human adjuster.
The traditional human adjuster cycle for a damage claim runs 2–5 days and $45–$120 per assessment, with inconsistent descriptions and manual routing.
The goal: a sub-90-second automated path that emits typed damage records — class, severity, zones, repair estimate, routing action — directly consumable by repair workflows, SAP ERP and claims platforms, while feeding a warehouse for trend analytics.
Glue PySpark: Laplacian blur scoring, perceptual-hash dedup, 640×640 normalize, Great Expectations checks before any write.
SageMaker-hosted VLM; every output validated against a draft-07 JSON Schema — invalid responses caught and escalated, never silently written.
Severity × confidence → auto-approve, queue-for-repair, flag-review or route-to-adjuster, with total-loss override.
Open-source ingestion + label normalization across five datasets (CompCars, CarDD, Stanford Cars, Kaggle, HuggingFace) is implemented; the PaliGemma 3B fine-tune is the next milestone.