NLP · field narrative analytics

ClaimLens

Free-text warranty claims and field notes in. Structured failure trends and overcycle-anomaly classes out — so the next 5-Why / 8D starts from data, not a spreadsheet search.

Warranty-narrative NLP macro-F1 0.90 (measured) source-typed intake 66 tests · CI-gated feeds QualityMind 8D
The problem

The signal is buried in prose

In commercial-vehicle field quality, the slowest step of root-cause analysis is reading thousands of free-text warranty claims, field-service narratives and equipment logs to figure out what is actually failing, and how often.

A large share of those returns are overcycle anomalies — repeated abnormal device cycling (soft resets, cloud-sync failures, power cycles) — that look like hardware faults but aren't. Telling them apart by hand is slow and inconsistent.

How it works

Extract → classify → trend → hand off

Claim — tagged by source: customer complaint · dealer RO · field log │ ├─► Extract component · failure mode · symptom · action · part # (rules) │ source-aware: dealer RO → action · field log → overcycle ├─► Classify overcycle anomaly: Soft Reset · Cloud Sync · … (TF-IDF + LogReg) └─► Aggregate Pareto by label / component / failure mode / source │ ▼ /handoff → QualityMind-RAG 5-Why / 8D problem_statement
Intake

Source-typed

Each narrative is tagged customer complaint, dealer RO or field log — driving a by-source Pareto and per-stream extraction emphasis. Regex + gazetteers, zero model download.

Classify

Overcycle anomaly

TF-IDF + balanced logistic regression over a locked 5-label taxonomy, reproducible (seed 42); low-confidence (<0.55) flagged needs_review.

Hand off

RCA bridge

Dominant trend becomes a QualityMind-ready 8D / 5-Why payload, POSTed through an SSRF-guarded client — narrative to corrective action.

Measured, not asserted

macro-F1 0.90 on 1,200 narratives

ClassPrecisionRecallF1
Soft Reset0.830.840.83
Cloud Sync0.910.870.89
Connectivity Loss0.880.950.91
Power Cycle0.960.940.95
No Fault Found0.900.880.89
macro avg0.900.900.90
Per-class F1
Soft Reset
0.83
Cloud Sync
0.89
Conn. Loss
0.91
Power Cycle
0.95
No Fault
0.89
macro avg
0.90
scale 0–1.0 · dashed line = 0.88 macro-F1 gate

Stratified 75/25 split, produced by evaluate.py — which also writes a confusion matrix, per-label recall and an overcycle-recall summary to metrics.json. The synthetic corpus mirrors published automotive-warranty theme frequencies (infotainment / OTA / cloud-sync dominant, with battery-range and key-fob / USB edge cases) and injects ~18% ambiguous / blended hard cases on purpose, so the score reflects real field-note messiness — not a separable toy set.

Stack

Built with

Python 3.10+scikit-learnTF-IDF + LogReg FastAPIPydanticpytestruffGitHub Actions