High
Claude API latency for real-time triage
Async job queue prevents blocking. Triage runs on 15-min cycle, not real-time. SLA: results within 2 min of ingestion window close.
High
Multi-tenant data isolation failure
tenant_id enforced at ORM layer with required() validator. "Tenant bleed" integration test checks 5 cross-tenant query patterns before every deploy.
Medium
Retailer portal schema changes break P1
MCP adapters versioned per retailer. Schema validation on ingest returns UNCLASSIFIED rather than crashing. Monthly schema monitoring job.
Medium
Claude hallucination in dispute drafts
Mandatory evidence_chain field. Pydantic rejects output without cited chunks. Human review gate before any submission. No auto-submit path.
Low
pgvector RAG retrieval quality
Namespace by retailer_id + effective_date. Cosine similarity threshold configured per retailer. Monthly embedding quality audit on test queries.
Low
Claude API cost overrun at scale
Prompt caching for routing guide context (stable). Token limits enforced per call. Cost alerting at $50/client/month threshold. Budget caps per tenant.