The initial temptation is to use the most capable model for everything. It works — uniform quality. Cost, also uniform: ~3x more expensive than it needed to be.
The spreadsheet that opened my eyes
Mapping each app feature vs avg call cost vs decision value made it obvious: 'operational Q&A chat' and 'yearly strategic analysis' don't deserve the same model. High volume × low value vs low volume × high value.
Feature → tier router
An `ai_config` table maps feature → tier per workspace. Global default, Enterprise client override. Switching providers tomorrow only requires updating the mapping.
// ai_config table
{
workspace_id: "wks_abc",
feature_models: {
"chat_consultor": "fast", // R$ baratinho, latência baixa
"voice_curral": "fast",
"ndvi_insight": "balanced", // raciocínio + multimodal
"pdf_generation": "balanced",
"strategic_analysis": "premium", // 1-2 chamadas por mês
},
}
const model = config.feature_models[feature] ?? DEFAULT_MODELS[feature];The number that matters
1,247 calls in the pilot month, $18.42 total cost. Strategic analysis (2 calls) ate $4.90 — 26% of the cost. Chat (187 calls) spent $1.12. If I'd used premium tier for everything, the bill would be ~$55. Routing paid 67% in the first month alone.
The humble side: still learning how to measure
Today I measure cost well (every call saves input/output tokens to Postgres). But answer quality per tier I still don't have a formal framework for — I compare manually on canonical cases. That's the next step: an eval framework that runs against all 3 tiers in parallel and flags regression without me looking.