AICusto· April 22, 2026 · ~5 min

Per-feature model routing: 70% savings without losing quality

General chat doesn't need the most expensive model; strategic analysis does. Per-feature routing with workspace override saves ~70% without sacrificing the critical decision. Measured in the AgroTrack pilot.

The initial temptation is to use the most capable model for everything. It works — uniform quality. Cost, also uniform: ~3x more expensive than it needed to be.

The spreadsheet that opened my eyes

Mapping each app feature vs avg call cost vs decision value made it obvious: 'operational Q&A chat' and 'yearly strategic analysis' don't deserve the same model. High volume × low value vs low volume × high value.

Feature → tier router

An `ai_config` table maps feature → tier per workspace. Global default, Enterprise client override. Switching providers tomorrow only requires updating the mapping.

typescript

// ai_config table
{
  workspace_id: "wks_abc",
  feature_models: {
    "chat_consultor": "fast",        // R$ baratinho, latência baixa
    "voice_curral": "fast",
    "ndvi_insight": "balanced",      // raciocínio + multimodal
    "pdf_generation": "balanced",
    "strategic_analysis": "premium", // 1-2 chamadas por mês
  },
}

const model = config.feature_models[feature] ?? DEFAULT_MODELS[feature];

The number that matters

1,247 calls in the pilot month, $18.42 total cost. Strategic analysis (2 calls) ate $4.90 — 26% of the cost. Chat (187 calls) spent $1.12. If I'd used premium tier for everything, the bill would be ~$55. Routing paid 67% in the first month alone.

The humble side: still learning how to measure

Today I measure cost well (every call saves input/output tokens to Postgres). But answer quality per tier I still don't have a formal framework for — I compare manually on canonical cases. That's the next step: an eval framework that runs against all 3 tiers in parallel and flags regression without me looking.

All notes

Explore more