Model Routing Is the New Unit Economics

Agentic Systems
Product Economics
AI Infrastructure
AI products that don’t implement model routing will have 30-50% worse margins than competitors by 2027.
Author

B. Talvinder

Published

March 22, 2026

Most teams are paying frontier model prices for commodity model work.

They default to GPT-4 or Claude Opus for tasks that a $0.10 per million token model could handle at 95% accuracy. The gap between what these models cost and what they’re actually needed for is the arbitrage opportunity of the next 18 months.

At Ostronaut, we generate training content at scale: presentations, quizzes, video scripts. We started with GPT-4 for everything. Cost per generation: $0.03. We moved structured extraction and template filling to GPT-4o-mini. Cost dropped to $0.015. Same user satisfaction scores. Half the cost.

The arbitrage isn’t about being cheap. It’s about understanding where model capability stops mattering to the outcome.

Inference CAC Compounds Like Customer Acquisition Cost

Call this Inference CAC: the cost to acquire value from each model call.

Just like customer acquisition cost, it’s a unit economic that compounds. If you’re running 10M inferences a month, a 50% reduction in per-call cost is $150K annual savings. That’s not rounding error. That’s headcount.

The shift happening now: AI products are moving from “can we do this?” to “can we do this profitably?” The companies that figure out model selection as a core competency will have better margins than competitors running everything through Opus.

This is not about performance. It’s about matching performance to the value threshold of the task.

The Default-to-Frontier Habit Is a Margin Killer

The default behavior in 2024-2025 was to use the best available model. GPT-4, Claude Opus, whatever scored highest on benchmarks. The logic made sense early: you’re prototyping, you want maximum capability, cost is secondary to learning if the feature works.

But that logic breaks once you’re in production. Once you’re processing thousands or millions of requests. Once the feature is validated and you’re optimizing for margin.

Here’s the pattern I see across teams: 80% of their AI tasks don’t need frontier model reasoning. They need reliable extraction, simple classification, template completion, or pattern matching. Tasks where a 90% accurate model and a 95% accurate model produce the same user outcome.

The performance plateau is real. If you’re extracting structured data from invoices, GPT-4’s reasoning capability is overkill. If you’re triaging support tickets into five categories, you don’t need multi-step reasoning. If you’re generating quiz questions from a content outline, you need consistency and format compliance, not creativity.

The companies that will win the next phase are the ones building model portfolios, not model lock-in. They route requests to the cheapest model that clears the quality bar for that specific task. Frontier models for complex reasoning. Mid-tier models for structured tasks. Small models for high-volume, low-complexity work.

This requires a different kind of product thinking. You need to know:

  • What is the minimum acceptable quality for this feature?
  • What does quality mean here — accuracy, consistency, format compliance, creativity?
  • What’s the cost per request at different model tiers?
  • What’s the volume, and how does that change unit economics?

Most teams can’t answer these questions. They pick a model, ship the feature, and never revisit the decision.

Here’s a claim: By 2027, any AI product doing more than 1M inferences/month that hasn’t implemented model routing will have 30-50% worse margins than competitors who have. The gap will be structural. It won’t be about better features. It will be about better cost discipline.

The Math Is Already Visible

Cursor’s pricing page tells you something: “Claude Opus is extremely expensive, so my recommendation not to use it, unless the company pays for it.” They’re already pushing users toward cost-aware model selection. The tool that’s supposed to make you more productive is teaching you to ration the expensive model.

That’s the canary. When dev tools start warning you about model costs, it means the unit economics are real enough to matter.

Look at SaaS unit economics. If your CAC is $1,800 and your annual contract value is $1,500, you’re underwater. You optimize CAC or increase ACV. Same logic applies to inference costs. If your cost per inference is $0.05 and your revenue per user per month is $20, you need to drive down inference cost or increase revenue.

Most teams will find it easier to optimize the cost side first.

The math is simple:

Volume Frontier Model Cost Mid-tier Model Cost Annual Difference
1M requests/month $50K/month $20K/month $360K/year
5M requests/month $250K/month $100K/month $1.8M/year
10M requests/month $500K/month $200K/month $3.6M/year

That’s the arbitrage. Find the 60-80% of your requests that don’t need frontier models. Route them to cheaper models. Bank the difference.

Model Selection Is a Feature-Level Decision

At Ostronaut, we built a multi-agent system for content generation. Initially, every agent used GPT-4. The cost per generation was $0.03. Acceptable for early customers, unsustainable at scale.

We audited every agent. Which tasks required reasoning? Which were template-filling? Which were format validation?

We moved structured extraction, template population, and rule-based validation to GPT-4o-mini. We kept GPT-4 for content composition and quality evaluation — the tasks where reasoning and creativity mattered.

Cost per generation dropped 50%. Quality scores stayed flat. We didn’t lose customers. We didn’t get more complaints. The cheaper model was good enough for those tasks.

The lesson: model selection is a feature-level decision, not a product-level decision. You don’t pick one model for your product. You pick the right model for each task within your product.

India Needs This More Than Anyone

This matters disproportionately for Indian AI product companies.

The ARPU constraints are real. When your customers are paying Rs 500-1,500/month, not Rs 5,000-20,000/month, your inference cost per user eats a bigger share of revenue. You can’t afford to run everything through Opus. You need to be surgical about where you spend on model capability.

The arbitrage is bigger here. Indian engineering teams are already good at cost optimization. Cloud cost management, infrastructure efficiency, resource utilization — these are native skills. Model routing is the same discipline applied to AI.

The companies building AI products in India that figure out model portfolios early will have a structural advantage. Not because they’re smarter. Because their margin constraints forced them to solve the problem first.

What I Don’t Know Yet

I don’t have a clean answer for how to build the routing logic itself. Do you hardcode rules? Do you train a classifier? Do you use an LLM to route to other LLMs? Each approach has tradeoffs.

Hardcoded rules are brittle but predictable. A classifier adds complexity but scales better. Using an LLM as a router adds latency and cost but might handle edge cases better.

We’re still experimenting. The right answer probably depends on your volume, your task diversity, and how much you’re willing to invest in routing infrastructure.

The other open question: how do you measure quality degradation when you switch models? User complaints are a lagging indicator. You need leading indicators — accuracy on test sets, consistency scores, format compliance rates. Building that instrumentation is non-trivial.

The Question Worth Asking

The companies that win the next phase of AI products won’t be the ones with the best models. They’ll be the ones with the best model selection strategy.

The question isn’t “which model should we use?” The question is “which model should we use for this specific task, at this volume, at this quality threshold?”

Most teams aren’t asking that question yet. They will be.