AI Infrastructure Will Be Worth More Than AI Applications

Agentic Systems

Infrastructure

Cloud Economics

The Plumbing Premium: why AI infrastructure companies systematically outperform application-layer companies once the market matures.

Author

B. Talvinder

Published

March 4, 2026

Modal Labs raised $87 million in September 2025 at a $1.1 billion valuation. Annualized revenue near $50 million. Baseten raised $150 million in its Series D at $2.15 billion. Fireworks AI raised $254 million at $4 billion.

These are infrastructure companies most application-layer founders haven’t heard of. They’re already worth more than the AI apps they serve.

Every compute paradigm produces the same investing mistake. The first wave of money chases applications. The second wave, the one that compounds, goes to infrastructure. AWS was plumbing for web apps. Stripe was plumbing for payments. The plumbing layer for AI workloads is being built right now, and it’s where the value will concentrate.

I’m calling this the Plumbing Premium: the systematic underpricing of infrastructure relative to applications in the early phase of any compute transition, followed by the systematic outperformance of infrastructure companies once the application layer commoditizes.

The physics are different

Web infrastructure follows Newtonian mechanics: linear, predictable, stackable. Double the users, double the servers. AI workloads follow thermodynamics. They’re non-linear, state-dependent, and entropy-prone.

You can’t “add more boxes” when the bottleneck is a 70-billion parameter model that takes 45 seconds to cold-start. Traditional auto-scaling doesn’t help when your cost driver is API calls to external AI services, not CPU utilization.

At Ostronaut, a single content generation request in production calls an LLM provider, a vector database, a text-to-speech service, a video renderer, and blob storage. Five external services, each with different latency profiles and failure modes. The cost surface is unrecognizable compared to a standard web app. This is the same agent cost architecture problem every agentic system faces at scale.

The tooling built for web infrastructure assumes the wrong physics. That gap is the market.

The unbundling has already happened

The AI infrastructure stack is splitting into three layers, and each one maps to how the cloud stack matured:

AI Infrastructure Layer	Cloud Equivalent	Companies & Funding
Deployment (GPU scheduling, cold-start, inference routing)	Compute (EC2)	Modal ($111M total), Baseten ($285M+), Fireworks ($331M)
Orchestration (agent loops, retrieval chains, workflows)	Container orchestration (K8s)	LangChain, LlamaIndex, CrewAI (all open source)
Observability (prompt logging, cost tracking, hallucination detection)	Monitoring (Datadog)	Helicone, Langfuse, Arize ($70M Series C, Feb 2025)

Modal’s entire bet: AI inference is bursty. You need massive compute for minutes, then nothing. Paying for reserved GPU instances at 15% utilization is paying for a bus because you occasionally move 50 people. Serverless GPU aligns cost with usage.

Simple insight. Modal went from seed to unicorn in under two years on it.

The observability layer is equally underserved. Datadog can tell you your p99 latency. It can’t tell you your model started hallucinating at 3 AM because your vector index got corrupted. You need the full prompt, the completion, token counts, latency, cost per request, and the quality score of the output.

Galileo raised $45 million in October 2024 specifically for AI evaluation intelligence. Revenue grew 834% that year. Six Fortune 50 companies signed up.

The Indian cost constraint advantage

Indian AI startups face an extra constraint that turns out to be an advantage. When your ARPU is a tenth of a US SaaS product, you can’t afford a sloppy infrastructure stack. This is the same structural challenge Indian startups face at every layer — and it shapes everything from agent reliability to evaluation economics.

Sarvam AI is the clearest example. Building multilingual foundation models across all 22 official Indian languages, funded with $53.8 million plus ₹246 crore in government compute support under the IndiaAI Mission. Their Sarvam-105B model uses mixture-of-experts architecture — not because it’s elegant, but because serving 22 languages at Indian price points on standard dense models is economically impossible.

Infrastructure efficiency isn’t a feature. It’s survival.

The Indian companies that get infrastructure right will have a structural cost advantage that well-funded but wasteful competitors can’t easily replicate. The same pattern that made Indian IT services globally competitive, applied to a new domain.

Tricog and Niramai both built AI diagnostic tools for healthcare — the same companies that prove centralized intelligence beats federated approaches in Indian conditions. Both had to solve for deployment in tier-2 and tier-3 cities with unreliable connectivity and low per-scan economics. The infrastructure constraints forced architectural decisions that made the products more resilient and cheaper to operate than comparable Western products. That’s not a story about scrappiness. It’s a story about economic physics shaping technical architecture.

What I got wrong

The Plumbing Premium thesis assumes the infrastructure layer consolidates before the application layer does. That’s how it played out with cloud. But AI might be different.

If OpenAI or Anthropic vertically integrate deployment, orchestration, and observability into their platforms, the standalone infrastructure companies get squeezed. The foundation model providers have historically been bad at building infrastructure products. But they have the distribution. OpenAI’s Assistants API is a direct attempt to own the orchestration layer. It hasn’t worked yet, but the threat is real.

The other open question: where does the Plumbing Premium accrue when the plumbing itself is open source? LangChain and LlamaIndex are both open source. The deployment layer is where the money is concentrating — Modal, Baseten, Fireworks have raised over $700 million combined.

If orchestration commoditizes, the premium shifts to deployment and observability. Or it shifts to managed services built on open source, which is how Red Hat worked. But the dynamics are still being sorted.

Scenario	What happens	Premium accrues to	Current signal
Vertical integration	OpenAI/Anthropic own deployment + orchestration	Model providers	OpenAI Assistants API attempting orchestration layer
Horizontal layering	Deployment consolidates, orchestration goes OSS	Deployment + observability	$700M+ into deployment, $115M+ into observability
Managed OSS	OSS orchestration wins, managed services layer emerges	Managed service providers (Red Hat model)	LangChain/LlamaIndex both open source

Model providers will commoditize. Applications will churn. The question is whether infrastructure compounds in AI the way it compounded in cloud, or whether this paradigm has different economics. The Cambrian Filter predicts that most application-layer companies will die; if that’s right, the Plumbing Premium thesis follows naturally.

The funding data says the market is betting on compounding. I think the market is right. But I’m watching the vertical integration moves closely.

--- categories: - Agentic Systems - Infrastructure - Cloud Economics date: 2026-03-04 description: 'The Plumbing Premium: why AI infrastructure companies systematically outperform application-layer companies once the market matures.' draft: false image: assets/og-image.png resources: - assets/d2-diagram-1.png - assets/d2-diagram-1.svg - assets/d2-diagram-2.png - assets/d2-diagram-2.svg - assets/devto-cover.png - assets/og-image.png title: AI Infrastructure Will Be Worth More Than AI Applications --- <script class="schema-defined-term" type="application/json"> {"name": "The Plumbing Premium", "description": "The principle that AI infrastructure companies systematically outperform application-layer companies once the market matures, because infrastructure captures value from every application built on top. Coined by B. Talvinder."} </script> <script class="schema-faq" type="application/json"> [ {"q": "What is the Plumbing Premium?", "a": "The Plumbing Premium is the principle that AI infrastructure companies — those building the runtime layer (inference, orchestration, observability) — systematically outperform AI application companies once the market matures. Infrastructure captures value from every application built on top of it."}, {"q": "Why is AI infrastructure more valuable than AI applications?", "a": "AI applications face commoditization as models improve and switching costs stay low. Infrastructure companies benefit from every application built on their stack, creating compounding network effects and higher margins as the ecosystem grows."}, {"q": "What companies demonstrate the Plumbing Premium?", "a": "Companies like Modal Labs ($1.1B valuation), Baseten ($2.15B), and Fireworks AI ($4B) demonstrate the pattern — all building AI runtime infrastructure rather than end-user applications."} ] </script> Modal Labs raised $87 million in September 2025 at a $1.1 billion valuation. Annualized revenue near $50 million. Baseten raised $150 million in its Series D at $2.15 billion. Fireworks AI raised $254 million at $4 billion. These are infrastructure companies most application-layer founders haven't heard of. They're already worth more than the AI apps they serve. Every compute paradigm produces the same investing mistake. The first wave of money chases applications. The second wave, the one that compounds, goes to infrastructure. AWS was plumbing for web apps. Stripe was plumbing for payments. The plumbing layer for AI workloads is being built right now, and it's where the value will concentrate. I'm calling this the Plumbing Premium: the systematic underpricing of infrastructure relative to applications in the early phase of any compute transition, followed by the systematic outperformance of infrastructure companies once the application layer commoditizes. ## The physics are different Web infrastructure follows Newtonian mechanics: linear, predictable, stackable. Double the users, double the servers. AI workloads follow thermodynamics. They're non-linear, state-dependent, and entropy-prone. You can't "add more boxes" when the bottleneck is a 70-billion parameter model that takes 45 seconds to cold-start. Traditional auto-scaling doesn't help when your cost driver is API calls to external AI services, not CPU utilization. At Ostronaut, a single content generation request in production calls an LLM provider, a vector database, a text-to-speech service, a video renderer, and blob storage. Five external services, each with different latency profiles and failure modes. The cost surface is unrecognizable compared to a standard web app. This is the same [agent cost architecture](/build-logs/agentic-rightsizing/) problem every agentic system faces at scale. The tooling built for web infrastructure assumes the wrong physics. That gap is the market. ## The unbundling has already happened The AI infrastructure stack is splitting into three layers, and each one maps to how the cloud stack matured: | AI Infrastructure Layer | Cloud Equivalent | Companies & Funding | |---|---|---| | Deployment (GPU scheduling, cold-start, inference routing) | Compute (EC2) | Modal ($111M total), Baseten ($285M+), Fireworks ($331M) | | Orchestration (agent loops, retrieval chains, workflows) | Container orchestration (K8s) | LangChain, LlamaIndex, CrewAI (all open source) | | Observability (prompt logging, cost tracking, hallucination detection) | Monitoring (Datadog) | Helicone, Langfuse, Arize ($70M Series C, Feb 2025) | Modal's entire bet: AI inference is bursty. You need massive compute for minutes, then nothing. Paying for reserved GPU instances at 15% utilization is paying for a bus because you occasionally move 50 people. Serverless GPU aligns cost with usage. Simple insight. Modal went from seed to unicorn in under two years on it. The observability layer is equally underserved. Datadog can tell you your p99 latency. It can't tell you your model started hallucinating at 3 AM because your vector index got corrupted. You need the full prompt, the completion, token counts, latency, cost per request, and the quality score of the output. Galileo raised $45 million in October 2024 specifically for AI evaluation intelligence. Revenue grew 834% that year. Six Fortune 50 companies signed up. ```{.d2 width="100%"} direction: down deployment: "Deployment Layer" { modal: "Modal (raised 111M)" baseten: "Baseten (raised 285M)" fireworks: "Fireworks (raised 331M)" style.fill: "#dbeafe" } orchestration: "Orchestration Layer" { langchain: "LangChain (OSS)" llamaindex: "LlamaIndex (OSS)" crewai: "CrewAI (OSS)" style.fill: "#dcfce7" } observability: "Observability Layer" { helicone: Helicone langfuse: Langfuse arize: "Arize (Series C 70M)" style.fill: "#fef9c3" } deployment -> orchestration: "compute substrate" orchestration -> observability: "execution telemetry" compute_equiv: "≈ EC2 (Cloud Compute)" k8s_equiv: "≈ Kubernetes (Orchestration)" dd_equiv: "≈ Datadog (Monitoring)" deployment -> compute_equiv: "maps to" { style.stroke-dash: 4 } orchestration -> k8s_equiv: "maps to" { style.stroke-dash: 4 } observability -> dd_equiv: "maps to" { style.stroke-dash: 4 } ``` ## The Indian cost constraint advantage Indian AI startups face an extra constraint that turns out to be an advantage. When your ARPU is a tenth of a US SaaS product, you can't afford a sloppy infrastructure stack. This is the same [structural challenge Indian startups face](/frameworks/biggest-challenge-indian-startups/) at every layer — and it shapes everything from [agent reliability](/field-notes/indian-saas-agent-reliability/) to [evaluation economics](/build-logs/llm-judge-india-failure/). Sarvam AI is the clearest example. Building multilingual foundation models across all 22 official Indian languages, funded with $53.8 million plus ₹246 crore in government compute support under the IndiaAI Mission. Their Sarvam-105B model uses mixture-of-experts architecture — not because it's elegant, but because serving 22 languages at Indian price points on standard dense models is economically impossible. Infrastructure efficiency isn't a feature. It's survival. The Indian companies that get infrastructure right will have a structural cost advantage that well-funded but wasteful competitors can't easily replicate. The same pattern that made Indian IT services globally competitive, applied to a new domain. Tricog and Niramai both built AI diagnostic tools for healthcare — the same companies that prove [centralized intelligence beats federated approaches](/field-notes/federated-learning-healthcare-failure/) in Indian conditions. Both had to solve for deployment in tier-2 and tier-3 cities with unreliable connectivity and low per-scan economics. The infrastructure constraints forced architectural decisions that made the products more resilient and cheaper to operate than comparable Western products. That's not a story about scrappiness. It's a story about economic physics shaping technical architecture. ## What I got wrong The Plumbing Premium thesis assumes the infrastructure layer consolidates before the application layer does. That's how it played out with cloud. But AI might be different. If OpenAI or Anthropic vertically integrate deployment, orchestration, and observability into their platforms, the standalone infrastructure companies get squeezed. The foundation model providers have historically been bad at building infrastructure products. But they have the distribution. OpenAI's Assistants API is a direct attempt to own the orchestration layer. It hasn't worked yet, but the threat is real. The other open question: where does the Plumbing Premium accrue when the plumbing itself is open source? LangChain and LlamaIndex are both open source. The deployment layer is where the money is concentrating — Modal, Baseten, Fireworks have raised over $700 million combined. If orchestration commoditizes, the premium shifts to deployment and observability. Or it shifts to managed services built on open source, which is how Red Hat worked. But the dynamics are still being sorted. | Scenario | What happens | Premium accrues to | Current signal | |---|---|---|---| | Vertical integration | OpenAI/Anthropic own deployment + orchestration | Model providers | OpenAI Assistants API attempting orchestration layer | | Horizontal layering | Deployment consolidates, orchestration goes OSS | Deployment + observability | **$700M+ into deployment, $115M+ into observability** | | Managed OSS | OSS orchestration wins, managed services layer emerges | Managed service providers (Red Hat model) | LangChain/LlamaIndex both open source | Model providers will commoditize. Applications will churn. The question is whether infrastructure compounds in AI the way it compounded in cloud, or whether this paradigm has different economics. The [Cambrian Filter](/frameworks/ai-evolution/) predicts that most application-layer companies will die; if that's right, the Plumbing Premium thesis follows naturally. The funding data says the market is betting on compounding. I think the market is right. But I'm watching the vertical integration moves closely.