The Context Window Pricing Collapse

Infrastructure

1M token context windows at standard pricing just killed the moat every RAG startup built in 2023-24. What survives when the retrieval layer becomes a free feature?

Author

B. Talvinder

Published

April 3, 2026

Claude’s Opus 4.6 and Sonnet 4.6 now ship with 1M token context windows at standard pricing. No premium tier. No waitlist. Just 1M tokens, available to everyone.

This single change killed the moat that every RAG startup built in 2023-24.

The competitive advantage those companies sold was never retrieval quality. It was working around small context windows. That constraint just disappeared.

The retrieval layer was a workaround

Between 2023 and early 2025, hundreds of startups raised money on the same pitch: “LLMs have small context windows, so you need our retrieval layer to feed them the right chunks.” Document Q&A companies. Legal AI startups. Enterprise search tools. Internal knowledge bases. All built on the same assumption: context windows are expensive and scarce, so smart retrieval is the product.

That was a reasonable bet when GPT-4 had 8K tokens and Claude had 100K at a premium. Chunking, embedding, reranking, and retrieval pipelines were genuine engineering problems. The companies that solved them well could charge for it.

But the economics just shifted. When you can drop an entire codebase, a full legal contract set, or a year of customer support tickets into a single prompt at standard pricing, the retrieval layer stops being a product and starts being a feature. A feature that the model provider gives away for free.

Why this hits Indian AI startups hardest

The Indian AI ecosystem produced a disproportionate number of RAG-focused companies. The problem was well-defined. The engineering talent was available. The capital requirements were low. Document processing for Indian enterprises. Multilingual knowledge retrieval. Compliance document analysis.

These were real businesses solving real problems. The problem is that the solution was always a workaround for a temporary constraint.

Look at the YC batches from 2023-24. Count the Indian-founded companies whose pitch decks said “RAG” or “retrieval” or “knowledge base.” Now ask how many of them have a moat beyond the retrieval pipeline itself.

Three categories of products just got commoditized

Document Q&A. If your product takes PDFs, chunks them, embeds them, and lets users ask questions, you now compete with “paste the PDF into Claude.” The entire retrieval pipeline becomes overhead. A user with a 1M context window doesn’t need your pipeline. They need a text box.

Enterprise search over internal docs. Companies like Glean built serious products here, but they also built moats beyond retrieval: connectors, permissions, personalization, usage analytics. The startups that only built the search layer are exposed.

Legal and compliance AI. Contract review, regulatory analysis, due diligence. These were perfect RAG use cases because the source documents were long and the queries were specific. Now you can feed entire contract sets into a single prompt.

What actually survives infinite context

The interesting question isn’t what dies. It’s what survives when context windows go functionally infinite.

Proprietary data pipelines. Getting data into a prompt is easy. Getting the right data, cleaned, structured, and current, from messy enterprise systems is hard. Connectors to SAP, Salesforce, government databases, legacy ERPs. That’s plumbing work that doesn’t get commoditized by larger context windows.

Orchestration and multi-step reasoning. RAG was a single-hop pattern: retrieve, then generate. The interesting AI products are multi-step: search, reason, act, verify, iterate. At Ostronaut, we learned that the hard problem in content generation isn’t feeding the model enough context. It’s coordinating multiple generation steps where each step depends on the output of the previous one, with validation gates between them. That coordination layer survives because it’s orthogonal to context window size.

Domain-specific reliability. In regulated industries, the value isn’t in retrieval. It’s in auditability, compliance, and deterministic behavior around non-deterministic models. A hospital doesn’t care that you can now fit all patient records into one prompt. They care that your system can prove why it made a specific recommendation and that it handles edge cases without hallucinating.

Cost optimization at scale. Here’s the part nobody’s talking about: 1M context windows are available, but they’re not free. Sending 1M tokens per request at enterprise scale gets expensive fast. The companies that build intelligent routing – small context for simple queries, large context only when needed, with caching and deduplication – will create real value. The constraint moved from “can’t fit enough context” to “can’t afford to use full context on every request.”

The test for your AI product

If you’re building an AI product in India right now, here’s the test: remove the retrieval layer from your architecture. Does your product still have a reason to exist?

If the answer is no, you have six months to find one. Not because 1M context windows will replace everything overnight. Adoption takes time. Enterprise procurement cycles are slow. But the pricing signal is clear: context windows are heading toward commodity.

Build on what stays scarce. Proprietary data access. Multi-step orchestration. Domain-specific reliability. Cost optimization at scale.

The RAG era isn’t over. Retrieval still matters for keeping costs down and for real-time data. But retrieval as a product is over. It’s a feature now. Build accordingly.

--- categories: [Infrastructure] image: assets/og-image.png date: 2026-04-03 description: 1M token context windows at standard pricing just killed the moat every RAG startup built in 2023-24. What survives when the retrieval layer becomes a free feature? draft: false resources: - assets/devto-cover.png - assets/og-image.png title: The Context Window Pricing Collapse --- Claude's Opus 4.6 and Sonnet 4.6 now ship with 1M token context windows at standard pricing. No premium tier. No waitlist. Just 1M tokens, available to everyone. This single change killed the moat that every RAG startup built in 2023-24. The competitive advantage those companies sold was never retrieval quality. It was working around small context windows. That constraint just disappeared. ## The retrieval layer was a workaround Between 2023 and early 2025, hundreds of startups raised money on the same pitch: "LLMs have small context windows, so you need our retrieval layer to feed them the right chunks." Document Q&A companies. Legal AI startups. Enterprise search tools. Internal knowledge bases. All built on the same assumption: context windows are expensive and scarce, so smart retrieval is the product. That was a reasonable bet when GPT-4 had 8K tokens and Claude had 100K at a premium. Chunking, embedding, reranking, and retrieval pipelines were genuine engineering problems. The companies that solved them well could charge for it. But the economics just shifted. When you can drop an entire codebase, a full legal contract set, or a year of customer support tickets into a single prompt at standard pricing, the retrieval layer stops being a product and starts being a feature. A feature that the model provider gives away for free. ## Why this hits Indian AI startups hardest The Indian AI ecosystem produced a disproportionate number of RAG-focused companies. The problem was well-defined. The engineering talent was available. The capital requirements were low. Document processing for Indian enterprises. Multilingual knowledge retrieval. Compliance document analysis. These were real businesses solving real problems. The problem is that the solution was always a workaround for a temporary constraint. Look at the YC batches from 2023-24. Count the Indian-founded companies whose pitch decks said "RAG" or "retrieval" or "knowledge base." Now ask how many of them have a moat beyond the retrieval pipeline itself. ## Three categories of products just got commoditized **Document Q&A.** If your product takes PDFs, chunks them, embeds them, and lets users ask questions, you now compete with "paste the PDF into Claude." The entire retrieval pipeline becomes overhead. A user with a 1M context window doesn't need your pipeline. They need a text box. **Enterprise search over internal docs.** Companies like Glean built serious products here, but they also built moats beyond retrieval: connectors, permissions, personalization, usage analytics. The startups that only built the search layer are exposed. **Legal and compliance AI.** Contract review, regulatory analysis, due diligence. These were perfect RAG use cases because the source documents were long and the queries were specific. Now you can feed entire contract sets into a single prompt. ## What actually survives infinite context The interesting question isn't what dies. It's what survives when context windows go functionally infinite. **Proprietary data pipelines.** Getting data into a prompt is easy. Getting the *right* data, cleaned, structured, and current, from messy enterprise systems is hard. Connectors to SAP, Salesforce, government databases, legacy ERPs. That's plumbing work that doesn't get commoditized by larger context windows. **Orchestration and multi-step reasoning.** RAG was a single-hop pattern: retrieve, then generate. The interesting AI products are multi-step: search, reason, act, verify, iterate. At Ostronaut, we learned that the hard problem in content generation isn't feeding the model enough context. It's coordinating multiple generation steps where each step depends on the output of the previous one, with validation gates between them. That coordination layer survives because it's orthogonal to context window size. **Domain-specific reliability.** In regulated industries, the value isn't in retrieval. It's in auditability, compliance, and deterministic behavior around non-deterministic models. A hospital doesn't care that you can now fit all patient records into one prompt. They care that your system can prove why it made a specific recommendation and that it handles edge cases without hallucinating. **Cost optimization at scale.** Here's the part nobody's talking about: 1M context windows are available, but they're not free. Sending 1M tokens per request at enterprise scale gets expensive fast. The companies that build intelligent routing -- small context for simple queries, large context only when needed, with caching and deduplication -- will create real value. The constraint moved from "can't fit enough context" to "can't afford to use full context on every request." ## The test for your AI product If you're building an AI product in India right now, here's the test: remove the retrieval layer from your architecture. Does your product still have a reason to exist? If the answer is no, you have six months to find one. Not because 1M context windows will replace everything overnight. Adoption takes time. Enterprise procurement cycles are slow. But the pricing signal is clear: context windows are heading toward commodity. Build on what stays scarce. Proprietary data access. Multi-step orchestration. Domain-specific reliability. Cost optimization at scale. The RAG era isn't over. Retrieval still matters for keeping costs down and for real-time data. But retrieval as a product is over. It's a feature now. Build accordingly.