Why We Built an Agentic Rightsizing System Instead of Using Existing FinOps Tools

Cloud
FinOps
Architecture
Zopdev
Zopdev’s Agentic Rightsizing Pattern delivers 25-35% cloud cost reduction in 90 days — by replacing FinOps dashboards with autonomous action. Here’s the architecture.
Author

B. Talvinder

Published

February 10, 2025

The Problem With Dashboards

Here’s a dirty secret about cloud FinOps: 80% of rightsizing recommendations never get implemented.

I’m B. Talvinder, CEO at Zopdev — we’ve spent 18+ months building agentic infrastructure systems, working directly with engineering teams across enterprise cloud environments. What I’ve learned is that the FinOps tooling market has solved the wrong problem. The industry has optimized for identifying waste. No one has built a system that reliably eliminates it.

Not because the recommendations are wrong. They’re usually right. The average cloud environment is 30-40% overprovisioned. The tools correctly identify this. They generate beautiful reports. They surface actionable recommendations.

And then nothing happens.

Because the recommendations land in a dashboard that a human has to check, evaluate, prioritize against a hundred other tasks, get approval for, and then manually execute. The friction between “recommendation” and “action” is where value goes to die.

What We Built Instead

At Zopdev, we decided to skip the dashboard entirely. Our system doesn’t generate recommendations. It generates actions.

The Agentic Rightsizing Pattern

This is the architecture we call the Agentic Rightsizing Pattern — four components in a closed loop:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Observation │────▶│   Reasoning  │────▶│   Action    │
│    Layer     │     │    Engine    │     │   Engine    │
│              │     │              │     │             │
│ - Metrics    │     │ - Pattern    │     │ - Resize    │
│ - Usage      │     │   matching   │     │ - Scale     │
│ - Costs      │     │ - Anomaly    │     │ - Schedule  │
│ - Patterns   │     │   detection  │     │ - Alert     │
└─────────────┘     └──────────────┘     └─────────────┘
       ▲                                        │
       │            ┌──────────────┐            │
       └────────────│   Learning   │◀───────────┘
                    │    Loop      │
                    │              │
                    │ - Did it     │
                    │   save $?    │
                    │ - Any perf   │
                    │   issues?    │
                    └──────────────┘

The key insight: the learning loop is the product. Every action the system takes generates data about whether that action was correct. Over time, the system develops domain-specific judgment about your infrastructure — not generic recommendations from a training set, but learned intuitions about your specific workload patterns.

This is the property that makes this Agentware rather than automation: it’s not executing a fixed rule set. It’s developing judgment.

The Technical Decisions That Mattered

Decision 1: We chose gradual over aggressive. The system starts conservative — small resizes, well-observed workloads, plenty of safety margins. As it accumulates confidence (measured by a rolling score of correct predictions), it becomes more aggressive. This mirrors how you’d onboard a new team member: you don’t hand them production access on day one.

Decision 2: We built in explicit uncertainty. When the system doesn’t have enough data to be confident, it says so. It doesn’t guess. It flags the workload for human review and explains what additional observation time it needs. This was critical for customer trust — early customers wanted to understand why the system was deferring, not just that it was.

Decision 3: We separated the observation and action planes. The system can observe everything, but it can only act within explicitly granted permission boundaries. A customer can say “observe all my infrastructure, but only auto-resize development environments.” This graduated trust model was the single most important product decision we made. It directly addresses the question of how you build organizational trust in autonomous systems — you start with a narrow permission boundary and expand it as the system earns confidence.

What We Got Wrong

We initially tried to build one universal reasoning engine for all cloud providers. That was a mistake. AWS, GCP, and Azure have fundamentally different pricing models, instance families, and scaling behaviors. The reasoning layer needs to be provider-specific with a shared abstraction layer on top.

We lost about 6 weeks on that wrong turn. The lesson: in agentic system design, the reasoning layer is not portable. You can share the observation schema and the learning infrastructure, but the judgment needs to be trained against the specific environment it will act in.

Results So Far

Early customers running the Agentic Rightsizing Pattern are seeing 25-35% reduction in cloud spend within the first 90 days, with zero performance degradation. The system’s confidence score — our measure of “how good is its judgment for this specific environment” — typically reaches actionable levels within 2-3 weeks of observation.

More importantly: the recommendations are actually getting implemented. Because there’s no human in the loop for routine decisions, the implementation rate is effectively 100% within the granted permission boundary. This is the number that matters. A 10% savings recommendation with 100% implementation rate beats a 35% savings recommendation with 10% implementation rate every time.

How This Connects to Zopdev’s Broader Work

The Agentic Rightsizing Pattern is one instantiation of a broader thesis: that Agentware — systems that observe, reason, act, and learn — will replace most of what we currently call enterprise software over the next 5-10 years. Cloud infrastructure happens to be an ideal proving ground for this thesis because the feedback signals are fast (cost data is near-real-time), the stakes are concrete (dollar savings are measurable), and the domain is complex enough to require real judgment rather than simple rule execution.

Everything we’re learning about how to build trustworthy, auditable autonomous systems in the infrastructure domain is informing how Zopdev thinks about the next generation of enterprise agentic products.


Related reading:


If you’re running a startup or enterprise burning $50k+/month on cloud and want to see if this applies to your infrastructure, reach out.

Enjoyed this?

Get frameworks, build logs, and field notes in your inbox.

No spam. Unsubscribe anytime.