This blog post is Human-Centered Content: Written by humans for humans.
Most AI initiatives stall between pilot and production. The path through starts with your experts, not your architecture.
A client messaged us recently after building a full data transformation with a single prompt using an AI coding assistant. He was fired up, and rightfully so. His next message laid out the vision: A layer of agents that build AI-ready data assets, enable self-serve analytics and test their own output. Convert hype into concrete value.
Then he added the honest part: “I don’t have a strategy yet about how to get there. I feel that at the moment, we’re limited by our own creativity.”
He’s not alone. I’m having this conversation with clients every week now. The instinct is right. The technology works. The gap is everything between “I built something cool with a prompt” and “Agents handle this stuff reliably at scale.”
That gap is the messy middle. And the organizations getting through it fastest aren’t the ones with the biggest budgets or the most ambitious agent architectures. They’re the ones putting AI tools in their best people’s hands first.
Why Coding Got There First
Software engineering is where the most dramatic AI productivity gains are happening. Developers using tools like Claude Code aren’t just writing code faster. They’re delegating entire tasks: Building pipelines, writing tests, refactoring codebases, scaffolding features from specs.
It’s tempting to chalk this up to code being more structured than other work. But that’s not the reason. Coding agents are ahead because the infrastructure for validating and deploying work already existed.
Before AI showed up, software teams already had version control, code review processes, automated testing, CI/CD pipelines, staging environments and production monitoring. A system for producing work, checking it, testing it and deploying it safely was already in place. AI slotted into an existing quality system. The workflow didn’t change. The throughput did.
The People Are the Quality System
Now look at data teams. Tools for version-controlled metric definitions, semantic layer management and transformation testing do exist. dbt, Snowflake’s semantic layer, Databricks’ Unity Catalog and others have made progress here. But adoption is uneven at best. Most organizations haven’t implemented standardized review processes for AI-generated transformations, don’t have CI/CD for their semantic layer and have no staging environment for testing whether an AI agent answers business questions accurately before putting it in front of a VP.
And data teams are actually ahead of the curve. Step outside of engineering and analytics into the broader business (operations, HR, finance, marketing) and the infrastructure gap is even wider. There’s almost nothing resembling a quality system for AI-generated work in most business functions. The people are the quality system.
That’s the core of the messy middle. Until validation infrastructure catches up, your experts are the review layer, the quality control, the institutional memory. They’re the ones who know what “correct” looks like for a given transformation, who can spot when the AI confidently produces something wrong, who carry the business rules that haven’t been written down yet.
This is already creating real strain. I’m seeing it across multiple clients: Teams producing more code and more data assets than ever, but their senior people spending increasing time reviewing AI-generated output rather than doing their own work. The review bottleneck is not a future concern. It’s happening now. When your best people are both the biggest producers (because they’re the most effective with AI tools) and the only qualified reviewers, something has to give.
The failure mode that keeps experienced practitioners up at night isn’t the obvious kind, where the AI produces something broken. It’s the quiet kind: Output that looks correct, passes review and is subtly wrong in production. A transformation that compiles. A dashboard that renders. An answer that sounds right. But the underlying logic missed a business rule the AI didn’t know about. This happens with humans too, which is why we have reviews in the first place. But AI produces at a volume that outpaces human review capacity, and it does so with a confidence that doesn’t signal uncertainty. The discipline is the same one good engineering teams already practice: Fail fast, be transparent and, when something goes wrong, improve the system so that class of error gets caught next time. When you find a bug, you add a test. The same applies to your AI workflows.
I’ve been writing about this progression for a while now, from agentic maturity curves to the real cost of finishing what AI starts. The throughline across all of it: The technology is ready, but the organizational infrastructure around it isn’t. And that infrastructure doesn’t get purchased or installed. It gets built, by the people doing the work.
The Temptation to Skip Ahead
Here’s the mistake I see organizations make, and one I’m constantly tempted by: Jumping straight to the grand solution.
A CEO asks to chat with all the company’s data. A VP wants self-serve analytics powered by AI. A client envisions a layer of orchestrated agents handling their data pipeline end to end. These visions are compelling and they feel like the right destination. The temptation is to start building toward them directly.
But the landscape shifts under your feet. Two years ago, everyone needed a vector database and a RAG pipeline to do anything meaningful with AI. Last year the answer was custom agent frameworks. These tools have real value, but new capabilities keep reshuffling what’s possible. The grand solution you architect today may not be the right architecture in six months.
What has actually had staying power is the processes and knowledge capture. The shared prompts that encode how your team works. The documented business rules that make AI output reviewable. The review workflows that keep quality consistent as you scale. The tools change. The institutional knowledge compounds.
That’s the case for starting with your experts and a single use case rather than a platform strategy. Not because thinking big is wrong, but because starting big means betting on a specific technical approach lasting long enough to pay off. Starting small builds something durable regardless of which tools win.
What’s Actually Working
This is playing out across our client base and within our own teams.
Building the semantic layer before the agent. A client wanted to use AI to handle the flood of ad-hoc analytics questions their team drowns in. The instinct was right, but jumping straight to an AI-powered Q&A tool meant the AI would be guessing at data sources based on names, then guessing at columns. It’s a lot to ask without the needed context. We focused first on building a semantic layer in Snowflake, which gives them flexibility beyond any single BI tool. Then we scoped to a single topic area: Get an agent answering questions well for the analytics team in one domain, roll that out to users and expand from there. Along the way, arm that team with tools like Cortex Code or Claude Code so their work managing all of it is faster.
Scaling governance alongside productivity. The clients I mentioned earlier with the review bottleneck also need standards: Shared prompts, consistent review workflows, agreed-upon quality checks. Once work is submitted, it needs expert review, and that review process needs to scale. We’re helping them use AI to handle the first-pass checks (best practices, security, functionality) so the experts can focus on the judgment calls that actually require their experience. The same pattern applies to teams adopting general-purpose AI tools like Claude Enterprise: Start at the user level, identify friction points, use existing solutions where they fit rather than building custom, and expand from there.
We’re living this ourselves. We haven’t built an agent that automatically does everything. Instead, we keep finding friction points and smoothing them out. We have agents that trigger automatically when someone can’t find documentation, suggesting additions. Agents that do preliminary research on new issues so engineers can hit the ground running. Agents that run for hours on complex tasks without human prompting. None of this is a fully autonomous lifecycle. But it is significant work happening without someone typing a prompt, and each piece was built iteratively by the people closest to the work. The process knowledge and standards we built along the way are what made each next step possible.
The Path Through
The path from “experts with tools” to reliable automation isn’t a single leap. Each step builds on the last.
Arm the experts. Your most experienced people should be the first ones using AI tools like Claude Code, Cortex Code, or Cursor. Not because they need the most help, but because they’re the ones who can actually evaluate the output. They’ll immediately find what’s missing: the business logic that lives in someone’s head but not in the semantic layer, the review criteria that were never written down, the edge cases that only they know about. This is valuable discovery work that only experts can do.
Build standards from practice, and governance in parallel. Every time an expert reviews AI-generated output, that’s a data point for your quality process. Every shared prompt is a step toward standardization. Every documented business rule is a brick in your semantic layer. The practical standards emerge from doing the work, not from planning the work.
But governance is bigger than code standards. Data access policies, audit trails, compliance requirements, usage policies, shadow AI visibility: These need to be developing alongside your expert-led work, not after it. Most AI spending is theater when governance isn’t part of the equation. We published a detailed account of how we rolled out AI across our own organization, from the usage policy we signed before anyone touched a tool, to the governance committee that still meets weekly, to the connector-by-connector expansion based on blast radius. Starting with experts doesn’t mean skipping governance. It means building both from real practice rather than theoretical frameworks.
Expand deliberately. Once a use case works reliably, you can start widening access. A rough heuristic: When members across the team are getting answers right the majority of the time without having to manually add extra context or correct the output, you’re getting close. The exact threshold depends on the subject matter and the consequences of getting it wrong. A low-stakes internal report has a different bar than a client-facing financial summary. Nobody has a universal formula here because this is genuinely new territory. But you’ll know the process is working when the experts trust it enough to let others use it. This is how pilots become production.
Where to Start
When I talk to clients about finding their first AI use cases, I look for three types of work:
- Tasks that are repetitive and high volume. The builds, the manual tests, the reports that get regenerated every week. These are the clearest wins and the easiest to measure.
- Tasks that require synthesizing information. Pulling context from multiple sources to make a recommendation or answer a question. AI is exceptionally good at this, and it’s where humans spend disproportionate time.
- Tasks where people are the glue between two or more systems. Any time someone’s job is to take output from one system, transform it and put it into another, there’s automation potential. These are often invisible workflows that nobody thinks of as “processes” but consume enormous amounts of time.
Find those tasks. Put your best people on them with AI tools. Let them build the first version of the process. Then iterate.
The vision of AI handling more of the routine work is already becoming real in narrow, well-scoped domains. Extending that to broader operations gets built piece by piece, by the people who understand the work deeply enough to teach the machines how to do it right.
That’s not as exciting as a keynote about agent orchestration. But it’s how it actually works.
If you want help navigating the messy middle, a great place to start is an AI & Data Landscape Review to understand where you are today, followed by hands-on workshops with your team and the governance foundations to scale what works. If your AI initiatives have stalled between “promising pilot” and “reliable at scale,” let’s talk.
