This blog post is Human-Centered Content: Written by humans for humans.
What Happened?
Last week brought a burst of AI announcements from the major players, each positioning their latest capabilities as breakthrough advances:
- OpenAI launched Codex research preview for parallel agent orchestration in coding.
- Microsoft Build delivered MCP Registry support for Windows, GitHub’s evolution from “pair programming” to “peer programmer” agents and open-sourced Visual Studio Code with Copilot.
- Google I/O unveiled AI Mode for search with data visualization, Gemini 2.5 Flash pricing, Jules asynchronous coding agent and MCP integration across their platform.
- Anthropic’s Code with Claude Event introduced Claude 4 (Opus and Sonnet), comprehensive agent tooling, enhanced prompt caching and deeper Claude Code integrations.
Amid the typical AI hype that describes our industry, some genuinely useful developments emerged. The common thread wasn’t revolutionary AI, it was practical improvements to how AI tools integrate into existing workflows, particularly around code and data tasks.
The Agent Revolution Accelerates
What stood out wasn’t any single announcement, but the convergence around similar capabilities. OpenAI’s Codex research preview showed parallel agent orchestration for coding. Google’s Jules introduced asynchronous coding agents that can juggle multiple tasks simultaneously. Anthropic positioned their new Claude 4 models as the foundation for “true agents” capable of “hours of tasks” without losing context, adding capabilities of handling memory with their new Files API and new improvements to their leading Claude Code tool. Microsoft’s GitHub copilot now includes an agent that they describe as an evolution from “pair programming” to what they call “peer programming” which means you can assign tasks directly to agents within GitHub workflows.
Strip away the marketing language, and you see companies solving similar problems: How to make AI assistance less conversational and more task-oriented. For data leaders, the practical question is whether these tools can actually handle the messy, context-heavy work that dominates data operations. Early indicators suggest some can, though the gap between demo and daily reality will be crossed by significant engineering and work for companies. Given clean datasets, AI can successfully build dashboards in languages like Python, but our businesses are often missing those clean datasets and surrounding context. The investment in your knowledge repositories and not just your data warehouses is more important than ever.
Search and Analytics
Google’s I/O event included AI Mode for search that generates data visualizations and tables to answer questions directly. While the demos looked polished, and I love seeing data visualizations in more places, the real test is whether this works reliably with complex data questions or just simple chart generation. We’ve seen (and still see) Google’s AI summaries at the top of search provide completely wrong information. AI Hallucination is not a solved problem, so double check anything that comes out of AI mode.
Anthropic’s Code Execution Tool demonstration showed Claude loading datasets, generating charts and analyzing results in real-time. In my own testing, I was able to ask very general questions of Claude Code and it would perform analysis in Python, then create another app (streamlit in this case) to present the data to me. The first time I did it, I was blown away. With additional requests and time, I still encounter the occasional weird error, like the inability to change a specific font color for some reason, while I was able to get it to do more complex things around layout with no problem. Claude Code still fails in interesting ways, but it also seems like it will save hours and hours of work with any analyst working in code. There’s a strong argument that we should be doing more and more of our analytics work in code as AI continues to improve on the efficiency gains in this area.
While the capabilities are impressive on clean data, the practical question for data teams is whether this works consistently with real datasets that have missing values, schema inconsistencies and other common messiness. In practice, I’ve found that AI can help with these tasks too, but I often still have to break the tasks like a senior data worker would. If I ask questions that assume the data is good, the model will assume they are good. If I guide it towards checking the data, it’ll do the appropriate checks.
One last warning on using these agents, I’ve seen several times now where the AI will hard code data. This is sometimes called “reward hacking” where the AI will cheat a bit to get the answer it needs. This disappears if you are very specific about using a dataset and only that dataset to generate numbers, but it’s a precaution that’s well worth mentioning.
Here’s one of my test runs using the SuperStore dataset:
The Rise of MCP
Behind the demos lies a more practical development: The emerging standards of AI agent infrastructure. MCP (Model Context Protocol) is a standard first proposed by Anthropic, but has been iteratively introduced across the ecosystem. In short, MCP is a standard way to provide access to tools and other resources to AI models. Microsoft’s addition of MCP support to Windows and GitHub, Google’s integration of MCP into the Gemini SDK and Chrome, and Anthropic’s API-level MCP support all point to the same reality: MCP appears to be gaining traction as a standard for AI-tool integration.
For data teams, this standardization could mean agents eventually integrate more seamlessly with existing tools — your lakehouse, orchestration platform, monitoring systems. The operative word is “could.” It’s still early days for this standard and things like row-level security don’t have standard solutions yet.
The Economics of Intelligence
Pricing LLMs is hard. There are tables that describe the per-million token costs of each model, but often that doesn’t give you the picture. Features like prompt caching can change the economics, especially for agent workflows, but also aspects like how many tokens each model typically generates. For instance, Gemini 2.5 Pro tends to be extremely wordy, even if it is cheaper than comparative models.
All that said, Google’s pricing strategy with Gemini 2.5 Flash offers concrete value. At roughly 25% the cost of comparable models while delivering competitive performance, it makes sense as a general workhorse when top of the line performance isn’t required.
On the more expensive end, but with clear strengths in programming and agentic flows, Anthropic has further enhanced prompt caching by extending context lifespan from 5 minutes to 1 hour. This should addresses a real limitation and save real money on solutions that use it. Long-running data processes often require maintaining context across extended operations. This improvement makes it more practical to have agents manage complex, multi-step workflows without losing track of what they’re doing.
From Programmer to Manager
Perhaps the most intriguing development came in the form of the asynchronous agents that were shown by everyone. There is a clear message that the industry sees programmers moving to management. In Anthropic’s words, their own programmers are “moving from individual contributors to managing multiple concurrent agents running tasks.” Similar things were seen in OpenAI’s Codex preview and Microsoft’s Github Copilot Agent.
Not a headlines feature, but a point I found interesting was the onboarding improvements Anthropic mentioned. They claim new employee onaboarding went from two to three weeks to two to three days. This provide a glimpse at one of the underappreciated aspects of LLMs: They are a tool for understanding and breaking down complexity in our language, programming or not. With the right knowledge infrastructure, we should be able help new team members understand our complex data architectures through AI assistance.
The Diffusion Experiment
Lastly, this one doesn’t have direct data implications, but Google’s announcement of Gemini Diffusion is cool. This isn’t a transformer model as all our current LLMs have been. Instead of working to predict the next work (or token), these work on blocks of text rather than sequential tokens.
The benefit? It’s fast. 5x faster then our fastest models. That has real implications if it gets adopted, allowing for more work to be done and iterated upon. While it’s unclear whether this approach will gain traction, it suggests the industry isn’t settling into a single architectural pattern and that there’s plenty of innovation left for us to create.
What This Means for Data Leaders
The convergence around agent capabilities, standardized protocols and code-centric AI suggests we’re approaching an inflection point. The question isn’t whether AI will transform data work, but how quickly and in what form. Your best and brightest should be working with agent tools, such as Claude Code, to see how far they can push them and to understand their current limitations. You should be building out the surrounding infrastructure, such as MCP Servers, to allow these tools to impact your business. Most of all, you should be working on your knowledge repositories.
The rise of data visualization tools like Tableau accelerated companies getting their data infrastructure in order and we saw the rise of new database giants like Snowflake and Databricks. AI benefits extremely from these investments, but it also needs context about our business. Just like how those BI tools gave us insight into the gaps in our data, AI is making it clear that we have gaps in our knowledge. Undocumented systems, old information, broken processes and tribal information are all going to be barriers that keep these AI systems from thriving at your company.
The most successful data organizations over the next 12 months will likely be those that experiment thoughtfully with these emerging capabilities while investing in the cultural, technical and informational foundations required. The tools are rapidly becoming capable enough for production use — the limiting factor is increasingly organizational readiness rather than technical maturity.