Building a Fleet of Autonomous Agents to Run Your Software's Upkeep

Please note that Portals for Tableau are now officially known as Curator by InterWorks. You can learn more at the official Curator website.

This blog post is AI-Assisted Content: Written by humans with a helping hand.

This blog is featured in our AI Use Case Library of real AI solutions.

The Problem

Curator by InterWorks is a complex web platform developed by a small but talented developer team. A year ago, it was easy to feel stretched thin between client features, security upgrades, infrastructure maintenance and support requests. It was a constant scheduling problem. It was also easy to let the important, but not urgent, stack up. Not only that, but it was easy for less obvious issues to get missed. Every documentation gap had to be noticed before we could write it up. Every flaky test was something someone had to chase. Every pull request was something someone had to review, and every release was something someone had to shepherd through the merge window.

None of it was hard. All of it competed for the same human attention as actually building the product, so it slipped. Things fell through the cracks or piled up in a backlog labeled “later.” The work that needed judgment and the work that just needed doing were fighting over the same hours, and the work that just needed doing usually lost.

What We Built

We built six autonomous agents, named for Greek Titans, that run around the clock and have since the start of 2026. They review code, write documentation, triage production errors, shepherd releases, monitor their own health and ship pull requests while the team sleeps. They run on Claude across a few different deployment models, and they work on Curator, our own product. The agents are doing this work in production today.

How It Works

From the team’s side, the work just happens. You come in to pull requests that were opened overnight, reviews already waiting on them, a standup email summarizing what moved and production errors already triaged and ranked. We gave each of the size agents a specific job and a personality:

Mnemosyne, documentation: She reads docs-assistant chats, support tickets and CI data to find documentation gaps, then opens issues and fix PRs, reviews other PRs for documentation drift and sends a daily standup.
Prometheus, code: He scans open issues for easy wins, ships the fixes as pull requests, then watches his own PRs for review comments and CI failures and sends a daily standup of his own.
Iapetus, self-healing: He monitors the other agents’ logs, diagnoses root causes with Claude and opens fix PRs, often before anyone notices something broke, and he stays silent when everything is healthy.
Themis, review: She reviews every PR opened, weighing the full diff for bugs, security issues and standards violations, and she signs each review as the Titan of Divine Law and Order. She also inherits her siblings’ lessons, so she gets sharper as they learn.
Epimetheus, production triage: He ranks unresolved production errors by impact, reads the codebase to understand the affected paths and files diagnostic issues for the worst offenders.
Hyperion, release shepherd: He works the weekly merge window, merging approved, green PRs in priority order, syncing branches, retrying flaky CI and nudging missing reviewers, and he never self-approves.

Each runs on its own schedule, from every two hours to once a week, so the upkeep is continuous instead of a thing someone has to remember to do.

Above: Epimetheus creating a pull request

Above: Our bots have their own slack channel.

How It Was Built

The six agents are a team but work independently. They’ve also evolved over time. The original agents are cron-scheduled Python scripts that use Claude Code in headless mode. Their heavy Claude work runs through a three-stage pipeline: Sonnet plans the change as structured JSON, Opus applies it and runs the linters in a retry loop, and Sonnet writes the final PR comment. Splitting the work by model is deliberate. Planning and explaining are cheaper, faster work, so Sonnet does them. The careful application and linting is where Opus is worth the extra cost.

Epimetheus was built differently. It runs on the TypeScript Agent SDK, using the SDK’s streaming query loop with a custom in-process MCP tool that pulls production errors from Sentry. It is the same underlying Claude with a different orchestration model, and it is a deliberate test of the SDK as the pattern for future agents.

Hyperion runs as a Claude Managed Agent, and it is the clearest example of a principle we hold in production: Reach for deterministic first. Hyperion launched doing work that was almost entirely scriptable, so we scripted it. We tuned the deterministic parts down into plain code and left Claude exactly one job, the actual value judgment of which approved, green PR to merge next. Everything else is a script. Its run cost is minimal, because we did not pay a model to do what a line of bash could.

Two design choices tie the fleet together. The agents share a lessons file: When one learns something from a review, the others inherit it, so Themis reviews better because Prometheus and Mnemosyne made mistakes first. And Iapetus watches the watchers, monitoring every agent’s logs, diagnosing failures with Claude and opening fix PRs, often before a human notices. Every script also has a failure trap that emails an alert with the script name, line number and exit code, because autonomous agents that fail silently are worse than no agents.

The stack is ordinary on purpose: Ubuntu, Python, Node and PHP toolchains, the Claude Code CLI, the GitHub CLI and standard linters, with each agent running as its own GitHub App. There is nothing exotic here, which is part of why six agents can run unattended.

Above: Mnemosyne keeping our documentation updated

Why It Matters

Above all, this is a change the team wanted. This is important work that no one likes to do and is easy to let slip to the dreaded “later.” Now it all happens: The documentation, the easy fixes, the reviews, the triage, the merges, whether someone schedules it or not. The things that used to fall through the cracks get caught at two in the morning by an agent, and the team’s hours go to the work that actually needs human judgment.

It also raised the standard quietly. Every PR gets a real review, because Themis never gets too busy. Documentation drift gets noticed, because noticing it is Mnemosyne’s whole job. Production errors arrive already ranked and investigated instead of as a wall of Sentry noise. None of that depended on the team getting bigger.

There is a credibility dimension too. This is our own product, maintained by our own agents, in production. When we talk to a client about putting autonomous agents into their engineering org, we are describing something we run on ourselves every day, with the lessons, the failure traps and the deterministic-first discipline that only come from doing it for real.

In the end, the results are clear: Our support ticket volume is down 50%, our resolution time has improved, and we have freed developer resources to invest in higher value tasks.

Above: Themis doing code review

Where This Could Go

The fleet is built to grow, in two directions: Deeper on Curator, and outward to everything else we run.

Other internal systems: The agent pattern is not specific to Curator. Pointing it at other software we run, our internal timekeeping system among them, would put the same constant-upkeep agents to work across the company instead of on one product.
Acting on product analytics: Agents that watch a product analytics tool like PostHog and take action on what they see, turning an insight or an emerging issue into an investigation or a fix instead of a dashboard nobody checks.

The pattern is the asset. Each new target is a new set of jobs for agents we already know how to build, run and keep honest.

Takeaway

A year ago, every documentation gap, flaky test and code review was a human scheduling problem, and the upkeep lost the fight for attention to the building. Now six agents do the constant work around the clock, and the team spends its hours on what actually needs judgment. If maintenance is winning the fight for your team’s attention, it does not have to be a person who does it.

SIGMA RAPID START

Building a Fleet of Autonomous Agents to Run Your Software’s Upkeep

Building a Fleet of Autonomous Agents to Run Your Software’s Upkeep

The Problem

What We Built

How It Works

How It Was Built

Why It Matters

Where This Could Go

Related

How To Leverage AI To Become a World-Class Dev Team

Switching your analytics to Sigma?
Speed up success with a Rapid Start!

Ben Bausili

Building a Fleet of Autonomous Agents to Run Your Software’s Upkeep

Building a Fleet of Autonomous Agents to Run Your Software’s Upkeep

The Problem

What We Built

How It Works

How It Was Built

Why It Matters

Where This Could Go

Related

How To Leverage AI To Become a World-Class Dev Team

Switching your analytics to Sigma? Speed up success with a Rapid Start!

Ben Bausili

Company Information

Tax Information

Payment Options

Other Information

Terms

Switching your analytics to Sigma?
Speed up success with a Rapid Start!