Agents@Work - See AI agents in production at Canva, Autodesk, KPMG, and Lightspeed.
Agents@Work - See AI agents in production at Canva, Autodesk, KPMG, and Lightspeed.

6 min read

February 12, 2026

How We Used AI to Clean Up 200+ Feature Flags (And Open Sourced It)

How We Used AI to Clean Up 200+ Feature Flags (And Open Sourced It)

Share this post

https://relevanceai.com/blog/how-we-used-ai-to-clean-up-feature-flags-and-open-sourced-it
Isaac Hagoel

Isaac Hagoel

Staff AI engineer

Feature flags are like scaffolding. You need them while building, and you should remove them once the work is done. In reality, most teams don't.

At Relevance AI, we ended up with 200+ flags across our codebase. Some had been fully rolled out for months. Others were effectively dead. Every leftover flag added more conditional code paths, more edge cases, and more cognitive load during incidents.

Removing them manually is pure grunt work: find every reference, keep the right path, clean up unreachable code paths and stale type definitions, run tests, and open a PR. Do that across multiple repos and you can burn weeks.

The trickiest part is confidence: knowing when a flag is truly safe to remove. Many teams rely on memory, ad hoc notes, or reminders, which does not scale.

So we automated it with bye-bye-flag, and we're open sourcing it.

TL;DR

bye-bye-flag connects to your feature-flag provider (PostHog today, with more coming), identifies stale flags, and dispatches AI coding agents to remove them in parallel in isolated worktrees. It cleans up conditionals, dead code paths, unused imports, and orphaned files across multiple repositories, then opens draft PRs ready for review.

How It Works

  1. Fetch stale flags from your provider (e.g. 0%/100% for 30+ days, with no recent changes)
  2. Validate references across target repos
  3. Spawn isolated AI agents in git worktrees, one per flag
  4. Create reviewable PRs with cleanup + tests

The agent does more than delete conditionals. It keeps the correct path, removes unreachable code paths, cleans up imports and orphaned files, runs tests, and commits with useful messages.

Why This Is Hard in Practice

A core design goal was extensibility: support multiple flag providers and multiple coding-agent backends without rewriting the core cleanup workflow.

  • Provider extensibility: each feature-flag provider exposes stale-state signals and APIs differently, so detection uses adapter-style integrations instead of provider-specific one-offs
  • Agent backend flexibility: Codex and Claude have different invocation models, setup requirements, and failure modes, so backend-specific execution is isolated behind a common orchestration layer
  • Execution orchestration: safe parallel cleanup requires isolated worktrees, deterministic setup, and per-repo bootstrapping
  • Reliability and hygiene: reruns must be idempotent, temporary resources cleaned up, and open/merged/declined PR states reconciled
  • Observability: you need clear logs and run summaries to see what changed, what failed, and what needs review

bye-bye-flag handles this orchestration layer so teams can focus on reviewing code changes instead of wiring providers, agents, and execution plumbing together.

How We're Running It

We're currently running it periodically, and planning to run it in the cloud on a schedule.

pnpm start run --target-repos=~/relevance-repos

A run fetches stale flags, skips work that already has PRs, processes flags in parallel up to configured limits, and produces a clear summary of what was created, skipped, or failed.

Configuration

Everything is controlled through one bye-bye-flag-config.json file:

{
  "fetcher": {
    "type": "posthog",
    "projectIds": ["12345"],
    "staleDays": 30
  },
  "orchestrator": {
    "concurrency": 2,
    "maxPrs": 10
  },
  "agent": {
    "type": "claude"
  }
}

You can also define repo-level setup, defaults, and context files (CONTEXT.md, CLAUDE.md) so agents follow your standards.

What's Next

  • Support more feature-flag providers (LaunchDarkly, Split, etc.)
  • Improve handling for complex rollout and multivariate flags
  • Cloud-native scheduled execution
  • More agent backends
  • Auto-respond to PR comments during cleanup review

If flag debt is slowing your team down, try bye-bye-flag and tell us what breaks. PRs are welcome, especially provider integrations.

How We Used AI to Clean Up 200+ Feature Flags (And Open Sourced It)

How We Used AI to Clean Up 200+ Feature Flags (And Open Sourced It)

Feature flags are like scaffolding. You need them while building, and you should remove them once the work is done. In reality, most teams don't.

At Relevance AI, we ended up with 200+ flags across our codebase. Some had been fully rolled out for months. Others were effectively dead. Every leftover flag added more conditional code paths, more edge cases, and more cognitive load during incidents.

Removing them manually is pure grunt work: find every reference, keep the right path, clean up unreachable code paths and stale type definitions, run tests, and open a PR. Do that across multiple repos and you can burn weeks.

The trickiest part is confidence: knowing when a flag is truly safe to remove. Many teams rely on memory, ad hoc notes, or reminders, which does not scale.

So we automated it with bye-bye-flag, and we're open sourcing it.

TL;DR

bye-bye-flag connects to your feature-flag provider (PostHog today, with more coming), identifies stale flags, and dispatches AI coding agents to remove them in parallel in isolated worktrees. It cleans up conditionals, dead code paths, unused imports, and orphaned files across multiple repositories, then opens draft PRs ready for review.

How It Works

  1. Fetch stale flags from your provider (e.g. 0%/100% for 30+ days, with no recent changes)
  2. Validate references across target repos
  3. Spawn isolated AI agents in git worktrees, one per flag
  4. Create reviewable PRs with cleanup + tests

The agent does more than delete conditionals. It keeps the correct path, removes unreachable code paths, cleans up imports and orphaned files, runs tests, and commits with useful messages.

Why This Is Hard in Practice

A core design goal was extensibility: support multiple flag providers and multiple coding-agent backends without rewriting the core cleanup workflow.

  • Provider extensibility: each feature-flag provider exposes stale-state signals and APIs differently, so detection uses adapter-style integrations instead of provider-specific one-offs
  • Agent backend flexibility: Codex and Claude have different invocation models, setup requirements, and failure modes, so backend-specific execution is isolated behind a common orchestration layer
  • Execution orchestration: safe parallel cleanup requires isolated worktrees, deterministic setup, and per-repo bootstrapping
  • Reliability and hygiene: reruns must be idempotent, temporary resources cleaned up, and open/merged/declined PR states reconciled
  • Observability: you need clear logs and run summaries to see what changed, what failed, and what needs review

bye-bye-flag handles this orchestration layer so teams can focus on reviewing code changes instead of wiring providers, agents, and execution plumbing together.

How We're Running It

We're currently running it periodically, and planning to run it in the cloud on a schedule.

pnpm start run --target-repos=~/relevance-repos

A run fetches stale flags, skips work that already has PRs, processes flags in parallel up to configured limits, and produces a clear summary of what was created, skipped, or failed.

Configuration

Everything is controlled through one bye-bye-flag-config.json file:

{
  "fetcher": {
    "type": "posthog",
    "projectIds": ["12345"],
    "staleDays": 30
  },
  "orchestrator": {
    "concurrency": 2,
    "maxPrs": 10
  },
  "agent": {
    "type": "claude"
  }
}

You can also define repo-level setup, defaults, and context files (CONTEXT.md, CLAUDE.md) so agents follow your standards.

What's Next

  • Support more feature-flag providers (LaunchDarkly, Split, etc.)
  • Improve handling for complex rollout and multivariate flags
  • Cloud-native scheduled execution
  • More agent backends
  • Auto-respond to PR comments during cleanup review

If flag debt is slowing your team down, try bye-bye-flag and tell us what breaks. PRs are welcome, especially provider integrations.

Contents
Isaac Hagoel
Isaac Hagoel
Isaac Hagoel
No items found.