Feature flags are like scaffolding. You need them while building, and you should remove them once the work is done. In reality, most teams don't.
At Relevance AI, we ended up with 200+ flags across our codebase. Some had been fully rolled out for months. Others were effectively dead. Every leftover flag added more conditional code paths, more edge cases, and more cognitive load during incidents.
Removing them manually is pure grunt work: find every reference, keep the right path, clean up unreachable code paths and stale type definitions, run tests, and open a PR. Do that across multiple repos and you can burn weeks.
The trickiest part is confidence: knowing when a flag is truly safe to remove. Many teams rely on memory, ad hoc notes, or reminders, which does not scale.
So we automated it with bye-bye-flag, and we're open sourcing it.
TL;DR
bye-bye-flag connects to your feature-flag provider (PostHog today, with more coming), identifies stale flags, and dispatches AI coding agents to remove them in parallel in isolated worktrees. It cleans up conditionals, dead code paths, unused imports, and orphaned files across multiple repositories, then opens draft PRs ready for review.
How It Works
- Fetch stale flags from your provider (e.g. 0%/100% for 30+ days, with no recent changes)
- Validate references across target repos
- Spawn isolated AI agents in git worktrees, one per flag
- Create reviewable PRs with cleanup + tests
The agent does more than delete conditionals. It keeps the correct path, removes unreachable code paths, cleans up imports and orphaned files, runs tests, and commits with useful messages.
Why This Is Hard in Practice
A core design goal was extensibility: support multiple flag providers and multiple coding-agent backends without rewriting the core cleanup workflow.
- Provider extensibility: each feature-flag provider exposes stale-state signals and APIs differently, so detection uses adapter-style integrations instead of provider-specific one-offs
- Agent backend flexibility: Codex and Claude have different invocation models, setup requirements, and failure modes, so backend-specific execution is isolated behind a common orchestration layer
- Execution orchestration: safe parallel cleanup requires isolated worktrees, deterministic setup, and per-repo bootstrapping
- Reliability and hygiene: reruns must be idempotent, temporary resources cleaned up, and open/merged/declined PR states reconciled
- Observability: you need clear logs and run summaries to see what changed, what failed, and what needs review
bye-bye-flag handles this orchestration layer so teams can focus on reviewing code changes instead of wiring providers, agents, and execution plumbing together.
How We're Running It
We're currently running it periodically, and planning to run it in the cloud on a schedule.
pnpm start run --target-repos=~/relevance-reposA run fetches stale flags, skips work that already has PRs, processes flags in parallel up to configured limits, and produces a clear summary of what was created, skipped, or failed.
Configuration
Everything is controlled through one bye-bye-flag-config.json file:
{
"fetcher": {
"type": "posthog",
"projectIds": ["12345"],
"staleDays": 30
},
"orchestrator": {
"concurrency": 2,
"maxPrs": 10
},
"agent": {
"type": "claude"
}
}You can also define repo-level setup, defaults, and context files (CONTEXT.md, CLAUDE.md) so agents follow your standards.
What's Next
- Support more feature-flag providers (LaunchDarkly, Split, etc.)
- Improve handling for complex rollout and multivariate flags
- Cloud-native scheduled execution
- More agent backends
- Auto-respond to PR comments during cleanup review
If flag debt is slowing your team down, try bye-bye-flag and tell us what breaks. PRs are welcome, especially provider integrations.

