6 min read

January 23, 2025

Building Self-improving Agentic Systems in Production with DSPy

Summary

Share this post

Michael Zhao

Building self-improving AI systems has been a persistent challenge in AI, as traditional approaches rarely deliver truly adaptive solutions. However, our implementation of DSPy for production-ready, self-improving agents has yielded promising results. Through testing and real-world deployment, DSPy-powered systems in production generated emails matching human-written quality 80% of the time—and in 6% of cases exceeded human performance.

These results demonstrate not only exceptional output quality but also dramatic improvements in development efficiency. Our implementation has cut production agent building time by 50% through eliminating the need for constant manual adjustments and prompt pipeline fine-tuning.

The success of our DSPy implementation extends beyond mere statistics. It represents a fundamental shift in AI system development—moving from static, manually-tuned systems to dynamic, self-improving agents that adapt and evolve based on real-world feedback.

Step-by-Step Guide to Building Self-Improving Agents on Relevance

See full tutorial - https://www.youtube.com/watch?v=UY8OsMlV21Y

Building a self-improving agent system on the Relevance platform is straightforward. Here's how to automate your workflow in a few simple steps:

1. Design Your Agent System

Before implementing a self-improving agent system, you need a clear understanding of your business requirements and target use cases. Let's explore a practical example.

In our case study of outbound sales development automation, we chose email composition as our primary optimization target. This process was ideal because it required extensive human oversight and complex decision-making. Our system integrates seamlessly into the workflow—operating after CRM data gathering and prospect research but before email delivery—creating an ideal opportunity for feedback-based learning.

2. Create Your Agent in Relevance

Setting up your agent is straightforward:

Create your agent through Relevance
Attach the necessary tools to your agent
Define your workflow in the agent's Core Instructions
Configure integration points with your existing systems

See Relevance Academy for easy tutorials on how to create your own agentic systems.

3. Implement the Feedback Mechanism

The feedback mechanism lies at the heart of self-improvement—it transforms a static AI system into a truly adaptive solution:

Identify areas that need LLM prompting or message generation, which are typically processes that benefit from self-optimizing programs
Set the output tools to 'Approval Required' in agent settings
This enables human oversight and refinement of outputs, creating a learning loop for the agent

4. Configure DSPy Integration

Follow these steps to configure the Prompt Optimizer tool in Relevance:

Create a new tool (e.g., "Compose Email Sequence") using DSPy
Use Prompt Optimizer - Get Data to collect and store training examples automatically
Choose the appropriate optimizer based on your training data size:
- BootstrapFewShot for fewer than 20 samples
- BootstrapFewShot With Random Search for around 50 samples
- MiProv2 for 200+ samples
Train your program
See DSPy: Programming - not prompting - Language Models for more information on optimizer methods

5. Implement the Self-Improvement Loop

Once your agentic self-improvement system is set up, here's how it works:

Agent executes its assigned task
System pauses at key checkpoint for human approval
Human reviews and corrects the output
System adds this feedback to its training data
DSPy uses the updated training data to improve future performance

This continuous learning cycle steadily enhances the system's capabilities. Feedback on the agent's emails automatically flows back into the system, enabling the agent to become more effective.

For a step by step walk through on how to build this system in Relevance see this [Ignore for now - Insert link to youtube video](youtube video)

Understanding DSPy's Architecture: A Deep Dive

DSPy's architecture represents a paradigm shift in language model programming. Moving beyond simple prompt engineering, it introduces a systematic framework with four core pillars: training data acquisition, program training, inference, and evaluation. Let's examine each component in detail.

Training Data: The Foundation of Success

The most critical—and often challenging—aspect of DSPy's architecture is acquiring high-quality training data. This component offers the greatest potential for system improvement, as training data quality directly impacts the entire pipeline's performance. The old adage "garbage in, garbage out" holds especially true here: better data and refined gold sets consistently produce superior system outputs.

Program Training: The Optimization Engine

DSPy provides three optimizers, each suited for different data scales:

BootstrapFewShot: An entry-level optimizer that identifies the most effective training examples for few-shot demonstrations
BootstrapFewShot with Random Search: An enhanced version that searches across larger sample sets to find the best combinations of few-shot examples
MIPROv2: The most sophisticated optimizer that both selects optimal examples and generates test prompt instructions

These optimized programs are automatically cached in the knowledge base under "_dspy_programs" for efficient reuse.

Inference: Putting It All Together

During inference, DSPy excels at operational efficiency. It leverages cached optimized programs to run inference steps with minimal computational overhead. The system then intelligently feeds inputs into these optimized programs, generating outputs that form the basis for our evaluation metrics.

Evaluation: Measuring Success

DSPy features a sophisticated evaluation framework built on comparative analysis. The system conducts parallel tests between DSPy-powered agents and control agents (non-DSPy variants), using identical inputs, tools, and workflows. At the heart of our evaluation process is the semanticF1 score—a metric that uses LLMs to measure semantic precision and recall of responses, combining them into a comprehensive performance indicator.

This architecture marks a significant advance in LLM programming, providing a systematic approach to building, optimizing, and evaluating language model applications. By separating the process into distinct components, DSPy gives developers clear points for system improvement while maintaining the flexibility needed for diverse applications.

Development Timeline: From Concept to Production

Building and deploying a production-ready agentic system with DSPy integration is straightforward and efficient. The implementation involves several key phases, and our experience shows that organizations can achieve a working system in a short timeframe.

Initial Setup Phase

The initial development of the agentic system takes about one week. During this phase, the team focuses on:

Understanding and mapping business requirements
Setting up the basic infrastructure
Configuring initial workflows
Testing basic functionalities

DSPy Integration

A key advantage of DSPy integration is its straightforward implementation. The process involves:

Creating a single additional tool within the existing system
Implementing pre-built DSPy tool steps
Configuring optimization settings

This phase is highly efficient since it uses pre-existing components, minimizing development time and effort.

Training Data Collection

The training data acquisition process follows two main approaches:

Using the built-in tool for automated data collection, enabling rapid deployment
Developing custom training datasets for specialized applications

While automation delivers quick results, organizations that need custom datasets should allocate extra time for thorough data preparation and validation.

Ongoing Improvements

After deployment, the system continues to evolve through:

Integration of human-approved responses into the training set
Ongoing refinement based on real-world feedback
Periodic optimization of the training pipeline

This continuous improvement cycle ensures the system maintains peak performance while adapting to changing business needs.

Key Considerations When Building Production Systems with DSPy

When implementing DSPy-powered systems in production, several critical factors can significantly impact system effectiveness and reliability. Let's examine the key aspects:

Real-time Feedback Integration

Real-time feedback handling is essential for continuous improvement. Through an approval mode mechanism, the agent pauses at critical points for human input. This feedback flows directly into the DSPy training set, creating a dynamic learning environment that evolves with each interaction.

Edge Case Management

While our system currently optimizes for positive examples, we see great potential in including negative examples in the learning framework. This would help the system not only learn effective patterns but also identify and avoid problematic responses.

Brand Voice Consistency

DSPy excels at learning brand voice naturally through human-approved examples. Instead of following rigid rules, the system adapts to brand messaging patterns through exposure to approved content. Custom message rules can further enhance this brand alignment.

Content Safety and Compliance

A robust system of safeguards ensures content quality and brand alignment:

Inclusion of customizable message rules for content modification
Multi-layer content filtering systems
Mandatory approval workflows for sensitive content
Automated flagging of prohibited terms and topics

Technical Performance

The system's cloud-based architecture delivers optimal performance without local processing overhead. With consistent response times of 1–2 seconds even under heavy load, the platform maintains efficiency through:

Cloud-based processing on Relevance's platform
Sophisticated caching mechanisms
Parallel processing capabilities

Future Improvements

Looking ahead, we are exploring two key enhancements:

AI-driven feedback interpretation for more autonomous self-improvement
Streamlined selection of gold set examples for optimization

These elements create the foundation for a robust, scalable, and effective production system that evolves while maintaining high performance standards.

Conclusion

The integration of DSPy-powered agentic systems marks a breakthrough in self-improving AI solutions. Through its combination of robust architecture, efficient training mechanisms, and real-time feedback loops, organizations can deploy sophisticated AI systems that evolve and adapt to their needs. Our implementation and testing have shown that these systems maintain the flexibility and reliability needed for production environments—while not only matching but frequently surpassing human performance.

LATEST BLOGS

LATEST DROP

CUSTOMERS

LEARN

LATEST BLOGS

LATEST DROP

CUSTOMERS

LEARN

LATEST BLOGS

LATEST DROP

CUSTOMERS

LEARN

Building Self-improving Agentic Systems in Production with DSPy

Summary

Tags

Share this post

Step-by-Step Guide to Building Self-Improving Agents on Relevance

1. Design Your Agent System

2. Create Your Agent in Relevance

3. Implement the Feedback Mechanism

4. Configure DSPy Integration

5. Implement the Self-Improvement Loop

Understanding DSPy's Architecture: A Deep Dive

Training Data: The Foundation of Success

Program Training: The Optimization Engine

Inference: Putting It All Together

Evaluation: Measuring Success

Development Timeline: From Concept to Production

Initial Setup Phase

DSPy Integration

Training Data Collection

Ongoing Improvements

Key Considerations When Building Production Systems with DSPy

Real-time Feedback Integration

Edge Case Management

Brand Voice Consistency

Content Safety and Compliance

Technical Performance

Future Improvements

Conclusion

What is AI Ops? The Role That's About to Elevate Every Company

The Definitive Guide: Understanding AI Agents vs. AI Workflows

AgentOS: Total Visibility and Control Over Your AI Workforce

Free your team. Build your first AI agent today!