Building self-improving AI systems has been a persistent challenge in AI, as traditional approaches rarely deliver truly adaptive solutions. However, our implementation of DSPy for production-ready, self-improving agents has yielded promising results. Through testing and real-world deployment, DSPy-powered systems in production generated emails matching human-written quality 80% of the time—and in 6% of cases exceeded human performance.
These results demonstrate not only exceptional output quality but also dramatic improvements in development efficiency. Our implementation has cut production agent building time by 50% through eliminating the need for constant manual adjustments and prompt pipeline fine-tuning.
The success of our DSPy implementation extends beyond mere statistics. It represents a fundamental shift in AI system development—moving from static, manually-tuned systems to dynamic, self-improving agents that adapt and evolve based on real-world feedback.
Step-by-Step Guide to Building Self-Improving Agents on Relevance
See full tutorial - here
Building a self-improving agent system on the Relevance platform is straightforward. Here's how to automate your workflow in a few simple steps:
1. Design Your Agent System
Before implementing a self-improving agent system, you need a clear understanding of your business requirements and target use cases. Let's explore a practical example.
In our case study of outbound sales development automation, we chose email composition as our primary optimization target. This process was ideal because it required extensive human oversight and complex decision-making. Our system integrates seamlessly into the workflow—operating after CRM data gathering and prospect research but before email delivery—creating an ideal opportunity for feedback-based learning.
2. Create Your Agent in Relevance
Setting up your agent is straightforward:
- Create your agent through Relevance
- Attach the necessary tools to your agent
- Define your workflow in the agent's Core Instructions
- Configure integration points with your existing systems
See Relevance Academy for easy tutorials on how to create your own agentic systems.
3. Implement the Feedback Mechanism
The feedback mechanism lies at the heart of self-improvement—it transforms a static AI system into a truly adaptive solution:
- Identify areas that need LLM prompting or message generation, which are typically processes that benefit from self-optimizing programs
- Set the output tools to 'Approval Required' in agent settings
- This enables human oversight and refinement of outputs, creating a learning loop for the agent
4. Configure DSPy Integration
Follow these steps to configure the Prompt Optimizer tool in Relevance:
- Create a new tool (e.g., "Compose Email Sequence") using DSPy
- Use Prompt Optimizer - Get Data to collect and store training examples automatically
- Choose the appropriate optimizer based on your training data size:
- BootstrapFewShot for fewer than 20 samples
- BootstrapFewShot With Random Search for around 50 samples
- MiProv2 for 200+ samples
- Train your program
- See DSPy: Programming - not prompting - Language Models for more information on optimizer methods
5. Implement the Self-Improvement Loop
Once your agentic self-improvement system is set up, here's how it works:
- Agent executes its assigned task
- System pauses at key checkpoint for human approval
- Human reviews and corrects the output
- System adds this feedback to its training data
- DSPy uses the updated training data to improve future performance
This continuous learning cycle steadily enhances the system's capabilities. Feedback on the agent's emails automatically flows back into the system, enabling the agent to become more effective.
For a step by step walk through on how to build this system in Relevance see this [Ignore for now - Insert link to youtube video](youtube video)
Understanding DSPy's Architecture: A Deep Dive
DSPy's architecture represents a paradigm shift in language model programming. Moving beyond simple prompt engineering, it introduces a systematic framework with four core pillars: training data acquisition, program training, inference, and evaluation. Let's examine each component in detail.
Training Data: The Foundation of Success
The most critical—and often challenging—aspect of DSPy's architecture is acquiring high-quality training data. This component offers the greatest potential for system improvement, as training data quality directly impacts the entire pipeline's performance. The old adage "garbage in, garbage out" holds especially true here: better data and refined gold sets consistently produce superior system outputs.
Program Training: The Optimization Engine
DSPy provides three optimizers, each suited for different data scales:
- BootstrapFewShot: An entry-level optimizer that identifies the most effective training examples for few-shot demonstrations
- BootstrapFewShot with Random Search: An enhanced version that searches across larger sample sets to find the best combinations of few-shot examples
- MIPROv2: The most sophisticated optimizer that both selects optimal examples and generates test prompt instructions
These optimized programs are automatically cached in the knowledge base under "_dspy_programs" for efficient reuse.
Inference: Putting It All Together
During inference, DSPy excels at operational efficiency. It leverages cached optimized programs to run inference steps with minimal computational overhead. The system then intelligently feeds inputs into these optimized programs, generating outputs that form the basis for our evaluation metrics.
Evaluation: Measuring Success
DSPy features a sophisticated evaluation framework built on comparative analysis. The system conducts parallel tests between DSPy-powered agents and control agents (non-DSPy variants), using identical inputs, tools, and workflows. At the heart of our evaluation process is the semanticF1 score—a metric that uses LLMs to measure semantic precision and recall of responses, combining them into a comprehensive performance indicator.
This architecture marks a significant advance in LLM programming, providing a systematic approach to building, optimizing, and evaluating language model applications. By separating the process into distinct components, DSPy gives developers clear points for system improvement while maintaining the flexibility needed for diverse applications.
Development Timeline: From Concept to Production
Building and deploying a production-ready agentic system with DSPy integration is straightforward and efficient. The implementation involves several key phases, and our experience shows that organizations can achieve a working system in a short timeframe.
Initial Setup Phase
The initial development of the agentic system takes about one week. During this phase, the team focuses on:
- Understanding and mapping business requirements
- Setting up the basic infrastructure
- Configuring initial workflows
- Testing basic functionalities
DSPy Integration
A key advantage of DSPy integration is its straightforward implementation. The process involves:
- Creating a single additional tool within the existing system
- Implementing pre-built DSPy tool steps
- Configuring optimization settings
This phase is highly efficient since it uses pre-existing components, minimizing development time and effort.
Training Data Collection
The training data acquisition process follows two main approaches:
- Using the built-in tool for automated data collection, enabling rapid deployment
- Developing custom training datasets for specialized applications
While automation delivers quick results, organizations that need custom datasets should allocate extra time for thorough data preparation and validation.
Ongoing Improvements
After deployment, the system continues to evolve through:
- Integration of human-approved responses into the training set
- Ongoing refinement based on real-world feedback
- Periodic optimization of the training pipeline
This continuous improvement cycle ensures the system maintains peak performance while adapting to changing business needs.
Key Considerations When Building Production Systems with DSPy
When implementing DSPy-powered systems in production, several critical factors can significantly impact system effectiveness and reliability. Let's examine the key aspects:
Real-time Feedback Integration
Real-time feedback handling is essential for continuous improvement. Through an approval mode mechanism, the agent pauses at critical points for human input. This feedback flows directly into the DSPy training set, creating a dynamic learning environment that evolves with each interaction.
Edge Case Management
While our system currently optimizes for positive examples, we see great potential in including negative examples in the learning framework. This would help the system not only learn effective patterns but also identify and avoid problematic responses.
Brand Voice Consistency
DSPy excels at learning brand voice naturally through human-approved examples. Instead of following rigid rules, the system adapts to brand messaging patterns through exposure to approved content. Custom message rules can further enhance this brand alignment.
Content Safety and Compliance
A robust system of safeguards ensures content quality and brand alignment:
- Inclusion of customizable message rules for content modification
- Multi-layer content filtering systems
- Mandatory approval workflows for sensitive content
- Automated flagging of prohibited terms and topics
Technical Performance
The system's cloud-based architecture delivers optimal performance without local processing overhead. With consistent response times of 1–2 seconds even under heavy load, the platform maintains efficiency through:
- Cloud-based processing on Relevance's platform
- Sophisticated caching mechanisms
- Parallel processing capabilities
Future Improvements
Looking ahead, we are exploring two key enhancements:
- AI-driven feedback interpretation for more autonomous self-improvement
- Streamlined selection of gold set examples for optimization
These elements create the foundation for a robust, scalable, and effective production system that evolves while maintaining high performance standards.
Conclusion
The integration of DSPy-powered agentic systems marks a breakthrough in self-improving AI solutions. Through its combination of robust architecture, efficient training mechanisms, and real-time feedback loops, organizations can deploy sophisticated AI systems that evolve and adapt to their needs. Our implementation and testing have shown that these systems maintain the flexibility and reliability needed for production environments—while not only matching but frequently surpassing human performance.