23. Anthropic Tool Design Best Practices
Date: 2025-10-17Status
AcceptedCategory
Core ArchitectureContext
Our MCP server exposes tools for AI agents to interact with the LangGraph agent system. To ensure optimal agent performance and user experience, we need to align our tool design with industry best practices for writing tools for AI agents, specifically those published by Anthropic. The original tool implementation had several areas for improvement:- Generic tool names without namespacing (
chat,get_conversation,list_conversations) - List-all approach for conversations (not search-focused)
- No response format control
- Unlimited response sizes (potential context overflow)
- Minimal usage guidance in tool descriptions
Decision
We will adopt Anthropic’s best practices for writing tools for agents as outlined in their engineering blog post. This includes:1. Tool Namespacing
Implementation: Prefix tools with their domain for clarity and scalability. Before:chatget_conversationlist_conversations
agent_chat- Agent interaction namespaceconversation_get- Conversation management namespaceconversation_search- Conversation discovery namespace
- Clear categorization as more tools are added
- Prevents naming conflicts
- Helps agents understand tool relationships
- Backward compatibility maintained with old names
2. Search-Focused Tools (Not List-All)
Anthropic Guidance: “Implement search-focused tools (likesearch_contacts) rather than list-all tools (list_contacts)”
Implementation: Replace list_conversations with conversation_search:
- Prevents context overflow with large conversation lists
- Forces agents to be specific in requests
- More token-efficient (agents have limited context)
- Better UX for users with many conversations
- Truncation guidance when results exceed limit
3. Response Format Control
Anthropic Guidance: “Expose aresponse_format enum parameter allowing agents to request ‘concise’ or ‘detailed’ responses”
Implementation:
- Balances token efficiency with information needs
- Agents can optimize for speed vs comprehensiveness
- Reduces unnecessary context consumption
- Clear expectations on response size
4. Token Limits and Response Optimization
Anthropic Guidance: “Restrict responses to ~25,000 tokens. Implement pagination, filtering, and truncation with sensible defaults.” Implementation: CreatedResponseOptimizer utility with:
- Token counting using tiktoken
- Automatic truncation with helpful messages
- Format-aware limits (concise: 500, detailed: 2000)
- High-signal information extraction
- Prevents context overflow
- Predictable response sizes
- Helpful truncation messages guide agents
- Compatible with all major LLM tokenizers
5. Enhanced Tool Descriptions
Anthropic Guidance: Provide clear usage guidance including when NOT to use tools, token limits, and expected response times. Before:- Clear expectations on performance
- Guidance on tool selection
- Token and rate limit transparency
- Helps agents make better decisions
6. Unambiguous Parameter Names
Anthropic Guidance: “Replace generic names likeuser with specific ones like user_id”
Implementation: Already well-implemented with specific names like:
message(notqueryorinput)thread_id(notidorconversation)username(notuser)limit(notmaxorcount)
7. Actionable Error Messages
Anthropic Guidance: “Replace opaque error codes with specific, actionable guidance” Implementation: Before:- Clear next steps for agents
- Reduces retry failures
- Better debugging
- Improved user experience
Technical Implementation
File Structure
Key Components
ResponseOptimizer Class
Updated Tool Schemas
Backward Compatibility
All old tool names are supported via routing:Consequences
Positive
-
Better Agent Performance
- Search-focused tools reduce context waste
- Format control allows optimization
- Clear descriptions improve tool selection
-
Scalability
- Namespacing prevents conflicts as tools grow
- Token limits prevent runaway responses
- Search approach works with large datasets
-
Improved UX
- Actionable error messages
- Clear expectations (response times, token counts)
- Helpful truncation guidance
-
Industry Alignment
- Follows Anthropic’s published best practices
- Positions codebase as reference implementation
- Easier onboarding for developers familiar with guidelines
-
Observability
- Format type tracked in metrics
- Token counts logged
- Search patterns monitored
Negative
-
Breaking Changes (Mitigated)
- Tool names changed (backward compatibility added)
- New required parameters (defaults provided)
- Migration path: old names work, deprecation warnings logged
-
Increased Complexity
- More code for response optimization
- Additional parameters to document
- Token counting overhead (minimal: ~1-5ms)
-
Development Overhead
- Developers must understand format control
- Tool descriptions require more thought
- Testing complexity increases
Mitigation Strategies
-
Backward Compatibility
- Support old tool names indefinitely
- Default values for new parameters
- Gradual deprecation warnings
-
Documentation
- Update tool documentation with examples
- Create migration guide
- Document performance characteristics
-
Testing
- Add tests for response optimization
- Validate token counting accuracy
- Test backward compatibility
-
Monitoring
- Track tool usage by name (old vs new)
- Monitor truncation frequency
- Alert on token limit breaches
Metrics
Track the following to measure success:Performance Metrics
tool_response_tokens_total{tool, format}- Response sizestool_truncation_rate{tool}- Truncation frequencytool_call_duration_seconds{tool, format}- Performance impact
Usage Metrics
tool_calls_total{tool_name, version}- Adoption of new namestool_format_usage{format}- Concise vs detailed preferencesearch_result_count{has_query}- Search vs browse patterns
Quality Metrics
tool_error_rate{tool, error_type}- Error patternstool_retry_rate{tool}- Clarity of error messagesagent_satisfaction_score- Overall agent success rate
Migration Guide
For Tool Developers
-
Add Response Format Control:
-
Apply Response Formatting:
-
Use Search-Focused Patterns:
- Add
queryandlimitparameters - Filter before returning
- Provide helpful truncation messages
- Add
-
Enhance Descriptions:
- Include token limits
- Document response times
- Specify when NOT to use
- Provide usage examples
For Tool Users (Agents)
Old code continues to work:Related ADRs
References
- Anthropic: Writing Tools for Agents
- Model Context Protocol Specification
- tiktoken Documentation
- Pydantic Field Validation
Implementation Checklist
- Create ResponseOptimizer utility module
- Update ChatInput with response_format parameter
- Create SearchConversationsInput schema
- Rename tools with namespacing (with backward compat)
- Enhance tool descriptions
- Implement conversation_search handler
- Update agent_chat to use format_response
- Apply changes to both server_stdio and server_streamable
- Update tool documentation (docs/api-reference/mcp/tools.mdx)
- Add tests for ResponseOptimizer
- Add integration tests for new tool parameters
- Update examples to use new tool names
- Add metrics dashboards for new tracking
- Create migration guide for existing integrations