Major improvements to LiteLLM Discord bot with MCP (Model Context Protocol) tools support: Features added: - MCP tools discovery and integration with LiteLLM proxy - Fetch and convert 40+ GitHub MCP tools to OpenAI format - Tool calling flow with placeholder execution (pending MCP endpoint confirmation) - Dynamic tool injection based on LiteLLM MCP server configuration - Enhanced system prompt with tool usage guidance - Added ENABLE_TOOLS environment variable for easy toggle - Comprehensive debug logging for troubleshooting Technical changes: - Added httpx>=0.25.0 dependency for async MCP API calls - Implemented get_available_mcp_tools() to query /v1/mcp/server and /v1/mcp/tools endpoints - Convert MCP tool schemas to OpenAI function calling format - Detect and handle tool_calls in model responses - Added system_prompt.txt for customizable bot behavior - Updated README with better documentation and setup instructions - Created claude.md with detailed development notes and upgrade roadmap Configuration: - New ENABLE_TOOLS flag in .env to control MCP integration - DEBUG_LOGGING for detailed execution logs - System prompt file support for easy customization Known limitations: - Tool execution currently uses placeholders (MCP execution endpoint needs verification) - Limited to 50 tools to avoid overwhelming the model - Requires LiteLLM proxy with MCP server configured Next steps: - Verify correct LiteLLM MCP tool execution endpoint - Implement actual tool execution via MCP proxy - Test end-to-end GitHub operations through Discord 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
20 KiB
OpenWebUI Discord Bot - Upgrade Project
Project Overview
This Discord bot currently interfaces with OpenWebUI to provide AI-powered responses. The goal is to upgrade it to:
- Switch from OpenWebUI to LiteLLM Proxy as the backend
- Add MCP (Model Context Protocol) Tool Support
- Implement system prompt management within the application
Current Architecture
Files Structure
- Main bot: v2/bot.py - Current implementation
- Legacy bot: scripts/discordbot.py - Older version with slightly different approach
- Dependencies: v2/requirements.txt
- Config: v2/.env.example
Current Implementation Details
Bot Features (v2/bot.py)
- Discord Integration: Uses discord.py with message intents
- Trigger Methods:
- Bot mentions (@bot)
- Direct messages (DMs)
- Message History: Retrieves last 100 messages for context using
get_chat_history() - Image Support: Downloads and encodes images as base64, sends to API
- API Client: Uses OpenAI Python SDK pointing to OpenWebUI endpoint
- Message Format: Embeds chat history in user message context
Current Message Flow
- User mentions bot or DMs it
- Bot fetches channel history (last 100 messages)
- Formats history as:
"AuthorName: message content" - Sends to OpenWebUI with format:
{ "role": "user", "content": [ {"type": "text", "text": "##CONTEXT##\n{history}\n##ENDCONTEXT##\n\n{user_message}"}, {"type": "image_url", "image_url": {...}} # if images present ] } - Returns AI response and replies to user
Current Limitations
- No system prompt: Context is embedded in user messages
- No tool calling: Cannot execute functions or use MCPs
- OpenWebUI dependency: Tightly coupled to OpenWebUI API structure
- Simple history: Just text concatenation, no proper conversation threading
- Synchronous image download: Uses
requests.get()in async context (should use aiohttp)
Target Architecture: LiteLLM + MCP Tools
Why LiteLLM?
LiteLLM is a unified proxy that:
- Standardizes API calls across 100+ LLM providers (OpenAI, Anthropic, Google, etc.)
- Native tool/function calling support via OpenAI-compatible API
- Built-in MCP support for Model Context Protocol tools
- Load balancing and fallback between models
- Cost tracking and usage analytics
- Streaming support for real-time responses
LiteLLM Tool Calling
LiteLLM supports the OpenAI tools format:
response = client.chat.completions.create(
model="gpt-4",
messages=[...],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {...}
}
}],
tool_choice="auto"
)
MCP (Model Context Protocol) Overview
MCP is a standard protocol for:
- Exposing tools to LLMs (functions they can call)
- Providing resources (files, APIs, databases)
- Prompts/templates for consistent interactions
- Sampling for multi-step agentic behavior
MCP Server Examples:
filesystem: Read/write filesgithub: Access repos, create PRspostgres: Query databasesbrave-search: Web searchslack: Send messages, read channels
Upgrade Plan
Phase 1: Switch to LiteLLM Proxy
Configuration Changes
-
Update environment variables:
DISCORD_TOKEN=your_discord_bot_token LITELLM_API_KEY=your_litellm_api_key LITELLM_API_BASE=http://localhost:4000 # or your LiteLLM proxy URL MODEL_NAME=gpt-4-turbo-preview # or any LiteLLM-supported model SYSTEM_PROMPT=your_default_system_prompt # New! -
Keep using OpenAI SDK (LiteLLM is OpenAI-compatible):
from openai import OpenAI client = OpenAI( api_key=os.getenv('LITELLM_API_KEY'), base_url=os.getenv('LITELLM_API_BASE') )
Message Format Refactor
Current approach (embedding context in user message):
text_content = f"##CONTEXT##\n{context}\n##ENDCONTEXT##\n\n{user_message}"
messages = [{"role": "user", "content": text_content}]
New approach (proper conversation history):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
# ... previous conversation messages with proper roles ...
{"role": "user", "content": user_message}
]
Benefits
- Better model understanding of conversation structure
- Separate system instructions from conversation
- Proper role attribution (user vs assistant)
- More efficient token usage
Phase 2: Add System Prompt Management
Implementation Options
Option A: Simple Environment Variable
- Store in
.envfile - Good for: Single, static system prompt
- Example:
SYSTEM_PROMPT="You are a helpful Discord assistant..."
Option B: File-Based System Prompt
- Store in separate file (e.g.,
system_prompt.txt) - Good for: Long, complex prompts that need version control
- Hot-reload capability
Option C: Per-Channel/Per-Guild Prompts
- Store in JSON/database mapping channel_id → system_prompt
- Good for: Multi-tenant bot with different personalities per server
- Example:
{ "123456789": "You are a coding assistant...", "987654321": "You are a gaming buddy..." }
Option D: User-Configurable Prompts
- Discord slash commands to set/view system prompt
- Store in SQLite/JSON
- Commands:
/setprompt,/viewprompt,/resetprompt
Recommended: Start with Option B (file-based), add Option D later for flexibility.
System Prompt Best Practices
- Define bot personality: Tone, style, formality
- Set boundaries: What bot should/shouldn't do
- Provide context: "You are in a Discord server, users will mention you"
- Handle images: "When users attach images, describe them..."
- Tool usage guidance: "Use available tools when appropriate"
Example system prompt:
You are a helpful AI assistant integrated into Discord. Users will interact with you by mentioning you or sending direct messages.
Key behaviors:
- Be concise and friendly
- Use Discord markdown formatting when helpful (code blocks, bold, etc.)
- When users attach images, analyze them and provide relevant insights
- You have access to various tools - use them when they would help answer the user's question
- If you're unsure about something, say so
- Keep track of conversation context
You are not a human, and you should not pretend to be one. Be honest about your capabilities and limitations.
Phase 3: Implement MCP Tool Support
LiteLLM MCP Integration
LiteLLM can connect to MCP servers in two ways:
1. Via LiteLLM Proxy Configuration
Configure in litellm_config.yaml:
model_list:
- model_name: gpt-4-with-tools
litellm_params:
model: gpt-4-turbo-preview
api_key: os.environ/OPENAI_API_KEY
mcp_servers:
filesystem:
command: npx
args: [-y, @modelcontextprotocol/server-filesystem, /allowed/path]
github:
command: npx
args: [-y, @modelcontextprotocol/server-github]
env:
GITHUB_TOKEN: ${GITHUB_TOKEN}
2. Via Direct Tool Definitions in Bot Define tools manually in the bot code:
tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
}
]
response = client.chat.completions.create(
model=MODEL_NAME,
messages=messages,
tools=tools,
tool_choice="auto"
)
Tool Execution Flow
-
Send message with tools available:
response = client.chat.completions.create( model=MODEL_NAME, messages=messages, tools=available_tools ) -
Check if model wants to use a tool:
if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: function_name = tool_call.function.name arguments = json.loads(tool_call.function.arguments) # Execute the function result = execute_tool(function_name, arguments) -
Send tool results back to model:
messages.append({ "role": "assistant", "content": None, "tool_calls": response.choices[0].message.tool_calls }) messages.append({ "role": "tool", "content": json.dumps(result), "tool_call_id": tool_call.id }) # Get final response final_response = client.chat.completions.create( model=MODEL_NAME, messages=messages, tools=available_tools ) -
Return final response to user
Tool Implementation Patterns
Pattern 1: Bot-Managed Tools Implement tools directly in the bot:
async def search_web(query: str) -> str:
"""Execute web search"""
# Use requests/aiohttp to call search API
pass
async def get_weather(location: str) -> str:
"""Get weather for location"""
# Call weather API
pass
AVAILABLE_TOOLS = {
"search_web": search_web,
"get_weather": get_weather,
}
async def execute_tool(name: str, arguments: dict) -> str:
if name in AVAILABLE_TOOLS:
return await AVAILABLE_TOOLS[name](**arguments)
return "Tool not found"
Pattern 2: MCP Server Proxy Let LiteLLM proxy handle MCP servers (recommended):
- Configure MCP servers in LiteLLM config
- LiteLLM automatically exposes them as tools
- Bot just passes tool calls through
- Simpler bot code, more scalable
Pattern 3: Hybrid
- Common tools via LiteLLM proxy MCP
- Discord-specific tools in bot (e.g., "get_server_info", "list_channels")
Recommended Starter Tools
-
Web Search (via Brave/Google MCP server)
- Let bot search for current information
-
File Operations (via filesystem MCP server - with restrictions!)
- Read documentation, configs
- Useful in developer-focused servers
-
Wikipedia (via wikipedia MCP server)
- Factual information lookup
-
Time/Date (custom function)
- Simple, no external dependency
-
Discord Server Info (custom function)
- Get channel list, member count, server info
- Discord-specific utility
Phase 4: Improve Message History Management
Current Issues
- Fetches all messages every time (inefficient)
- No conversation threading (treats all channel messages as one context)
- No token limit awareness
- Channel history might contain irrelevant conversations
Improvements
1. Per-Conversation Threading
# Track conversations by thread or by user
conversation_storage = {
"channel_id:user_id": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
]
}
2. Token-Aware History Truncation
def trim_history(messages, max_tokens=4000):
"""Keep only recent messages that fit in token budget"""
# Use tiktoken to count tokens
# Remove oldest messages until under limit
pass
3. Message Deduplication Only include messages directly related to bot conversations:
- Messages mentioning bot
- Bot's responses
- Optionally: X messages before each bot mention for context
4. Caching & Persistence
- Cache conversation history in memory
- Optional: Persist to SQLite/Redis for bot restarts
- Clear old conversations after inactivity
Implementation Checklist
Preparation
- Set up LiteLLM proxy locally or remotely
- Configure LiteLLM with desired model(s)
- Decide on MCP servers to enable
- Design system prompt strategy
- Review token limits for target models
Code Changes
File: v2/bot.py
- Update imports (add
json, improveaiohttpusage) - Change environment variables:
OPENWEBUI_API_BASE→LITELLM_API_BASE- Add
SYSTEM_PROMPTorSYSTEM_PROMPT_FILE
- Update OpenAI client initialization
- Refactor
get_ai_response():- Add system message
- Convert history to proper message format (alternating user/assistant)
- Add tool support parameters
- Implement tool execution loop
- Refactor
get_chat_history():- Return structured messages instead of text concatenation
- Filter for bot-relevant messages
- Add token counting/truncation
- Fix
download_image()to use aiohttp instead of requests - Add tool definition functions
- Add tool execution handler
- Add error handling for tool failures
New File: v2/tools.py (optional)
- Define tool schemas
- Implement tool execution functions
- Export tool registry
New File: v2/system_prompt.txt or system_prompts.json
- Write default system prompt
- Optional: Add per-guild prompts
File: v2/requirements.txt
- Keep:
discord.py,openai,python-dotenv - Add:
aiohttp(if not using requests),tiktoken(for token counting) - Optional:
anthropic(if using Claude directly),litellm(if using SDK directly)
File: v2/.env.example
- Update variable names
- Add system prompt variables
- Document new configuration options
Testing
- Test basic message responses (no tools)
- Test with images attached
- Test tool calling with simple tool (e.g., get_time)
- Test tool calling with external MCP server
- Test conversation threading
- Test token limit handling
- Test error scenarios (API down, tool failure, etc.)
- Test in multiple Discord servers/channels
Documentation
- Update README.md with new setup instructions
- Document LiteLLM proxy setup
- Document MCP server configuration
- Add example system prompts
- Document available tools
- Add troubleshooting section
Technical Considerations
Token Management
- Most models have 4k-128k token context windows
- Message history can quickly consume tokens
- Reserve tokens for:
- System prompt: ~500-1000 tokens
- Tool definitions: ~100-500 tokens per tool
- Response: ~1000-2000 tokens
- History: remaining tokens
Rate Limiting
- Discord: 5 requests per 5 seconds per channel
- LLM APIs: Varies by provider (OpenAI: ~3500 RPM for GPT-4)
- Implement queuing if needed
Error Handling
- API timeouts: Retry with exponential backoff
- Tool execution failures: Return error message to model
- Discord API errors: Log and notify user
- Invalid tool calls: Validate before execution
Security Considerations
- Tool access control: Don't expose dangerous tools (file delete, system commands)
- Input validation: Sanitize tool arguments
- Rate limiting: Prevent abuse of expensive tools (web search)
- API key security: Never log or expose API keys
- MCP filesystem access: Restrict to safe directories only
Cost Optimization
- Use smaller models for simple queries (gpt-3.5-turbo)
- Implement streaming for better UX
- Cache common queries
- Trim history aggressively
- Consider LiteLLM's caching features
Future Enhancements
Short Term
- Add slash commands for bot configuration
- Implement conversation reset command
- Add support for Discord threads
- Stream responses for long outputs
- Add reaction-based tool approval (user confirms before execution)
Medium Term
- Multi-modal support (voice, more image formats)
- Per-user conversation isolation
- Tool usage analytics and logging
- Custom MCP server for Discord-specific tools
- Web dashboard for bot management
Long Term
- Agentic workflows (multi-step tool usage)
- Memory/RAG for long-term context
- Multiple bot personalities per server
- Integration with Discord's scheduled events
- Voice channel integration (TTS/STT)
Resources
Documentation
- LiteLLM Docs: https://docs.litellm.ai/
- LiteLLM Tools/Functions: https://docs.litellm.ai/docs/completion/function_call
- MCP Specification: https://modelcontextprotocol.io/
- MCP Server Examples: https://github.com/modelcontextprotocol/servers
- Discord.py Docs: https://discordpy.readthedocs.io/
- OpenAI API Docs: https://platform.openai.com/docs/guides/function-calling
Example MCP Servers
@modelcontextprotocol/server-filesystem: File operations@modelcontextprotocol/server-github: GitHub integration@modelcontextprotocol/server-postgres: Database queries@modelcontextprotocol/server-brave-search: Web search@modelcontextprotocol/server-slack: Slack integration@modelcontextprotocol/server-memory: Persistent memory
Tools for Development
- tiktoken: Token counting (OpenAI tokenizer)
- litellm CLI:
litellm --model gpt-4 --drop_paramsfor testing - Postman: Test LiteLLM API endpoints
- Docker: Containerize LiteLLM proxy
Questions to Resolve
-
Which LiteLLM deployment?
- Self-hosted proxy (more control, more maintenance)
- Hosted service (easier, potential cost)
-
Which models to support?
- Single model (simpler)
- Multiple models with fallback (more robust)
- User-selectable models (more flexible)
-
MCP server hosting?
- Same machine as bot
- Separate server
- Cloud functions
-
System prompt strategy?
- Single global prompt
- Per-guild prompts
- User-configurable
-
Tool approval flow?
- Automatic execution (faster but riskier)
- User confirmation for sensitive tools (safer but slower)
-
Conversation persistence?
- In-memory only (simple, lost on restart)
- SQLite (persistent, moderate complexity)
- Redis (distributed, more setup)
Current Code Analysis
v2/bot.py Strengths
- Clean, simple structure
- Proper async/await usage
- Good image handling
- Type hints in newer version
v2/bot.py Issues to Fix
- Line 44: Using synchronous
requests.get()in async function - Lines 62-77: Embedding history in user message instead of proper conversation format
- Line 41:
channel_historydict declared but never used - No error handling for OpenAI API errors besides generic try/catch
- No rate limiting
- No conversation threading
- History includes ALL channel messages, not just bot-relevant ones
- No system prompt support
scripts/discordbot.py Differences
- Has system message (line 67) - better approach!
- Slightly different message structure
- Otherwise similar implementation
Recommended Migration Path
Step 1: Quick wins (minimal changes)
- Add system prompt support using
scripts/discordbot.pypattern - Fix async image download (use aiohttp)
- Update env vars and client to point to LiteLLM
Step 2: Core refactor (moderate changes)
- Refactor message history to proper conversation format
- Implement token-aware history truncation
- Add basic tool support infrastructure
Step 3: Tool integration (significant changes)
- Define initial tool set
- Implement tool execution loop
- Add error handling for tool failures
Step 4: Polish (incremental improvements)
- Add slash commands for configuration
- Improve conversation management
- Add monitoring and logging
This approach allows you to test at each step and provides incremental value.
Getting Started
When you're ready to begin implementation:
-
Set up LiteLLM proxy:
pip install litellm litellm --model gpt-4 --drop_params # Or use Docker: docker run -p 4000:4000 ghcr.io/berriai/litellm:main -
Test LiteLLM endpoint:
curl -X POST http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}' -
Start with system prompt: Implement system prompt support first as low-risk improvement
-
Iterate on tools: Start with one simple tool, then expand
Let me know which phase you'd like to tackle first!