Files
OpenWebUI-Discordbot/claude.md
Josh Knapp 408028c36e
All checks were successful
OpenWebUI Discord Bot / Build-and-Push (push) Successful in 1m2s
Add MCP tools integration for Discord bot
Major improvements to LiteLLM Discord bot with MCP (Model Context Protocol) tools support:

Features added:
- MCP tools discovery and integration with LiteLLM proxy
- Fetch and convert 40+ GitHub MCP tools to OpenAI format
- Tool calling flow with placeholder execution (pending MCP endpoint confirmation)
- Dynamic tool injection based on LiteLLM MCP server configuration
- Enhanced system prompt with tool usage guidance
- Added ENABLE_TOOLS environment variable for easy toggle
- Comprehensive debug logging for troubleshooting

Technical changes:
- Added httpx>=0.25.0 dependency for async MCP API calls
- Implemented get_available_mcp_tools() to query /v1/mcp/server and /v1/mcp/tools endpoints
- Convert MCP tool schemas to OpenAI function calling format
- Detect and handle tool_calls in model responses
- Added system_prompt.txt for customizable bot behavior
- Updated README with better documentation and setup instructions
- Created claude.md with detailed development notes and upgrade roadmap

Configuration:
- New ENABLE_TOOLS flag in .env to control MCP integration
- DEBUG_LOGGING for detailed execution logs
- System prompt file support for easy customization

Known limitations:
- Tool execution currently uses placeholders (MCP execution endpoint needs verification)
- Limited to 50 tools to avoid overwhelming the model
- Requires LiteLLM proxy with MCP server configured

Next steps:
- Verify correct LiteLLM MCP tool execution endpoint
- Implement actual tool execution via MCP proxy
- Test end-to-end GitHub operations through Discord

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 11:26:01 -08:00

20 KiB

OpenWebUI Discord Bot - Upgrade Project

Project Overview

This Discord bot currently interfaces with OpenWebUI to provide AI-powered responses. The goal is to upgrade it to:

  1. Switch from OpenWebUI to LiteLLM Proxy as the backend
  2. Add MCP (Model Context Protocol) Tool Support
  3. Implement system prompt management within the application

Current Architecture

Files Structure

Current Implementation Details

Bot Features (v2/bot.py)

  • Discord Integration: Uses discord.py with message intents
  • Trigger Methods:
    • Bot mentions (@bot)
    • Direct messages (DMs)
  • Message History: Retrieves last 100 messages for context using get_chat_history()
  • Image Support: Downloads and encodes images as base64, sends to API
  • API Client: Uses OpenAI Python SDK pointing to OpenWebUI endpoint
  • Message Format: Embeds chat history in user message context

Current Message Flow

  1. User mentions bot or DMs it
  2. Bot fetches channel history (last 100 messages)
  3. Formats history as: "AuthorName: message content"
  4. Sends to OpenWebUI with format:
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "##CONTEXT##\n{history}\n##ENDCONTEXT##\n\n{user_message}"},
        {"type": "image_url", "image_url": {...}}  # if images present
      ]
    }
    
  5. Returns AI response and replies to user

Current Limitations

  • No system prompt: Context is embedded in user messages
  • No tool calling: Cannot execute functions or use MCPs
  • OpenWebUI dependency: Tightly coupled to OpenWebUI API structure
  • Simple history: Just text concatenation, no proper conversation threading
  • Synchronous image download: Uses requests.get() in async context (should use aiohttp)

Target Architecture: LiteLLM + MCP Tools

Why LiteLLM?

LiteLLM is a unified proxy that:

  • Standardizes API calls across 100+ LLM providers (OpenAI, Anthropic, Google, etc.)
  • Native tool/function calling support via OpenAI-compatible API
  • Built-in MCP support for Model Context Protocol tools
  • Load balancing and fallback between models
  • Cost tracking and usage analytics
  • Streaming support for real-time responses

LiteLLM Tool Calling

LiteLLM supports the OpenAI tools format:

response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {...}
        }
    }],
    tool_choice="auto"
)

MCP (Model Context Protocol) Overview

MCP is a standard protocol for:

  • Exposing tools to LLMs (functions they can call)
  • Providing resources (files, APIs, databases)
  • Prompts/templates for consistent interactions
  • Sampling for multi-step agentic behavior

MCP Server Examples:

  • filesystem: Read/write files
  • github: Access repos, create PRs
  • postgres: Query databases
  • brave-search: Web search
  • slack: Send messages, read channels

Upgrade Plan

Phase 1: Switch to LiteLLM Proxy

Configuration Changes

  1. Update environment variables:

    DISCORD_TOKEN=your_discord_bot_token
    LITELLM_API_KEY=your_litellm_api_key
    LITELLM_API_BASE=http://localhost:4000  # or your LiteLLM proxy URL
    MODEL_NAME=gpt-4-turbo-preview  # or any LiteLLM-supported model
    SYSTEM_PROMPT=your_default_system_prompt  # New!
    
  2. Keep using OpenAI SDK (LiteLLM is OpenAI-compatible):

    from openai import OpenAI
    
    client = OpenAI(
        api_key=os.getenv('LITELLM_API_KEY'),
        base_url=os.getenv('LITELLM_API_BASE')
    )
    

Message Format Refactor

Current approach (embedding context in user message):

text_content = f"##CONTEXT##\n{context}\n##ENDCONTEXT##\n\n{user_message}"
messages = [{"role": "user", "content": text_content}]

New approach (proper conversation history):

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    # ... previous conversation messages with proper roles ...
    {"role": "user", "content": user_message}
]

Benefits

  • Better model understanding of conversation structure
  • Separate system instructions from conversation
  • Proper role attribution (user vs assistant)
  • More efficient token usage

Phase 2: Add System Prompt Management

Implementation Options

Option A: Simple Environment Variable

  • Store in .env file
  • Good for: Single, static system prompt
  • Example: SYSTEM_PROMPT="You are a helpful Discord assistant..."

Option B: File-Based System Prompt

  • Store in separate file (e.g., system_prompt.txt)
  • Good for: Long, complex prompts that need version control
  • Hot-reload capability

Option C: Per-Channel/Per-Guild Prompts

  • Store in JSON/database mapping channel_id → system_prompt
  • Good for: Multi-tenant bot with different personalities per server
  • Example:
    {
      "123456789": "You are a coding assistant...",
      "987654321": "You are a gaming buddy..."
    }
    

Option D: User-Configurable Prompts

  • Discord slash commands to set/view system prompt
  • Store in SQLite/JSON
  • Commands: /setprompt, /viewprompt, /resetprompt

Recommended: Start with Option B (file-based), add Option D later for flexibility.

System Prompt Best Practices

  1. Define bot personality: Tone, style, formality
  2. Set boundaries: What bot should/shouldn't do
  3. Provide context: "You are in a Discord server, users will mention you"
  4. Handle images: "When users attach images, describe them..."
  5. Tool usage guidance: "Use available tools when appropriate"

Example system prompt:

You are a helpful AI assistant integrated into Discord. Users will interact with you by mentioning you or sending direct messages.

Key behaviors:
- Be concise and friendly
- Use Discord markdown formatting when helpful (code blocks, bold, etc.)
- When users attach images, analyze them and provide relevant insights
- You have access to various tools - use them when they would help answer the user's question
- If you're unsure about something, say so
- Keep track of conversation context

You are not a human, and you should not pretend to be one. Be honest about your capabilities and limitations.

Phase 3: Implement MCP Tool Support

LiteLLM MCP Integration

LiteLLM can connect to MCP servers in two ways:

1. Via LiteLLM Proxy Configuration Configure in litellm_config.yaml:

model_list:
  - model_name: gpt-4-with-tools
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

mcp_servers:
  filesystem:
    command: npx
    args: [-y, @modelcontextprotocol/server-filesystem, /allowed/path]
  github:
    command: npx
    args: [-y, @modelcontextprotocol/server-github]
    env:
      GITHUB_TOKEN: ${GITHUB_TOKEN}

2. Via Direct Tool Definitions in Bot Define tools manually in the bot code:

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

Tool Execution Flow

  1. Send message with tools available:

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        tools=available_tools
    )
    
  2. Check if model wants to use a tool:

    if response.choices[0].message.tool_calls:
        for tool_call in response.choices[0].message.tool_calls:
            function_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)
            # Execute the function
            result = execute_tool(function_name, arguments)
    
  3. Send tool results back to model:

    messages.append({
        "role": "assistant",
        "content": None,
        "tool_calls": response.choices[0].message.tool_calls
    })
    messages.append({
        "role": "tool",
        "content": json.dumps(result),
        "tool_call_id": tool_call.id
    })
    
    # Get final response
    final_response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        tools=available_tools
    )
    
  4. Return final response to user

Tool Implementation Patterns

Pattern 1: Bot-Managed Tools Implement tools directly in the bot:

async def search_web(query: str) -> str:
    """Execute web search"""
    # Use requests/aiohttp to call search API
    pass

async def get_weather(location: str) -> str:
    """Get weather for location"""
    # Call weather API
    pass

AVAILABLE_TOOLS = {
    "search_web": search_web,
    "get_weather": get_weather,
}

async def execute_tool(name: str, arguments: dict) -> str:
    if name in AVAILABLE_TOOLS:
        return await AVAILABLE_TOOLS[name](**arguments)
    return "Tool not found"

Pattern 2: MCP Server Proxy Let LiteLLM proxy handle MCP servers (recommended):

  • Configure MCP servers in LiteLLM config
  • LiteLLM automatically exposes them as tools
  • Bot just passes tool calls through
  • Simpler bot code, more scalable

Pattern 3: Hybrid

  • Common tools via LiteLLM proxy MCP
  • Discord-specific tools in bot (e.g., "get_server_info", "list_channels")
  1. Web Search (via Brave/Google MCP server)

    • Let bot search for current information
  2. File Operations (via filesystem MCP server - with restrictions!)

    • Read documentation, configs
    • Useful in developer-focused servers
  3. Wikipedia (via wikipedia MCP server)

    • Factual information lookup
  4. Time/Date (custom function)

    • Simple, no external dependency
  5. Discord Server Info (custom function)

    • Get channel list, member count, server info
    • Discord-specific utility

Phase 4: Improve Message History Management

Current Issues

  • Fetches all messages every time (inefficient)
  • No conversation threading (treats all channel messages as one context)
  • No token limit awareness
  • Channel history might contain irrelevant conversations

Improvements

1. Per-Conversation Threading

# Track conversations by thread or by user
conversation_storage = {
    "channel_id:user_id": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."},
    ]
}

2. Token-Aware History Truncation

def trim_history(messages, max_tokens=4000):
    """Keep only recent messages that fit in token budget"""
    # Use tiktoken to count tokens
    # Remove oldest messages until under limit
    pass

3. Message Deduplication Only include messages directly related to bot conversations:

  • Messages mentioning bot
  • Bot's responses
  • Optionally: X messages before each bot mention for context

4. Caching & Persistence

  • Cache conversation history in memory
  • Optional: Persist to SQLite/Redis for bot restarts
  • Clear old conversations after inactivity

Implementation Checklist

Preparation

  • Set up LiteLLM proxy locally or remotely
  • Configure LiteLLM with desired model(s)
  • Decide on MCP servers to enable
  • Design system prompt strategy
  • Review token limits for target models

Code Changes

File: v2/bot.py

  • Update imports (add json, improve aiohttp usage)
  • Change environment variables:
    • OPENWEBUI_API_BASELITELLM_API_BASE
    • Add SYSTEM_PROMPT or SYSTEM_PROMPT_FILE
  • Update OpenAI client initialization
  • Refactor get_ai_response():
    • Add system message
    • Convert history to proper message format (alternating user/assistant)
    • Add tool support parameters
    • Implement tool execution loop
  • Refactor get_chat_history():
    • Return structured messages instead of text concatenation
    • Filter for bot-relevant messages
    • Add token counting/truncation
  • Fix download_image() to use aiohttp instead of requests
  • Add tool definition functions
  • Add tool execution handler
  • Add error handling for tool failures

New File: v2/tools.py (optional)

  • Define tool schemas
  • Implement tool execution functions
  • Export tool registry

New File: v2/system_prompt.txt or system_prompts.json

  • Write default system prompt
  • Optional: Add per-guild prompts

File: v2/requirements.txt

  • Keep: discord.py, openai, python-dotenv
  • Add: aiohttp (if not using requests), tiktoken (for token counting)
  • Optional: anthropic (if using Claude directly), litellm (if using SDK directly)

File: v2/.env.example

  • Update variable names
  • Add system prompt variables
  • Document new configuration options

Testing

  • Test basic message responses (no tools)
  • Test with images attached
  • Test tool calling with simple tool (e.g., get_time)
  • Test tool calling with external MCP server
  • Test conversation threading
  • Test token limit handling
  • Test error scenarios (API down, tool failure, etc.)
  • Test in multiple Discord servers/channels

Documentation

  • Update README.md with new setup instructions
  • Document LiteLLM proxy setup
  • Document MCP server configuration
  • Add example system prompts
  • Document available tools
  • Add troubleshooting section

Technical Considerations

Token Management

  • Most models have 4k-128k token context windows
  • Message history can quickly consume tokens
  • Reserve tokens for:
    • System prompt: ~500-1000 tokens
    • Tool definitions: ~100-500 tokens per tool
    • Response: ~1000-2000 tokens
    • History: remaining tokens

Rate Limiting

  • Discord: 5 requests per 5 seconds per channel
  • LLM APIs: Varies by provider (OpenAI: ~3500 RPM for GPT-4)
  • Implement queuing if needed

Error Handling

  • API timeouts: Retry with exponential backoff
  • Tool execution failures: Return error message to model
  • Discord API errors: Log and notify user
  • Invalid tool calls: Validate before execution

Security Considerations

  • Tool access control: Don't expose dangerous tools (file delete, system commands)
  • Input validation: Sanitize tool arguments
  • Rate limiting: Prevent abuse of expensive tools (web search)
  • API key security: Never log or expose API keys
  • MCP filesystem access: Restrict to safe directories only

Cost Optimization

  • Use smaller models for simple queries (gpt-3.5-turbo)
  • Implement streaming for better UX
  • Cache common queries
  • Trim history aggressively
  • Consider LiteLLM's caching features

Future Enhancements

Short Term

  • Add slash commands for bot configuration
  • Implement conversation reset command
  • Add support for Discord threads
  • Stream responses for long outputs
  • Add reaction-based tool approval (user confirms before execution)

Medium Term

  • Multi-modal support (voice, more image formats)
  • Per-user conversation isolation
  • Tool usage analytics and logging
  • Custom MCP server for Discord-specific tools
  • Web dashboard for bot management

Long Term

  • Agentic workflows (multi-step tool usage)
  • Memory/RAG for long-term context
  • Multiple bot personalities per server
  • Integration with Discord's scheduled events
  • Voice channel integration (TTS/STT)

Resources

Documentation

Example MCP Servers

  • @modelcontextprotocol/server-filesystem: File operations
  • @modelcontextprotocol/server-github: GitHub integration
  • @modelcontextprotocol/server-postgres: Database queries
  • @modelcontextprotocol/server-brave-search: Web search
  • @modelcontextprotocol/server-slack: Slack integration
  • @modelcontextprotocol/server-memory: Persistent memory

Tools for Development

  • tiktoken: Token counting (OpenAI tokenizer)
  • litellm CLI: litellm --model gpt-4 --drop_params for testing
  • Postman: Test LiteLLM API endpoints
  • Docker: Containerize LiteLLM proxy

Questions to Resolve

  1. Which LiteLLM deployment?

    • Self-hosted proxy (more control, more maintenance)
    • Hosted service (easier, potential cost)
  2. Which models to support?

    • Single model (simpler)
    • Multiple models with fallback (more robust)
    • User-selectable models (more flexible)
  3. MCP server hosting?

    • Same machine as bot
    • Separate server
    • Cloud functions
  4. System prompt strategy?

    • Single global prompt
    • Per-guild prompts
    • User-configurable
  5. Tool approval flow?

    • Automatic execution (faster but riskier)
    • User confirmation for sensitive tools (safer but slower)
  6. Conversation persistence?

    • In-memory only (simple, lost on restart)
    • SQLite (persistent, moderate complexity)
    • Redis (distributed, more setup)

Current Code Analysis

v2/bot.py Strengths

  • Clean, simple structure
  • Proper async/await usage
  • Good image handling
  • Type hints in newer version

v2/bot.py Issues to Fix

  • Line 44: Using synchronous requests.get() in async function
  • Lines 62-77: Embedding history in user message instead of proper conversation format
  • Line 41: channel_history dict declared but never used
  • No error handling for OpenAI API errors besides generic try/catch
  • No rate limiting
  • No conversation threading
  • History includes ALL channel messages, not just bot-relevant ones
  • No system prompt support

scripts/discordbot.py Differences

  • Has system message (line 67) - better approach!
  • Slightly different message structure
  • Otherwise similar implementation

Step 1: Quick wins (minimal changes)

  1. Add system prompt support using scripts/discordbot.py pattern
  2. Fix async image download (use aiohttp)
  3. Update env vars and client to point to LiteLLM

Step 2: Core refactor (moderate changes)

  1. Refactor message history to proper conversation format
  2. Implement token-aware history truncation
  3. Add basic tool support infrastructure

Step 3: Tool integration (significant changes)

  1. Define initial tool set
  2. Implement tool execution loop
  3. Add error handling for tool failures

Step 4: Polish (incremental improvements)

  1. Add slash commands for configuration
  2. Improve conversation management
  3. Add monitoring and logging

This approach allows you to test at each step and provides incremental value.


Getting Started

When you're ready to begin implementation:

  1. Set up LiteLLM proxy:

    pip install litellm
    litellm --model gpt-4 --drop_params
    # Or use Docker: docker run -p 4000:4000 ghcr.io/berriai/litellm:main
    
  2. Test LiteLLM endpoint:

    curl -X POST http://localhost:4000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
    
  3. Start with system prompt: Implement system prompt support first as low-risk improvement

  4. Iterate on tools: Start with one simple tool, then expand

Let me know which phase you'd like to tackle first!