claude.md

# OpenWebUI Discord Bot - Upgrade Project

## Project Overview

This Discord bot currently interfaces with OpenWebUI to provide AI-powered responses. The goal is to upgrade it to:
1. **Switch from OpenWebUI to LiteLLM Proxy** as the backend
2. **Add MCP (Model Context Protocol) Tool Support**
3. **Implement system prompt management within the application**

## Current Architecture

### Files Structure
- **Main bot**: [v2/bot.py](v2/bot.py) - Current implementation
- **Legacy bot**: [scripts/discordbot.py](scripts/discordbot.py) - Older version with slightly different approach
- **Dependencies**: [v2/requirements.txt](v2/requirements.txt)
- **Config**: [v2/.env.example](v2/.env.example)

### Current Implementation Details

#### Bot Features (v2/bot.py)
- **Discord Integration**: Uses discord.py with message intents
- **Trigger Methods**:
  - Bot mentions (@bot)
  - Direct messages (DMs)
- **Message History**: Retrieves last 100 messages for context using `get_chat_history()`
- **Image Support**: Downloads and encodes images as base64, sends to API
- **API Client**: Uses OpenAI Python SDK pointing to OpenWebUI endpoint
- **Message Format**: Embeds chat history in user message context

#### Current Message Flow
1. User mentions bot or DMs it
2. Bot fetches channel history (last 100 messages)
3. Formats history as: `"AuthorName: message content"`
4. Sends to OpenWebUI with format:
   ```python
   {
     "role": "user",
     "content": [
       {"type": "text", "text": "##CONTEXT##\n{history}\n##ENDCONTEXT##\n\n{user_message}"},
       {"type": "image_url", "image_url": {...}}  # if images present
     ]
   }
   ```
5. Returns AI response and replies to user

#### Current Limitations
- **No system prompt**: Context is embedded in user messages
- **No tool calling**: Cannot execute functions or use MCPs
- **OpenWebUI dependency**: Tightly coupled to OpenWebUI API structure
- **Simple history**: Just text concatenation, no proper conversation threading
- **Synchronous image download**: Uses `requests.get()` in async context (should use aiohttp)

## Target Architecture: LiteLLM + MCP Tools

### Why LiteLLM?

LiteLLM is a unified proxy that:
- **Standardizes API calls** across 100+ LLM providers (OpenAI, Anthropic, Google, etc.)
- **Native tool/function calling support** via OpenAI-compatible API
- **Built-in MCP support** for Model Context Protocol tools
- **Load balancing** and fallback between models
- **Cost tracking** and usage analytics
- **Streaming support** for real-time responses

### LiteLLM Tool Calling

LiteLLM supports the OpenAI tools format:
```python
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {...}
        }
    }],
    tool_choice="auto"
)
```

### MCP (Model Context Protocol) Overview

MCP is a standard protocol for:
- **Exposing tools** to LLMs (functions they can call)
- **Providing resources** (files, APIs, databases)
- **Prompts/templates** for consistent interactions
- **Sampling** for multi-step agentic behavior

**MCP Server Examples**:
- `filesystem`: Read/write files
- `github`: Access repos, create PRs
- `postgres`: Query databases
- `brave-search`: Web search
- `slack`: Send messages, read channels

## Upgrade Plan

### Phase 1: Switch to LiteLLM Proxy

#### Configuration Changes
1. Update environment variables:
   ```env
   DISCORD_TOKEN=your_discord_bot_token
   LITELLM_API_KEY=your_litellm_api_key
   LITELLM_API_BASE=http://localhost:4000  # or your LiteLLM proxy URL
   MODEL_NAME=gpt-4-turbo-preview  # or any LiteLLM-supported model
   SYSTEM_PROMPT=your_default_system_prompt  # New!
   ```

2. Keep using OpenAI SDK (LiteLLM is OpenAI-compatible):
   ```python
   from openai import OpenAI

   client = OpenAI(
       api_key=os.getenv('LITELLM_API_KEY'),
       base_url=os.getenv('LITELLM_API_BASE')
   )
   ```

#### Message Format Refactor
**Current approach** (embedding context in user message):
```python
text_content = f"##CONTEXT##\n{context}\n##ENDCONTEXT##\n\n{user_message}"
messages = [{"role": "user", "content": text_content}]
```

**New approach** (proper conversation history):
```python
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    # ... previous conversation messages with proper roles ...
    {"role": "user", "content": user_message}
]
```

#### Benefits
- Better model understanding of conversation structure
- Separate system instructions from conversation
- Proper role attribution (user vs assistant)
- More efficient token usage

### Phase 2: Add System Prompt Management

#### Implementation Options

**Option A: Simple Environment Variable**
- Store in `.env` file
- Good for: Single, static system prompt
- Example: `SYSTEM_PROMPT="You are a helpful Discord assistant..."`

**Option B: File-Based System Prompt**
- Store in separate file (e.g., `system_prompt.txt`)
- Good for: Long, complex prompts that need version control
- Hot-reload capability

**Option C: Per-Channel/Per-Guild Prompts**
- Store in JSON/database mapping channel_id → system_prompt
- Good for: Multi-tenant bot with different personalities per server
- Example:
  ```json
  {
    "123456789": "You are a coding assistant...",
    "987654321": "You are a gaming buddy..."
  }
  ```

**Option D: User-Configurable Prompts**
- Discord slash commands to set/view system prompt
- Store in SQLite/JSON
- Commands: `/setprompt`, `/viewprompt`, `/resetprompt`

**Recommended**: Start with Option B (file-based), add Option D later for flexibility.

#### System Prompt Best Practices
1. **Define bot personality**: Tone, style, formality
2. **Set boundaries**: What bot should/shouldn't do
3. **Provide context**: "You are in a Discord server, users will mention you"
4. **Handle images**: "When users attach images, describe them..."
5. **Tool usage guidance**: "Use available tools when appropriate"

Example system prompt:
```
You are a helpful AI assistant integrated into Discord. Users will interact with you by mentioning you or sending direct messages.

Key behaviors:
- Be concise and friendly
- Use Discord markdown formatting when helpful (code blocks, bold, etc.)
- When users attach images, analyze them and provide relevant insights
- You have access to various tools - use them when they would help answer the user's question
- If you're unsure about something, say so
- Keep track of conversation context

You are not a human, and you should not pretend to be one. Be honest about your capabilities and limitations.
```

### Phase 3: Implement MCP Tool Support

#### LiteLLM MCP Integration

LiteLLM can connect to MCP servers in two ways:

**1. Via LiteLLM Proxy Configuration**
Configure in `litellm_config.yaml`:
```yaml
model_list:
  - model_name: gpt-4-with-tools
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

mcp_servers:
  filesystem:
    command: npx
    args: [-y, @modelcontextprotocol/server-filesystem, /allowed/path]
  github:
    command: npx
    args: [-y, @modelcontextprotocol/server-github]
    env:
      GITHUB_TOKEN: ${GITHUB_TOKEN}
```

**2. Via Direct Tool Definitions in Bot**
Define tools manually in the bot code:
```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=messages,
    tools=tools,
    tool_choice="auto"
)
```

#### Tool Execution Flow

1. **Send message with tools available**:
   ```python
   response = client.chat.completions.create(
       model=MODEL_NAME,
       messages=messages,
       tools=available_tools
   )
   ```

2. **Check if model wants to use a tool**:
   ```python
   if response.choices[0].message.tool_calls:
       for tool_call in response.choices[0].message.tool_calls:
           function_name = tool_call.function.name
           arguments = json.loads(tool_call.function.arguments)
           # Execute the function
           result = execute_tool(function_name, arguments)
   ```

3. **Send tool results back to model**:
   ```python
   messages.append({
       "role": "assistant",
       "content": None,
       "tool_calls": response.choices[0].message.tool_calls
   })
   messages.append({
       "role": "tool",
       "content": json.dumps(result),
       "tool_call_id": tool_call.id
   })

   # Get final response
   final_response = client.chat.completions.create(
       model=MODEL_NAME,
       messages=messages,
       tools=available_tools
   )
   ```

4. **Return final response to user**

#### Tool Implementation Patterns

**Pattern 1: Bot-Managed Tools**
Implement tools directly in the bot:
```python
async def search_web(query: str) -> str:
    """Execute web search"""
    # Use requests/aiohttp to call search API
    pass

async def get_weather(location: str) -> str:
    """Get weather for location"""
    # Call weather API
    pass

AVAILABLE_TOOLS = {
    "search_web": search_web,
    "get_weather": get_weather,
}

async def execute_tool(name: str, arguments: dict) -> str:
    if name in AVAILABLE_TOOLS:
        return await AVAILABLE_TOOLS[name](**arguments)
    return "Tool not found"
```

**Pattern 2: MCP Server Proxy**
Let LiteLLM proxy handle MCP servers (recommended):
- Configure MCP servers in LiteLLM config
- LiteLLM automatically exposes them as tools
- Bot just passes tool calls through
- Simpler bot code, more scalable

**Pattern 3: Hybrid**
- Common tools via LiteLLM proxy MCP
- Discord-specific tools in bot (e.g., "get_server_info", "list_channels")

#### Recommended Starter Tools

1. **Web Search** (via Brave/Google MCP server)
   - Let bot search for current information

2. **File Operations** (via filesystem MCP server - with restrictions!)
   - Read documentation, configs
   - Useful in developer-focused servers

3. **Wikipedia** (via wikipedia MCP server)
   - Factual information lookup

4. **Time/Date** (custom function)
   - Simple, no external dependency

5. **Discord Server Info** (custom function)
   - Get channel list, member count, server info
   - Discord-specific utility

### Phase 4: Improve Message History Management

#### Current Issues
- Fetches all messages every time (inefficient)
- No conversation threading (treats all channel messages as one context)
- No token limit awareness
- Channel history might contain irrelevant conversations

#### Improvements

**1. Per-Conversation Threading**
```python
# Track conversations by thread or by user
conversation_storage = {
    "channel_id:user_id": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."},
    ]
}
```

**2. Token-Aware History Truncation**
```python
def trim_history(messages, max_tokens=4000):
    """Keep only recent messages that fit in token budget"""
    # Use tiktoken to count tokens
    # Remove oldest messages until under limit
    pass
```

**3. Message Deduplication**
Only include messages directly related to bot conversations:
- Messages mentioning bot
- Bot's responses
- Optionally: X messages before each bot mention for context

**4. Caching & Persistence**
- Cache conversation history in memory
- Optional: Persist to SQLite/Redis for bot restarts
- Clear old conversations after inactivity

## Implementation Checklist

### Preparation
- [ ] Set up LiteLLM proxy locally or remotely
- [ ] Configure LiteLLM with desired model(s)
- [ ] Decide on MCP servers to enable
- [ ] Design system prompt strategy
- [ ] Review token limits for target models

### Code Changes

#### File: v2/bot.py
- [ ] Update imports (add `json`, improve `aiohttp` usage)
- [ ] Change environment variables:
  - [ ] `OPENWEBUI_API_BASE` → `LITELLM_API_BASE`
  - [ ] Add `SYSTEM_PROMPT` or `SYSTEM_PROMPT_FILE`
- [ ] Update OpenAI client initialization
- [ ] Refactor `get_ai_response()`:
  - [ ] Add system message
  - [ ] Convert history to proper message format (alternating user/assistant)
  - [ ] Add tool support parameters
  - [ ] Implement tool execution loop
- [ ] Refactor `get_chat_history()`:
  - [ ] Return structured messages instead of text concatenation
  - [ ] Filter for bot-relevant messages
  - [ ] Add token counting/truncation
- [ ] Fix `download_image()` to use aiohttp instead of requests
- [ ] Add tool definition functions
- [ ] Add tool execution handler
- [ ] Add error handling for tool failures

#### New File: v2/tools.py (optional)
- [ ] Define tool schemas
- [ ] Implement tool execution functions
- [ ] Export tool registry

#### New File: v2/system_prompt.txt or system_prompts.json
- [ ] Write default system prompt
- [ ] Optional: Add per-guild prompts

#### File: v2/requirements.txt
- [ ] Keep: `discord.py`, `openai`, `python-dotenv`
- [ ] Add: `aiohttp` (if not using requests), `tiktoken` (for token counting)
- [ ] Optional: `anthropic` (if using Claude directly), `litellm` (if using SDK directly)

#### File: v2/.env.example
- [ ] Update variable names
- [ ] Add system prompt variables
- [ ] Document new configuration options

### Testing
- [ ] Test basic message responses (no tools)
- [ ] Test with images attached
- [ ] Test tool calling with simple tool (e.g., get_time)
- [ ] Test tool calling with external MCP server
- [ ] Test conversation threading
- [ ] Test token limit handling
- [ ] Test error scenarios (API down, tool failure, etc.)
- [ ] Test in multiple Discord servers/channels

### Documentation
- [ ] Update README.md with new setup instructions
- [ ] Document LiteLLM proxy setup
- [ ] Document MCP server configuration
- [ ] Add example system prompts
- [ ] Document available tools
- [ ] Add troubleshooting section

## Technical Considerations

### Token Management
- Most models have 4k-128k token context windows
- Message history can quickly consume tokens
- Reserve tokens for:
  - System prompt: ~500-1000 tokens
  - Tool definitions: ~100-500 tokens per tool
  - Response: ~1000-2000 tokens
  - History: remaining tokens

### Rate Limiting
- Discord: 5 requests per 5 seconds per channel
- LLM APIs: Varies by provider (OpenAI: ~3500 RPM for GPT-4)
- Implement queuing if needed

### Error Handling
- API timeouts: Retry with exponential backoff
- Tool execution failures: Return error message to model
- Discord API errors: Log and notify user
- Invalid tool calls: Validate before execution

### Security Considerations
- **Tool access control**: Don't expose dangerous tools (file delete, system commands)
- **Input validation**: Sanitize tool arguments
- **Rate limiting**: Prevent abuse of expensive tools (web search)
- **API key security**: Never log or expose API keys
- **MCP filesystem access**: Restrict to safe directories only

### Cost Optimization
- Use smaller models for simple queries (gpt-3.5-turbo)
- Implement streaming for better UX
- Cache common queries
- Trim history aggressively
- Consider LiteLLM's caching features

## Future Enhancements

### Short Term
- [ ] Add slash commands for bot configuration
- [ ] Implement conversation reset command
- [ ] Add support for Discord threads
- [ ] Stream responses for long outputs
- [ ] Add reaction-based tool approval (user confirms before execution)

### Medium Term
- [ ] Multi-modal support (voice, more image formats)
- [ ] Per-user conversation isolation
- [ ] Tool usage analytics and logging
- [ ] Custom MCP server for Discord-specific tools
- [ ] Web dashboard for bot management

### Long Term
- [ ] Agentic workflows (multi-step tool usage)
- [ ] Memory/RAG for long-term context
- [ ] Multiple bot personalities per server
- [ ] Integration with Discord's scheduled events
- [ ] Voice channel integration (TTS/STT)

## Resources

### Documentation
- **LiteLLM Docs**: https://docs.litellm.ai/
- **LiteLLM Tools/Functions**: https://docs.litellm.ai/docs/completion/function_call
- **MCP Specification**: https://modelcontextprotocol.io/
- **MCP Server Examples**: https://github.com/modelcontextprotocol/servers
- **Discord.py Docs**: https://discordpy.readthedocs.io/
- **OpenAI API Docs**: https://platform.openai.com/docs/guides/function-calling

### Example MCP Servers
- `@modelcontextprotocol/server-filesystem`: File operations
- `@modelcontextprotocol/server-github`: GitHub integration
- `@modelcontextprotocol/server-postgres`: Database queries
- `@modelcontextprotocol/server-brave-search`: Web search
- `@modelcontextprotocol/server-slack`: Slack integration
- `@modelcontextprotocol/server-memory`: Persistent memory

### Tools for Development
- **tiktoken**: Token counting (OpenAI tokenizer)
- **litellm CLI**: `litellm --model gpt-4 --drop_params` for testing
- **Postman**: Test LiteLLM API endpoints
- **Docker**: Containerize LiteLLM proxy

## Questions to Resolve

1. **Which LiteLLM deployment?**
   - Self-hosted proxy (more control, more maintenance)
   - Hosted service (easier, potential cost)

2. **Which models to support?**
   - Single model (simpler)
   - Multiple models with fallback (more robust)
   - User-selectable models (more flexible)

3. **MCP server hosting?**
   - Same machine as bot
   - Separate server
   - Cloud functions

4. **System prompt strategy?**
   - Single global prompt
   - Per-guild prompts
   - User-configurable

5. **Tool approval flow?**
   - Automatic execution (faster but riskier)
   - User confirmation for sensitive tools (safer but slower)

6. **Conversation persistence?**
   - In-memory only (simple, lost on restart)
   - SQLite (persistent, moderate complexity)
   - Redis (distributed, more setup)

## Current Code Analysis

### v2/bot.py Strengths
- Clean, simple structure
- Proper async/await usage
- Good image handling
- Type hints in newer version

### v2/bot.py Issues to Fix
- Line 44: Using synchronous `requests.get()` in async function
- Lines 62-77: Embedding history in user message instead of proper conversation format
- Line 41: `channel_history` dict declared but never used
- No error handling for OpenAI API errors besides generic try/catch
- No rate limiting
- No conversation threading
- History includes ALL channel messages, not just bot-relevant ones
- No system prompt support

### scripts/discordbot.py Differences
- Has system message (line 67) - better approach!
- Slightly different message structure
- Otherwise similar implementation

## Recommended Migration Path

**Step 1**: Quick wins (minimal changes)
1. Add system prompt support using `scripts/discordbot.py` pattern
2. Fix async image download (use aiohttp)
3. Update env vars and client to point to LiteLLM

**Step 2**: Core refactor (moderate changes)
1. Refactor message history to proper conversation format
2. Implement token-aware history truncation
3. Add basic tool support infrastructure

**Step 3**: Tool integration (significant changes)
1. Define initial tool set
2. Implement tool execution loop
3. Add error handling for tool failures

**Step 4**: Polish (incremental improvements)
1. Add slash commands for configuration
2. Improve conversation management
3. Add monitoring and logging

This approach allows you to test at each step and provides incremental value.

---

## Getting Started

When you're ready to begin implementation:

1. **Set up LiteLLM proxy**:
   ```bash
   pip install litellm
   litellm --model gpt-4 --drop_params
   # Or use Docker: docker run -p 4000:4000 ghcr.io/berriai/litellm:main
   ```

2. **Test LiteLLM endpoint**:
   ```bash
   curl -X POST http://localhost:4000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
   ```

3. **Start with system prompt**: Implement system prompt support first as low-risk improvement

4. **Iterate on tools**: Start with one simple tool, then expand

Let me know which phase you'd like to tackle first!