342 lines
9.6 KiB
Markdown
342 lines
9.6 KiB
Markdown
# Hacker Public Radio Knowledge Base MCP Server
|
|
|
|
An MCP (Model Context Protocol) server providing access to the Hacker Public Radio (HPR) knowledge base, including episodes, transcripts, hosts, series, and community comments.
|
|
|
|
## About HPR
|
|
|
|
Hacker Public Radio is a community-driven podcast where hosts contribute content on topics of interest to hackers. All content is released under Creative Commons licenses, making it freely available for learning and sharing.
|
|
|
|
## Features
|
|
|
|
This MCP server provides:
|
|
|
|
- **Episode Search**: Search through thousands of HPR episodes by title, summary, tags, or host notes
|
|
- **Transcript Search**: Full-text search across all episode transcripts
|
|
- **Episode Details**: Get complete information about any episode including transcript and comments
|
|
- **Host Information**: Look up hosts and see all their contributions
|
|
- **Series Browsing**: Explore mini-series of related episodes
|
|
- **Statistics**: View overall HPR statistics and recent episodes
|
|
|
|
## Installation
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18 or higher
|
|
- The HPR data files:
|
|
- `hpr_metadata/` directory containing JSON files
|
|
- `hpr_transcripts/` directory containing transcript files
|
|
|
|
### Setup
|
|
|
|
1. Install dependencies:
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
2. Make the server executable:
|
|
|
|
```bash
|
|
chmod +x index.js
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Running Locally (Stdio Mode)
|
|
|
|
You can test the stdio server directly (for local MCP clients like Claude Desktop):
|
|
|
|
```bash
|
|
npm start
|
|
```
|
|
|
|
### Running as HTTP Server (Network Access)
|
|
|
|
For network access and public deployment, use the HTTP/SSE server:
|
|
|
|
```bash
|
|
npm run start:http
|
|
```
|
|
|
|
This starts an HTTP server on port 3000 (configurable via `PORT` environment variable) with:
|
|
- **SSE endpoint**: `http://localhost:3000/sse`
|
|
- **Health check**: `http://localhost:3000/health`
|
|
- Built-in rate limiting, compression, and graceful degradation
|
|
|
|
### Using with AI Tools
|
|
|
|
**Claude Desktop** (and other MCP-compatible clients):
|
|
|
|
See **[CONFIGURATION.md](CONFIGURATION.md)** for detailed setup instructions for:
|
|
- ✅ **Claude Desktop** (stdio - fully supported)
|
|
- ⚠️ **Other MCP Clients** (varies by client)
|
|
- ❌ **ChatGPT** (not supported - workarounds included)
|
|
- ❌ **GitHub Copilot** (not supported - alternatives included)
|
|
- ❌ **Google Gemini** (not supported - integration options)
|
|
- 🔧 **Custom Integration** (Python/Node.js examples)
|
|
|
|
**Quick Start (Claude Desktop)**:
|
|
|
|
Add this to your Claude Desktop configuration file:
|
|
|
|
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
|
|
**Linux**: `~/.config/Claude/claude_desktop_config.json`
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"hpr-knowledge-base": {
|
|
"command": "node",
|
|
"args": ["/absolute/path/to/knowledge_base/index.js"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Replace `/absolute/path/to/knowledge_base/` with the actual path to this directory.
|
|
|
|
**Note**: Claude Desktop currently only supports local (stdio) connections. Remote HTTP/SSE support coming in future versions.
|
|
|
|
### Using with Other MCP Clients
|
|
|
|
Any MCP-compatible client can connect to this server via stdio. The server will load all HPR data on startup and make it available through tools and resources.
|
|
|
|
## Available Tools
|
|
|
|
### 1. `search_episodes`
|
|
|
|
Search for episodes by keywords in title, summary, tags, or notes.
|
|
|
|
**Parameters:**
|
|
- `query` (string): Search query
|
|
- `limit` (number, optional): Maximum results (default: 20)
|
|
- `hostId` (number, optional): Filter by specific host
|
|
- `seriesId` (number, optional): Filter by specific series
|
|
- `tag` (string, optional): Filter by tag
|
|
- `fromDate` (string, optional): Filter from date (YYYY-MM-DD)
|
|
- `toDate` (string, optional): Filter to date (YYYY-MM-DD)
|
|
|
|
**Example:**
|
|
```
|
|
Search for episodes about "linux kernel" from 2020 onwards
|
|
```
|
|
|
|
### 2. `get_episode`
|
|
|
|
Get detailed information about a specific episode.
|
|
|
|
**Parameters:**
|
|
- `episodeId` (number, required): Episode ID
|
|
- `includeTranscript` (boolean, optional): Include transcript (default: true)
|
|
- `includeComments` (boolean, optional): Include comments (default: true)
|
|
|
|
**Example:**
|
|
```
|
|
Get details for episode 16 including transcript and comments
|
|
```
|
|
|
|
### 3. `search_transcripts`
|
|
|
|
Search through episode transcripts for phrases or multiple terms with flexible matching.
|
|
|
|
**Parameters:**
|
|
- `query` (string, optional): Phrase to search for. Useful for exact-phrase lookups.
|
|
- `terms` (string[], optional): Explicit list of terms to search for; combine with `matchMode` for logical AND/OR searches.
|
|
- `matchMode` (`'phrase' | 'any' | 'all'`, optional): How to combine `query`/`terms`. Defaults to `'phrase'`. Use `'any'` to match if any term is present, `'all'` to require every term somewhere in the transcript.
|
|
- `limit` (number, optional): Maximum episodes to return (default: 20).
|
|
- `contextLines` (number, optional): Lines of context to include around each match (default: 3).
|
|
- `hostId` (number, optional): Only return matches for this host ID.
|
|
- `hostName` (string, optional): Only return matches for hosts whose name includes this value.
|
|
- `caseSensitive` (boolean, optional): Treat terms as case-sensitive (default: false).
|
|
- `wholeWord` (boolean, optional): Match whole words only (default: false).
|
|
- `maxMatchesPerEpisode` (number, optional): Maximum number of excerpts per episode (default: 5).
|
|
|
|
**Example queries:**
|
|
```
|
|
Find transcripts mentioning "virtual machine"
|
|
```
|
|
```
|
|
Find transcripts where klaatu talks about bash or python
|
|
```
|
|
```
|
|
List episodes where host ID 123 mentions "encryption" and "privacy" (require all terms)
|
|
```
|
|
|
|
### 4. `get_host_info`
|
|
|
|
Get information about a host and their episodes.
|
|
|
|
**Parameters:**
|
|
- `hostId` (number, optional): Host ID
|
|
- `hostName` (string, optional): Host name to search for
|
|
- `includeEpisodes` (boolean, optional): Include episode list (default: true)
|
|
|
|
**Example:**
|
|
```
|
|
Get information about host "klaatu" including all their episodes
|
|
```
|
|
|
|
### 5. `get_series_info`
|
|
|
|
Get information about a series and all its episodes.
|
|
|
|
**Parameters:**
|
|
- `seriesId` (number, required): Series ID
|
|
|
|
**Example:**
|
|
```
|
|
Get information about series 4 (Databases series)
|
|
```
|
|
|
|
## Available Resources
|
|
|
|
### `hpr://stats`
|
|
Overall statistics about the HPR knowledge base
|
|
|
|
### `hpr://episodes/recent`
|
|
List of 50 most recent episodes
|
|
|
|
### `hpr://hosts/all`
|
|
List of all HPR hosts with episode counts
|
|
|
|
### `hpr://series/all`
|
|
List of all HPR series with descriptions
|
|
|
|
## Data Structure
|
|
|
|
The server expects the following directory structure:
|
|
|
|
```
|
|
knowledge_base/
|
|
├── index.js
|
|
├── data-loader.js
|
|
├── package.json
|
|
├── hpr_metadata/
|
|
│ ├── episodes.json
|
|
│ ├── hosts.json
|
|
│ ├── comments.json
|
|
│ └── series.json
|
|
└── hpr_transcripts/
|
|
├── hpr0001.txt
|
|
├── hpr0002.txt
|
|
└── ...
|
|
```
|
|
|
|
## Deployment
|
|
|
|
The HTTP/SSE server (`server-http.js`) is designed for public deployment with graceful degradation features:
|
|
|
|
### Features
|
|
|
|
- **Rate Limiting**: 50 requests per minute per IP address
|
|
- **Request Timeouts**: 30-second timeout per request
|
|
- **Concurrent Request Limiting**: Maximum 10 concurrent requests
|
|
- **Circuit Breaker**: Automatically stops accepting requests if failure rate is too high
|
|
- **Memory Monitoring**: Rejects requests if memory usage exceeds 450MB
|
|
- **Compression**: Gzip compression for all responses
|
|
- **CORS**: Enabled for cross-origin requests
|
|
|
|
### Recommended Hosting Options
|
|
|
|
#### Render.com (Recommended)
|
|
```bash
|
|
# Free tier available, $7/mo for always-on
|
|
# Auto-scaling and health checks built-in
|
|
```
|
|
|
|
#### Railway.app
|
|
```bash
|
|
# $5 free credit/month, pay-per-usage
|
|
# Scales to zero when idle
|
|
```
|
|
|
|
#### Fly.io
|
|
```bash
|
|
# Free tier: 256MB RAM
|
|
# Global edge deployment
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
- `PORT`: Server port (default: 3000)
|
|
|
|
### Health Check
|
|
|
|
The server provides a health check endpoint at `/health` for monitoring:
|
|
|
|
```bash
|
|
curl http://localhost:3000/health
|
|
```
|
|
|
|
Returns:
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"memory": {
|
|
"used": "45.23MB",
|
|
"threshold": "450MB"
|
|
},
|
|
"activeRequests": 2,
|
|
"circuitBreaker": "CLOSED"
|
|
}
|
|
```
|
|
|
|
## Development
|
|
|
|
### Project Structure
|
|
|
|
- `index.js` - Stdio MCP server (for local use)
|
|
- `server-http.js` - HTTP/SSE MCP server (for network deployment)
|
|
- `data-loader.js` - Data loading and searching functionality
|
|
- `package.json` - Node.js package configuration
|
|
|
|
### Extending the Server
|
|
|
|
You can add new tools or resources by:
|
|
|
|
1. Adding new methods to `HPRDataLoader` in `data-loader.js`
|
|
2. Registering new tools in the `ListToolsRequestSchema` handler
|
|
3. Implementing tool logic in the `CallToolRequestSchema` handler
|
|
|
|
## License
|
|
|
|
This MCP server code is released under CC-BY-SA to match the HPR content license.
|
|
|
|
The Hacker Public Radio content itself is released under various Creative Commons licenses as specified in each episode's metadata.
|
|
|
|
## Credits
|
|
|
|
- **Hacker Public Radio**: https://hackerpublicradio.org
|
|
- **MCP SDK**: https://modelcontextprotocol.io
|
|
|
|
## Contributing
|
|
|
|
Contributions are welcome! This server can be extended with:
|
|
|
|
- Advanced search features (fuzzy matching, relevance ranking)
|
|
- Tag cloud generation
|
|
- Episode recommendations
|
|
- Audio file access
|
|
- Web interface for browsing
|
|
|
|
## Support
|
|
|
|
For issues related to:
|
|
- **This MCP server**: Open an issue in this repository
|
|
- **HPR content**: Visit https://hackerpublicradio.org
|
|
- **MCP protocol**: Visit https://modelcontextprotocol.io
|
|
|
|
## Example Queries
|
|
|
|
Here are some example queries you can try with an MCP client:
|
|
|
|
1. "Find episodes about Python programming from 2023"
|
|
2. "Show me all episodes by Ken Fallon"
|
|
3. "Search transcripts for discussions about encryption"
|
|
4. "What is the Database 101 series about?"
|
|
5. "Show me recent episodes about Linux"
|
|
6. "Find episodes tagged with 'security'"
|
|
|
|
Enjoy exploring the Hacker Public Radio knowledge base!
|