Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr0022.txt
+++ b/hpr_transcripts/hpr0022.txt
@@ -0,0 +1,47 @@
+Episode: 22
+Title: HPR0022: Chunk Parsing 
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0022/hpr0022.mp3
+Transcribed: 2025-10-07 10:23:11
+
+---
+
+...
+Hello everyone, welcome to another episode of Higher Public Radio. I am your host
+of Flexi and I'll be talking about parsing, specifically about chunk parsing. I will
+describe the classical chunk parsing, then I will tell you about modification suggested
+to make it more efficient. Chunk parsing is also called partial parsing. It's a robust
+parsing strategy proposed for natural language processing. An example of a chunk sentence
+is, I like foreign languages, where I like is taken as a chunk and foreign languages
+as the second chunk. The chunking corresponds to the phonology of the sentence. A chunk
+edge is created when there is a pause and there is one major stress-burst chunk. The stress
+pattern and pause pattern of spoken language is called prosody.
+Fantastically, a chunk contains one head or content word, like a verb or noun, and function
+words surrounding it. The form of chunks follows given fixed templates. Their relationship
+between chunks is not templated, but are more governed by lexical interactions. Chunks
+can move around each other as a whole, but items within a chunk cannot move within the
+chunk. The chunking process is made of two main tasks. The first is segmentation, and
+segmentation tokens are identified, and chunks are created based on the criteria described
+earlier for the chunks. Then there's the second main task, which is labeling. Labeling
+identifies the types of the words. Then the types of the chunks, such as noun phrase,
+etc. A chunk parser groups related tokens into chunks. Then combines the chunks with
+the dominant tokens, forming a two-level tree that covers the whole sentence. This tree
+is called chunk structure. Chunking rules are applied in turn until all, until all the
+rules are done with. The resulting structure is returned. When a chunking rule is applied
+to a hypothesis, it only creates new chunks that don't overlap with any previous ones.
+So if we apply two non-identical rules in reverse order, we get two different results. There
+are then rules for chunking, rules for unchunking, rules for merging, for splitting, etc. So as
+I said earlier, chunk parsing is also called partial parsing. What's the difference between
+chunk parsing and full parsing? Well, each one has its benefits and its downsides. Full
+parsing is a polynomial of degree three, whereas chunk parsing is a linear algorithm. Chunk
+parsing has a hierarchy of limited depth, whereas full parsing doesn't. But chunk parsing
+is not as awesome as it sounds, because it can have less than perfect results. Two researchers
+from Tokyo suggested an alteration to chunk parsing to make it more efficient. They suggested
+using a classical sliding window technique instead of tagging to consider all subsequences
+rather than avoid overlapping completely. They also suggested using a machine learning
+algorithm to filter sequences that are in a context free grammar. They suggested using
+a maximum entropy classifier for filtering. For more detail, look at that paper. It's
+one of the links in the show notes. That's all for tonight. Thank you for listening. This
+was Plexi with Hackebubli Gradio.
+Thank you for listening to Hackebubli Gradio. HPR is sponsored by tarot.net. So head on
+over to C-A-R-O dot-E-C for all of the team.
+Thank you very much.