Files
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

48 lines
3.4 KiB
Plaintext

Episode: 22
Title: HPR0022: Chunk Parsing
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0022/hpr0022.mp3
Transcribed: 2025-10-07 10:23:11
---
...
Hello everyone, welcome to another episode of Higher Public Radio. I am your host
of Flexi and I'll be talking about parsing, specifically about chunk parsing. I will
describe the classical chunk parsing, then I will tell you about modification suggested
to make it more efficient. Chunk parsing is also called partial parsing. It's a robust
parsing strategy proposed for natural language processing. An example of a chunk sentence
is, I like foreign languages, where I like is taken as a chunk and foreign languages
as the second chunk. The chunking corresponds to the phonology of the sentence. A chunk
edge is created when there is a pause and there is one major stress-burst chunk. The stress
pattern and pause pattern of spoken language is called prosody.
Fantastically, a chunk contains one head or content word, like a verb or noun, and function
words surrounding it. The form of chunks follows given fixed templates. Their relationship
between chunks is not templated, but are more governed by lexical interactions. Chunks
can move around each other as a whole, but items within a chunk cannot move within the
chunk. The chunking process is made of two main tasks. The first is segmentation, and
segmentation tokens are identified, and chunks are created based on the criteria described
earlier for the chunks. Then there's the second main task, which is labeling. Labeling
identifies the types of the words. Then the types of the chunks, such as noun phrase,
etc. A chunk parser groups related tokens into chunks. Then combines the chunks with
the dominant tokens, forming a two-level tree that covers the whole sentence. This tree
is called chunk structure. Chunking rules are applied in turn until all, until all the
rules are done with. The resulting structure is returned. When a chunking rule is applied
to a hypothesis, it only creates new chunks that don't overlap with any previous ones.
So if we apply two non-identical rules in reverse order, we get two different results. There
are then rules for chunking, rules for unchunking, rules for merging, for splitting, etc. So as
I said earlier, chunk parsing is also called partial parsing. What's the difference between
chunk parsing and full parsing? Well, each one has its benefits and its downsides. Full
parsing is a polynomial of degree three, whereas chunk parsing is a linear algorithm. Chunk
parsing has a hierarchy of limited depth, whereas full parsing doesn't. But chunk parsing
is not as awesome as it sounds, because it can have less than perfect results. Two researchers
from Tokyo suggested an alteration to chunk parsing to make it more efficient. They suggested
using a classical sliding window technique instead of tagging to consider all subsequences
rather than avoid overlapping completely. They also suggested using a machine learning
algorithm to filter sequences that are in a context free grammar. They suggested using
a maximum entropy classifier for filtering. For more detail, look at that paper. It's
one of the links in the show notes. That's all for tonight. Thank you for listening. This
was Plexi with Hackebubli Gradio.
Thank you for listening to Hackebubli Gradio. HPR is sponsored by tarot.net. So head on
over to C-A-R-O dot-E-C for all of the team.
Thank you very much.