Episode: 22 Title: HPR0022: Chunk Parsing Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0022/hpr0022.mp3 Transcribed: 2025-10-07 10:23:11 --- ... Hello everyone, welcome to another episode of Higher Public Radio. I am your host of Flexi and I'll be talking about parsing, specifically about chunk parsing. I will describe the classical chunk parsing, then I will tell you about modification suggested to make it more efficient. Chunk parsing is also called partial parsing. It's a robust parsing strategy proposed for natural language processing. An example of a chunk sentence is, I like foreign languages, where I like is taken as a chunk and foreign languages as the second chunk. The chunking corresponds to the phonology of the sentence. A chunk edge is created when there is a pause and there is one major stress-burst chunk. The stress pattern and pause pattern of spoken language is called prosody. Fantastically, a chunk contains one head or content word, like a verb or noun, and function words surrounding it. The form of chunks follows given fixed templates. Their relationship between chunks is not templated, but are more governed by lexical interactions. Chunks can move around each other as a whole, but items within a chunk cannot move within the chunk. The chunking process is made of two main tasks. The first is segmentation, and segmentation tokens are identified, and chunks are created based on the criteria described earlier for the chunks. Then there's the second main task, which is labeling. Labeling identifies the types of the words. Then the types of the chunks, such as noun phrase, etc. A chunk parser groups related tokens into chunks. Then combines the chunks with the dominant tokens, forming a two-level tree that covers the whole sentence. This tree is called chunk structure. Chunking rules are applied in turn until all, until all the rules are done with. The resulting structure is returned. When a chunking rule is applied to a hypothesis, it only creates new chunks that don't overlap with any previous ones. So if we apply two non-identical rules in reverse order, we get two different results. There are then rules for chunking, rules for unchunking, rules for merging, for splitting, etc. So as I said earlier, chunk parsing is also called partial parsing. What's the difference between chunk parsing and full parsing? Well, each one has its benefits and its downsides. Full parsing is a polynomial of degree three, whereas chunk parsing is a linear algorithm. Chunk parsing has a hierarchy of limited depth, whereas full parsing doesn't. But chunk parsing is not as awesome as it sounds, because it can have less than perfect results. Two researchers from Tokyo suggested an alteration to chunk parsing to make it more efficient. They suggested using a classical sliding window technique instead of tagging to consider all subsequences rather than avoid overlapping completely. They also suggested using a machine learning algorithm to filter sequences that are in a context free grammar. They suggested using a maximum entropy classifier for filtering. For more detail, look at that paper. It's one of the links in the show notes. That's all for tonight. Thank you for listening. This was Plexi with Hackebubli Gradio. Thank you for listening to Hackebubli Gradio. HPR is sponsored by tarot.net. So head on over to C-A-R-O dot-E-C for all of the team. Thank you very much.