Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr1128.txt
+++ b/hpr_transcripts/hpr1128.txt
@@ -0,0 +1,118 @@
+Episode: 1128
+Title: HPR1128: Compilers part4
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1128/hpr1128.mp3
+Transcribed: 2025-10-17 19:28:57
+
+---
+
+I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say, but I don't know what to say.
+Hello everybody and welcome to another riveting, riveting edition of about compilers. I guess that's the name of it about compilers. This is part four in the series on miscellaneous radio theater 4,096.
+And I know I know I said in the last episode we're going to be talking about the front end of a compiler. And now we're going to talk about the glorious world of code generation.
+Well I've got news for you, we're still going to talk about the front end of a compiler.
+There's a review we'll make a nice calculator using a lexical analyzer generator called lex and a parser generator called the act.
+Lex and the act are standard on unix. And if you don't have a chance to Zarlbina repo, the canoe versions are called lex and bison.
+Building this calculator will exercise our knowledge of lexical analysis, parsing and synthesized attributes.
+Lex takes a while containing token descriptions and outputs a complete lexical analyzer that matches one token at a time from its input and returns a token number.
+Returns a token number from an API called yylex that is meant to be called by yax parser.
+Yax takes a file containing complete grammar and something called back us in our form. Well actually not really back us in our form exactly but pretty darn close.
+I'll just call it back us from now on. This is pretty darn similar to the production rules we've talked about in the past.
+I may also refer to this as yax grammar. Yax given a proper yax grammar file outputs a complete parser for our language.
+Token description and back us nor form grammars are not the only thing these files contain of course but they're actually really similar in their format.
+We'll start with the lex. At the start of lex's input we have percent open bracket, percent close bracket.
+In these brackets you have C code that's pretty much copied directly at the beginning of lex's output.
+Then you have macros and options and then you have percent percent followed by percent percent.
+In between these percent percent you have token descriptions and corresponding C code. At the bottom of the file you have C code that's copied directly to the end of lex's output.
+Okay now for yax input file at the start you have the same percent bracket C code percent close bracket.
+After that you have token names start symbol that's the symbol of the top of our parse tree and the left right non-associative presidents of tokens.
+I should explain that. If you have an input source code of A plus B plus C there's enough ambiguity in that to have multiple parse trees to describe it.
+That's really bad. We call this an ambiguous grammar. To resolve this conflict we give left right presidents to particular tokens.
+If a token has a left presidents the symbol to its left gets resolved first. As a right presidents the symbol to the right gets resolved first.
+After that in the act file we have the same percent percent followed by percent percent.
+In between the percent percent we have back us in our form grammar and it's corresponding C code and they're synthesized attributes.
+At the bottom of the file you have C code that's copied directly to the end of yax output much like how it is in lex.
+So let's give that description. Let's define some tokens.
+The calculator needs numbers so let's have a number token. We'll call that num. The calculator also needs to do stuff.
+Let's have operation tokens plus sub and malt div.
+Since we're describing tokens first we have to write a minimal yak grammar since that's where the token names come from.
+I prefer you to figure A in the show notes. I highly recommend you look at figure A but if you're not looking at it figure A looks like this.
+Percent open bracket, percent closed bracket, percent token num plus sub malt div, percent start line, percent line, colon num, semi colon, percent percent.
+Now I know exactly what you're thinking. Seek Club, what's with start and line grammar.
+It's to produce a minimal file. Yax won't compile without a start symbol. Now if we have a start symbol we need some grammar so we insert some junk grammar.
+Let's save this whole thing to gram.y. We generate our parser like this. At the command prompt type yak space dash d space gram.y.
+What this produces is y.tab.c which is our parser and y.tab.h which is a c macro definition of our tokens.
+Now let's write a minimal lex token description. Seek figure B in the show notes. Figure B looks like this. Percent open bracket include y.tab.h, percent closed bracket, percent option, no y y wrap, percent percent percent percent.
+Lex is a lot less picky and we don't need token descriptions to compile. Percent option, no y y wrap tells lex that once we encounter it end of file that's it. There's no more input.
+We are not going to wrap to another file that's what y y no wrap indicates. Let's save all of this to lex dot y. We generate our lex planalyzer like this. At the command prompt type lex space lex dot l.
+What that produces is lex dot y y dot c which is the lexical analyzer. You'll notice that we've included y.tab.h in the beginning of the c code. That's so we can refer to our token macros.
+If you straight up try to compile our parser in lexical analyzer together like this at the command prompt again gcc lex dot y y dot c space y.tab.c
+You'll find it airs out telling us that there's no main or y y air. It's pretty simple to resolve this problem. We simply put main and y y air in our gram dot y file.
+At the bottom where c code gets copied to the bottom of our parser our main looks like this.
+It's main open print void and print open bracket y y parse open print and print semicolon close bracket.
+That starts our parser which drives our lexical analyzer with y y legs. Our function y y air looks like this void space y y air open print char space
+asterisk s close print open bracket print def syntax air close bracket. This gets called very parser encounters an air or an unexpected token stream.
+From now on we'll just output syntax air when this happens.
+Now if at the command prompt we type gcc lex y.y.c space y.tab.c we produce a dot out. That's our calculator.
+While not yet first we need to describe our tokens. Lex as a token description language is pretty easy to pick up.
+Whatever character you type is directly matched with the exception of some rule characters like asterisk.
+Our asterisk matches zero or more ours for instance. You can also nest these rules with friends and have brackets to describe character classes like open brackets 0 through 9 close bracket.
+Since our main token is number let's describe that first. Take a look at figure c.
+I'm not going to say figure c just take a look at figure c seriously because it's a log one.
+0 or 1 followed by 1 or more 0 through 9s or 0 or more 0 through 9s followed by dot or 1 or more 0 through 9s.
+There's some redundancy there so let's replace 0 through 9s with macro d.
+So now our lex input file looks something like figure d. You can see we've added brackets. That's where our corresponding c code goes.
+And that's what gets run once our description is matched. There are two ways lexical analyzer communicates with the parser. One is by returning a token number.
+The other through is through a value or a union or structure. For example we're going to say communicates through a value.
+This value is stored in a variable called yylbow and gets defined in our gram dot y file by a c macro defined yy s type.
+We'll define this as a float. The way the lexical analyzer communicates with this corresponding c code is through a character called yy text.
+Yy text contains the actual text of the matching string.
+Given all of this, the c code inserted into the brackets looks something like this.
+Scan, open print, yy text, comma, quotes, percent, f, and quotes, comma, address of yylbow and print, semicolon, return num.
+This converts yy text into a floating point number using the scan f in libc and stores it in yylbow and returns the token number num.
+Our complete lex dot l file looks like the gear e.
+Notice that sometimes we return characters. That's because the token number is just an integer that can be really anything we want.
+Some characters works just fine for token numbers.
+So now let's complete our gram dot y file.
+Most of this is already there. We just need to define left associativity to the operators as not to grade an ambiguous grammar.
+We do this with percent left plus mult subdivide.
+Then we define line as producing expression plus new line.
+Line, colon, expression, new line, open bracket, close bracket.
+Notice the brackets. This is with the c code or synthesized attributes go.
+Then we define expression as producing all manner of things.
+First thing it produces is num. You can also produce expression, multiply or plus or subtract or divide expression.
+Lastly, so we can have nested expressions. It also produces open print, expression, close print.
+Now we complete the grammar, which looks like the gear f. I'm not going to say the gear f. Just need to look at the gear f.
+Now to fill the brackets. This is where y y all val comes in.
+Each symbol in a production rule has a value associated with it.
+This value is referenced by dollar sign and some number.
+This number is the symbol number from left to right in the production rule.
+If a symbol is a terminal, that number is gotten from its y y all val.
+If it's a non terminal, it's gotten from the previous rule declarations through dollar sign, dollar sign.
+For example, take the production rule, expression produces num.
+Number is of course a terminal, so it's y y all val is referenced by dollar one.
+What expression is is referenced by dollar sign, dollar sign.
+So an expression produces num. Our synthesized attribute is dollar sign, dollar sign equals dollar sign one.
+If we follow these rules for every production run, where expression produces, we get figure g.
+Our last production rule is line produces expression num line.
+If in our parsing we come to this rule, we're pretty much finished with all computation.
+So the point of this line is to print out the final product, c figure f.
+And that is our complete calculator.
+It's a compile this whole thing to add this at the command prompt.
+Yac space dash d gram dot y, return.
+Lex space Lex dot l, return gcc, space dash o, calc, space Lex dot y y dot c, space y dot tab dot c, return.
+This compiles our calculator.
+To perform calculations, echo them into standard in of the program cult, which we just generated.
+For instance, echo quote 1 plus 2, pipe, calc.
+And there you go, and that's it. That's our calculator.
+Thanks for listening everyone.
+In the next episode we're going to talk about the back end of a compiler.
+Take care everyone. Bye-bye.
+You have been listening to Hacker Public Radio at Hacker Public Radio.
+We are a community podcast network that releases shows every weekday on every Friday.
+Today's show, like all our shows, was contributed by an HPR listener by yourself.
+If you ever consider recording a podcast, then visit our website to find out how easy it really is.
+Hacker Public Radio was founded by the digital dot pound and the economical and computer cloud.
+HPR is funded by the binary revolution at binref.com.
+All binref projects are crowd-responsive by linear pages.
+From shared hosting to custom private clouds, go to lunarpages.com for all your hosting needs.
+Unless otherwise stasis, today's show is released under a creative commons,
+attribution, share alike, lead us all license.