Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Lee Hanken
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions

157
hpr_transcripts/hpr2944.txt Normal file
View File

@@ -0,0 +1,157 @@
Episode: 2944
Title: HPR2944: ONICS Basics Part 4: Network Flows and Connections
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2944/hpr2944.mp3
Transcribed: 2025-10-24 13:40:00
---
This is HPR Episode 2944 for Thursday the 14th of November 2019. Today's show isn't
titled, Onyx Basics Part 4. Network flows and connections. It's part of the series
networking and it's the 10th anniversary show of Gabriel Ebenfire. It's about 16 minutes
long and carries a clean flag. The summary is, I try to add a bit more network basic info
while writing a script for Dave Morris. This episode of HPR is brought to you by AnanasThost.com.
Get 15% discount on all shared hosting with the offer code HPR15. That's HPR15.
Better web hosting that's honest and fair at AnanasThost.com.
Hello hacker public radio. This is Gabriel Ebenfire. I decided to do a quick episode in this
series on command line networking based on a comment Dave Morris made in the community news
a month or so back. Dave mentioned that he was doing some monitoring of his home network using
the Onyx tool suite. So I thought I'd whip up a quick script to show another way to use the tools
to monitor traffic. Unfortunately my enthusiasm kind of spiraled out of control. But before we get
to that, in the spirit of interweaving network basics in this series, let's pause for a second
and talk about two new concepts in networking namely connections and flows. I mentioned in the
previous episode that modern networks are packet switched and showed how we can send an individual
packet from one machine to another. But actual software doesn't want to worry about the details
of breaking files or data that they exchange into limited size packets. Furthermore they often
need to exchange data with multiple remote programs at the same time or sometimes communicate
with the same remote program using multiple channels. In order to support these needs,
programs use another layer of abstraction on top of the protocols for basic data exchange.
They create connections. A connection is a bi-directional channel between two programs.
The programs linked by a connection may be running on the same computer but are more commonly
located on different computers connected to the internet. In simpler terms, you can think of opening
a connection like making a phone call. The program initiating the call is called the client and the
program receiving the call is called the server. In order for the client to open a connection,
it must first decide which protocol it will use to talk to the server. Next, the client needs to
identify the address of the computer running the server program. As mentioned in the last episode,
a network address is a string of bytes that uniquely identifies a node in the network or more
precisely the interface that a node uses to connect to the network. Besides the computer's
address, the client also needs some way of identifying the specific program on the remote computer
that it wants to talk to. After all, a computer may be running multiple programs that are all using
networking. The most commonly used network protocols use another kind of address specific to programs
on the computer. These special program addresses are usually called ports. Like a network address,
a port is just a number but it identifies a specific program or service on a given computer.
Once the client knows the address in port of a server program, the client also allocates its own
local port on the client's computer so that the server knows how to send its own data back to the
client. Once it has all of this, it's ready to start sending packets formatted according to the
protocol that's selected to initiate the connection with the server. Now, there are multiple protocols
out there that run on top of the internet protocol that establish connections. The most well-known
is TCP but there are others like SCTP and DCCP and so on. There are also protocols like UDP that
don't specifically establish connections but nevertheless often use connection-like mechanisms
such as ports for communication. Even though so-called datagram protocols like UDP don't have
connections per se, programs using UDP still often work by having a client send a request to a server
listening on a port and the server responding back with its own data to the client's port.
And some folks in the networking world would call that request response transaction a connection
even though the programs didn't actually set up any long-term association between each other.
Okay, so those are connections and ports. Now, let's talk about the concept of flows.
A flow is a set of network traffic that should all be treated in a common way.
A flow can entail a large set of traffic between many computers and many programs or
it can just contain the traffic traveling from a program on one machine to a program on another machine.
Sometimes a flow defined at that fine level of granularity is called a microflow.
Note that the direction may actually matter. If we are saying that a particular flow is just the
data sent from one program to another program, then a connection actually may be made up of two
microflows, one for direction from client to server and another for traffic from server to client.
I guess the most important thing to remember about the term flow in networking is that it can
encompass whatever a network administrator or a program wants it to mean.
The key idea is that it's a way of grouping data together to treat it in the same way.
Okay, so much for terminology. Now, let's bring all of this back to the little networking
monitoring exercise that I was doing for Dave. So I started with a simple script that goes like this.
It's runs the commands packet in PKTIN, ETH0, PIP, NFTRK-D.
Now, what that command does is read in packets from the network and pass them to the network flow
tracker program, which is NFTRK. With the Dash D option, this program drops all incoming packets,
but prints out events occurring on network flows. In the case of NFTRK, it treats each individual
connection as its own flow. It also treats unanswered streams of packets from a given source as a flow.
Now, the events that NFTRK prints look something like this. A connection from my laptop to Google
just started, or a connection from my smartphone to Facebook just ended and it used 30 megabytes of
bandwidth total, or a connection from my workstation to my network attacks attached storage is still
running. It's been running for 30 minutes and it's used 50 megabytes of bandwidth. So we get flow
starts, flow ends, and just updates on the status of existing flows. Each flow event from NFTRK
goes on its own line, which is handy when you are working with command line tools, text manipulation
tools, and Linux, or Unix in general. So I thought that this would be, okay, a good way now, instead of
looking at individual packets and going, oh, I wonder what this packet is, and I wonder what this
packet is. I thought this might be more useful for somebody in some interested networking explorer
than the stream of packets because it gives a little higher level sense of what's going on in the
network. And I should have probably stopped there, but I thought to myself, okay, that's cute and
all, but it isn't pretty to look at because you just get this continuous stream of events.
Many of them pertained to the same flow. So I added a bit more shell scripting. I took the same
command and I sent piped it's data to a redirected it's data out to a file. And then I used the
sort command to sort the events by the combination of protocol addresses, of protocol, network
addresses, and ports that were used in the connection. Sometimes you will hear networking folks refer
to this as examining the connection's five tuple. The five elements of the tuple are the network
protocol, the client address, the server address, the client port, and the server port.
And those uniquely identify a given connection. Okay, so if I've sorted all of the events based on
five tuple, then all of the events for a given connection show up in adjacent lines in the file.
So now with a small little ox script, we can take this set of events and compress it down to
one event per connection. And what we really just want is the last event scene for any given
connection because it has all the information from all the previous events. Okay, so we took the
the set of events and we sorted it by the five tuple and we squashed all of the events down just so
that we see the last event for each connection. I wanted instead of using AUK for finding the unique
events, I wanted to use the unique command, since that's a nice unique command line tools,
but it only works on whole lines and I needed it to work on only part of the line, the part that
had the five tuple. I thought about using the dash u option to sort, which also does the same thing,
but I could only get that to work, I could only get that to save the first event that would show up
instead of the last. So I ended up with the little AUK magic instead. Okay, so now what would be
coming out of this little script would be one line of text per connection. Sounds good.
But the information is still pretty dense and it looks like a pile of numbers. So,
okay, I can use another little AUK script to start pretty printing the data and converting the
numbers to common names. Like for example, instead of saying that this connection uses IP protocol
6, it would say this connection uses TCP, which is IP protocol number 6, right? Now the data looks a
bit better, but when we think of sites, we don't usually think of them as random strings of numbers like
200.40.82.157. We think of them in terms of names like hackerpublicradio.org.
Fortunately, there's a little command line utility called dig that can convert addresses to names
or vice versa. So, with a little more scripting, now the pretty printer can change addresses to names
if it can find them. And to make sure that I'm not spamming the network with these
reverse lookups from addresses to names, I added a bit of caching into the script as well.
Okay, cool. But hey, instead of just looking at flows that were captured over some period of time,
how about we look at what's happening in real time? Actually, that's not much of a modification.
We can do that. So, all right, first, we'll leave the flow event capture running in the background,
and next we'll make the shell script keep running over the file of flow events in a loop once per second.
But just to make it easier, we'll have the script only look at the last 100 events that it has
seen using the tail command. Then, after we've sorted and squashed the events by five
tuple and pretty printed them, we can use the head command to say only look at the first 20.
So, it fits in our nice 80 by 24 character terminal. Yeah, I'm a kind of old fashioned that way.
While we're at it, let's every second, we're going to get, say, the top 20 flows. Let's just clear the
screen using the clear command so that it looks like we just have one continuous list of active
flows that changes every second or so. Let's add one more feature. Let's sort the flows by
how much data they've each consumed on the network. To do that, we just have to pipe the flow
output that we have so far through another sort command, and this time we'll base it on the field
indicating the number of the bytes the flow has used until it to sort in descending order.
Okay, so what do we end up with? Well, a small set of scripts that basically build up something
that looks like the top program, but for network connections instead of processes.
I decided to name it topflows.sh. Every second, it shows the top 20 flows that we've captured
and their data usage, and how long they've been running. Okay, I finally decided I could stop there.
Now, a few quick self-criticisms. First, there are already programs that do this sort of thing,
like n top and Linux or pftop and OpenBSD, and they probably do it better than these scripts.
Second, I probably could have done a better job with the data manipulation in less lines of code
using a language like purl, python, or go. But in my defense, this was just a quick proof of
concept done on a Saturday afternoon to see if I could. Also, my OpenBSD firewall that I wanted to
test it on is memory constrained, and it doesn't have a purl or python interpreter or a go compiler.
Besides, it was fun. Okay, so that was the ride. I clearly need to stop taking up challenges based
on random comments in response to my episodes. Or maybe I don't. If you think not, then
throw out some random comments in an episode or in the comments section, and let's see what happens.
Anyway, I hope you enjoyed hearing a little more about networking and the concepts behind it,
and I hope this podcast will inspire folks to dig a little deeper into the technology we use every day.
You can find all of the code that I mentioned in this podcast on GitLab at
https, colon slash slash GitLab.com slash Onyx slash Onyx dash examples. And remember Onyx is spelled
ONICS for Open Network Inspection Command Suite. And this code is under the top flows sub directory
in that Onyx examples repository. Now, let's just see what this script is showing is going on
on our network here. Hey, wait a second. My daughter should be asleep. Why are there connections
from her tablet going to Netflix? Okay, okay, gotta cut this off now. Bye folks, have a good one.
Gabriel even fire signing off.
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast and click on our contributing,
to find out how easy it really is. Hacker Public Radio was founded by the digital dog pound
and the infonomicon computer club. And it's part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise stated, today's show is released on the
creative comments, attribution, share a life, 3.0 life zones.