Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
389
hpr_transcripts/hpr0014.txt
Normal file
389
hpr_transcripts/hpr0014.txt
Normal file
@@ -0,0 +1,389 @@
|
||||
Episode: 14
|
||||
Title: HPR0014: Databases 101 Part 2
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0014/hpr0014.mp3
|
||||
Transcribed: 2025-10-07 10:19:26
|
||||
|
||||
---
|
||||
|
||||
In the next video...
|
||||
It's the time
|
||||
Another companionship
|
||||
Hello world and welcome to After Public Radio.
|
||||
I am Stank Doug and on today's episode we are going to continue our in-depth series, our
|
||||
many series on databases, talking about from the ground up how they work, the fundamentals
|
||||
and basics and working our way up to some more advanced topics.
|
||||
This is episode two in the series so if you are just joining us I highly suggest you
|
||||
go back and find episode one.
|
||||
You can find that at hackerpublicradio.org and you can check to see that and a lot of
|
||||
other great episodes on great topics but today we are going to continue with some relational
|
||||
database management system topics and the first episode we did talk about databases in
|
||||
general and touched on the idea of relational databases but we didn't get too much in
|
||||
the terminology.
|
||||
I wanted to take a break and let the first part of that kind of sink in and make sure we
|
||||
are all on the same page and today we are going to talk about specifically our DBMS systems
|
||||
or relational database management systems which is the most common and most prevalent type
|
||||
of database management system out there.
|
||||
The relational database management system is based on something called a relational model.
|
||||
Obviously that is where the name comes from but what is a relational model?
|
||||
One of the things we are going to do in this series is try to break down a lot of this
|
||||
terminology.
|
||||
I do think sometimes it is silly and overkill to make up all these puzzle words and terminology
|
||||
to try to explain something that really can be much simpler.
|
||||
What they mean by a relational database management system and a relational model is simply that
|
||||
your data inside of this management system has relationships between each other.
|
||||
Now if you have just a text file as we talked about in the first episode of this series,
|
||||
if you just have a text file you don't really need a database system.
|
||||
If you just have a list of names and phone numbers or something very basic but as soon
|
||||
you add any level of complexity in there that you think that you are going to need something
|
||||
to maintain large volumes and numbers of data or a large amount of data, that is when
|
||||
you are going to want some sort of database management system and again relational database
|
||||
management system being the most common.
|
||||
The easiest way probably to visualize how data is stored in a relational database management
|
||||
system which we are going to refer to as RDBMS from now on, envision a spreadsheet.
|
||||
Any data stored in a relational or an RDBMS is stored in a two-dimensional array, aka
|
||||
a spreadsheet.
|
||||
Now, not a spreadsheet like Microsoft Office Excel or OpenOffice.org or any of those
|
||||
type things.
|
||||
It is easy to visualize that way but it is not an application.
|
||||
It is a two-dimensional array.
|
||||
That is to say that there are a certain number of rows and a certain number of columns and
|
||||
they make up something called a table.
|
||||
A table is the primary way that data is stored in an RDBMS so you can have a table that
|
||||
is a list of people's name and phone number.
|
||||
For example, again, you probably wouldn't use it if you are using a simple example like
|
||||
that but if you are using something a little bit more complex like say you had an online
|
||||
website that was taking orders.
|
||||
Well, if you think about that, you need a lot of information to take orders.
|
||||
You need people's name and address.
|
||||
You need a lot of their billing information.
|
||||
You need a catalog with what you are selling and the prices of those items, maybe descriptions,
|
||||
pictures, all kinds of data.
|
||||
If we go back to what we talked about in episode one, you could theoretically put all that
|
||||
in one big line in a text file somewhere but as soon as you reach any quantity of records,
|
||||
it is going to get confusing.
|
||||
There is going to be a lot of repetition.
|
||||
If somebody comes in and places five orders, they are going to have five records with the
|
||||
same information all of it.
|
||||
It is a lot of wasted space, inefficient accessing of the data.
|
||||
That is when an RDBMS is going to come in.
|
||||
What you do in an RDBMS is break down your data.
|
||||
There is something called normal form, normalization and what you are going to do in the normalization
|
||||
process is you are going to sit down and analyze the data that you have and you are going
|
||||
to break it down into a series of tables or spreadsheets if you want to think about that
|
||||
anyway.
|
||||
You can even sit down and I do this all the time for a lot of the projects that we work
|
||||
on in Binrib, sit down even on a napkin or a piece of paper or there are plenty of tools,
|
||||
gooey tools that you can use to do this on a computer.
|
||||
Sit down and kind of design and think about all the different data that you are going
|
||||
to have and going to be tracking.
|
||||
You have to do a lot of planning and preparation, it is not something you kind of do off the
|
||||
cup.
|
||||
You actually have to sit down and do some analysis and decide what data you are going to
|
||||
need in your particular system and then how can you break that data up logically and
|
||||
that process is called normalization.
|
||||
Normalization in layman's terms is to break down your data so that there is no repetition
|
||||
that data all works together and cooperates and is reliable and makes it easy to maintain
|
||||
and read.
|
||||
Now there are different types of normalization, different levels I should say of normalization
|
||||
that goes up to fifth and sometimes people say six normal form.
|
||||
We are not going to emphasize that now we might come back to that later on in another episode
|
||||
but for now just think that you want to get your data spread out in such a way that let's
|
||||
go back to the orders example that I used earlier.
|
||||
You don't want one big text file with all of that data stored on there.
|
||||
What you want to do is break that up into smaller logical segments so that it is easier
|
||||
to maintain.
|
||||
In one table or one text file or however you is easier for you to envision it, you want
|
||||
to break that data into one entity.
|
||||
You do not want to put all that in one entity.
|
||||
You want to break that up so that it is in several different tables or entities logically
|
||||
broken down.
|
||||
For example, if you are running an online store, one bit of data that you are going to
|
||||
have to have is your product line and information about it.
|
||||
You are going to have a product name, maybe the size, the color, the price, things like
|
||||
that.
|
||||
Anything that has to do with that product, you probably want to separate that off into
|
||||
its own database or excuse me into its own table inside of your database.
|
||||
That way as you add new products to it, they only get added to this one table.
|
||||
It is easily updated.
|
||||
There is no confusion.
|
||||
It is all in one place and one logical format.
|
||||
If you ever wanted to do a report on a list of all the items that you have for sale, you
|
||||
could do that very easily by just putting the data out of this table.
|
||||
That data really is somewhat irrelevant of the other parts of your system.
|
||||
Another part of your system of course would be the names of people who place orders.
|
||||
Your customer list would probably be a separate entity because the customers are independent
|
||||
of the products.
|
||||
You probably would have a separate table when someone comes to your website and decides
|
||||
to order something, they are going to have to register and give you their name, address,
|
||||
phone number possibly, maybe credit card number could go in there.
|
||||
Again, I am just making this up as we go along.
|
||||
You would have to do an analysis on your own system to decide what best fits for you.
|
||||
Then you might have a third one, a third table for the actual order of some sales.
|
||||
This table one and table two is a list of products and a list of customers respectively,
|
||||
but how do those two go together?
|
||||
This customer may order items one, five, and seven, but this next customer may order
|
||||
items one, two, and four, and how do you make those things work together?
|
||||
Well you probably would have a third table when a customer places an order that ties those
|
||||
two things together.
|
||||
You see that these tables while independent in nature by the nature of their data, customer
|
||||
table, product table, you can see that those are independent by the logical breakdown of
|
||||
their data.
|
||||
You see that they actually do have relationships and there is where that word comes back
|
||||
into play.
|
||||
They do have relationships between those tables and that is where you are also going to sit
|
||||
down in your analysis and decide how you are going to store that data.
|
||||
I guess to put it in a question form to make you think about it, what would you do or
|
||||
how would you find out if customer number four, what items did he order?
|
||||
Well it is not quite so simple as pulling only the ones where that customer's name matches
|
||||
the order.
|
||||
He may have multiple orders.
|
||||
You got to be careful that someone else in there may not have ordered the same thing
|
||||
that you are pulling the right data.
|
||||
So all those relationships are important in the design and layout of your database.
|
||||
I know this is kind of tricky to visualize via audio only on a radio show, but I am
|
||||
trying to break this down into the most simplistic terms that I can.
|
||||
So what you should be envisioning now in front of you are three different spreadsheets in
|
||||
the example I gave you.
|
||||
A three different tables is the proper term, table of customers, table of products, and
|
||||
then a table, third table that ties those together called orders.
|
||||
So when a customer comes to the site, the site is going to pull from the products table
|
||||
and give them a list of products as they page through them.
|
||||
You can imagine Amazon and all the products that they have is pulling out of those products
|
||||
tables and displaying them on the screen.
|
||||
If you register, it's going to create a record for you in the customer's table.
|
||||
And if you ever place an order, it's going to insert a record then into your orders table
|
||||
or their orders table.
|
||||
Now, that's very simplistic because a whole lot going on besides that, but you can see
|
||||
that process.
|
||||
And when you write that order in there, it's going to have your name, or probably some
|
||||
sort of numeric reference to your name to make it a little bit quicker.
|
||||
For ease of thought here, it's going to have your name, and then it's going to have the
|
||||
product name that you bought.
|
||||
Now it doesn't have to have the price, or the color, or the size, or any of that necessarily,
|
||||
depending on how you lay it out of size as an option, you might have to put that in there.
|
||||
But actually, by putting in just the name of the customer and the name of the product,
|
||||
it's tying the data from those two together and it can read those tables to pull the rest
|
||||
of the data.
|
||||
So it knows what color you ordered.
|
||||
It knows what size you ordered because it pulls it from that table.
|
||||
And those are the relationships in a relational database management system.
|
||||
So I've used the word table.
|
||||
Let's break down and analyze what exactly is a table.
|
||||
I described as a spreadsheet earlier.
|
||||
So if you've used Excel or any kind of spreadsheet application, you know across the top, you
|
||||
have several columns.
|
||||
And each one of those columns, I think by default, most spreadsheet applications will
|
||||
say A, B, C, D, E, F, et cetera, et cetera.
|
||||
Well, you have all the columns at the top and on the left, you have rows.
|
||||
And that's just simply how many rows of data that you have.
|
||||
So you may have a list of five customers.
|
||||
If you want to visualize, imagine across the top, what are you going to put in column A?
|
||||
Customer name, for example.
|
||||
And column B might be street address.
|
||||
Column three might be state four, city five, zip code, et cetera, et cetera.
|
||||
So each one of those columns has a different type of data in it.
|
||||
And imagine it, or even if you want to while we're listening to this type then, so you
|
||||
can visualize it.
|
||||
But you see that each one of those columns has a different type of data in it.
|
||||
You don't want to mix in that.
|
||||
You don't want to on your next row.
|
||||
Oh, well, I'll put the address in column A this time.
|
||||
And the name over in column C, logically, who does that when they put in a spreadsheet?
|
||||
You have it all nice and neat, so it's readable and easy to understand the data that's
|
||||
in there.
|
||||
Well, the same holds true when you're putting data into a database.
|
||||
So as a matter of fact, that's one of the benefits of a database to keep everything
|
||||
nice and organized.
|
||||
So you'll have a customer table.
|
||||
And each one of those columns that we refer to in a spreadsheet, the proper name and
|
||||
database terminology for that, those are your field names.
|
||||
You're going to call a unique field, for example, customer name, or C name, or whatever
|
||||
you want to call it.
|
||||
Just plain old name, whatever you call it.
|
||||
As long as it's unique, you can call those fields, whatever you want.
|
||||
You can have as many fields as you deem necessary.
|
||||
Again, the relational database management systems have, there is some theoretical limit, but
|
||||
it's so ridiculous that nobody probably listening to this show is ever going to run into a situation
|
||||
where they don't have enough room or the database management system doesn't allow them to
|
||||
have enough fields.
|
||||
So you're going to create a bunch of fields and you're going to define those field data
|
||||
types.
|
||||
For example, if the field you decide to call it customer name, or customer or underscore
|
||||
name, or something like that, you're going to define it as a data type of what kind of
|
||||
data is alphanumeric, which is, and I'm not going to go into the terms, but there's different
|
||||
ways that you define that.
|
||||
You say, okay, well, this is going to be an alphanumeric field, and it's going to be a
|
||||
length of, I don't know, 50, or however long you think the longest name you might run into.
|
||||
Always error on the side of caution, and there are some database tricks and data types
|
||||
that you can use that are variable length.
|
||||
We'll come back to that in a later episode, but for now, just understand you're going
|
||||
to define the data type.
|
||||
One of the other fields we mentioned is price, for example, well, price is a numeric
|
||||
data type.
|
||||
You're never going to put in a BC for a price, it doesn't make sense.
|
||||
So you probably define that field as a numeric type.
|
||||
So you put all that in there, just like you would in the spreadsheet, but you're going
|
||||
to put these into a table table, you're going to define all those data types and the names
|
||||
of them, and then every time you put a record into that data, a record is equivalent to saying
|
||||
a row in your spreadsheet visualization.
|
||||
So you're thinking of a spreadsheet, and let's say you're manually taking orders over
|
||||
the phone and in a spreadsheet, and in Microsoft Excel, you type in all that information every
|
||||
time someone takes an order, what a nightmare that could be.
|
||||
You type in one row of information about somebody that is a record in database terms.
|
||||
That is one row across filling in all, or most of the fields that you define at the top
|
||||
of the name, the address, the phone number, et cetera.
|
||||
And certain information, of course, is required.
|
||||
You can't process someone's order if they don't give you a credit card number, for example,
|
||||
or if they don't give you a shipping address.
|
||||
So all these things factor in there as well.
|
||||
We're going to come back to some database design, probably in a later episode as well.
|
||||
But right now, you should have, visually, in your mind, several spreadsheets, aka tables,
|
||||
and we're going to, from now on, get in the habit of calling them tables, because that
|
||||
is the proper name.
|
||||
You can visually think a spreadsheet, but there are two dimensional arrays called tables.
|
||||
The tables have all of your fields defined in them, which again is across the top for
|
||||
a visualization aspect.
|
||||
And then every time you have a complete row of data in there, a rows are referred to as
|
||||
records.
|
||||
So this is all your data stored in there.
|
||||
So let's imagine you've been a business for a month, and you have a database filled
|
||||
with at least those three tables I mentioned, a customer's table, a product's table, and
|
||||
an order's table.
|
||||
So as you add new products to your store, all you have to do is add that product into
|
||||
your product's table.
|
||||
As new customers come in, it's going to add a new row to your customer table, and it's
|
||||
going to continually grow, and this is where relational database management system shine
|
||||
is the more data you put in there, they can still read and write that data very efficiently.
|
||||
So that's one of the good benefits of our GVMS.
|
||||
The other thing we probably want to talk about on this show is we mentioned these tables
|
||||
and having relationships together, well, I'm going to leave, this is kind of a very deep
|
||||
topic, but it's critical, it is the most important thing in understanding databases and relational
|
||||
databases is the concept of keys.
|
||||
We talked about these tables having relationships between each other, but how we kind of glossed
|
||||
over that and intentionally so, well, that database has to relate to each other somehow,
|
||||
and it has to be in design in such a way that you can look data up between the two of them.
|
||||
If the data does not have relations between them, if they do not have relationships between
|
||||
them, then again, you might not need a database management system or you might not need these
|
||||
all together in the same database itself.
|
||||
The whole point is to have relationships between that data.
|
||||
So again, we go back to the products table and a customer's table, well, how do you tie
|
||||
those two together?
|
||||
Well each time you insert a table into a database, you want to, you don't have to, but you
|
||||
want to, to take advantage and properly use databases, you want to define some unique
|
||||
identifier for every table.
|
||||
And that's what normalization that we refer to at the beginning, normalization, you don't
|
||||
want to repeat any data anywhere.
|
||||
You want to try to make every piece of data only once in the database because ASSafe space
|
||||
be, you don't want to have to update multiple locations anytime something changes.
|
||||
What if the color of one of your products changes?
|
||||
If you've got it just in your products table, there's only one place that you can change
|
||||
it.
|
||||
But if it's also over here in the order table or in some other descriptions tables, you've
|
||||
got this independent and not tied back to that, you can be updating data in a bunch of
|
||||
different places and you don't have data integrity anymore, you don't have your data maintained,
|
||||
you may forget to change it in a bunch of other places.
|
||||
So what you want to do in these tables is logically break them down in such a way where you
|
||||
define something called a primary key.
|
||||
And a primary key on a, on let's say our customer database, for example, it could be a name
|
||||
and we're going to go with name for right now, a name, some unique way to look up information
|
||||
in that table.
|
||||
In that table, if I've got a table with 500 people in there, I only want to find one particular
|
||||
customer.
|
||||
How do I do that?
|
||||
Well, in the last, in the first episode, we talked about if you had a text file and all
|
||||
that data in there and you have to go searching through it to find one unique name, one person,
|
||||
it's going to be difficult.
|
||||
You don't have to parse through that entire thing and all this extraneous data that you really
|
||||
don't need.
|
||||
Well, in an RDBMS, the data is stored in such a way that you can search much quicker and
|
||||
much more efficiently, you define a primary key, which is indexed so that when you go looking
|
||||
for something, it's going to go right to that data infinitely so, so much faster than
|
||||
writing directly to a text file or some other format.
|
||||
So if you only wanted to look up, I don't know, David Letterman, just see what he's ordered
|
||||
from your site.
|
||||
I don't know why that name popped into my head, but it did.
|
||||
So you go in there and you search for that name.
|
||||
It's a unique ID and the example that we're using here.
|
||||
And you can say, okay, there he is.
|
||||
Here's his address, phone number, so you went straight to him.
|
||||
Believe it or not, that's actually a very bad idea.
|
||||
But if your actual user that you were looking up was John Smith or some common name, you
|
||||
could have multiple John Smiths with different addresses, they're completely different people.
|
||||
So when you define your primary key, you have to be very careful, you have to come up with
|
||||
some unique identifier to be able to tell that.
|
||||
Back to your products table, for example, you might not be able to use a product name
|
||||
as your primary key.
|
||||
But if the name product name was antivirus, well, there's several different antivirus
|
||||
software packages.
|
||||
How are you going to tell one from another?
|
||||
You're going to have to come up with either a longer, more unique name every time, or
|
||||
more than likely what's going to happen and what most places do is they'll use a numeric
|
||||
system, whether it's auto numbers, starting at one, to uniquely identify every product
|
||||
in there.
|
||||
Maybe assign some sort of code number, maybe something that's stored in a third-party
|
||||
database, so AV1 for the first antivirus product, AV2 for the second antivirus product.
|
||||
Over in the customer's table, think of some unique identifier that you could use to uniquely
|
||||
identify a person.
|
||||
The most common one that we all know is probably social security number, and there's a lot
|
||||
of controversy over using that number.
|
||||
It is used very frequently as unique identifier because theoretically, it's unique to every
|
||||
person in the United States.
|
||||
They all have a social security number, but that may not be safe to use, first of all.
|
||||
If you're dealing with international, you might not have a social security number, so that
|
||||
becomes a pitfall.
|
||||
Really, what I'm getting at here is part of the analysis that you're going to do when
|
||||
you lay your tables out is to come up with something unique that you can identify and
|
||||
individual row.
|
||||
You have to be able to do that in our DVMS.
|
||||
You have to somehow come up with a way where you can find one row, one record of data in
|
||||
there.
|
||||
I think that's probably a good place to leave you thinking until we come back to another
|
||||
episode and we'll follow up on this, but how would you break something down?
|
||||
How would you uniquely identify a person?
|
||||
That's probably the question I'm going to leave you with with this episode.
|
||||
How would you uniquely identify with a person in any scenario or in any database?
|
||||
How do you think and use it in reference to the application?
|
||||
How does the United States government uniquely identify a person's social security number?
|
||||
One of the first things that jumps to mind.
|
||||
How does your state uniquely identify you in their database?
|
||||
They could use the social security number, but technically, they should not.
|
||||
What do you think they might use?
|
||||
What about your school or your college or university or even high school?
|
||||
How do you think they uniquely identify you?
|
||||
They again could use the social security number, which is universally unique in the United
|
||||
States, again, theoretically, but they shouldn't be using that either.
|
||||
So what might they use?
|
||||
What's different that could be unique for them to identify one individual person?
|
||||
So that's a good question to leave you with this time.
|
||||
I'll do a quick recap on some of the important terms and things that we brought up this episode.
|
||||
We talked about RGBMS, relational database management system, and how they have relationships.
|
||||
The key word in that is relationships.
|
||||
Your data has to have relationships with the other data in your database.
|
||||
Otherwise, there's no point in having it or no point in using it.
|
||||
If you've just got one list of data, then there's no need in using an RGBMS.
|
||||
But if you do decide that you're going that direction and you are going to need that,
|
||||
then you're going to break your data down into logical clusters, remove repetition, don't
|
||||
use reuse data.
|
||||
Make sure the data exists only in one place as best you can, and this process is called
|
||||
normalization.
|
||||
And there are different levels of normalization that we did not go into that we'll hold
|
||||
for another episode.
|
||||
And the design of your database logically, think ahead of how you're going to uniquely
|
||||
identify all of that data.
|
||||
Sometimes you break down all of your data, but you're still going to have to add a little
|
||||
bit more data to uniquely identify it.
|
||||
In the example we talked about tonight, we said products.
|
||||
Anti-virus may not be a good enough example.
|
||||
Even, honestly, even Norton antivirus might not be good enough because there's different
|
||||
versions of it.
|
||||
So how would you uniquely identify it?
|
||||
Would it be NAV1 for Norton antivirus version 1, NAV2 for version 2, et cetera, et cetera?
|
||||
That's the kind of stuff you need to think about when you're designing your databases.
|
||||
So I'm going to plant that in your head for this week and let you all think about it.
|
||||
The question you should be having your head and think about when we start the next episode
|
||||
in this many series will be how to uniquely identify a person, just as an exercise to
|
||||
get you thinking for the next episode.
|
||||
So thank you for your time this week, and we will see you again on an upcoming episode
|
||||
of Hacker Public Radio.
|
||||
Thank you for listening to Hacker Public Radio, HPR, sponsored by carrow.net, so head on
|
||||
over to C-A-R-O-DOT-E-T for all your hopes for new Hacker Public Radio.
|
||||
Reference in New Issue
Block a user