220 lines
14 KiB
Plaintext
220 lines
14 KiB
Plaintext
|
|
Episode: 1569
|
||
|
|
Title: HPR1569: Many-to-many data relationship howto
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1569/hpr1569.mp3
|
||
|
|
Transcribed: 2025-10-18 05:11:15
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This episode of HBR is brought to you by AnanasThost.com.
|
||
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
||
|
|
That's HBR15.
|
||
|
|
Better web hosting that's Honest and Fair at AnanasThost.com.
|
||
|
|
Hello and welcome to Hacker Public Radio.
|
||
|
|
My name is Mike Ray.
|
||
|
|
In this show I'm going to explain the best
|
||
|
|
why think is the best and possibly the only way
|
||
|
|
to implement a many-to-many relationship
|
||
|
|
in a database.
|
||
|
|
This has been triggered by some discussion between Ken and Dave
|
||
|
|
about many-to-many relationships in databases which
|
||
|
|
has been seen and heard on community podcasts and emails.
|
||
|
|
I believe there was some discussion, some good nature
|
||
|
|
ribbing between them about the way in which Ken
|
||
|
|
implemented a many-to-many relationship in some mechanism
|
||
|
|
to do with the website.
|
||
|
|
One of the first tasks in designing a database
|
||
|
|
is the identification of the entities which will be contained
|
||
|
|
and managed by that database.
|
||
|
|
And more often than not an entity will have its own table.
|
||
|
|
There will be a one-to-one relationship between an entity
|
||
|
|
and a table in a database.
|
||
|
|
Exceptions to this rule will be where the software demands otherwise.
|
||
|
|
Sometimes for example there are e-commerce systems
|
||
|
|
which are designed using object oriented technology and
|
||
|
|
they demand a greater degree of granularity in the data
|
||
|
|
and that might give rise to a situation where a single entity
|
||
|
|
could occupy more than one table.
|
||
|
|
What do I mean by entity?
|
||
|
|
If you can imagine designing a database for an e-commerce
|
||
|
|
or a sales ledger, some kind of mechanism by which you're selling
|
||
|
|
products either via a website or by more traditional means
|
||
|
|
your database will probably contain such things as
|
||
|
|
customer data, invoice data,
|
||
|
|
invoice line data because an invoice can contain more than one entry.
|
||
|
|
Products and other related things, each of these is an entity.
|
||
|
|
When you've identified the entities in a database
|
||
|
|
they are often represented on a diagram by a diagram which contains
|
||
|
|
rectangles, one for each entity, they contain a label.
|
||
|
|
For example there will be a rectangle with the word customer
|
||
|
|
inside it which represents the customer entity.
|
||
|
|
Perhaps another box which represents the invoice entity
|
||
|
|
and whatever entities exist in a database will be joined together
|
||
|
|
by lines which join the boxes.
|
||
|
|
These lines indicate the nature of the join between the tables.
|
||
|
|
For example a customer, a single customer can obviously make more than one
|
||
|
|
visit to a website or a shop and make more than one purchase.
|
||
|
|
One customer could give rise to many invoices.
|
||
|
|
The relationship between a customer entity and an invoice entity is a one to many
|
||
|
|
relationship and that is represented on the diagram by
|
||
|
|
line drawn between customer and invoice and at the invoice end which is the
|
||
|
|
many end. The line splits into what is generally known as a
|
||
|
|
crow's foot which is a little three-pronged
|
||
|
|
shape just like the foot of a crow or chicken
|
||
|
|
and that touches the box which contains the entity
|
||
|
|
of which there are many. In fact there could be just one row in the invoice
|
||
|
|
table because of course a customer at some point will make his first
|
||
|
|
purchase so a one to many could be more
|
||
|
|
correctly described as a one to one or more
|
||
|
|
relationship. You could also describe a one to many
|
||
|
|
relationship as a many to one if you're looking from the other end if you're
|
||
|
|
standing if you like at the invoice and looking towards the customer then
|
||
|
|
what you have is a many to one many invoices
|
||
|
|
a collection of invoices can have or will have the same
|
||
|
|
customer ID. If you find that in your database you have
|
||
|
|
a one to one relationship then there is probably an error
|
||
|
|
with your data analysis. The rules of database normalization
|
||
|
|
state that all of the entries all of the columns
|
||
|
|
in a single row in a database relate to the primary key
|
||
|
|
the primary identifier of that entity so a customer
|
||
|
|
obviously has a name but the name is often not unique so the customer is or is
|
||
|
|
issued with a customer ID and then the customer name
|
||
|
|
customers gender age. Not a number of other things related to the customer will
|
||
|
|
be stored against or keyed by that index and if you have another table which
|
||
|
|
is a one to one relationship with a customer it probably belongs
|
||
|
|
in the customer table and combine two tables into one.
|
||
|
|
So what about many to many relationships? Well these are not
|
||
|
|
as common far less common than a one to many relationship
|
||
|
|
one example and an actually example I use here is in the implementation of
|
||
|
|
a music database. In this case the database contains
|
||
|
|
two entities an artist and a genre. Now it's
|
||
|
|
obviously there's going to be multiple artists and multiple genres
|
||
|
|
and it's conceivable that a single artist may appear
|
||
|
|
in more than one genre and it's absolutely certain
|
||
|
|
that a single genre will probably have more than one artist so this gives rise
|
||
|
|
to many to many relationship but in defining a many to many relationship
|
||
|
|
in the database we do not and cannot simply join the artist
|
||
|
|
and the genre table and have a crow's foot at both ends of the line which
|
||
|
|
joins the tables and the reason for this should become clear
|
||
|
|
when I explain about foreign keys and
|
||
|
|
trying very hard not to violate the rules of data normalization.
|
||
|
|
A foreign key is a column contained in what is often called the child
|
||
|
|
in the relationship so in this in the case again for a customer
|
||
|
|
and the invoice the customer it can be sort of as the parent
|
||
|
|
entity and the invoice as a child entity
|
||
|
|
and an invoice row will contain a column which corresponds to the
|
||
|
|
customer ID and this is indexed not uniquely indexed but indexed
|
||
|
|
and contains the customer ID of the customer making the purchase
|
||
|
|
which is recorded by the invoice. This column the custom ID column in the
|
||
|
|
invoice table is known as a foreign key to the customer
|
||
|
|
table. Now if we were to try to join two
|
||
|
|
tables together and have a crow's foot at both ends of the line
|
||
|
|
and join artist and genre in a many to many then
|
||
|
|
this would suggest that the genre table will contain a foreign key
|
||
|
|
for the artist ID and the artist ID and the artist
|
||
|
|
table will contain a column which is a foreign key
|
||
|
|
into the genre table. This is nonsense and it's a mess
|
||
|
|
because it absolutely busts wide open the rules about data normalization.
|
||
|
|
Because now the artist entity the artist table contains a column
|
||
|
|
a data item which does not relate to the artist
|
||
|
|
doesn't identify the artist okay identifies a genre to which the artist can
|
||
|
|
belong but it does not identify the artist
|
||
|
|
and that breaks kind of creating this kind of circular
|
||
|
|
relationship it just breaks the rules of data
|
||
|
|
normalization so hard to explain but it does so how do we
|
||
|
|
generate or represent a many to many relationship in a database.
|
||
|
|
Well if you can visualize for a minute the two tables
|
||
|
|
artist and genre joined together by a straight line
|
||
|
|
with a crow's foot at each end now cut that line in the middle
|
||
|
|
and plop another table there and then spin the two halves of the lines
|
||
|
|
through 180 degrees so now you've got
|
||
|
|
custom at the central table the table in the middle is called a cross
|
||
|
|
reference table and I always suffix these tables with
|
||
|
|
x ref shortfall cross reference so in a worked example
|
||
|
|
there would be an artist table a genre table
|
||
|
|
and an artist extra artist genre x ref table
|
||
|
|
so the artist now has a one-to-many relationship
|
||
|
|
with the artist genre x ref table
|
||
|
|
and the genre table has a one-to-many relationship
|
||
|
|
with the artist genre x ref table
|
||
|
|
what does the artist genre x ref table contain
|
||
|
|
It simply contains two columns, an artist ID and a genre ID, and these two columns are not
|
||
|
|
permitted to contain null, so they're defined as not null constraints in the data definition.
|
||
|
|
And there is a single index which is a compound unique index using both of those columns,
|
||
|
|
so you would create an index, give it a unique constraint and use both of the columns
|
||
|
|
in the creation of that index, and what that does is to ensure that there can only be a single row
|
||
|
|
where an artist and a genre are contained. Now this is all getting a bit
|
||
|
|
esoteric and a bit abstract and confusing, so in the show notes you will find all of the code
|
||
|
|
which implement this real world example of artist and genre in a partial music database system,
|
||
|
|
implementing many to many to many to many relationship, and what I have used as an RDBMS
|
||
|
|
in this case is SQLite, that's SQL ITE, which is the world's widest most used RDBMS.
|
||
|
|
That's a very grand claim, but it's a claim which is perfectly justified because if you have a
|
||
|
|
smartphone, not a Blackberry Android phone or an iDevice in your pocket, or if you have
|
||
|
|
a satellite TV receiver or a TV over at home, then chances are each of those
|
||
|
|
devices containing an embedded operating system, big at some flavor of Linux or
|
||
|
|
whatever the Apple operating system is or Windows CE or whatever Windows phones now contain,
|
||
|
|
they will probably use SQLite, in fact they almost definitely do, so that's why SQLite can
|
||
|
|
and does live up to the claim that it's the most widely used RDBMS in the world.
|
||
|
|
Now SQLite is very easy to install on Linux. In fact a lot of packages are contained or installed
|
||
|
|
on a Linux platform will already make use of SQLite, so you probably have SQLite libraries and
|
||
|
|
development libraries, but you may need to install the interactive prompt. On Arch Linux it's called
|
||
|
|
SQLite 3. I can't remember exactly what it is on Debian or Ubuntu, but it's in the show notes anyway.
|
||
|
|
So once you've installed SQLite 3 you then have an interactive prompt which you can enter,
|
||
|
|
create the database and merely create and populate tables and run queries.
|
||
|
|
The show notes contain all of the files which I created to simulate the many-to-money
|
||
|
|
relationship, data definition language, CSV comma separated values data to load into the three
|
||
|
|
tables that we're going to create, load scripts, SQL scripts to select data to demonstrate how to select
|
||
|
|
artists in given genres or supersets of genres and scripts for dumping those results sets to
|
||
|
|
other CSV files. So I have a good look at the show notes, it's very, very comprehensive.
|
||
|
|
As I say all of the files that I used in generating the example are there.
|
||
|
|
In each case each file is topped and tailed with the string dash, dash,
|
||
|
|
snip dash, dash. This is because in a SQLite SQL command or query file,
|
||
|
|
two dashes begins a line comment, so dash, dash, dash will not affect the contents of the file.
|
||
|
|
The only example where you don't need to take out the dash, dash, dash, dash will be in the
|
||
|
|
CSV data files because of course you need the data to be there on its own.
|
||
|
|
I've included a couple of artists which appear in more than one genre.
|
||
|
|
An Irish folk rock band called Horselips which crosses from the description,
|
||
|
|
both into folk and rock, a Scottish band called Runrig that also belong in folk and rock.
|
||
|
|
There are some artists there which belong only in rock, a couple which belong only in folk
|
||
|
|
and a couple of classical artists just for good measure.
|
||
|
|
And all of the mechanisms that you'll need to test or demonstrate the workings of a many to many
|
||
|
|
relationships. But to emphasise again, I've written or I've been involved in the design and
|
||
|
|
implementation of some quite large client server, Oracle and MySQL database systems and it's been
|
||
|
|
my task to implement many to many relationships on several occasions and I have never found a better
|
||
|
|
solution than this, a better solution that does not compromise normalisation or lead to an
|
||
|
|
even worse situation which is where you need to drag big record sets back across the network,
|
||
|
|
probably containing recorders that you will ultimately drop because that's wasteful in bandwidth
|
||
|
|
and it's wasteful in client side processing cycles and as for doing things like
|
||
|
|
putting comma separated lists of fields into a single column in a database.
|
||
|
|
Oh no no no no no don't do it, it's bad, you know who you are.
|
||
|
|
So there we are, I sort of concluded it's been a probably a difficult explanation to follow but
|
||
|
|
look at the show notes because the show notes are very very comprehensive and as I said they contain
|
||
|
|
a complete worked example implemented with SQLite3 which despite being small and compact and fast
|
||
|
|
is an exceptionally powerful RDBMS for its size and for its
|
||
|
|
it's not even, I don't think it's even GPL, it's just free, it's just in the public domain, it's
|
||
|
|
free as a bird, it's a very realistic, very realistic till don't use it if you intend to
|
||
|
|
write a relational database system which is going to have multiple concurrent users if you're
|
||
|
|
going to start an airline and have booking clerks all around the world who need to access the
|
||
|
|
database at the same time. SQLite ain't going to do it, SQLite really is more suited
|
||
|
|
to single user embedded systems or small one-off programs embedded inside a software program
|
||
|
|
for use by only the programmer or by only by use for use only by the person who is using the
|
||
|
|
software at the time. In fact there's one really good feature of SQLite which is the ability to
|
||
|
|
create a database in memory. That's a database that does not exist on disk. If you name the database
|
||
|
|
colon memory, colon, it exists only in memory and then it's possible to define tables and columns
|
||
|
|
and relationships, perhaps into which to load the contents of an XML file or a bunch of XML files
|
||
|
|
or JSON files or whatever in order to make blisteringly fast queries of configuration data
|
||
|
|
which if it's changed in any way can then be dumped back out to an XML file at a later time.
|
||
|
|
I think that's it. Long and long-winded boring technical difficult to grasp just by listening to
|
||
|
|
what I'm saying. It's really one of those things where you're going to need to look at a diagram or
|
||
|
|
Peru is at the very least a textual diagram in order to be able to understand exactly what
|
||
|
|
these relationships are and exactly how the mechanism works but trust me this is as far as I can
|
||
|
|
as far as my experience tells me the only way in which to implement many to many relationship
|
||
|
|
in a relational database system.
|
||
|
|
You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast
|
||
|
|
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
||
|
|
was contributed by an HBR listener like yourself. If you ever thought of recording a podcast
|
||
|
|
and click on our contributing to find out how easy it really is. Hacker Public Radio was founded
|
||
|
|
by the Digital Dove Pound and the Infonomicon Computer Club and it's part of the binary revolution
|
||
|
|
at binrev.com. If you have comments on today's show, please email the host directly, leave a comment
|
||
|
|
on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is
|
||
|
|
released on the Creative Commons Attribution ShareLight 3.0 license.
|