Episode: 1569 Title: HPR1569: Many-to-many data relationship howto Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1569/hpr1569.mp3 Transcribed: 2025-10-18 05:11:15 --- This episode of HBR is brought to you by AnanasThost.com. Get 15% discount on all shared hosting with the offer code HBR15. That's HBR15. Better web hosting that's Honest and Fair at AnanasThost.com. Hello and welcome to Hacker Public Radio. My name is Mike Ray. In this show I'm going to explain the best why think is the best and possibly the only way to implement a many-to-many relationship in a database. This has been triggered by some discussion between Ken and Dave about many-to-many relationships in databases which has been seen and heard on community podcasts and emails. I believe there was some discussion, some good nature ribbing between them about the way in which Ken implemented a many-to-many relationship in some mechanism to do with the website. One of the first tasks in designing a database is the identification of the entities which will be contained and managed by that database. And more often than not an entity will have its own table. There will be a one-to-one relationship between an entity and a table in a database. Exceptions to this rule will be where the software demands otherwise. Sometimes for example there are e-commerce systems which are designed using object oriented technology and they demand a greater degree of granularity in the data and that might give rise to a situation where a single entity could occupy more than one table. What do I mean by entity? If you can imagine designing a database for an e-commerce or a sales ledger, some kind of mechanism by which you're selling products either via a website or by more traditional means your database will probably contain such things as customer data, invoice data, invoice line data because an invoice can contain more than one entry. Products and other related things, each of these is an entity. When you've identified the entities in a database they are often represented on a diagram by a diagram which contains rectangles, one for each entity, they contain a label. For example there will be a rectangle with the word customer inside it which represents the customer entity. Perhaps another box which represents the invoice entity and whatever entities exist in a database will be joined together by lines which join the boxes. These lines indicate the nature of the join between the tables. For example a customer, a single customer can obviously make more than one visit to a website or a shop and make more than one purchase. One customer could give rise to many invoices. The relationship between a customer entity and an invoice entity is a one to many relationship and that is represented on the diagram by line drawn between customer and invoice and at the invoice end which is the many end. The line splits into what is generally known as a crow's foot which is a little three-pronged shape just like the foot of a crow or chicken and that touches the box which contains the entity of which there are many. In fact there could be just one row in the invoice table because of course a customer at some point will make his first purchase so a one to many could be more correctly described as a one to one or more relationship. You could also describe a one to many relationship as a many to one if you're looking from the other end if you're standing if you like at the invoice and looking towards the customer then what you have is a many to one many invoices a collection of invoices can have or will have the same customer ID. If you find that in your database you have a one to one relationship then there is probably an error with your data analysis. The rules of database normalization state that all of the entries all of the columns in a single row in a database relate to the primary key the primary identifier of that entity so a customer obviously has a name but the name is often not unique so the customer is or is issued with a customer ID and then the customer name customers gender age. Not a number of other things related to the customer will be stored against or keyed by that index and if you have another table which is a one to one relationship with a customer it probably belongs in the customer table and combine two tables into one. So what about many to many relationships? Well these are not as common far less common than a one to many relationship one example and an actually example I use here is in the implementation of a music database. In this case the database contains two entities an artist and a genre. Now it's obviously there's going to be multiple artists and multiple genres and it's conceivable that a single artist may appear in more than one genre and it's absolutely certain that a single genre will probably have more than one artist so this gives rise to many to many relationship but in defining a many to many relationship in the database we do not and cannot simply join the artist and the genre table and have a crow's foot at both ends of the line which joins the tables and the reason for this should become clear when I explain about foreign keys and trying very hard not to violate the rules of data normalization. A foreign key is a column contained in what is often called the child in the relationship so in this in the case again for a customer and the invoice the customer it can be sort of as the parent entity and the invoice as a child entity and an invoice row will contain a column which corresponds to the customer ID and this is indexed not uniquely indexed but indexed and contains the customer ID of the customer making the purchase which is recorded by the invoice. This column the custom ID column in the invoice table is known as a foreign key to the customer table. Now if we were to try to join two tables together and have a crow's foot at both ends of the line and join artist and genre in a many to many then this would suggest that the genre table will contain a foreign key for the artist ID and the artist ID and the artist table will contain a column which is a foreign key into the genre table. This is nonsense and it's a mess because it absolutely busts wide open the rules about data normalization. Because now the artist entity the artist table contains a column a data item which does not relate to the artist doesn't identify the artist okay identifies a genre to which the artist can belong but it does not identify the artist and that breaks kind of creating this kind of circular relationship it just breaks the rules of data normalization so hard to explain but it does so how do we generate or represent a many to many relationship in a database. Well if you can visualize for a minute the two tables artist and genre joined together by a straight line with a crow's foot at each end now cut that line in the middle and plop another table there and then spin the two halves of the lines through 180 degrees so now you've got custom at the central table the table in the middle is called a cross reference table and I always suffix these tables with x ref shortfall cross reference so in a worked example there would be an artist table a genre table and an artist extra artist genre x ref table so the artist now has a one-to-many relationship with the artist genre x ref table and the genre table has a one-to-many relationship with the artist genre x ref table what does the artist genre x ref table contain It simply contains two columns, an artist ID and a genre ID, and these two columns are not permitted to contain null, so they're defined as not null constraints in the data definition. And there is a single index which is a compound unique index using both of those columns, so you would create an index, give it a unique constraint and use both of the columns in the creation of that index, and what that does is to ensure that there can only be a single row where an artist and a genre are contained. Now this is all getting a bit esoteric and a bit abstract and confusing, so in the show notes you will find all of the code which implement this real world example of artist and genre in a partial music database system, implementing many to many to many to many relationship, and what I have used as an RDBMS in this case is SQLite, that's SQL ITE, which is the world's widest most used RDBMS. That's a very grand claim, but it's a claim which is perfectly justified because if you have a smartphone, not a Blackberry Android phone or an iDevice in your pocket, or if you have a satellite TV receiver or a TV over at home, then chances are each of those devices containing an embedded operating system, big at some flavor of Linux or whatever the Apple operating system is or Windows CE or whatever Windows phones now contain, they will probably use SQLite, in fact they almost definitely do, so that's why SQLite can and does live up to the claim that it's the most widely used RDBMS in the world. Now SQLite is very easy to install on Linux. In fact a lot of packages are contained or installed on a Linux platform will already make use of SQLite, so you probably have SQLite libraries and development libraries, but you may need to install the interactive prompt. On Arch Linux it's called SQLite 3. I can't remember exactly what it is on Debian or Ubuntu, but it's in the show notes anyway. So once you've installed SQLite 3 you then have an interactive prompt which you can enter, create the database and merely create and populate tables and run queries. The show notes contain all of the files which I created to simulate the many-to-money relationship, data definition language, CSV comma separated values data to load into the three tables that we're going to create, load scripts, SQL scripts to select data to demonstrate how to select artists in given genres or supersets of genres and scripts for dumping those results sets to other CSV files. So I have a good look at the show notes, it's very, very comprehensive. As I say all of the files that I used in generating the example are there. In each case each file is topped and tailed with the string dash, dash, snip dash, dash. This is because in a SQLite SQL command or query file, two dashes begins a line comment, so dash, dash, dash will not affect the contents of the file. The only example where you don't need to take out the dash, dash, dash, dash will be in the CSV data files because of course you need the data to be there on its own. I've included a couple of artists which appear in more than one genre. An Irish folk rock band called Horselips which crosses from the description, both into folk and rock, a Scottish band called Runrig that also belong in folk and rock. There are some artists there which belong only in rock, a couple which belong only in folk and a couple of classical artists just for good measure. And all of the mechanisms that you'll need to test or demonstrate the workings of a many to many relationships. But to emphasise again, I've written or I've been involved in the design and implementation of some quite large client server, Oracle and MySQL database systems and it's been my task to implement many to many relationships on several occasions and I have never found a better solution than this, a better solution that does not compromise normalisation or lead to an even worse situation which is where you need to drag big record sets back across the network, probably containing recorders that you will ultimately drop because that's wasteful in bandwidth and it's wasteful in client side processing cycles and as for doing things like putting comma separated lists of fields into a single column in a database. Oh no no no no no don't do it, it's bad, you know who you are. So there we are, I sort of concluded it's been a probably a difficult explanation to follow but look at the show notes because the show notes are very very comprehensive and as I said they contain a complete worked example implemented with SQLite3 which despite being small and compact and fast is an exceptionally powerful RDBMS for its size and for its it's not even, I don't think it's even GPL, it's just free, it's just in the public domain, it's free as a bird, it's a very realistic, very realistic till don't use it if you intend to write a relational database system which is going to have multiple concurrent users if you're going to start an airline and have booking clerks all around the world who need to access the database at the same time. SQLite ain't going to do it, SQLite really is more suited to single user embedded systems or small one-off programs embedded inside a software program for use by only the programmer or by only by use for use only by the person who is using the software at the time. In fact there's one really good feature of SQLite which is the ability to create a database in memory. That's a database that does not exist on disk. If you name the database colon memory, colon, it exists only in memory and then it's possible to define tables and columns and relationships, perhaps into which to load the contents of an XML file or a bunch of XML files or JSON files or whatever in order to make blisteringly fast queries of configuration data which if it's changed in any way can then be dumped back out to an XML file at a later time. I think that's it. Long and long-winded boring technical difficult to grasp just by listening to what I'm saying. It's really one of those things where you're going to need to look at a diagram or Peru is at the very least a textual diagram in order to be able to understand exactly what these relationships are and exactly how the mechanism works but trust me this is as far as I can as far as my experience tells me the only way in which to implement many to many relationship in a relational database system. You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast network that releases shows every weekday Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast and click on our contributing to find out how easy it really is. Hacker Public Radio was founded by the Digital Dove Pound and the Infonomicon Computer Club and it's part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is released on the Creative Commons Attribution ShareLight 3.0 license.