hpr_transcripts/hpr3469.txt

Episode: 3469
Title: HPR3469: Linux Inlaws S01E43: The Great Battle or not
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3469/hpr3469.mp3
Transcribed: 2025-10-25 00:06:02

---

This is Hacker Public Radio Episode 3469 for the first 18th of November 2021.
Today's show is entitled, Liuxin Laws S0143, The Great Battle or not and is part of the series,
Liuxin Laws, it is hosted by Monochrome and is about 69 minutes long and carries an explicit flag.
The summer is, The Great Battle or not.
The Great Battle or not.
This is Liuxin Laws, a podcast on topics around free and open-source software and is hosted
a contraband, communism, the revolution in general and whatever fence is your tickle.
Please note that this and other episodes may contain strong language, offensive humor and
other certainly not politically correct language you have been warned.
Our parents insisted on this disclaimer.
Happy Mom!
Thus, the content is not suitable for consumption in the workplace, especially when played back
in an open plan office or similar environments.
Any minors under the age of 35 or any pets including fluffy little killer bunnies,
you trust the guide dog, a lesson speed and QT Rex's or other associated dinosaurs.
Welcome to Liuxin Laws Season 1 Episode 43, The Great Battle or not.
Martin, how are things?
Oh, very much looking forward to tonight.
Excellent!
We have our dear friend from Redis Days with us.
Speaking of which, yes.
Interesting discussion around.
Indeed, but before we go into the battle mode, maybe David should introduce himself.
David, over to you.
Yeah, I mean, I'm an old friend, right?
Me and Thomas.
We are already kind of part of this podcast as guests.
One of the great old chorus.
Exactly.
Yeah, exactly.
Yeah.
Introducing myself, I guess the reason why Kristov asked me to join this is because I worked
for some database companies in the past.
So, for instance, Ingress as a relational database system vendor.
Then for a small graph database startup, then for couch based as a document database vendor.
And now I'm working quite a lot with Redis Lips as a as the vendor of, let's say,
real-time data platform, whatever.
And I'm also having, I'm also having a little our consultancy company in Germany,
which is called NoSQL Geeks.
Right.
And full distortion, Redis Labs is the place where we all met initially before Martin
defected to something called.
Martin, you could be.
Oh, sorry.
Yes, you could.
Right, right, right, right.
Yes.
Sorry.
Okay.
Richard, if you're listening, the address is sponsor at Linus Ingress.eu.
Okay, guys.
The idea behind tonight is actually to shadow a bit of light on the history of both camps in terms
of where SQL is coming from, where no SQL is coming from, full distortion, Martin
before joining Redis Lips was kind of old school and worked for a company called Enterprise
to be.
But maybe Martin, you want to shed some light on this.
Stint in your life too.
Well, it's, it's more than it is.
It is.
It is.
Oracle period.
Yes.
Stint.
Yes.
I started working, in fact, after uni days.
So mainly, well, as you mentioned, Oracle.
And then in later years, Postgres and associated frames.
So why didn't you, why did you actually then consider a no SQL database employer?
Well, there are obviously advantages to.
Lots of different technologies, right?
So, and I think that I had not forgotten for going any conclusions.
I think the conclusion.
Of course, money.
Be either.
There is a database for everything.
And money didn't have anything to do with it or with it, of course.
No, no, we don't get paid for this.
Of course, of course.
Okay, guys, now after the interrupt, maybe each and every one of you could.
Could give a little bit of an historic breakdown of actually where the particular technology
came from.
Let's start with.
Let's start with you Martin.
Where does the SQL come from?
Three feet to go back to.
Sometimes, if you have to.
Okay, so SQL is actually a language, then structured query language.
So.
If you talk about.
SQL versus no SQL.
I mean, there is also.
There are two different definitions of no SQL, which probably David will pick up on.
But.
Where do we seem to come from?
Well, there was a.
For those of you who studied science, you're probably familiar with my stone breaker.
He did many things.
One of his.
For instance, indeed.
Yes.
One of his name.
One step at a time.
Gentlemen.
The guns haven't been drawn yet.
So rest assured.
Yeah, so that whole kind of.
English was the origin of many different relational databases at the time.
Or being one of them and so on.
So it.
It all comes from that same.
Same background.
And.
Yeah.
I mean, relational.
They have obviously progressed over the years.
They are not just only relational data stores anymore.
But again, we'll come to that later.
So.
And they are probably the majority of all.
Let's say transactional systems in the world run on a relational database, I would say.
Interesting.
Over to you, David.
I mean, what is the question?
What is the no secret?
Well, no secret movement is coming from.
I'm not sure if we can pass it this together, but.
Let's say this way, right?
So originally there, there were.
Traditionally already more than relational database systems in the past, right?
So I think even in the late 80s, we had hierarchical database systems and stuff like that, right?
And then I would say late 90s, maybe, right?
Or maybe I'm wrong.
Maybe even earlier object databases arise.
So it was not a new idea.
Yeah.
I think earlier, right?
It was not a new idea to actually are, let's say, have.
Alternative models, right?
And even if ration databases are based on duration, which kind of makes sense.
And so on.
There's always this kind of strange thing that you need to kind of track the.
The view on your data are by using tables and columns and roles and whatever, right?
So meaning at some point or I would say maybe 2008 around that, right?
There, there was the no sequel movement happening where the basic idea was to say, hey.
Instead of using a database system, which aims to be a general purpose database system,
which is more or less the case for a relational database system.
At least they are promising it.
We will maybe see during this discussion here that this promise is not entirely fulfilled always.
Let's have database systems that are more multipurpose systems, let's say, right?
And such multipurpose systems are, yeah, as the name is indicating, are good for specific purposes,
but maybe not for other purposes in the first step, right?
And with this kind of intent, intention, or we would basically have also different methods of accessing them.
Or so, for instance, CogeDB is something I worked with in the past a bit in the context of CogeBase.
It's not the same, by the way, right?
But CogeDB is a document database system where you kind of index your data by defining our views, let's say, right?
Those views are accessed by a REST API and they are kind of implemented by using Reproduce, let's say, right?
So I would say this is quite a different way of accessing data versus having the structured querying language, let's say, right?
And there are other systems as well that are falling into this category like key value stores, for instance, right?
Which is much more than a key value store we can talk about later.
But again, the idea was, yeah, have specific data stores that are fulfilling specific purposes.
But better than the relation database system, whatever better means in this context, right?
For those purposes, if this makes sense.
Just to summarize, structured querying language as an SQL implies that you have a predefined structure consisting of tables.
And these tables are subdivided into rows and columns.
So at the usual table, whereas a no SQL doesn't have this constraint.
The no SQL, this is actually funny, right?
Because initially, historically, as far as I remember, when I got in touch with no SQL the first time, it actually really meant that we are doing something not relational, let's say.
So the no was basically saying there is no SQL, but.
But nowadays, this is different, right?
So the no was kind of reinterpreted or in between, which means it is now not only SQL.
And no SQL system or can actually have a query language and it can actually be even a query language, which reminds very much.
About the structure of very language, right?
So are the summary was not entirely true in this context, but.
Where do you see these two ecosystems today in the light of.
Sheen elegans like internet or of things like hipster languages implementing implementing microservices and.
Application monitor the monitor looks.
Being rapidly deconstructed.
So you have your ecosystem of applications all having different.
Application having different positions requirements and also different expectations.
Of of the persistence layer, which essentially database is Martin, maybe you want to go first.
Yeah, I think there was a couple of questions there, right?
Backstats have gotten Martin by all means.
But if we touch upon I guess because you mean your question comes down to.
For which applications would you choose which technology in short.
Modern applications.
I don't want to talk about.
Modern application.
Yeah, sorry.
What's the modern application?
And like anything.
Anything not being a general that has been running on.
Not.
Not you.
Not being implemented in corporate.
Okay.
Yeah.
Yeah.
Now I mean, but the things like microservices that is the concept.
So whether which data store you choose, but you have a point about the persistence, right?
If you.
If data is temporary, then.
You may choose to go with memory storage only, which again doesn't make any difference to the underlying database technology.
Whether it's a.
Postgres or a bet is you can decide not to use persistence with either, right?
So it's then because.
It becomes temporary data because it's in memories, not persisted, not durable, whatever you want to call it.
So.
I don't think there is a.
There are obviously typical use cases for database, right?
Which is.
Like with, you know, a really.
Color relational database, but they are typically used for transactional systems because of the durability requirement.
Because people don't like to lose their transactions.
They like to have them in as many places as possible and ideally on something that is not a volatile storage.
Depends on the use case, right?
Yeah, exactly.
Depends on or.
If the session management, you might just live.
Yes.
But temporary data, right?
Yeah.
That your data is gone or whatever.
And there's also.
However, however, however, you can also just use persistence in an in-memory mode only for example.
So.
Yeah.
I'm not in postgres expert.
It's not really related to saying, well, I have to have a key value store.
I have to have no sequels to store something in memory, right?
It doesn't matter which.
But it's all about efficient access, right?
So meaning let assume you have a memory database system, which is a relation database system, right?
And let's assume that you you stick with the relational paradigm by not by not being able to bypass it, right?
Because you're using a relation database system.
So what you need to do is if someone is creating the data, even if your data is in memory, right?
You need to kind of pause this query, right?
You need to maybe.
Or did a query execution plan, right?
Or by doing some in-memoration steps, steps behind the scenes, right?
Or maybe you involve some statistically, both the data are in order to find the best plan blah blah, right?
And then are just a quick question for the two listeners who are not database experts.
What's the what's the query execution plan before we get to technical?
So basically there are multiple steps of your execute query, right?
You give the query, which is basically the query string and this query string is in the first step past, let's say, right?
And then based on the on the parsed or SQL, you transfer it in something more low level.
And then you say, okay, fine.
Within my my steps, which I need to do right there are maybe some selection.
There's some projection.
There's there are some index scans are in order to retrieve some data.
There may be some low level operations like merge joints or whatever, right?
And in order to efficiently fetch data you're interested in, you need to plan how to do this, right?
And the plan how to do that is the query execution plan.
Wow, okay.
And there's something you build up front as part of the query execution.
And if you have SQL, you get usually, even if it is a memory, all this overhead of doing that, right?
And then at some point as soon as you have the query execution plan, you would basically say, okay, fine.
I'm now executing the query by scanning maybe some indexes that are in memory or whatever, right?
And then finally, fetch a data here.
In this case, Martin mentioned, okay, if it is a memory, it's a memory that it's fast.
But it's not entirely true.
The question is how complex, how time, how big is the time complexity in order to be able to fetch a data from memory.
And with the relation model, there's much more overhead and worth than with a key value store.
Because in a key value store, you don't have a query you need to parse or you don't need to build a query execution plan.
The only thing you need to do is you need to have one or one operation that is basically fetching the data from a kind of giant hash map within memory.
Let's see.
Let's see a program.
Yeah, but I mean, that's the point, right?
It's simple.
And the other thing is however, however, that I was just going to pick up on your parsing plan, because clearly, if you use a t value store,
you also execute a certain command, whether that's a ghetto or whatever it is, right?
It still has to be parsed.
Yeah, but it's not like a sequel query, man.
No, this is it.
I only retrieved one value.
So in a sequel query, one operation as well.
Yeah, but Martin, it's no joint involved.
There's no merge joint.
But if you would use your sequel.
So it's you, you write if you use your, your relation database, exactly like a key base.
All right.
So if you're a relation database system and you do something like our select star from my tables, where this kind of equals exactly that value.
Yeah.
Then there is anyway more overhead than just doing a get key.
Let's say.
Yeah.
If you give me the workflow, it's not the same.
It's not the same operation.
Yeah, it's the same.
No, it's one speaker at the time, if possible.
Thank you.
Well, I'm basically disagreeing with that.
No, but okay.
I guess it's a single operation, right?
Maybe I just single key operation.
So therefore, a select star is a multi-roll bridge.
Maybe I'm going to order right now.
But the thing is the following.
Okay, let me let me kind of explain it.
Okay.
There's the following.
As long as you say.
If you have a pure keyway store, if you have a pure keyway store, everything is an item is a keyway you pair, which means the only way how you can access the data in a pure keyway store is basically to fetch the entire data you're interested in by knowing the key already.
So which means that your object or let it be an object, maybe or something, which was basically serialized or and then stored in the keyway store is something like completely there as you need it, right?
And the underlying data model, let's say, right?
It's so simple that you just need to do a get key and you get your value back or whatever this is, let's say, right?
So now in the equivalent in a relational database system would be okay, hey, or maybe your type is reflected by a table, let's say, right?
And maybe you would like to have all the data, which is associated to this.
Maybe there's a user table that you would all have like to have all the user data.
The same way as you would maybe fetch the entire user object if you basically do this get user demire, whatever, right?
Now are in in relation to the base systems, just in the example, you would do something like six star from our users, right?
Where my user ID equals demire, right?
So now processing this query is much more expensive than just doing this get, right?
And on the on the hatchment processing this very basically means that you need to parse it.
There is anyway a query execution plan because it's a general purposes that right?
So you go through the entire stack, let's say, and finally, you fetch the data and give it back to the client, but it's more complicated.
And yeah, in this example, you would do a select star or in order to fetch the entire user details the same way as you would fetch the entire user by fetching this CLI object, which is in your key value.
Now, the thing is as well, right?
If you use a relation database system, then relation database systems usually come are again with the entire stacks of processing data, right?
There are multiple components that are involved.
There are transaction managers, for instance, right?
That are detecting that locks of transactions.
They are doing maybe lock or locking.
They are maybe doing lock escalations or blah blah, right?
They are index scans and so on.
The whole thing off or executing a ferry.
So in a key value store, the model is such simple that you basically just fetch it by a specific key, right?
That all of this stuff doesn't play a role in this context, right?
The stack is basically not as deep as with the relation database system, I would say, right?
And this needs to be anticipated.
It makes a difference if you store your data in memory in a relation database system versus a key value store in this sense.
And the other sense is, no, it's a last two sentences regarding this before when you talk.
It is basically the underlying data model has an impact on, or let's say, the scalability of your system, right?
Indeed, we are, if you use a key value store, a key value store, we are only interested in fetching data by its primary key, right?
But this kind of limitation in the first step is also giving us a benefit of being able to scale this stuff, right?
Because what you can easily do is we can easily apply something like a hash function on this primary key, distribute the data based on the hash function across many nodes, right?
By being able to create scale reads and writes.
Now with the relation database system, even if it is on memory, you can't do this that easily.
You can partition data, right?
Or in specific ways and so on, by basically finding logical partitions, but it's much more complex.
And again, if you would like to keep the guarantees of your relation data, this seems like as a compliance and so on, right?
Which is something you compromise with the key value store in this sense.
There's again additional overhead involved for something that simple, like fetching a specific key or changing a specific key, so it makes a difference.
It's not just the in-memory aspect, right? It's about the entire characteristics of the system and not just about its in-memory, so it doesn't matter if it's this system, which is a memory or that system.
And before we go even further, maybe Martin could explain because he's old school, what is asset compliance exactly before we lose our last listener or something?
Yeah, first I'd like to kind of agree with David because the thing about the memory piece was really about, because Chris mentioned the persistence piece really.
And you may give a point that key value stores are simpler and relation databases are generally more complex, more feature rich and they both become with trade-offs, which is fair.
So if we then go on to the asset piece, they're really a set of properties for database transactions or databases that give certain guarantees, right?
Because we talk about, say, isolation, right, isolation, if you have multiple users using the same database, they don't want to be able to see the others in flight transactions, things like that.
There is obviously durability guarantee versus what a D stands for. So if someone does a database transaction, then it's guaranteed to be durable, meaning that if you have a power outage, then it is possible to retrieve this data.
And the other two are atomicity and consistency, which again are really associated with the relational databases, you can only go with consistency.
It's really a state transition that is always the same, right? So if we're going from one state to the other, then that consistency is the guarantee that it has happened as a single unit that there aren't any separate steps in between that can be interrupted, right?
So if someone writes, you know, or yes, specifically because if it writes, this is relevant, right? If we are saying, I've got deducted 100 pounds from your credit card balance, then we want to know that that happened in one step rather than in two steps, where it could have failed halfway through and you end up with.
This is actually atomicity, right? Just saying, well, what you just described is basically atomicity, right?
Yes, I mean, it moves on a little bit, but so these are the main characteristics that yes, I mean with a key value, so you can have durability as well.
Which we know from from from this common word catch basis or catch DB's model was.
Yeah, because the coach base actually are, was originally more disk based, right? So it's not exactly the same as the greatest where everything needs to be in RAM, but coach base are the initially one, I think they introduced in between another feature, which was discless bucket, or whatever.
But initially, it was quite typical to evict our data down to disk, right? So persistence was actually kind of big then, right?
Going back to the controversy, to the absolutely controversial issue of an execution plan, I'm not interjecting that actually not every no secret database is a key value store.
Because you have graph database, you have time choose database, not rest of the right.
Indeed, indeed, indeed, indeed, indeed, but this was not the point, right? We didn't talk about it.
This is only new sequence, not new sequence, but the point or controversial discussion was about it.
It doesn't matter which in memory data store you're using, if it is a memory, it's always as fast as if any other in memory databases.
This was what I interpreted for margin statement and this is a wrong statement.
No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no.
It doesn't mean that there are other kinds of database systems, no single database systems out there, right?
But they own specific characteristics beyond our key value stores.
key value stores. Maybe one last thing about the consistency, right? So consistency in
relation database systems does not just refer to, let's say, physical consistency or state
consistency or whatever, right? It is specially also about logical consistency. But you do
is you start from a consistent state or your transaction starts from a consistent state,
right? And it ends in a consistent state whereby what this consistent state is is actually
kind of well defined by by having constraints. So for instance, they did reference integrity
or not null or whatever, right? As constraints. Interesting perspective statement. But given
the fact that now with the latest edition of mobile, crazy and other interesting fascinating
use cases, database is actually have to scale planet wide. Just wondering, how do both
camps scale up to that challenge?
John, I go first, I will show I go first. Well, let me go ahead. I think what we
have seen in the whole database landscape is that people keep adding to it. And when
you talk specifically talk about planet scale databases, then the likes of Google and
Microsoft have built their own tech, right? And yes, you can argue that other databases may
be able to fill that role as well. Some better than others, clearly. Partly due to heritage,
partly due to their premise of what they are trying to achieve. But if you're talking
about planet scale, then there's a number of factors involved, right? There's latency,
there's consistency, there's partitioning or cap theorem. It's not right, David. Very
much so. But before we go any further, what exactly does the cap theory mean for the people
who do not know? I think he's Professor Brewer these days, right? We're wrong. At least
Doctor anyway. Maybe. So should I take this? Please David, go ahead.
So let's assume that. So C is consistency, A is availability and P is partition tolerance,
right? Let's assume that the P is kind of a given, right? So if you have a network partitioning
or event, then our system is basically tolerating this by not falling into pieces, let's say,
right? So meaning are usually we are talking about AP systems versus CP system. So available
in partition tolerance and consistent and partition tolerance, right? So now there, it's
actually quite interesting that the cap theorem is often a little bit misinterpreted, right?
Or so at least I have seen this with customers a lot, right? And I hope that I'm not misinterpreting
it now, but I'm pretty confident that this is not the case. If I am, then are you or the
listeners might correct me. But so let's say the availability is actually not just the availability
in the context of if nodes are up or down or whatever, or if the system is highly available,
this is kind of overlaying with this, but at the end availability means the availability to
perform requests, right? A system which would basically always return an error is not available,
even if all the nodes are up and running, right? So availability is the availability to be
able to or execute commands at the end, right? Consistency, it's kind of... Sorry to interrupt.
On that subject, then are you, I'm just sort of clarifying for listeners.
Are we saying that if a system is up, then it's available, no matter whether it can...
No, no, no, no. What I'm saying is if a system is up and running,
but if it returns you an error message for your request, it's not available to execute the
commands, right? Which means it's not available if it returns error messages, right? Even if the nodes
are up and running, that's the statement, are... Okay, so let's clarify that this is the availability
we mean, right? The availability to execute commands, actually, to respond to requests, right?
This is the availability we mean here. The consistency refers to, let's say, state consistency or
let's say the observed state by clients and what helps me at least is to imagine a distributed
system in a sense as a black box in the first step, right? So kind of don't be interested in the
first step how it is looking inside, right? Let's just look at it as a black box, right? And then
there are multiple clients are connected to this black box. And inside, one client might be
connected to node one, another one to node three, whatever, that's another, right? So let's say
multiple clients are connected to this black box. Now, what's important is that if multiple clients
would request a specific data item at exactly the same point of time, right? A system would be
consistently behaving, right? If all the clients are getting exactly the same result, right?
Whereby in an AP system, it would be tolerated that some of the clients get another result and
other clients, right? So now, why is the cap theorem easy to prove in a sense, right? Let's imagine
you have or just two nodes in your state-producible system, right? And so let's say node one and
node two and let's assume that you have a single client here in this example, right? And this
single client is writing, is performing a single right operation to node number one, right?
In addition, node number one is aiming to replicate the data to node number two, right? So now,
again, setup is client writes to node number one, node number one, replicates to node number two
under normal circumstances. Now, what we have need to have now in order to make sense out of this,
it is a network partitioning event, right? Let's say someone cuts the wire between node one and
node number two, right? So the client is performing the right operation to node number one, right?
Nude number one is now basically not able to reach node number two, right? Because node number one
is not able to reach number two, but it aims consistency and aiming consistency are basically means or
that are I use it there. Yeah, yeah. So aiming consistency are basically means that the state
on node number one and the state of the node number two should be the same, right? So meaning,
but here under this network partitioning event, node number one is not able to reach node number two,
which means that node number one, if it aims consistency under the circumstance of this
network partitioning event, right? We need to communicate an error to the client, right? It
will need to tell the client, hey, or I can't unfortunately reach the desired consistency level,
right? So error, right? So meaning if it is a CP system under the circumstance of this network
partitioning, it would basically be not available, right? So this is because why you can only have
an AP or a CP system that assume the system is more available than consistent, right? Or behaves
available instead of consistent, then it would actually not care about the fact that it can't
replicate the data over to node number two. It would basically just accept the right operation by
acknowledging the right as successful to client and would eventually solve the right the
replication of the data, right? In this case, the node number one or the system behaves are
evaluated, right? Because it responds to the request, but it's not consistent, right? Because it
can't guarantee consistency without ensuring that the data is basically the same on all the nodes,
let's say, right? Hope this makes sense. Hopefully not a two-screwed up explanation of the cat theory.
It does to me. What about you? Excellent. That's all good. I'm not sure if we answered your question.
What was the question the question was? Yeah, I think that was a pretty good description of the
cat theory. No, there was a question before that. Yes, indeed. What was the question before that?
I'm lacking. Are you talking about your database?
It's kind of a planet, but kind of why database? It's been a long day on the database road.
I think Martin said something very interesting and useful there, which was basically that the
latency is playing a role across Geo's and so on, right? And the thing is that especially if you
have a high network latency between two regions, let's say, right? Because of physics, just because of
physics, at least before quantum computing is maybe unboke, I'm not sure if this serves it,
but it's not like it. And then next week, right? Yeah, maybe. Then it takes a wider transfer
state from A to B, right? And let's say if you have a planet scale database and you aim for
consistency, then as just explained, you would typically need to have synchronous replication from
from one side to the other, right? And this synchronous replication would basically have an
impact on your performance, your workload, right? Because then your client would basically need to
or yeah, let's say, our source side or target side, it is active active, it's better actually,
but you basically need to replicate it over before you acknowledge the client, right? And
given that the performance from the client's point of view would be impacted, which is the reason
why most planet scale database systems, I would say, are one to operate at scale, but might be
a question of requirements, are voting more for availability over consistency by doing asynchronous
replication across the use, let's say, Ahmed. And just just for the relationship, active active
of course, meaning that if you have distributed clusters across the planet, each and every,
each and every application, accessing their own cluster instance can write to set cluster instance
independent of the other ones. Exactly. You can write and read from each site. Exactly.
Where by an active passive, or you can only write to, let's say, the source side, but
optionally maybe read from the target side, right? So given the latest craze about
Netflix streaming services and all the rest of it, which clearly demand a kind of
planet-wise capability and probably a streaming service, not the best example, but you can address
what do you think, which approach would scale better? The old SQL stuff that has been around for
the last 50 years, or there's no this newfangled, no SQL thingy used by the hipsters with a long beard
and rust and Python programming language or whatever. Yeah, as mentioned before, right?
So we are voting, or in this case, for scalability over our scalability and availability over
consistency, often, right? Doesn't need to be the case. There are also no SQL systems that are
behaving consistently. Again, they are multi-purpose systems. It's not the case that you can say
that no SQL system or whatever, right? But you can say there are different flavors for different
purposes and at least for something like document database systems at planet scale, let's say,
or key value stores at planet scale are, as mentioned by you, right? I would say it's most common to do
our asynchronous replication across sites, which means that we kind of favor
availability and scalability over consistency and relational database systems,
traditionally, boot more for consistency in this context, right? Because they want to be
acid compliant. So Martin, does this mean that SQL has had its day, as an its debt?
I think as David Toy, you describe the use cases, the Netflix use case, for example, is
if you are connected to your local Netflix instance, wherever that may be in Europe or the US,
you don't really care. If your US instance also has the same data, unless the Europe instance
goes down and you get redirected to that. So this is a slightly different scenario from a,
there's always the financial application, right, where consistency is more important.
So there are, you can solve the first problem with lots of different technologies,
depending on which consistency model you choose to use and which replication model.
So yeah, it's not, yes, there are more, I think, but you're kind of aiming at one. Trying to get
to is that the scalability of a NoSQL, but not a NoSQL, yeah, it's called NoSQL, is easier because of
the way the data can be partitioned. But maybe Martin, do you really think that after the
imminent zombie apocalypse, credit card transactions will matter anymore?
What matters then? I mean, Netflix, of course, Netflix, how do you pay Netflix? How do you pay for Netflix?
Okay, good, you're here first, Netflix will keep running after zombie apocalypse.
Maybe, maybe, I mean, we did have a little fight here, right? So initially, and maybe for no reason.
It's a discussion comes to my table. It was anyway fun, but let's maybe state that it is not a
direct competition anymore anyway, right? Because what was already mentioned by Crystal at the very
beginning is our microservices. And if you talk about microservices and having your application as
a distributed system by itself, right? You anyway need to kind of take some trade-offs into
account regarding scalability versus consistency and so on, right? Or availability of
taking into account is where for your entire system, right? And the data store is one puzzle piece
there, an important one, right? And a common pattern in microservices is to say, okay, the service,
each service uses or each microservice uses its own data store, right? And the second idea is
if each service uses its own data store, right? Then it should use the data store, which is the best
fit for the service, right? And sometimes the best fit is a relation data-based system. And maybe
this is true for something like financial transactions, because relation data-based systems are
good in that, right? So they have something like money data types, or for instance, or by not
dealing with something like floating point numbers with specific positions, but exact values,
they have asset compliance transactions, or completely are reliable, and so on, right? So meaning
for something like a money transaction to user relation data-based system, maybe not the
worst idea, right? So, but in your distributed system, which is, let's say, having a bunch of
services in order to offer you the application, there are other services with different kinds of
requirements, right? Another service might have much, much higher scalability requirements, right?
What is the benefit of having a service which can scale tremendously, right? But the data store
kind of scale together with it, right? Because it becomes a bottleneck because it needs to sit on
a single machine, or there is across a few machines, or a lot of overhead to do the transaction
management, and so on, right? So you get into trouble regarding the scalability criteria,
which is, in this case, maybe more than just a non-functional requirement. So what you need to do
is you need the data store, use the data store, which makes most sense for that, and for instance,
if you manage sessions, you would end up with a key value store. Maybe if you have a product
and a look, you would maybe end up with something like a document database system. If you do network
analyzes, right? So finding the flows, or show these parts, or dealing with social networks,
or whatever, maybe a graph database is the best choice, right? Because it allows you are efficient
traverses with our scanning indexes, right? Which is not guaranteed by relation database systems,
right? So meaning our idea is, choose the right data store for the right kind of service based
on your requirements, and in this context, it's not actually a competition, right? They are
living kind of as creatures together in the same kind of microservice architecture, right?
Reminds me of the land best dinosaurs, and the thing, kind of, we are now looking at
called birds. One of them survived one didn't, so I'm joking. Martin, anything you want to plug
in a NoSQL world? By the way, are we talking about new SQL as well, or just new SQL and
relation, right? Just asking, let's say, right? David, David, anything goes, just go for it.
And now I entirely agree, there are better fits for certain technologies, certain
technologies come with better functionality, like a graph database, as David mentioned, for example,
for document databases, again, these are the application type databases we haven't even
discussed the old app databases at all so far. Martin, you just go for it.
Yeah, so I think what we've concluded so far is that in general, NoSQL database are a bit
scalable. They are, in case of key value store, lower latency because of the reduced amount of
transactions they need to do, whereas your relational databases are on multi-purpose and have,
therefore, some overhead in various places, and some benefits in terms of acid, etc.
But yeah, if we talk about, so these are all, let's call it transactional databases,
right, when you're talking about application, Netflix is a planet skill by the body.
People also use databases for analytical processes and reporting and all sorts of fun stuff,
right now. And yeah, in that case, the, I would say, the majority of all the tools out there
are still very much. SQL based, yeah, relational, the, you know, the, the IT, yeah, that's it.
A long story short, as long as their mainframe is round, SQL has a place. Is that what I'm hearing?
Well, nobody uses the mainframe. No, in a few minutes. The mainframes are clearly transactional,
right? This is, so you could argue that they're in the, yeah. Okay, if you talk about
anything, I think monolithic in, in, in, in build, right there, one single great bit,
computer, scalability, doesn't work apart from making it bigger. So it's on that point of view,
they're in the same campus, you're most of the relational database. Martin has a point regarding
that and because, right, if you look at an analytical tools, or right now, and they're also more
traditional, let's say, right, there's a transitioning happening from, let's say, something like,
that, or it's not actually any, any more true, right? It did swing a bit back. But anyway,
so if you, if you think about analytical stuff, right, the tools you're using in order to
access this are in the past, or the 2000s or whatever, they were called business intelligence tools,
they were reporting tools, and everything there was talking SQL, right? Because SQL is,
is perfect for doing some expressive queries on your data, right? So a group by aggregations,
blah, blah, right? And there was actually even, or this, a trend to do the same on, let's say,
more modern or analytical systems, but this is also not fair, right? Because they're,
they, it's a kind of coexistence. But let's say, or let's take something like a doob for instance,
right? A doob initially was, okay, fine, we need to do some long running analytics, and we,
we just killed the problem with iron by having our tons of compute nodes, right? Distributing,
or distributing the workload by using a pattern called MapReduce, right? So mapping,
basically filtering, extracting, or mapping data to, to other data, and then basically reduce,
applying the aggregation in the next step, and then maybe re-reduce across nodes, and so on,
right? So, so meaning, meaning this kind of was a pattern used for while to do analytics,
right? Which was beyond SQL. But as far as I remember, the trend was actually like,
now we have a doob, be materialize, or analytical results back into the had to distribute
filesystem, right? Or we store it somewhere else, but in order to express, have expressive queries
on the data, it would be cool to have something similar to SQL, right? And then are there,
there were basically frameworks rising that translated SQL queries into a doob, so we could do
the job. Yeah, they just did the storage plan. I do distribute filesystems, the storage,
I do doob itself, or basically also the compute layer on the table, right? But yeah,
it's just done the transactions, bypass the whole computer layer, they're dispensable,
just yeah, or even that, right? Or they were basically then also query engines on top of
that, or hive, for instance, I think hive translated into, are into basically,
in MapReduce jobs that we're running, right? But then are patching the data again, spark has
something similar, right? So as a kind of maybe successful redo bit in a specific context,
right? They also had a query language in order to be able to do more dynamic querying on the
data and so on. Yeah, but there were things like like Apache Hawk and Impala that would just
bypass all MapReduce processes, or just use the HDFS file system, or something like that,
right? Anyway, so the thing is, the thing is SQL is useful, right? It's not completely useless,
and it sometimes makes sense to express stuff as SQL and even no SQL database systems as mentioned
before, have SQL-like languages. Central has a CQL, I think, which looks like SQL with our
joins, or couch-based, or as a documentation-based system has a nickel, which is like SQL for Jason,
from which has joins, and it's also allowing you some some coup dyes and so on. Redis is having as
as a module redis search, right? Where you can basically define indexes, it supports automatic
indexing, could the query language just look exactly like SQL, but it looks also like an expressive
or query language a little bit more like you've seen, let's say, right? So meaning a lot of no
SQL database systems actually have some query languages in addition as well, right, for being
able to do some analytical stuff, but Martin is right that, let's say, the more traditional RBI
stuff is using our, yeah, SQL, in a sense, right, to access it. There are some abstraction
there. So you could, for instance, use Presto as a project by Facebook, right? Which is a kind of
distributed query engine, which you can put on any kind of data store in order to to
bridge this gap of SQL for analytical purposes, for instance, right? By referring to whatever
data store you have a connector for, right? And query the data out of it or the SQL way, just as
an example, right? Yeah. And there is indeed another aspect that this traditional approach,
let's say, again, it's a core existence, is also kind of, yeah, the other approach is well,
I do post one, which I mentioned before, but there's also now this this well field of, let's say,
our artificial intelligence machine learning, right? But it's about processing data in real time,
building models, right? And you can say this is also analytics, right? And the toolset is
completely different than the iteration, the I toolset, right? So you're using something like
Jupyter notebooks in order to get insights to your data instead of doing some reports in whatever
tableau or whatever you used with SQL in the past, right? So it's not, I'd say the statement
is in general true, but not entirely true, because there are a lot of facets regarding it.
Now that our opponents have completely switched sides apparently,
maybe? No, no, it's just if they are, it's just if they are in balance view from both of us.
It's just a fair statement. And by the way, I would not say that the one is positive
with all the other, right? So it's kind of, it's kind of, so that it's kind of common nowadays,
to not just say, hey, I have my analytics system there and I have my real time database system
there. It's actually often a combination of everything, right? So meaning what you have is,
you do your speed processing or real time processing and you do your analytical processing and
applications often basically incorporate both aspects that they, right? So they combine the
the analytical aspects with the real time data or they basically even do predictive analytics
are based on models that were derived from, let's say, deeper analytics or current real time
data flying in, right? So it's not like, yeah, there's this analytics world and there's the
operational world. Actually, modern systems are leveraging everything to a specific degree, right?
Yeah, as we are rapidly approaching our mark with regards to the links of podcasts,
maybe it's our time, sorry, I'm off course, only joking. Maybe now is the time for some final
thoughts on the subject? Really, I mean, already done, maybe I talk too much. No, no, no,
not at all David, not at all. No, we should keep the the integration of the podcast to I think for
hours, given the fact that we're not stretching to five. We can touch it, it'll be on the future,
right? Yeah, the future, exactly. I'm a main famous fool, I hate to say. So is IMS, DB2 alongside,
yeah, knowledgeable databases have their place and thank you for listening. Joke, go ahead.
Martin, go ahead. Should I go first? All right. So I mean, anybody following the database
world, we'll see that there are constantly people innovating. Mainly, well, I would say both in
the analytical side and the planned skills side, as you call it, but there are innovations in,
let's say, the way they are architected happening, there are innovations happening in the way they
use hardware or scalability. And I think, yeah, so in short, it's a constantly evolving
scenario, but as you can see, the relational database hit stay, no SQL database, so hit stay,
the graph databases. Yes, they have a very good piece of functionality and use, but then I wouldn't
call them mainstream as such or maybe doing them some injustice. Martin, if it's any consolation,
Richard has just deposited an unjust close amount of Bitcoins in our account, so you're free,
you're free to pluck your empire if you want to. Yeah, so we are in the analytical space, so that's
clearly our domain. And we use GPUs to accelerate database. Well, so let's clear hardware
play or play, that's going to be making use of hardware. Innovations, right? So as you mentioned,
content technology comes along, I'm sure someone will come up with a content technology database
after that. Yes, the address the sponsor at Linus in lost at the U, and we do take Bitcoins. Yes.
David, go ahead. I'm not sure. Maybe I should mention that the crumb you would call those
are taking Bitcoins. Yeah, I mean, GPU stuff and so on is actually a good point or sounds like
something which is really useful in the future. But yeah, I personally think that there is a
convergence happening in the market. So and what I have seen over the last years is that
it's a relation database vendor desperately tried to add additional functionality to kind of
or keep up with the no-sql vendors because they are disrupted by the no-sql vendors.
But it's not exactly the same, right? As mentioned before, you don't just turn a
relation database system into graph database system because you offer a graph API in the same way,
you don't turn it into a document database system by just allowing to do some SQL which involves
some JSON path queries, right? That's not how it works, right? Because at the end, it's all about
efficient access for specific purposes and so on. And I think the convergence is not happening
just from the side of the relational or let's say side, but also from the no-sql sites. Let's say,
right? No-sql vendors are adding additional functionality as well, right? For instance, new
consistency models are addressing something like asset transaction compliance or better than before,
right? But the difference is that the approach of a relation database system is more I'm coming
from a system which is promising you to be general purpose by kind of having all this overhead and
this huge package I'm carrying with me whereby on the other hand side, the idea was I'm coming
basically with a much lighter package and I'm now basically allowing others in a controlled way
to add additional functionality, right? And this additional functionality is then optional and not
not mandatory in a lot of cases. A good example, for instance, is Redis. In Redis, there are modules,
right? There's a module which is adding querying indexing protect search. There's a module which
is adding a craft engine and what's the site for as a craft query language. There is a module which
is adding time series functionality, right? And you're not enforced to actually use it in any way
or even to carry it around, right? You can decide to deploy this module or not to deploy this module
by gaining specific functionality or not gaining specific functionality. And at some point,
there will also be the I think Redis raft was already announced during Rediscon, right?
There will also be the possibility to do or let's say our CP system like stuff and maybe at some
point even are different kinds of transaction management and so on, right? So meaning there is a
there is a kind of run happening between the no sequence systems that are kind of becoming more
more flexible by by by aiming to to fulfill more and more purposes without being general purpose
systems whereby you have on the other side this this relational database system, which is kind of
saying, hey, I'm a general purpose system, right? And I'm just off promising whatever you need by
by actually carrying a lot of heavy bait around well for no reason, right? Yes, yes and no.
Some databases have that carrying around principle, but database like Postgres have a similar module
approach like the pluggable column stores and multiple. Yeah, I get it, but the API, right? So
the difference is that with the Redis module, you could basically a different API for dealing with
time serieses or then are doing search or whatever, right? The module comes to the data structure
and let's say an API, which gives you efficient access to to this data structure whereby in a
relational database system, and I would say this is the case in Postgres, I didn't double check this,
I would say whatever you plug in is on a is on a level, which involves any way some of the
some of going through the stack, right? So it will have more overhead than just basically going
to the low level API, which is exposed by Redis. Yeah, correct. There are plugins that is not
complete in new pieces of code, as you say. But yeah, so from the point of view, but you're
right, diverse converging in both directions. And my bet is my bet is that our NoSQL will long term
win this, right? That's just my bet and it's really just a guess. But my opinion and maybe we
can close it with this or my bet is NoSQL will win this convergence battle and I guess your bet
is that SQL will still be around in 50 years or whatever. Definitely.
I mean, how do you think of the type of use cases? I mean, not everybody needs to plan
scale database. So therefore, a multipurpose database is far more suited so they don't have to
learn 10 different technologies, for example. I mean, on this, on this conversion thing, the
last statement is actually something called ToroDB, right? That takes the Mongol wire protocol
and maps it onto a Postgres engine, which is probably the best example for you for your proof
and point. The deed is of course, we'll be in the show notes. ToroDB, if you're listening,
if you're looking for a sponsor slot contact us at funder at the same list.
I just saved this man. So, David, we're doing, don't forget to listen to grump, you
encoder. Yes. David, our email address is sponsored.
That has been more than a wonderful two hours, maybe three. Any closing remarks for the next
60 minutes before we close this podcast off. Yeah, I'm shutting up now, right? Otherwise,
there's nothing to say anymore. I just want to say thank you for inviting me,
right? And listen to grump, you encoders. Thank you, David. Thank you for your time,
David. And hopefully, having you, having you both back at some stage.
Martin, any final remarks from the from the from the dark side in terms of the old dinosaur world
of sequel? Well, I mean, I'm talking about all dinosaurs. You keep mentioning the main
it should probably not mention dinosaurs here. No, it's it's it's it's it was good discussion
and I think David and I agree on on many things on this. Well, I mean, people don't get confused.
I mean, if you just take a look at the at the committee history of some of the postgres,
which is probably one of the most popular sequel databases, it's still up and running in terms
of it is still alive. Yeah, indeed, still, right? I'm not giving you a point. Yeah, it can
sure be more popular than better. And I want to apologize to all my friends at English, by the way,
right? You have been to English? Yeah, yeah. Yeah, but maybe one last thing, right? One last thing,
right? So actually, there's a new and I'm not kidding. I mean, it's a promotion, but anyway,
I'm doing it to cut it out, if you want. But there's a new episode of Krumpik or Kodos coming up
and the only reason I'm mentioning is this is because Crystal reminded me when he said the dark
side, because the episode title is this time, the dark side. And this is about let's say,
or maybe this scares people the way, but the topic is this time a little bit less technical.
Topic is actually a serious one. The topic is more about, yeah, let's say, mental health issues
or our struggles and challenges or in the context of the dark side, right? In the IT and
software development business, right? So not a funny topic anyway, or I had the feeling maybe
this is something which can be discussed in the podcast together with Thomas. So if you want
to hear our opinions, Thomas or the other Krumpik or Kodos in mind, right? Then yeah, stay tuned
to be with publishers soon. Absolutely. David, thank you so much for participating.
And it has been a pleasure as usual. And as I said, we'll get in touch one way or the other,
and I'm really looking forward to having you on the show with, together with Thomas at some stage.
Yeah, sure, you're welcome. Thank you for having me. Bye. Bye. This is the Linux in-laws.
You come for the knowledge, but stay for the madness. Thank you for listening.
This podcast is licensed under the latest version of the Creative Commons license. Tap
Attribution Share Like. Credits for the entry music go to bluesy roosters, for the song
Salute Margot, to twin flames, for their peace call the flow, used for the second intros,
and finally to the lesser ground for their songs we just is, used by the dark side.
You find these and other ditties licensed under Creative Commons at Germando. The website
dedicated to liberate the music industry from choking corporate legislation and other crap concepts.
Any questions before we start the podcast?
Now, what is it about just database systems? I guess for you have been,
yeah, you have been volunteering, David, to represent the no sequels out of the house,
whereas Martin, given his age, is actually representing the sequel part and I'm just doing
the moderation. That's all. Okay, fine. No, no, no, it's not a battle, very important,
but just a discussion. Yes, yes, you see, David, this is something that we do want to avoid.
It that makes sense. Oh, Martin, oh, for God, I forgot there's a grand rule,
but you cannot mention Postgres. Okay, don't mention Venice or a couch,
but it's already, the show is not as explicit, so you can't swear if you want to.
Martin, if this whole idea thing doesn't work out, just take a couple of
view to me courses and you'll find as a politician.
I don't need a course for that. I'm talking about the fine tuning,
no, I'm not talking about the basics.
You've been listening to Hecker Public Radio at Hecker Public Radio.
Today's show was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing
to find out how easy it really is. Hosting for HBR is kindly provided by
an honesthost.com, the internet archive and our sync.net.
Unless otherwise stated, today's show is released under Creative Commons,
Attribution, Share Like, Feeders, O-License.