- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
545 lines
49 KiB
Plaintext
545 lines
49 KiB
Plaintext
Episode: 1089
|
|
Title: HPR1089: Max Mether of SkySQL talks about MariaDB
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1089/hpr1089.mp3
|
|
Transcribed: 2025-10-17 18:47:16
|
|
|
|
---
|
|
|
|
That's it.
|
|
This is Max Meather from SkySQL.
|
|
Thank you all right so I'm here to talk about Maria DB.
|
|
Before we start who knows what Maria DB is.
|
|
All right who knows what Maya is guys.
|
|
Okay that's good that's good.
|
|
So I wanted to do this I wanted to make a non-technical talk.
|
|
Unfortunately I wasn't able to so it's going to be quite technical so bear with me.
|
|
Yeah so I was asked to do this talk about Maria DB so my first question was okay so
|
|
what is Maria DB so I thought actually it's fairly simple it takes something old and you
|
|
replace it with something new like this and that's a fairly simple story and it looks good
|
|
the problem is that it's not really true so Maria DB is not really a replacement of Maya
|
|
it's actually a combination right so Maria DB is a branch of Maya SQL so 98% of everything
|
|
you find in Maria DB is actually Maya SQL and if anyone who these girls are on weather here
|
|
they're actually the founders girls he had this idea of naming his products after his daughters
|
|
so that's actually why Maya SQL is called Maya SQL because his first daughter is called
|
|
Mew and no one ever says Mew SQL but the Swedish correct pronunciation would be Mew SQL
|
|
and his next daughter is called Maria and that's why we have Maria DB
|
|
ask the product all right so what is Maria DB to branch of Maya SQL it's completed backward compatible
|
|
drop-in replacement if you're running Maya SQL right now you can stop it put Maria DB binary
|
|
restart it and it will work out of the box so it's a drop-in replacement and as I said
|
|
well around 96% of the code is the same so it's very very very close and basically this talk
|
|
I'm going to talk about the differences so what does Maria DB add to Maya SQL don't
|
|
next okay so Maria DB is complete fully open source all the development is done on launchpad so
|
|
anyone can download the source code try adding stuff to the source code it's all open but of course
|
|
the whole Maya SQL source code has been developed for quite a while so there's a lot of code there
|
|
it's all on the GPL so well anyone could add stuff or remove stuff as long as he's taking
|
|
the GPL everything is open source and well Maria DB is committed to staying open source
|
|
there are no closed source modules as opposed to the Vanilla Maya SQL branch where there are now
|
|
some closed source modules and another big thing the bugs database is open Maya SQL traditionally
|
|
had an open bugs database so everyone ever submitting a bug it was put into this open database
|
|
everyone could see the bugs you could check if you're running to a bug you check is it already
|
|
submitted and so forth unfortunately that change recently Oracle who not want Maya SQL
|
|
they decided not to have all the bugs out in the open so customers when they file bugs they
|
|
fight it to different system the old bugs database still exists however it's not always maintained
|
|
a lot of bugs are open since five six years nothing has happened five six years so no one knows
|
|
if they're actually doing stuff on them or not and furthermore when they do releases in Maya SQL
|
|
now they refer to bug fixes but the bug fixes are from the internal closed bugs database so you
|
|
don't actually know what they're talking about so you don't say fix internal bugs 743 and no one
|
|
knows what this bug means and this is actually why there now are certain distributions who are
|
|
thinking about switching from Maya SQL to Maria Libby because they don't they don't know what
|
|
goes on in Maya SQL all right so let's have a brief history here so Maya SQL was developed by one
|
|
guy in the beginning called Montevidanius the first version of Maya SQL well you can't really
|
|
call it Maya SQL but the first thing he developed that would eventually become Maya SQL was done in
|
|
83 it was a database it was developed in basic if anyone still remembers basic in 83 it's all
|
|
long time ago it eventually got poor it to see and got a lot of changes the biggest one being in
|
|
in the beginning of the 90s so the database which was then called net base or union reg and stuff
|
|
it only had an isam interface right there was no SQL you couldn't use SQL you had to use an isam
|
|
interface and at the time the company developing the isam interface said well we're actually going to
|
|
start developing this so your database won't be one who work anymore so they decided to all right
|
|
let's get an SQL interface to the database so that people can access it and at the same time the
|
|
guys Monti who had created it and David Oxmark one of his co-workers they had the idea of why don't
|
|
we just we create the SQL interface and we make it open source for them it's a big decision
|
|
because they had like three four customers who were paying them a lot of money and they were
|
|
scalable if you make this open source and free you know a lot of potential customers might just
|
|
use it and not pay us anything but because they had been using a lot of new tools before so they
|
|
felt that they've they've had a lot of benefit from open source they wanted to make sure that they
|
|
gave something back so they decided to release mySQL as open source and this was done in in
|
|
95 at the same time they created a new company called mySQL AB at the time it had two employees
|
|
those two guys mySQL AB slowly grew as you know mySQL happened to be released in a very fortunate
|
|
time in 95 about the same time the internet was exploding so lamp became a
|
|
became something very known and very used everyone had their own lamp lamp a server running
|
|
with Linux Apache mySQL and PHP RPR and the mySQL usage because it was free because it's open source
|
|
my mySQL usage exploded the company didn't follow it's very slowly grew because they were only
|
|
selling selling support but eventually they had some more business oriented people come about
|
|
monthly being a great developer developer he perhaps wasn't the best CEO of commercial
|
|
company so they got another CEO more to me because and mySQL started growing quite rapidly
|
|
and of course in 2008 mySQL was acquired by Sun for one billion which is still the most
|
|
profitable open source acquisition in history so if anyone ever says that open source is not
|
|
profitable here's the good counter example they paid one billion for mySQL two years later or
|
|
paid four billion for Sun so you could argue that mySQL was one fourth of Sun which is of course not
|
|
true which you could argue that because it was two years later anyway the acquisition of Sun
|
|
or the acquisition of mySQL by Sun changed a lot of things mySQL had always been a small company
|
|
two people at the beginning but even at the acquisition point there was less than 500 we were a small
|
|
company there were 50 developers so small company Sun had around 40,000 employees so quite a different
|
|
structure a lot of processes and stuff that you have in big companies so a lot of people decided
|
|
that well this is not a place for me so they left and the company called Monty Program was founded
|
|
in late 2008 they was founded by the same guy who had created mySQL and he took a lot of core
|
|
developers with him and the first purpose of the company was to create add-ons to mySQL just make
|
|
sure that there's an open source version of mySQL out there then of course Oracle acquired Sun
|
|
and with Sun mySQL and the purpose of Monty Program changed a bit now they saw a real threat to mySQL
|
|
and the open source needs of mySQL so they said well we have to make sure that there's always an
|
|
open source version so they created the branch MariaDB and the first version of MariaDB was launched
|
|
in February 2010 and well the latest release was launched a few weeks ago so who's behind MariaDB
|
|
well Monty Program so the company created by the founder of mySQL and the creator of mySQL so
|
|
guys who know mySQL really well he took the guys he wanted with him so there's around 30 developers
|
|
and you can say that they are core developers there was quite a long time in mySQL where
|
|
where if the founder went on vacation pretty much everything stopped because he was the only one who
|
|
knew everything in mySQL later we hired two or three people who also knew everything so he could
|
|
go on vacation and there was there was some other guys who could take over well all of these two
|
|
three people are now in MariaDB but it's not only Monty Program there's also a community behind MariaDB
|
|
the goal of MariaDB is to provide a community that developed stable and free database and in order to
|
|
do this MariaDB is very open to the community it takes contributions it's developed on launchpad
|
|
and many of MariaDB has captains that basically
|
|
Stuart the development of MariaDB is to look at the patches and stuff like that has to approve
|
|
everything and off the MariaDB captains only 64% are in Monty Program the rest are outside community
|
|
people so it's in a large part developed by the community MariaDB they also have an open
|
|
development model meaning that when they have new features and stuff they have an open workloads
|
|
everyone can look at them anyone can suggest enhancements
|
|
Monty Program is a distributed company people work from home all over the place they have meetings
|
|
twice a year they have like company meetings and in the company meetings 85% of all their sessions
|
|
are open to the public so anyone can actually join them customers do join them Facebook being a
|
|
good big MySQL user and customer every time Monty Program has a company meeting Facebook is there
|
|
they have at least two three guys just sitting on the sessions and make sure that their voices
|
|
heard as well in that future development but anyone could join last meeting they had was in Athens
|
|
almost that few months ago and there was there were like three or four companies some people
|
|
there just to make sure that their voices heard in future development so it's very open everything
|
|
they do all right let's take a look at the differences between MariaDB and MySQL
|
|
so first of all the first version of MariaDB was 5.1 and this was based on MySQL 5.1
|
|
that's easy enough to follow and basically the version numbering follows exactly MySQL 5.1
|
|
so you had MySQL 5.1 30 you would have MariaDB 5.1 30 which was based on MySQL 5.1 30
|
|
very simple however they did add quite a few features
|
|
who knows what a storage engine is one guy okay excellent so MySQL has this concept of storage
|
|
engines which means that for storing the data so in a database you could really simplify
|
|
you could say you have three main pieces you have a parser parser the queries you have an optimizer
|
|
and you have a storage engine which stores the data and retrieves the data and in MySQL that's
|
|
storage engine part is modular so you can choose how you want to store the data which also has
|
|
lot of implications like indexing types data types all of that it is stored on this commemorative
|
|
and so forth so there's an API between the optimizer and the storage engine and all you have to
|
|
do is fulfill these functions in the API and you can create your own storage engine and many have
|
|
in MySQL the MySQL distribution there's like five six different storage engines but there's a lot
|
|
of open storage storage engines and so forth so anyone can create their own storage engine
|
|
so the first thing they did that MariaDB was to include third-party storage engines
|
|
again this is something that Oracle doesn't want to do mainly because Oracle does dual licensing
|
|
so MySQL is GPL but they also sell a commercial license for people who want to embed MySQL
|
|
in their software who don't want to get bound by the GPL and there's a lot of fun about this stuff but
|
|
basically they have two two types of licenses and of course they can't put GPL stuff in the commercial
|
|
license so any open source GPL storage engine Oracle can't touch because they can't sell commercial
|
|
licenses so MariaDB introduced a few storage engines PBXC a transactional one which actually the main
|
|
goodness with this it has really fast crash recovery so if you crash a lot this is a good storage
|
|
engine X3DB who has heard of X3DB no one all right who has heard of innerDB
|
|
okay quite a few so innerDB is the main storage in MySQL it's a transactional storage engine
|
|
similar to traditional databases like Oracle and Cybase and and so forth and X3DB is actually
|
|
very close to innerDB it's a branch of innerDB so it's the innerDB storage engine with a few
|
|
enhancements and again there's a very complicated story to why this this this was about innerDB
|
|
traditionally was created for well it was created in the in the 90s so it was back then there weren't
|
|
that many multi core multi CPU machines and it turns out that it didn't perform that well on
|
|
multi core multi CPU machines and when people started having multi core machines innerDB started
|
|
running into all of mutics and stuff which didn't scale well but a lot of these mutics were quite
|
|
easy to overcome and and Google who was using MySQL heavily at the time they figured it out they made
|
|
some patches that fixed this and they released those patches problem being again MySQL couldn't
|
|
take in the patches because they were released on the GPL and we were selling commercial licenses so
|
|
we had to record the whole thing at the time out of the MySQL so we couldn't take the Google patches
|
|
because they were GPL so another company called Percona took the Google patches and added them to
|
|
innerDB and created a patched version of innerDB called xrdb and that's basically xrdb's eventually
|
|
MySQL added these the same features not of course the same code because they can't they added the
|
|
same features into innerDB's and now innerDB and xrdb are very close but at the time xrdb was
|
|
scaled a lot better in innerDB anyways that's a storage engine and there are some other storage
|
|
engines oh yeah who knows what MySQL is two guys MySQL is the traditional kind of the old school
|
|
storage engine in MySQL it's very good for storing data in an efficient way however it's not
|
|
transaction it's not scratch safe it's pretty much bad in many senses except that data is very
|
|
compact on desk and back in the days this was the only storage in MySQL it used to be the default
|
|
so a lot of customers actually used MySQL without knowing it because they didn't know that you could
|
|
choose so they were saying well we use transactions and we crash that we're like actually you're not
|
|
if your system crashes during a ride while your tables will be corrupt and they were like really
|
|
and that's still the case with MySQL and RIA is basically an improved version of MySQL so it has
|
|
the same storage however it's crash safe it has a small log so if the server crash is doing
|
|
your operation it can actually roll back that over a operation myasm can't if you're doing an
|
|
update changing 100 rows system crashes well your table is somewhere in between and that's where
|
|
the table is there's no way of falling back with RIA you can roll back then there it's in bug fixes
|
|
one of the things was that MySQL 5 when you wanted to upgrade from MySQL 5-0 to MySQL 5-1 the MySQL
|
|
development team actually forgot that people might want to upgrade so you weren't able to upgrade
|
|
without taking a total total dump of your data and reinserting all the data and this was actually
|
|
just due to a simple bug and MariaDB fixed it so going from MySQL 5-0 to MariaDB 5-1 you could do it
|
|
without taking it back up but MySQL 5-0 to MySQL 5-1 you couldn't and that was a big
|
|
seller to many customers back in the days please then they added microsecond support can anyone
|
|
see this red stuff no I don't know where to turn on the lights basically MySQL traditionally has
|
|
only second granularity on everything so something takes zero seconds one second or two seconds
|
|
and well you don't get lower than that in MariaDB you get to microsecond level so you can see
|
|
you can see if your query took I don't know 0.5 seconds or 0.51 seconds instead of having just
|
|
a zero there or one so that's something that added in MariaDB and this goes both for the
|
|
slow query log and the process list so you can see a statement how long has it been running
|
|
in microseconds and not only in seconds then they added a thread pool a very basic thread pool
|
|
which basically traditionally MySQL every thread every connection to the every connection to the
|
|
database is a thread so if you have a hundred connections you have a hundred threads plus a few
|
|
extra threads and this can be great but it can also be not so great for example if you have a lot
|
|
of small connections you have a lot of connections doing small transactions this means you end up
|
|
have a lot of a lot of context switches between between a different threads because well every
|
|
connection is a thread and if they do stuff all the time they have to switch between the threads
|
|
all the time so for these types of scenarios a thread pool makes sense you get less context
|
|
switching because you queue the stuff you queue all the operations into a pool of threads
|
|
another thing another thing that Monte program did was MySQL had an optimizer team that was
|
|
basically working on the MySQL optimizer it had four people in it or five at the end so five
|
|
people working on the MySQL optimizer well Monte program hired five of these five which left
|
|
how many in Oracle zero so all the guys working on the optimizer moved out to Monte program so
|
|
they added features to the optimizer one of the first was this table elimination feature
|
|
which is to be honest it's not that great a feature it's mainly mainly
|
|
useful if you use a machine created SQL statement so if you have some kind of middle layer that
|
|
creates the SQL for you that's when you might run into this so basically if you have this kind
|
|
of anchor schema which basically means that you you have attribute as a table so if you instead
|
|
of having a table with like a person and you have as a column you have it's his ID his name is
|
|
phone number and stuff you have each attribute will be in its own table so the first date will be
|
|
one table the whatever name will be one table here we have actors and the rating of the actor
|
|
and another table the idea with this is that it's very very modular so you don't you don't do an
|
|
alter table to add columns you just add more tables to add columns so it becomes very modular all right
|
|
if you have this type of schema normally you would create a view because you don't want to have like
|
|
nine table joins every time you want to see some things so you could have viewed at
|
|
for it to access all this which means to have a view with all the nine tables and then when you do
|
|
a simple query on this view it will have to join in all the nine tables every single time and that's
|
|
what happens in standard MySQL every time you use this view even if you're looking at just one
|
|
column of one of the tables it will join all nine tables and that's where this is where table elimination
|
|
elimination comes into the picture it actually eliminates the table the tables the optimizer
|
|
eliminates the tables from the join algorithm again you are not likely to do is by yourself but if
|
|
you use machine-created code they can actually create these types of schemas for you and you don't
|
|
even know it yes that was MariaDB 5.1 because Oracle was very slow at coming with new versions of
|
|
MySQL and MariaDB well you know they were developers and they were developing new features so that is
|
|
decided to create a new version of MariaDB 5.2 which is built on MariaDB 5.1 which is built on
|
|
MySQL 5.1 so it's still based on MySQL 5.1 MariaDB 5.2 and here they added a few features one is
|
|
plugable plugable authentication so traditionally MySQL there's one authentication protocol and that's
|
|
built in and well that's what you have to use however in MariaDB 5.2 you can choose a different
|
|
authentication protocol you can plug it in so to say there's already a plan plan plug-in included
|
|
but you can use any type of you can create your own plugin for this for the authentication so you
|
|
don't have to use the MySQL built-in authentication anymore that's one feature another very useful feature
|
|
is user statistics in MySQL there's a lot of statistics but they're all on the server level so you
|
|
see what's going on the server but you have no clue on what's going on for an individual client
|
|
individual user or index or tables so this is something that MariaDB 5.2 so you get statistics per
|
|
IP so you see okay all connections from this IP address what have they done how much memory
|
|
have they use how many queries and everything same with a specific user account what's this user
|
|
account done done and then the last ones are index so for every single index you see how many times
|
|
this index been read and so forth and same with the tables this before MariaDB 5.2 this doesn't
|
|
exist so if you use MySQL 5.1 you don't have statistics on this level you have server for the whole server
|
|
then there's something called things does anyone know what's things is
|
|
okay things is an open source it's a full text full text search engine it's very much used
|
|
for the web MySQL has some has full text indexing on my island tables but the full text indexing
|
|
indexes are not very good I mean they're better than no full text indexing but if you're talking about
|
|
high-performance websites it's not good enough so instead of using the Maya some built-in full
|
|
text indexing you can use Sphinx which is a program where you have full text indexing so you put
|
|
data in it and it creates indexes on text based on articles or whatnot and normally you run Sphinx
|
|
as a separate process and you query it through your code so you send a request through Sphinx saying
|
|
okay I want to look for this and which articles and it gives you back the key values for the
|
|
articles and match your search. While you MariaDB you can do the same except that you can do it
|
|
from within the MySQL server so you can access the Sphinx engine from within the MySQL servers you don't
|
|
have to use the native C or whatever calls to Sphinx you can use it through the MySQL engine which
|
|
means that it's easier to combine the results of Sphinx and MySQL. Another thing is the segment
|
|
ison key cache which basically allows better concurrency. My ison tables have a joint key cache for
|
|
every single index of every table is stored which doesn't work too well when you have a lot of
|
|
concurrent users because they all have to use the same key cache so you run into concurrency issues
|
|
and by segmenting it up you can have a lot better concurrency. I have a benchmark image that
|
|
Jamie will show you to give you a bit of an idea so again if you run with here's one benchmark
|
|
and if you see the number of threads growing you see the benefit of running segmental key cache so
|
|
this is basically here you have the threads here's how many queries you can do per second if you
|
|
don't have a segmental key cache you basically max out around 5,000 queries per second with my ison
|
|
if you have a segmental key cache you grab to 20,000 and this well that depends on what type of
|
|
benchmark you run but this is about the worst performance game in the benchmarks
|
|
and that was MariaDB 5.2 but the big release for MariaDB is MariaDB 5.3 which
|
|
went GA in December so not so long ago and this has a lot of new features again it's built on
|
|
MariaDB 5.2 so it's built on MariaDB 5.1 which is built on MySQL 5.1 so we're still on MySQL 5.1
|
|
it has quite a few features something called handler socket which basically gives you a
|
|
kind of a no SQL or no SQL interface to InnoDB engine so you can directly access the InnoDB
|
|
engine without passing through the SQL layer or the or even the optimizer layer just send
|
|
a simple crude operator so a crude is what is it creates read update delete something like that
|
|
create read update delete so you can do direct operations on the data without passing through
|
|
the optimizer so this is very useful if you do simple kind of batch type of operations
|
|
basic things where you normally wouldn't use a SQL engine you would use some kind of no
|
|
SQL stuff like MongoDB or something so you can kind of get similar stuff with
|
|
InnoDB then there is something called dynamic columns which is again kind of a no SQL feature because
|
|
it does stuff that you're not supposed to do with SQL so basically it allows you to create
|
|
columns with dynamic content so basically you store stuff in a blob and you just have some
|
|
functions for treating what's in the blob so basically this is useful for some type of data warehouse
|
|
or something where you sell products and well depending on what type of product it is it has
|
|
different attributes and you don't want to have different tables because okay you have all
|
|
their phones here you have all your memory in that table you have everything there so you can store
|
|
all your products in this one table and you just store the attributes in this dynamic column
|
|
and I have an example for you so you get the idea of what we're doing here if you can see anything
|
|
well not the red so where is turn off the light
|
|
so the best way to do this is we have a blob column here and then we use a column create function
|
|
and we just give an index which means which attributes is there must be a switch somewhere
|
|
I'll check it out there I have door
|
|
sorry for you in the news
|
|
I guess not
|
|
yeah so there's a column create function and you just basically give
|
|
even index to the attribute you want to have and then you give a value to the attribute
|
|
so here of course as as the guy programming this you have to keep track of okay which
|
|
which index corresponds to which start with attributes so for example one here corresponds to
|
|
the color we have one is blue you one is black you one is black and so forth and we have other
|
|
other things like that and we can store any number of attributes so one product could have one
|
|
attribute another product could have five and the third could have seven and so forth and then
|
|
we can use this column list function to basically get this data back out
|
|
the for example here I just had a list of all the attributes for each product so for example
|
|
the Nokia is a phone it has attributes one and two one being color and two was something else
|
|
that she says to me something else and so forth to get the list of the attributes wage product
|
|
and we can also do searches of these products so for example
|
|
yeah I don't know if you can see that there but we're looking for all products that have
|
|
attribute one as black so basically we're doing a normal search we just want everything
|
|
has black and we get a few phones and we're going to not be thinking about and what not we get all
|
|
product that has black and so forth here we do a different thing we're just looking at
|
|
and what color are these things up well so again it's feature at fairly useful
|
|
for different things
|
|
then we're going to be also added a lot of replication enhancement who here has used
|
|
mySQL replication no one okay then you probably don't care about this
|
|
basically it's yeah if you're using mySQL replication all of well some things are better
|
|
but I'm going to skip this if you don't care there's a user feedback plugin which means that
|
|
we can mySQL can already be can get data sent through so they can log what people are doing
|
|
with the server okay not what type statements but just where they are what version they're using
|
|
and stuff to better get an idea of what's just been used out there and this is my favorite feature
|
|
actually they added a progress report for auto table it's a very simple feature to code
|
|
but or it was a way where's the feature to code but before this if you do an auto table on mySQL
|
|
you do auto table do use and change presenter and then you wait and after 10 minutes you're like
|
|
it's been 10 minutes I mean I guess it's going to be done soon then you wait some more and it's
|
|
been half an hour you're like well I can't stop it now it's going to be done really soon now
|
|
then it doesn't hour and you're like oh it's been an hour I mean if I kill it now then have to
|
|
redo the hour and thank you whoever did that and so forth thanks so it's a very simple feature
|
|
but a lot of DBAs complain about this they do an auto table and then they're like I don't know
|
|
how long it's going to take should I just kill it or should I wait or watch that do now you know
|
|
there's a progress progress bar you get when you do the auto table you actually get a progress bar
|
|
and you can do a show process list in the server you get a progress bar as well so the guy doing
|
|
it gets a progress report and everyone else can get a whatever the DBAs can get a progress report as
|
|
well so you know okay should I kill this or should I just let it run is it close to being done
|
|
or not and this is my favorite feature there's a lot of optimizer enhancements
|
|
um my school added sub queries in version four that one so somewhere around
|
|
2004 or three or something and this is great we could there was a check box do you have sub queries
|
|
we could click it and say yes the bad thing was that they sucked it was a common
|
|
common knowledge that my school has sub queries but please never use them because they're really slow
|
|
but we still had them but so the bad thing with this is that well the good thing is that everyone
|
|
learned that if you want performance you shouldn't use sub queries and in most cases you can do
|
|
work runs you can use joints or you can do other things instead of sub queries the bad thing is
|
|
that there's a lot of again a lot of people using middle layers application layers where where the
|
|
school is created by the middle layer and there they use sub queries and and you can't control it
|
|
because you're using this application layer and you have to use it and they had really bad performance
|
|
because well sub queries weren't performing well well mainly because they hired all the
|
|
all the optimizer team it was easy for them but mainly we fixed it so sub queries work a lot faster
|
|
just basically sub queries are usable now they weren't before
|
|
there's a lot of optimizations there but mainly what they do is if a sub query can be rewritten
|
|
to a join the optimizer will actually do it previously the optimizer didn't even try it was like
|
|
okay a sub query let's assume it's a dependent correlated and do worst case scenario it will be
|
|
always work so the sub queries were always done in the slowest possible way before not anymore
|
|
and some other doing materialization and stuff like that
|
|
traditionally myoscope only had one type of join
|
|
there's a lot of ways you can join tables when you do a join in a database kind of the simplest
|
|
is to one myoscope used it's called nested loop join and it's basically like for loops nested
|
|
for loops you have a for loop for the first table for the second third and so forth and that's
|
|
to do you take the first row first first the second versus third and blah blah and this is great
|
|
in many cases but in some cases it sucks and well if you only have this you only have this and
|
|
if it sucks it sucks and you you can't do anything most database systems other database systems
|
|
commercial database systems have multiple join algorithms and this is something that was now added
|
|
to my gdb i guess the main main difference is like the hash joins is what it has to do with indexing
|
|
and the batch key well batch acts is actually it's easy to understand instead of doing it a for loop
|
|
one row per row you do a batch so you do a batch of the first table and then you do a batch of the
|
|
second table and so forth depending on your memory structures and messaging and this can be a
|
|
lot faster in some cases as an example there's an end there's my school cluster is a product where
|
|
where the data is stored on the network on different nodes and there when you use nested loop joins
|
|
it means that every time you want a row you have to send a network message all right I want this
|
|
row send it back okay then next step then next step so the latency will kill you if you do a nested
|
|
loop join it means that for every single row combination you have to send end messages on the net
|
|
so normally with my school cluster if you have joins with more than two tables you should go for a
|
|
coffee break while you're doing the join well with batch key access you don't have to do this anymore
|
|
because you can actually get in one send you can send hundreds of rows so you have like a batch you say
|
|
uh one kilobiter one 500 kilobytes of rows in one go instead of having to sequentially send
|
|
every single row and wait for the network latency network so that's great and some other
|
|
optimizations I'm not going to go into this however another another thing is that there's not
|
|
optimizer control so you can control which of these optimizations are on and off for your queries
|
|
so you can say well I want to use the old way and I don't want to try these new things and you
|
|
can control it with this optimizer switch and then perhaps the biggest or best
|
|
feature of my marydb53 is something called group commit and in order to explain group commit I
|
|
have to explain a bit about how the server works so basically on the server this is in inodb or
|
|
xodb the transactional engines with something like my eyes and you don't care about group commit because
|
|
nothing is is crash safe anyway so you don't have to worry about currently in crash safe however inodb
|
|
and xodb they're supposed to be acid compliant which means that their transactions are durable
|
|
if the system crashes and you restart everything that's been committed is committed right you can't
|
|
lose anything that's what crash safe means the problem is of course in general disk IO is very slow
|
|
so what you don't want to do is commit every time you do a change you don't want to write this
|
|
change on to the disk because it's going to be very very slow so instead what you do is you have
|
|
a buffer pool all your all your data blocks index blocks and stuff are in this buffer pool you do
|
|
changes to them well instead of well when you do the changes you have something called a log buffer
|
|
where you log these changes as operations not the actual changes when you're done you have to
|
|
you commit your transaction you don't write the data pages to the disk which are large you just
|
|
write this this log to disk so you have a log of the changes where you just say whatever update
|
|
or something and you write that to the disk because that's small so when you commit when you commit
|
|
a transaction you have to sync your redo log to disk to make sure that this change is not written
|
|
somewhere so in case the system crashes I can get this state back somehow and what happens when
|
|
your system restarts is that well it looks at the data on disk and looks at start going through the
|
|
redo log okay which of the which of the transactions in the redo log have been applied on this disk state
|
|
and which haven't so that's basically what it does that's the crash recovery it goes to the redo log
|
|
and applies the changes to the disk all right so when you commit you have to sync you do an F sync
|
|
your redo log is now flushed it sync to disk and you're safe
|
|
there's a few problems with this one is during concurrency you have hundreds of transactions or
|
|
thousands of transactions everything of transactions commits at the same time well you can't do F sync
|
|
at the same time so you do the sequentially in a queue and you F sync one transaction then the
|
|
second the third and so forth again if you have lots of transactions this is not a good thing to
|
|
do so what you try to do instead is you group your transactions together and do a group commit
|
|
so you F sync all of them at the same time instead of doing it in sequence and that's the basic
|
|
idea of group commit and that you do get in the newest version of mySQL you get this however
|
|
what really added is something else mySQL also has something called a binary log who knows
|
|
what a binary log is no one excellent so the redo log is specific to the innerDB engine okay
|
|
so it's only two actions in this engine then there's something called a binary log which is the whole
|
|
mySQL server and why is this important it's because this is basically an incremental backup of
|
|
your system so when you do changes all the changes are stored in this binary log and this is
|
|
also used for replication if you have replicating mySQL service they will use the binary log to replicate
|
|
stuff so the problem now is well when you do a commit you not only have to sync your redo logs
|
|
you have to sync your binary log as well because you have to match when you crash when you crash
|
|
when you restart have to match you can have a transaction that's written to this here but not
|
|
here because then your incremental backup well it's useless because it's some transactions are
|
|
transactions are missing from it and so forth so you have to make sure that these are synced together
|
|
and this is actually done through three fsinks so first you do a prepare on the redo log
|
|
files then you have sync the binary log and then you are fsink the final commit on the redo log
|
|
so you have to three fsinks for every transaction again if your load is if you have a lot of
|
|
concurrency your system will fairly soon become IO bound because of this and this is the great feature
|
|
of MariaDB they actually made a group commit that has the binary log included in it so I mean any
|
|
system where you want to have a backup you will need to use the binary log in mySQL so anyone using
|
|
backups will also have the binary log which means that they will need this group commit feature well
|
|
they will need this group commit feature if their load is IO bound and this is the great thing so
|
|
we have group commit where the commits on the binary log are grouped together and it's dynamic you
|
|
don't have to turn it on basically if you have one commit and then there's more than two in the
|
|
queue they will be grouped together so as soon as you have more than three commits at the same time
|
|
you will use the group commit feature and I have a benchmark here and this is this benchmark
|
|
is really cool because actually not done by MariaDB or anyone promoting this this is actually
|
|
done by Facebook because they also needed like a group commit because they were had an IO bound
|
|
load so they decided to code a group commit feature so they coded it it was okay but it wasn't
|
|
very good compared to MariaDB so they created a new version of it and this is the benchmark with
|
|
original mySQL the second version of the Facebook patch and MariaDB so here you see as your
|
|
concurrent transactions grow how many commits can you do per second basically
|
|
standard mySQL max is out around 5,000 commits per second you can't do more than that because
|
|
well your IO bound and you can't do more than that it of course depends on your disk but this is
|
|
on a fast disk look at MariaDB what you have to almost 28,000 commits per second which is huge
|
|
Facebook gets to around 18,000 commits per second and actually after doing this benchmark
|
|
Facebook officially decided to stop this project because the MariaDB patch was so much better
|
|
so they they're not doing it anymore but you can actually this you can find on the internet if you
|
|
if your Google group commit Facebook MariaDB you will find this article on Facebook pages it's done by
|
|
Mark Culligan who's the mySQL guru at Facebook who actually used to work for Google so he was the
|
|
mySQL guru at Google and now he's the mySQL guru at Facebook and this is one of the best features of
|
|
mySQL 5 MariaDB 5.3 well well MariaDB team was having fun creating new versions best of mySQL 5.1
|
|
Oracle actually produced a new version called mySQL 5.5 and the MariaDB team were like well you know
|
|
we'll just we'll just merge our MariaDB versions into this new Oracle versions version turns out
|
|
it wasn't that easy because MariaDB had been adding so many features all the features you saw
|
|
until now putting them back into mySQL 5.5 which has had some of some features that were similar
|
|
and they had to kind of make sure that they didn't do the same thing twice or they didn't overlap
|
|
and so forth took a long time MariaDB originally promised to be to be released MariaDB 5.5 was
|
|
promised to be released in April 2011 it was released two weeks ago the first alpha version
|
|
but so basically what is MariaDB 5.5 it's mySQL 5.5 so the newest version of mySQL plus
|
|
all the features from up to MariaDB 5.3 so they've taken all the new features merged them into
|
|
mySQL 5.5 and well they had to a lot of work because of the the the conflicts but they managed to finally
|
|
do it and this was what it was supposed to be but at the same time they couldn't help themselves
|
|
but adding a few more features so there's a few more features here and this also has to do with the
|
|
fact that in mySQL 5.5 there's now the community version and the enterprise version differ so this
|
|
is when Oracle added these commercial extensions that you can only get if you're paying customer
|
|
and the MariaDB team they decided that well we cannot have Oracle giving commercial extensions
|
|
that there are no open source alternatives to so they wanted to actually code the same thing in
|
|
MariaDB and one of the extensions I think there's only two extensions out there but one of the
|
|
extensions was a thread pool and it turns out the Oracle thread pool was better than the one MariaDB
|
|
had implemented previously and the MariaDB guys were furious because this is not good so they had
|
|
to re-code their thread pool and that's what they did so the thread pool added in 5.1 is now
|
|
and they added a new thread pool in 5.5 the big difference is that it's dynamic the previous
|
|
thread pool was not you had a fixed set of threads here it's dynamic and it's also optimized
|
|
differently for different operating systems and you can tune it quite well on Linux machines and so
|
|
forth so again this should be on Linux this is pretty much the same as the one Oracle did on windows
|
|
know they use MariaDB use something else I don't know and then there's some more features
|
|
one interesting feature is my MySQL has I put this here because it's actually quite cool MySQL
|
|
has this feature called limit so you can when you do any type of select statement you can add
|
|
limit clause and it limits the output to a certain amount of rows you say select huge select
|
|
limit 10 and you will only get 10 rows in your output and that's cool the bad thing is that if
|
|
you have a 407 table join it will actually do the join before it limits your 10 rows to the
|
|
output so it might last forever and now they extended this limit with rows examine so you
|
|
put a max number of how many rows should be examined so for example here I'm doing Cartesian
|
|
product between two tables which means I take the n times n so if I have one million rows and one
|
|
million rows this Cartesian product will actually create what is that a thousand a trillion I don't know
|
|
one million times a million it's something weird 12 trillion but whatever it's a lot of rows
|
|
anyway but if I had this rows examined a thousand it will stop once it's done a thousand first rows
|
|
so that's a city feature but it's cool I'm going to be 5.5 so the alpha or I think it's beta version is
|
|
out now and it's supposed to be GA fairly so I mean there's not a lot of features it's mostly the
|
|
old features so it should it should stabilize earlier than the others and the previous version
|
|
I'm going to be then for the future I'm going to be 5.6 well all of this is of course guesswork
|
|
because they're not working on it yet they're waiting for 5.5 to stabilize me for but here are
|
|
some of the features that they're thinking of adding and you can actually vote for features
|
|
so we work together with multi-programmed so we have a vote where there's a few potential features
|
|
list and you can go and vote okay I want this feature and great and that will go into multi-programmed
|
|
all right we're on your side so is Maria MariaDB a branch or a fork
|
|
with my score 5.1 it's very simple my score so Oracle has my score now the community edition
|
|
in 4.5.1 the community edition equals the inter-presentation and MariaDB follows these without the
|
|
problem so it's definitely a branch same thing for MariaDB 5.2 and 5.3 because they're based on on
|
|
my score 5.1 with my score 5.5 it gets a bit hairier why well because the community edition
|
|
and the inter-presentation is no longer the same because they added these commercial extensions
|
|
and because MariaDB of course had a feature fest in the previous version that had to backport
|
|
all these features but they did manage to merge my score 5.5 and MariaDB 5.5 together so it is
|
|
still just a branch it's still merging back to when there's a new version of of my score 5.5 the
|
|
community edition MariaDB will merge into that version and there will be a new version of MariaDB
|
|
so it's still only a branch however 4.5.6 who knows it depends on what Oracle does the more commercial
|
|
Oracle goes the closer MariaDB comes to being a real fork and a no longer branch but at the moment
|
|
it's a branch meaning that it merges back into the main tree a fork no longer merges back that's
|
|
a big difference and also fork is made of metal and this is tree would all right so now all of
|
|
you want to get MariaDB right after this who wants to get MariaDB now that's great so how can you
|
|
get it well this MariaDB.org that's the main site you can download stuff you can also look at
|
|
documentation to have a knowledge base and so forth you can also get it from the open source
|
|
build servers so if you're using open source well the have package is ready made for you and a few
|
|
others and multi-programmed have their own yumm and app repos available so you can get it if you add
|
|
their repos to your distribution you can actually use yum or app gets to get them.
|
|
What is empty? What is empty? Multi-programmed sorry it's a company behind it
|
|
and what about support and services well multi-programmed they do what they call NRE which is
|
|
non-recurrent engineering so if a customer comes and says well we want to have this feature in
|
|
MariaDB. Multi-programmed is okay great we'll put it on the on the list of features and we'll
|
|
get to it eventually the customer says no no I really want this feature so all right how much
|
|
are you paying and then if they pay well they will put this feature sooner and that's basically
|
|
what they do. Multi-programmed is a company with only developers so they don't really do anything
|
|
else than develop the problem is that their product is free right it's gpl no one pays them anything
|
|
so they don't earn any money and that's where kind of we come in SkySQL we do services
|
|
we do support training and consulting and basically every time we sell support contracts
|
|
a part of what the price goes to multi-programmed because they are kind of the back end support so we
|
|
do a level one and level two and they do all the bug fixing so if there's a bug if one of our
|
|
customers finds a bug or well they find they have a problem our support in jeans try to try to
|
|
well look at the problem and say well this is a bug we send it to multi-programmed multi-programmed
|
|
delivers a pledge and of course this works with MariaDB but also it's standard MariaSQL because
|
|
they know mySQL pretty well it's strange enough and of course there are others providing MariaDB
|
|
so we're not the only ones I don't know what contracts the others have but there's the list at
|
|
MariaDB org service providers but basically if you buy stuff from us remember the money the
|
|
part of the money goes to MariaDB that's how they are in the living because they don't get that much
|
|
NRE and even more information you can go to this thing there's a mailing list on launch pad
|
|
and of course the source code is on launch pad there's Maria channel on free node
|
|
that's basically how they communicate at SkySQL we use Skype and we have our we have an IAC channel
|
|
but it's internal closed and these guys use free nodes IAC channel as the only means of
|
|
communicating well and email but so you will have them discuss features and development stuff on
|
|
this free node you can just anyone can log in and see what they're talking about you might not
|
|
always understand it because you know like me they're not native English speakers they're Russian
|
|
and stuff so sometimes you don't tell a clue what they're talking about anyway and there's actually
|
|
a MariaDB book MariaDB crash course which you can get from Amazon or whatever if you're interested
|
|
that's it yes I'm wondering about the MariaDB source code how big it is in my
|
|
so it's fairly big because it's I mean it's it's my SQL plus the extra features so it's I can't
|
|
say how big it is but it's big hundreds of megabytes yeah I think so yeah it's written in C plus
|
|
well we could see plus it's basically written in C plus plus but it's kind of procedural so
|
|
it's just it's just like features of C plus plus but it's not really it's not really object oriented
|
|
in a real way any other questions let's see what is the real life scope of my SQL workbench
|
|
so workbench is works yeah it's compatible with MariaDB and yeah it's compatible with MariaDB
|
|
it's like like a like a design call it a a a-d tool entity whatever it's a design tool you can
|
|
design tables and stuff and but it communicates through SQL so it's works finding both
|
|
what yeah one day one is nice about like no SQL it's an embedded objects turns the dynamic columns
|
|
have embedded objects some kind of a table no so you can you can you can just embed data but not
|
|
more complicated structures you can embed characters or integers and stuff like that but you can't
|
|
I mean I guess you could have multiple but I mean it's not an object now okay
|
|
but if you have more common questions we'll be we have a booth up here so you can come
|
|
and ask questions at the booth we'll be there thank you
|
|
you have been listening to Hacker Public Radio at Hacker Public Radio does our
|
|
we are a community podcast network that releases shows every weekday Monday through Friday
|
|
today's show like all our shows was contributed by a HPR listener like yourself
|
|
if you ever considered recording a podcast then visit our website to find out how easy it really is
|
|
Hacker Public Radio was founded by the Digital Dog Pound and the Infanomicon Computer Club
|
|
HPR is funded by the binary revolution at binref.com all binref projects are proud
|
|
sponsored by lunar pages from shared hosting to custom private clouds go to lunarpages.com
|
|
for all your hosting needs unless otherwise stated today's show is released under a creative
|
|
comments, attribution, share alike, free dose of license
|