Files

545 lines
49 KiB
Plaintext
Raw Permalink Normal View History

Episode: 1089
Title: HPR1089: Max Mether of SkySQL talks about MariaDB
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1089/hpr1089.mp3
Transcribed: 2025-10-17 18:47:16
---
That's it.
This is Max Meather from SkySQL.
Thank you all right so I'm here to talk about Maria DB.
Before we start who knows what Maria DB is.
All right who knows what Maya is guys.
Okay that's good that's good.
So I wanted to do this I wanted to make a non-technical talk.
Unfortunately I wasn't able to so it's going to be quite technical so bear with me.
Yeah so I was asked to do this talk about Maria DB so my first question was okay so
what is Maria DB so I thought actually it's fairly simple it takes something old and you
replace it with something new like this and that's a fairly simple story and it looks good
the problem is that it's not really true so Maria DB is not really a replacement of Maya
it's actually a combination right so Maria DB is a branch of Maya SQL so 98% of everything
you find in Maria DB is actually Maya SQL and if anyone who these girls are on weather here
they're actually the founders girls he had this idea of naming his products after his daughters
so that's actually why Maya SQL is called Maya SQL because his first daughter is called
Mew and no one ever says Mew SQL but the Swedish correct pronunciation would be Mew SQL
and his next daughter is called Maria and that's why we have Maria DB
ask the product all right so what is Maria DB to branch of Maya SQL it's completed backward compatible
drop-in replacement if you're running Maya SQL right now you can stop it put Maria DB binary
restart it and it will work out of the box so it's a drop-in replacement and as I said
well around 96% of the code is the same so it's very very very close and basically this talk
I'm going to talk about the differences so what does Maria DB add to Maya SQL don't
next okay so Maria DB is complete fully open source all the development is done on launchpad so
anyone can download the source code try adding stuff to the source code it's all open but of course
the whole Maya SQL source code has been developed for quite a while so there's a lot of code there
it's all on the GPL so well anyone could add stuff or remove stuff as long as he's taking
the GPL everything is open source and well Maria DB is committed to staying open source
there are no closed source modules as opposed to the Vanilla Maya SQL branch where there are now
some closed source modules and another big thing the bugs database is open Maya SQL traditionally
had an open bugs database so everyone ever submitting a bug it was put into this open database
everyone could see the bugs you could check if you're running to a bug you check is it already
submitted and so forth unfortunately that change recently Oracle who not want Maya SQL
they decided not to have all the bugs out in the open so customers when they file bugs they
fight it to different system the old bugs database still exists however it's not always maintained
a lot of bugs are open since five six years nothing has happened five six years so no one knows
if they're actually doing stuff on them or not and furthermore when they do releases in Maya SQL
now they refer to bug fixes but the bug fixes are from the internal closed bugs database so you
don't actually know what they're talking about so you don't say fix internal bugs 743 and no one
knows what this bug means and this is actually why there now are certain distributions who are
thinking about switching from Maya SQL to Maria Libby because they don't they don't know what
goes on in Maya SQL all right so let's have a brief history here so Maya SQL was developed by one
guy in the beginning called Montevidanius the first version of Maya SQL well you can't really
call it Maya SQL but the first thing he developed that would eventually become Maya SQL was done in
83 it was a database it was developed in basic if anyone still remembers basic in 83 it's all
long time ago it eventually got poor it to see and got a lot of changes the biggest one being in
in the beginning of the 90s so the database which was then called net base or union reg and stuff
it only had an isam interface right there was no SQL you couldn't use SQL you had to use an isam
interface and at the time the company developing the isam interface said well we're actually going to
start developing this so your database won't be one who work anymore so they decided to all right
let's get an SQL interface to the database so that people can access it and at the same time the
guys Monti who had created it and David Oxmark one of his co-workers they had the idea of why don't
we just we create the SQL interface and we make it open source for them it's a big decision
because they had like three four customers who were paying them a lot of money and they were
scalable if you make this open source and free you know a lot of potential customers might just
use it and not pay us anything but because they had been using a lot of new tools before so they
felt that they've they've had a lot of benefit from open source they wanted to make sure that they
gave something back so they decided to release mySQL as open source and this was done in in
95 at the same time they created a new company called mySQL AB at the time it had two employees
those two guys mySQL AB slowly grew as you know mySQL happened to be released in a very fortunate
time in 95 about the same time the internet was exploding so lamp became a
became something very known and very used everyone had their own lamp lamp a server running
with Linux Apache mySQL and PHP RPR and the mySQL usage because it was free because it's open source
my mySQL usage exploded the company didn't follow it's very slowly grew because they were only
selling selling support but eventually they had some more business oriented people come about
monthly being a great developer developer he perhaps wasn't the best CEO of commercial
company so they got another CEO more to me because and mySQL started growing quite rapidly
and of course in 2008 mySQL was acquired by Sun for one billion which is still the most
profitable open source acquisition in history so if anyone ever says that open source is not
profitable here's the good counter example they paid one billion for mySQL two years later or
paid four billion for Sun so you could argue that mySQL was one fourth of Sun which is of course not
true which you could argue that because it was two years later anyway the acquisition of Sun
or the acquisition of mySQL by Sun changed a lot of things mySQL had always been a small company
two people at the beginning but even at the acquisition point there was less than 500 we were a small
company there were 50 developers so small company Sun had around 40,000 employees so quite a different
structure a lot of processes and stuff that you have in big companies so a lot of people decided
that well this is not a place for me so they left and the company called Monty Program was founded
in late 2008 they was founded by the same guy who had created mySQL and he took a lot of core
developers with him and the first purpose of the company was to create add-ons to mySQL just make
sure that there's an open source version of mySQL out there then of course Oracle acquired Sun
and with Sun mySQL and the purpose of Monty Program changed a bit now they saw a real threat to mySQL
and the open source needs of mySQL so they said well we have to make sure that there's always an
open source version so they created the branch MariaDB and the first version of MariaDB was launched
in February 2010 and well the latest release was launched a few weeks ago so who's behind MariaDB
well Monty Program so the company created by the founder of mySQL and the creator of mySQL so
guys who know mySQL really well he took the guys he wanted with him so there's around 30 developers
and you can say that they are core developers there was quite a long time in mySQL where
where if the founder went on vacation pretty much everything stopped because he was the only one who
knew everything in mySQL later we hired two or three people who also knew everything so he could
go on vacation and there was there was some other guys who could take over well all of these two
three people are now in MariaDB but it's not only Monty Program there's also a community behind MariaDB
the goal of MariaDB is to provide a community that developed stable and free database and in order to
do this MariaDB is very open to the community it takes contributions it's developed on launchpad
and many of MariaDB has captains that basically
Stuart the development of MariaDB is to look at the patches and stuff like that has to approve
everything and off the MariaDB captains only 64% are in Monty Program the rest are outside community
people so it's in a large part developed by the community MariaDB they also have an open
development model meaning that when they have new features and stuff they have an open workloads
everyone can look at them anyone can suggest enhancements
Monty Program is a distributed company people work from home all over the place they have meetings
twice a year they have like company meetings and in the company meetings 85% of all their sessions
are open to the public so anyone can actually join them customers do join them Facebook being a
good big MySQL user and customer every time Monty Program has a company meeting Facebook is there
they have at least two three guys just sitting on the sessions and make sure that their voices
heard as well in that future development but anyone could join last meeting they had was in Athens
almost that few months ago and there was there were like three or four companies some people
there just to make sure that their voices heard in future development so it's very open everything
they do all right let's take a look at the differences between MariaDB and MySQL
so first of all the first version of MariaDB was 5.1 and this was based on MySQL 5.1
that's easy enough to follow and basically the version numbering follows exactly MySQL 5.1
so you had MySQL 5.1 30 you would have MariaDB 5.1 30 which was based on MySQL 5.1 30
very simple however they did add quite a few features
who knows what a storage engine is one guy okay excellent so MySQL has this concept of storage
engines which means that for storing the data so in a database you could really simplify
you could say you have three main pieces you have a parser parser the queries you have an optimizer
and you have a storage engine which stores the data and retrieves the data and in MySQL that's
storage engine part is modular so you can choose how you want to store the data which also has
lot of implications like indexing types data types all of that it is stored on this commemorative
and so forth so there's an API between the optimizer and the storage engine and all you have to
do is fulfill these functions in the API and you can create your own storage engine and many have
in MySQL the MySQL distribution there's like five six different storage engines but there's a lot
of open storage storage engines and so forth so anyone can create their own storage engine
so the first thing they did that MariaDB was to include third-party storage engines
again this is something that Oracle doesn't want to do mainly because Oracle does dual licensing
so MySQL is GPL but they also sell a commercial license for people who want to embed MySQL
in their software who don't want to get bound by the GPL and there's a lot of fun about this stuff but
basically they have two two types of licenses and of course they can't put GPL stuff in the commercial
license so any open source GPL storage engine Oracle can't touch because they can't sell commercial
licenses so MariaDB introduced a few storage engines PBXC a transactional one which actually the main
goodness with this it has really fast crash recovery so if you crash a lot this is a good storage
engine X3DB who has heard of X3DB no one all right who has heard of innerDB
okay quite a few so innerDB is the main storage in MySQL it's a transactional storage engine
similar to traditional databases like Oracle and Cybase and and so forth and X3DB is actually
very close to innerDB it's a branch of innerDB so it's the innerDB storage engine with a few
enhancements and again there's a very complicated story to why this this this was about innerDB
traditionally was created for well it was created in the in the 90s so it was back then there weren't
that many multi core multi CPU machines and it turns out that it didn't perform that well on
multi core multi CPU machines and when people started having multi core machines innerDB started
running into all of mutics and stuff which didn't scale well but a lot of these mutics were quite
easy to overcome and and Google who was using MySQL heavily at the time they figured it out they made
some patches that fixed this and they released those patches problem being again MySQL couldn't
take in the patches because they were released on the GPL and we were selling commercial licenses so
we had to record the whole thing at the time out of the MySQL so we couldn't take the Google patches
because they were GPL so another company called Percona took the Google patches and added them to
innerDB and created a patched version of innerDB called xrdb and that's basically xrdb's eventually
MySQL added these the same features not of course the same code because they can't they added the
same features into innerDB's and now innerDB and xrdb are very close but at the time xrdb was
scaled a lot better in innerDB anyways that's a storage engine and there are some other storage
engines oh yeah who knows what MySQL is two guys MySQL is the traditional kind of the old school
storage engine in MySQL it's very good for storing data in an efficient way however it's not
transaction it's not scratch safe it's pretty much bad in many senses except that data is very
compact on desk and back in the days this was the only storage in MySQL it used to be the default
so a lot of customers actually used MySQL without knowing it because they didn't know that you could
choose so they were saying well we use transactions and we crash that we're like actually you're not
if your system crashes during a ride while your tables will be corrupt and they were like really
and that's still the case with MySQL and RIA is basically an improved version of MySQL so it has
the same storage however it's crash safe it has a small log so if the server crash is doing
your operation it can actually roll back that over a operation myasm can't if you're doing an
update changing 100 rows system crashes well your table is somewhere in between and that's where
the table is there's no way of falling back with RIA you can roll back then there it's in bug fixes
one of the things was that MySQL 5 when you wanted to upgrade from MySQL 5-0 to MySQL 5-1 the MySQL
development team actually forgot that people might want to upgrade so you weren't able to upgrade
without taking a total total dump of your data and reinserting all the data and this was actually
just due to a simple bug and MariaDB fixed it so going from MySQL 5-0 to MariaDB 5-1 you could do it
without taking it back up but MySQL 5-0 to MySQL 5-1 you couldn't and that was a big
seller to many customers back in the days please then they added microsecond support can anyone
see this red stuff no I don't know where to turn on the lights basically MySQL traditionally has
only second granularity on everything so something takes zero seconds one second or two seconds
and well you don't get lower than that in MariaDB you get to microsecond level so you can see
you can see if your query took I don't know 0.5 seconds or 0.51 seconds instead of having just
a zero there or one so that's something that added in MariaDB and this goes both for the
slow query log and the process list so you can see a statement how long has it been running
in microseconds and not only in seconds then they added a thread pool a very basic thread pool
which basically traditionally MySQL every thread every connection to the every connection to the
database is a thread so if you have a hundred connections you have a hundred threads plus a few
extra threads and this can be great but it can also be not so great for example if you have a lot
of small connections you have a lot of connections doing small transactions this means you end up
have a lot of a lot of context switches between between a different threads because well every
connection is a thread and if they do stuff all the time they have to switch between the threads
all the time so for these types of scenarios a thread pool makes sense you get less context
switching because you queue the stuff you queue all the operations into a pool of threads
another thing another thing that Monte program did was MySQL had an optimizer team that was
basically working on the MySQL optimizer it had four people in it or five at the end so five
people working on the MySQL optimizer well Monte program hired five of these five which left
how many in Oracle zero so all the guys working on the optimizer moved out to Monte program so
they added features to the optimizer one of the first was this table elimination feature
which is to be honest it's not that great a feature it's mainly mainly
useful if you use a machine created SQL statement so if you have some kind of middle layer that
creates the SQL for you that's when you might run into this so basically if you have this kind
of anchor schema which basically means that you you have attribute as a table so if you instead
of having a table with like a person and you have as a column you have it's his ID his name is
phone number and stuff you have each attribute will be in its own table so the first date will be
one table the whatever name will be one table here we have actors and the rating of the actor
and another table the idea with this is that it's very very modular so you don't you don't do an
alter table to add columns you just add more tables to add columns so it becomes very modular all right
if you have this type of schema normally you would create a view because you don't want to have like
nine table joins every time you want to see some things so you could have viewed at
for it to access all this which means to have a view with all the nine tables and then when you do
a simple query on this view it will have to join in all the nine tables every single time and that's
what happens in standard MySQL every time you use this view even if you're looking at just one
column of one of the tables it will join all nine tables and that's where this is where table elimination
elimination comes into the picture it actually eliminates the table the tables the optimizer
eliminates the tables from the join algorithm again you are not likely to do is by yourself but if
you use machine-created code they can actually create these types of schemas for you and you don't
even know it yes that was MariaDB 5.1 because Oracle was very slow at coming with new versions of
MySQL and MariaDB well you know they were developers and they were developing new features so that is
decided to create a new version of MariaDB 5.2 which is built on MariaDB 5.1 which is built on
MySQL 5.1 so it's still based on MySQL 5.1 MariaDB 5.2 and here they added a few features one is
plugable plugable authentication so traditionally MySQL there's one authentication protocol and that's
built in and well that's what you have to use however in MariaDB 5.2 you can choose a different
authentication protocol you can plug it in so to say there's already a plan plan plug-in included
but you can use any type of you can create your own plugin for this for the authentication so you
don't have to use the MySQL built-in authentication anymore that's one feature another very useful feature
is user statistics in MySQL there's a lot of statistics but they're all on the server level so you
see what's going on the server but you have no clue on what's going on for an individual client
individual user or index or tables so this is something that MariaDB 5.2 so you get statistics per
IP so you see okay all connections from this IP address what have they done how much memory
have they use how many queries and everything same with a specific user account what's this user
account done done and then the last ones are index so for every single index you see how many times
this index been read and so forth and same with the tables this before MariaDB 5.2 this doesn't
exist so if you use MySQL 5.1 you don't have statistics on this level you have server for the whole server
then there's something called things does anyone know what's things is
okay things is an open source it's a full text full text search engine it's very much used
for the web MySQL has some has full text indexing on my island tables but the full text indexing
indexes are not very good I mean they're better than no full text indexing but if you're talking about
high-performance websites it's not good enough so instead of using the Maya some built-in full
text indexing you can use Sphinx which is a program where you have full text indexing so you put
data in it and it creates indexes on text based on articles or whatnot and normally you run Sphinx
as a separate process and you query it through your code so you send a request through Sphinx saying
okay I want to look for this and which articles and it gives you back the key values for the
articles and match your search. While you MariaDB you can do the same except that you can do it
from within the MySQL server so you can access the Sphinx engine from within the MySQL servers you don't
have to use the native C or whatever calls to Sphinx you can use it through the MySQL engine which
means that it's easier to combine the results of Sphinx and MySQL. Another thing is the segment
ison key cache which basically allows better concurrency. My ison tables have a joint key cache for
every single index of every table is stored which doesn't work too well when you have a lot of
concurrent users because they all have to use the same key cache so you run into concurrency issues
and by segmenting it up you can have a lot better concurrency. I have a benchmark image that
Jamie will show you to give you a bit of an idea so again if you run with here's one benchmark
and if you see the number of threads growing you see the benefit of running segmental key cache so
this is basically here you have the threads here's how many queries you can do per second if you
don't have a segmental key cache you basically max out around 5,000 queries per second with my ison
if you have a segmental key cache you grab to 20,000 and this well that depends on what type of
benchmark you run but this is about the worst performance game in the benchmarks
and that was MariaDB 5.2 but the big release for MariaDB is MariaDB 5.3 which
went GA in December so not so long ago and this has a lot of new features again it's built on
MariaDB 5.2 so it's built on MariaDB 5.1 which is built on MySQL 5.1 so we're still on MySQL 5.1
it has quite a few features something called handler socket which basically gives you a
kind of a no SQL or no SQL interface to InnoDB engine so you can directly access the InnoDB
engine without passing through the SQL layer or the or even the optimizer layer just send
a simple crude operator so a crude is what is it creates read update delete something like that
create read update delete so you can do direct operations on the data without passing through
the optimizer so this is very useful if you do simple kind of batch type of operations
basic things where you normally wouldn't use a SQL engine you would use some kind of no
SQL stuff like MongoDB or something so you can kind of get similar stuff with
InnoDB then there is something called dynamic columns which is again kind of a no SQL feature because
it does stuff that you're not supposed to do with SQL so basically it allows you to create
columns with dynamic content so basically you store stuff in a blob and you just have some
functions for treating what's in the blob so basically this is useful for some type of data warehouse
or something where you sell products and well depending on what type of product it is it has
different attributes and you don't want to have different tables because okay you have all
their phones here you have all your memory in that table you have everything there so you can store
all your products in this one table and you just store the attributes in this dynamic column
and I have an example for you so you get the idea of what we're doing here if you can see anything
well not the red so where is turn off the light
so the best way to do this is we have a blob column here and then we use a column create function
and we just give an index which means which attributes is there must be a switch somewhere
I'll check it out there I have door
sorry for you in the news
I guess not
yeah so there's a column create function and you just basically give
even index to the attribute you want to have and then you give a value to the attribute
so here of course as as the guy programming this you have to keep track of okay which
which index corresponds to which start with attributes so for example one here corresponds to
the color we have one is blue you one is black you one is black and so forth and we have other
other things like that and we can store any number of attributes so one product could have one
attribute another product could have five and the third could have seven and so forth and then
we can use this column list function to basically get this data back out
the for example here I just had a list of all the attributes for each product so for example
the Nokia is a phone it has attributes one and two one being color and two was something else
that she says to me something else and so forth to get the list of the attributes wage product
and we can also do searches of these products so for example
yeah I don't know if you can see that there but we're looking for all products that have
attribute one as black so basically we're doing a normal search we just want everything
has black and we get a few phones and we're going to not be thinking about and what not we get all
product that has black and so forth here we do a different thing we're just looking at
and what color are these things up well so again it's feature at fairly useful
for different things
then we're going to be also added a lot of replication enhancement who here has used
mySQL replication no one okay then you probably don't care about this
basically it's yeah if you're using mySQL replication all of well some things are better
but I'm going to skip this if you don't care there's a user feedback plugin which means that
we can mySQL can already be can get data sent through so they can log what people are doing
with the server okay not what type statements but just where they are what version they're using
and stuff to better get an idea of what's just been used out there and this is my favorite feature
actually they added a progress report for auto table it's a very simple feature to code
but or it was a way where's the feature to code but before this if you do an auto table on mySQL
you do auto table do use and change presenter and then you wait and after 10 minutes you're like
it's been 10 minutes I mean I guess it's going to be done soon then you wait some more and it's
been half an hour you're like well I can't stop it now it's going to be done really soon now
then it doesn't hour and you're like oh it's been an hour I mean if I kill it now then have to
redo the hour and thank you whoever did that and so forth thanks so it's a very simple feature
but a lot of DBAs complain about this they do an auto table and then they're like I don't know
how long it's going to take should I just kill it or should I wait or watch that do now you know
there's a progress progress bar you get when you do the auto table you actually get a progress bar
and you can do a show process list in the server you get a progress bar as well so the guy doing
it gets a progress report and everyone else can get a whatever the DBAs can get a progress report as
well so you know okay should I kill this or should I just let it run is it close to being done
or not and this is my favorite feature there's a lot of optimizer enhancements
um my school added sub queries in version four that one so somewhere around
2004 or three or something and this is great we could there was a check box do you have sub queries
we could click it and say yes the bad thing was that they sucked it was a common
common knowledge that my school has sub queries but please never use them because they're really slow
but we still had them but so the bad thing with this is that well the good thing is that everyone
learned that if you want performance you shouldn't use sub queries and in most cases you can do
work runs you can use joints or you can do other things instead of sub queries the bad thing is
that there's a lot of again a lot of people using middle layers application layers where where the
school is created by the middle layer and there they use sub queries and and you can't control it
because you're using this application layer and you have to use it and they had really bad performance
because well sub queries weren't performing well well mainly because they hired all the
all the optimizer team it was easy for them but mainly we fixed it so sub queries work a lot faster
just basically sub queries are usable now they weren't before
there's a lot of optimizations there but mainly what they do is if a sub query can be rewritten
to a join the optimizer will actually do it previously the optimizer didn't even try it was like
okay a sub query let's assume it's a dependent correlated and do worst case scenario it will be
always work so the sub queries were always done in the slowest possible way before not anymore
and some other doing materialization and stuff like that
traditionally myoscope only had one type of join
there's a lot of ways you can join tables when you do a join in a database kind of the simplest
is to one myoscope used it's called nested loop join and it's basically like for loops nested
for loops you have a for loop for the first table for the second third and so forth and that's
to do you take the first row first first the second versus third and blah blah and this is great
in many cases but in some cases it sucks and well if you only have this you only have this and
if it sucks it sucks and you you can't do anything most database systems other database systems
commercial database systems have multiple join algorithms and this is something that was now added
to my gdb i guess the main main difference is like the hash joins is what it has to do with indexing
and the batch key well batch acts is actually it's easy to understand instead of doing it a for loop
one row per row you do a batch so you do a batch of the first table and then you do a batch of the
second table and so forth depending on your memory structures and messaging and this can be a
lot faster in some cases as an example there's an end there's my school cluster is a product where
where the data is stored on the network on different nodes and there when you use nested loop joins
it means that every time you want a row you have to send a network message all right I want this
row send it back okay then next step then next step so the latency will kill you if you do a nested
loop join it means that for every single row combination you have to send end messages on the net
so normally with my school cluster if you have joins with more than two tables you should go for a
coffee break while you're doing the join well with batch key access you don't have to do this anymore
because you can actually get in one send you can send hundreds of rows so you have like a batch you say
uh one kilobiter one 500 kilobytes of rows in one go instead of having to sequentially send
every single row and wait for the network latency network so that's great and some other
optimizations I'm not going to go into this however another another thing is that there's not
optimizer control so you can control which of these optimizations are on and off for your queries
so you can say well I want to use the old way and I don't want to try these new things and you
can control it with this optimizer switch and then perhaps the biggest or best
feature of my marydb53 is something called group commit and in order to explain group commit I
have to explain a bit about how the server works so basically on the server this is in inodb or
xodb the transactional engines with something like my eyes and you don't care about group commit because
nothing is is crash safe anyway so you don't have to worry about currently in crash safe however inodb
and xodb they're supposed to be acid compliant which means that their transactions are durable
if the system crashes and you restart everything that's been committed is committed right you can't
lose anything that's what crash safe means the problem is of course in general disk IO is very slow
so what you don't want to do is commit every time you do a change you don't want to write this
change on to the disk because it's going to be very very slow so instead what you do is you have
a buffer pool all your all your data blocks index blocks and stuff are in this buffer pool you do
changes to them well instead of well when you do the changes you have something called a log buffer
where you log these changes as operations not the actual changes when you're done you have to
you commit your transaction you don't write the data pages to the disk which are large you just
write this this log to disk so you have a log of the changes where you just say whatever update
or something and you write that to the disk because that's small so when you commit when you commit
a transaction you have to sync your redo log to disk to make sure that this change is not written
somewhere so in case the system crashes I can get this state back somehow and what happens when
your system restarts is that well it looks at the data on disk and looks at start going through the
redo log okay which of the which of the transactions in the redo log have been applied on this disk state
and which haven't so that's basically what it does that's the crash recovery it goes to the redo log
and applies the changes to the disk all right so when you commit you have to sync you do an F sync
your redo log is now flushed it sync to disk and you're safe
there's a few problems with this one is during concurrency you have hundreds of transactions or
thousands of transactions everything of transactions commits at the same time well you can't do F sync
at the same time so you do the sequentially in a queue and you F sync one transaction then the
second the third and so forth again if you have lots of transactions this is not a good thing to
do so what you try to do instead is you group your transactions together and do a group commit
so you F sync all of them at the same time instead of doing it in sequence and that's the basic
idea of group commit and that you do get in the newest version of mySQL you get this however
what really added is something else mySQL also has something called a binary log who knows
what a binary log is no one excellent so the redo log is specific to the innerDB engine okay
so it's only two actions in this engine then there's something called a binary log which is the whole
mySQL server and why is this important it's because this is basically an incremental backup of
your system so when you do changes all the changes are stored in this binary log and this is
also used for replication if you have replicating mySQL service they will use the binary log to replicate
stuff so the problem now is well when you do a commit you not only have to sync your redo logs
you have to sync your binary log as well because you have to match when you crash when you crash
when you restart have to match you can have a transaction that's written to this here but not
here because then your incremental backup well it's useless because it's some transactions are
transactions are missing from it and so forth so you have to make sure that these are synced together
and this is actually done through three fsinks so first you do a prepare on the redo log
files then you have sync the binary log and then you are fsink the final commit on the redo log
so you have to three fsinks for every transaction again if your load is if you have a lot of
concurrency your system will fairly soon become IO bound because of this and this is the great feature
of MariaDB they actually made a group commit that has the binary log included in it so I mean any
system where you want to have a backup you will need to use the binary log in mySQL so anyone using
backups will also have the binary log which means that they will need this group commit feature well
they will need this group commit feature if their load is IO bound and this is the great thing so
we have group commit where the commits on the binary log are grouped together and it's dynamic you
don't have to turn it on basically if you have one commit and then there's more than two in the
queue they will be grouped together so as soon as you have more than three commits at the same time
you will use the group commit feature and I have a benchmark here and this is this benchmark
is really cool because actually not done by MariaDB or anyone promoting this this is actually
done by Facebook because they also needed like a group commit because they were had an IO bound
load so they decided to code a group commit feature so they coded it it was okay but it wasn't
very good compared to MariaDB so they created a new version of it and this is the benchmark with
original mySQL the second version of the Facebook patch and MariaDB so here you see as your
concurrent transactions grow how many commits can you do per second basically
standard mySQL max is out around 5,000 commits per second you can't do more than that because
well your IO bound and you can't do more than that it of course depends on your disk but this is
on a fast disk look at MariaDB what you have to almost 28,000 commits per second which is huge
Facebook gets to around 18,000 commits per second and actually after doing this benchmark
Facebook officially decided to stop this project because the MariaDB patch was so much better
so they they're not doing it anymore but you can actually this you can find on the internet if you
if your Google group commit Facebook MariaDB you will find this article on Facebook pages it's done by
Mark Culligan who's the mySQL guru at Facebook who actually used to work for Google so he was the
mySQL guru at Google and now he's the mySQL guru at Facebook and this is one of the best features of
mySQL 5 MariaDB 5.3 well well MariaDB team was having fun creating new versions best of mySQL 5.1
Oracle actually produced a new version called mySQL 5.5 and the MariaDB team were like well you know
we'll just we'll just merge our MariaDB versions into this new Oracle versions version turns out
it wasn't that easy because MariaDB had been adding so many features all the features you saw
until now putting them back into mySQL 5.5 which has had some of some features that were similar
and they had to kind of make sure that they didn't do the same thing twice or they didn't overlap
and so forth took a long time MariaDB originally promised to be to be released MariaDB 5.5 was
promised to be released in April 2011 it was released two weeks ago the first alpha version
but so basically what is MariaDB 5.5 it's mySQL 5.5 so the newest version of mySQL plus
all the features from up to MariaDB 5.3 so they've taken all the new features merged them into
mySQL 5.5 and well they had to a lot of work because of the the the conflicts but they managed to finally
do it and this was what it was supposed to be but at the same time they couldn't help themselves
but adding a few more features so there's a few more features here and this also has to do with the
fact that in mySQL 5.5 there's now the community version and the enterprise version differ so this
is when Oracle added these commercial extensions that you can only get if you're paying customer
and the MariaDB team they decided that well we cannot have Oracle giving commercial extensions
that there are no open source alternatives to so they wanted to actually code the same thing in
MariaDB and one of the extensions I think there's only two extensions out there but one of the
extensions was a thread pool and it turns out the Oracle thread pool was better than the one MariaDB
had implemented previously and the MariaDB guys were furious because this is not good so they had
to re-code their thread pool and that's what they did so the thread pool added in 5.1 is now
and they added a new thread pool in 5.5 the big difference is that it's dynamic the previous
thread pool was not you had a fixed set of threads here it's dynamic and it's also optimized
differently for different operating systems and you can tune it quite well on Linux machines and so
forth so again this should be on Linux this is pretty much the same as the one Oracle did on windows
know they use MariaDB use something else I don't know and then there's some more features
one interesting feature is my MySQL has I put this here because it's actually quite cool MySQL
has this feature called limit so you can when you do any type of select statement you can add
limit clause and it limits the output to a certain amount of rows you say select huge select
limit 10 and you will only get 10 rows in your output and that's cool the bad thing is that if
you have a 407 table join it will actually do the join before it limits your 10 rows to the
output so it might last forever and now they extended this limit with rows examine so you
put a max number of how many rows should be examined so for example here I'm doing Cartesian
product between two tables which means I take the n times n so if I have one million rows and one
million rows this Cartesian product will actually create what is that a thousand a trillion I don't know
one million times a million it's something weird 12 trillion but whatever it's a lot of rows
anyway but if I had this rows examined a thousand it will stop once it's done a thousand first rows
so that's a city feature but it's cool I'm going to be 5.5 so the alpha or I think it's beta version is
out now and it's supposed to be GA fairly so I mean there's not a lot of features it's mostly the
old features so it should it should stabilize earlier than the others and the previous version
I'm going to be then for the future I'm going to be 5.6 well all of this is of course guesswork
because they're not working on it yet they're waiting for 5.5 to stabilize me for but here are
some of the features that they're thinking of adding and you can actually vote for features
so we work together with multi-programmed so we have a vote where there's a few potential features
list and you can go and vote okay I want this feature and great and that will go into multi-programmed
all right we're on your side so is Maria MariaDB a branch or a fork
with my score 5.1 it's very simple my score so Oracle has my score now the community edition
in 4.5.1 the community edition equals the inter-presentation and MariaDB follows these without the
problem so it's definitely a branch same thing for MariaDB 5.2 and 5.3 because they're based on on
my score 5.1 with my score 5.5 it gets a bit hairier why well because the community edition
and the inter-presentation is no longer the same because they added these commercial extensions
and because MariaDB of course had a feature fest in the previous version that had to backport
all these features but they did manage to merge my score 5.5 and MariaDB 5.5 together so it is
still just a branch it's still merging back to when there's a new version of of my score 5.5 the
community edition MariaDB will merge into that version and there will be a new version of MariaDB
so it's still only a branch however 4.5.6 who knows it depends on what Oracle does the more commercial
Oracle goes the closer MariaDB comes to being a real fork and a no longer branch but at the moment
it's a branch meaning that it merges back into the main tree a fork no longer merges back that's
a big difference and also fork is made of metal and this is tree would all right so now all of
you want to get MariaDB right after this who wants to get MariaDB now that's great so how can you
get it well this MariaDB.org that's the main site you can download stuff you can also look at
documentation to have a knowledge base and so forth you can also get it from the open source
build servers so if you're using open source well the have package is ready made for you and a few
others and multi-programmed have their own yumm and app repos available so you can get it if you add
their repos to your distribution you can actually use yum or app gets to get them.
What is empty? What is empty? Multi-programmed sorry it's a company behind it
and what about support and services well multi-programmed they do what they call NRE which is
non-recurrent engineering so if a customer comes and says well we want to have this feature in
MariaDB. Multi-programmed is okay great we'll put it on the on the list of features and we'll
get to it eventually the customer says no no I really want this feature so all right how much
are you paying and then if they pay well they will put this feature sooner and that's basically
what they do. Multi-programmed is a company with only developers so they don't really do anything
else than develop the problem is that their product is free right it's gpl no one pays them anything
so they don't earn any money and that's where kind of we come in SkySQL we do services
we do support training and consulting and basically every time we sell support contracts
a part of what the price goes to multi-programmed because they are kind of the back end support so we
do a level one and level two and they do all the bug fixing so if there's a bug if one of our
customers finds a bug or well they find they have a problem our support in jeans try to try to
well look at the problem and say well this is a bug we send it to multi-programmed multi-programmed
delivers a pledge and of course this works with MariaDB but also it's standard MariaSQL because
they know mySQL pretty well it's strange enough and of course there are others providing MariaDB
so we're not the only ones I don't know what contracts the others have but there's the list at
MariaDB org service providers but basically if you buy stuff from us remember the money the
part of the money goes to MariaDB that's how they are in the living because they don't get that much
NRE and even more information you can go to this thing there's a mailing list on launch pad
and of course the source code is on launch pad there's Maria channel on free node
that's basically how they communicate at SkySQL we use Skype and we have our we have an IAC channel
but it's internal closed and these guys use free nodes IAC channel as the only means of
communicating well and email but so you will have them discuss features and development stuff on
this free node you can just anyone can log in and see what they're talking about you might not
always understand it because you know like me they're not native English speakers they're Russian
and stuff so sometimes you don't tell a clue what they're talking about anyway and there's actually
a MariaDB book MariaDB crash course which you can get from Amazon or whatever if you're interested
that's it yes I'm wondering about the MariaDB source code how big it is in my
so it's fairly big because it's I mean it's it's my SQL plus the extra features so it's I can't
say how big it is but it's big hundreds of megabytes yeah I think so yeah it's written in C plus
well we could see plus it's basically written in C plus plus but it's kind of procedural so
it's just it's just like features of C plus plus but it's not really it's not really object oriented
in a real way any other questions let's see what is the real life scope of my SQL workbench
so workbench is works yeah it's compatible with MariaDB and yeah it's compatible with MariaDB
it's like like a like a design call it a a a-d tool entity whatever it's a design tool you can
design tables and stuff and but it communicates through SQL so it's works finding both
what yeah one day one is nice about like no SQL it's an embedded objects turns the dynamic columns
have embedded objects some kind of a table no so you can you can you can just embed data but not
more complicated structures you can embed characters or integers and stuff like that but you can't
I mean I guess you could have multiple but I mean it's not an object now okay
but if you have more common questions we'll be we have a booth up here so you can come
and ask questions at the booth we'll be there thank you
you have been listening to Hacker Public Radio at Hacker Public Radio does our
we are a community podcast network that releases shows every weekday Monday through Friday
today's show like all our shows was contributed by a HPR listener like yourself
if you ever considered recording a podcast then visit our website to find out how easy it really is
Hacker Public Radio was founded by the Digital Dog Pound and the Infanomicon Computer Club
HPR is funded by the binary revolution at binref.com all binref projects are proud
sponsored by lunar pages from shared hosting to custom private clouds go to lunarpages.com
for all your hosting needs unless otherwise stated today's show is released under a creative
comments, attribution, share alike, free dose of license