Episode: 1089 Title: HPR1089: Max Mether of SkySQL talks about MariaDB Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1089/hpr1089.mp3 Transcribed: 2025-10-17 18:47:16 --- That's it. This is Max Meather from SkySQL. Thank you all right so I'm here to talk about Maria DB. Before we start who knows what Maria DB is. All right who knows what Maya is guys. Okay that's good that's good. So I wanted to do this I wanted to make a non-technical talk. Unfortunately I wasn't able to so it's going to be quite technical so bear with me. Yeah so I was asked to do this talk about Maria DB so my first question was okay so what is Maria DB so I thought actually it's fairly simple it takes something old and you replace it with something new like this and that's a fairly simple story and it looks good the problem is that it's not really true so Maria DB is not really a replacement of Maya it's actually a combination right so Maria DB is a branch of Maya SQL so 98% of everything you find in Maria DB is actually Maya SQL and if anyone who these girls are on weather here they're actually the founders girls he had this idea of naming his products after his daughters so that's actually why Maya SQL is called Maya SQL because his first daughter is called Mew and no one ever says Mew SQL but the Swedish correct pronunciation would be Mew SQL and his next daughter is called Maria and that's why we have Maria DB ask the product all right so what is Maria DB to branch of Maya SQL it's completed backward compatible drop-in replacement if you're running Maya SQL right now you can stop it put Maria DB binary restart it and it will work out of the box so it's a drop-in replacement and as I said well around 96% of the code is the same so it's very very very close and basically this talk I'm going to talk about the differences so what does Maria DB add to Maya SQL don't next okay so Maria DB is complete fully open source all the development is done on launchpad so anyone can download the source code try adding stuff to the source code it's all open but of course the whole Maya SQL source code has been developed for quite a while so there's a lot of code there it's all on the GPL so well anyone could add stuff or remove stuff as long as he's taking the GPL everything is open source and well Maria DB is committed to staying open source there are no closed source modules as opposed to the Vanilla Maya SQL branch where there are now some closed source modules and another big thing the bugs database is open Maya SQL traditionally had an open bugs database so everyone ever submitting a bug it was put into this open database everyone could see the bugs you could check if you're running to a bug you check is it already submitted and so forth unfortunately that change recently Oracle who not want Maya SQL they decided not to have all the bugs out in the open so customers when they file bugs they fight it to different system the old bugs database still exists however it's not always maintained a lot of bugs are open since five six years nothing has happened five six years so no one knows if they're actually doing stuff on them or not and furthermore when they do releases in Maya SQL now they refer to bug fixes but the bug fixes are from the internal closed bugs database so you don't actually know what they're talking about so you don't say fix internal bugs 743 and no one knows what this bug means and this is actually why there now are certain distributions who are thinking about switching from Maya SQL to Maria Libby because they don't they don't know what goes on in Maya SQL all right so let's have a brief history here so Maya SQL was developed by one guy in the beginning called Montevidanius the first version of Maya SQL well you can't really call it Maya SQL but the first thing he developed that would eventually become Maya SQL was done in 83 it was a database it was developed in basic if anyone still remembers basic in 83 it's all long time ago it eventually got poor it to see and got a lot of changes the biggest one being in in the beginning of the 90s so the database which was then called net base or union reg and stuff it only had an isam interface right there was no SQL you couldn't use SQL you had to use an isam interface and at the time the company developing the isam interface said well we're actually going to start developing this so your database won't be one who work anymore so they decided to all right let's get an SQL interface to the database so that people can access it and at the same time the guys Monti who had created it and David Oxmark one of his co-workers they had the idea of why don't we just we create the SQL interface and we make it open source for them it's a big decision because they had like three four customers who were paying them a lot of money and they were scalable if you make this open source and free you know a lot of potential customers might just use it and not pay us anything but because they had been using a lot of new tools before so they felt that they've they've had a lot of benefit from open source they wanted to make sure that they gave something back so they decided to release mySQL as open source and this was done in in 95 at the same time they created a new company called mySQL AB at the time it had two employees those two guys mySQL AB slowly grew as you know mySQL happened to be released in a very fortunate time in 95 about the same time the internet was exploding so lamp became a became something very known and very used everyone had their own lamp lamp a server running with Linux Apache mySQL and PHP RPR and the mySQL usage because it was free because it's open source my mySQL usage exploded the company didn't follow it's very slowly grew because they were only selling selling support but eventually they had some more business oriented people come about monthly being a great developer developer he perhaps wasn't the best CEO of commercial company so they got another CEO more to me because and mySQL started growing quite rapidly and of course in 2008 mySQL was acquired by Sun for one billion which is still the most profitable open source acquisition in history so if anyone ever says that open source is not profitable here's the good counter example they paid one billion for mySQL two years later or paid four billion for Sun so you could argue that mySQL was one fourth of Sun which is of course not true which you could argue that because it was two years later anyway the acquisition of Sun or the acquisition of mySQL by Sun changed a lot of things mySQL had always been a small company two people at the beginning but even at the acquisition point there was less than 500 we were a small company there were 50 developers so small company Sun had around 40,000 employees so quite a different structure a lot of processes and stuff that you have in big companies so a lot of people decided that well this is not a place for me so they left and the company called Monty Program was founded in late 2008 they was founded by the same guy who had created mySQL and he took a lot of core developers with him and the first purpose of the company was to create add-ons to mySQL just make sure that there's an open source version of mySQL out there then of course Oracle acquired Sun and with Sun mySQL and the purpose of Monty Program changed a bit now they saw a real threat to mySQL and the open source needs of mySQL so they said well we have to make sure that there's always an open source version so they created the branch MariaDB and the first version of MariaDB was launched in February 2010 and well the latest release was launched a few weeks ago so who's behind MariaDB well Monty Program so the company created by the founder of mySQL and the creator of mySQL so guys who know mySQL really well he took the guys he wanted with him so there's around 30 developers and you can say that they are core developers there was quite a long time in mySQL where where if the founder went on vacation pretty much everything stopped because he was the only one who knew everything in mySQL later we hired two or three people who also knew everything so he could go on vacation and there was there was some other guys who could take over well all of these two three people are now in MariaDB but it's not only Monty Program there's also a community behind MariaDB the goal of MariaDB is to provide a community that developed stable and free database and in order to do this MariaDB is very open to the community it takes contributions it's developed on launchpad and many of MariaDB has captains that basically Stuart the development of MariaDB is to look at the patches and stuff like that has to approve everything and off the MariaDB captains only 64% are in Monty Program the rest are outside community people so it's in a large part developed by the community MariaDB they also have an open development model meaning that when they have new features and stuff they have an open workloads everyone can look at them anyone can suggest enhancements Monty Program is a distributed company people work from home all over the place they have meetings twice a year they have like company meetings and in the company meetings 85% of all their sessions are open to the public so anyone can actually join them customers do join them Facebook being a good big MySQL user and customer every time Monty Program has a company meeting Facebook is there they have at least two three guys just sitting on the sessions and make sure that their voices heard as well in that future development but anyone could join last meeting they had was in Athens almost that few months ago and there was there were like three or four companies some people there just to make sure that their voices heard in future development so it's very open everything they do all right let's take a look at the differences between MariaDB and MySQL so first of all the first version of MariaDB was 5.1 and this was based on MySQL 5.1 that's easy enough to follow and basically the version numbering follows exactly MySQL 5.1 so you had MySQL 5.1 30 you would have MariaDB 5.1 30 which was based on MySQL 5.1 30 very simple however they did add quite a few features who knows what a storage engine is one guy okay excellent so MySQL has this concept of storage engines which means that for storing the data so in a database you could really simplify you could say you have three main pieces you have a parser parser the queries you have an optimizer and you have a storage engine which stores the data and retrieves the data and in MySQL that's storage engine part is modular so you can choose how you want to store the data which also has lot of implications like indexing types data types all of that it is stored on this commemorative and so forth so there's an API between the optimizer and the storage engine and all you have to do is fulfill these functions in the API and you can create your own storage engine and many have in MySQL the MySQL distribution there's like five six different storage engines but there's a lot of open storage storage engines and so forth so anyone can create their own storage engine so the first thing they did that MariaDB was to include third-party storage engines again this is something that Oracle doesn't want to do mainly because Oracle does dual licensing so MySQL is GPL but they also sell a commercial license for people who want to embed MySQL in their software who don't want to get bound by the GPL and there's a lot of fun about this stuff but basically they have two two types of licenses and of course they can't put GPL stuff in the commercial license so any open source GPL storage engine Oracle can't touch because they can't sell commercial licenses so MariaDB introduced a few storage engines PBXC a transactional one which actually the main goodness with this it has really fast crash recovery so if you crash a lot this is a good storage engine X3DB who has heard of X3DB no one all right who has heard of innerDB okay quite a few so innerDB is the main storage in MySQL it's a transactional storage engine similar to traditional databases like Oracle and Cybase and and so forth and X3DB is actually very close to innerDB it's a branch of innerDB so it's the innerDB storage engine with a few enhancements and again there's a very complicated story to why this this this was about innerDB traditionally was created for well it was created in the in the 90s so it was back then there weren't that many multi core multi CPU machines and it turns out that it didn't perform that well on multi core multi CPU machines and when people started having multi core machines innerDB started running into all of mutics and stuff which didn't scale well but a lot of these mutics were quite easy to overcome and and Google who was using MySQL heavily at the time they figured it out they made some patches that fixed this and they released those patches problem being again MySQL couldn't take in the patches because they were released on the GPL and we were selling commercial licenses so we had to record the whole thing at the time out of the MySQL so we couldn't take the Google patches because they were GPL so another company called Percona took the Google patches and added them to innerDB and created a patched version of innerDB called xrdb and that's basically xrdb's eventually MySQL added these the same features not of course the same code because they can't they added the same features into innerDB's and now innerDB and xrdb are very close but at the time xrdb was scaled a lot better in innerDB anyways that's a storage engine and there are some other storage engines oh yeah who knows what MySQL is two guys MySQL is the traditional kind of the old school storage engine in MySQL it's very good for storing data in an efficient way however it's not transaction it's not scratch safe it's pretty much bad in many senses except that data is very compact on desk and back in the days this was the only storage in MySQL it used to be the default so a lot of customers actually used MySQL without knowing it because they didn't know that you could choose so they were saying well we use transactions and we crash that we're like actually you're not if your system crashes during a ride while your tables will be corrupt and they were like really and that's still the case with MySQL and RIA is basically an improved version of MySQL so it has the same storage however it's crash safe it has a small log so if the server crash is doing your operation it can actually roll back that over a operation myasm can't if you're doing an update changing 100 rows system crashes well your table is somewhere in between and that's where the table is there's no way of falling back with RIA you can roll back then there it's in bug fixes one of the things was that MySQL 5 when you wanted to upgrade from MySQL 5-0 to MySQL 5-1 the MySQL development team actually forgot that people might want to upgrade so you weren't able to upgrade without taking a total total dump of your data and reinserting all the data and this was actually just due to a simple bug and MariaDB fixed it so going from MySQL 5-0 to MariaDB 5-1 you could do it without taking it back up but MySQL 5-0 to MySQL 5-1 you couldn't and that was a big seller to many customers back in the days please then they added microsecond support can anyone see this red stuff no I don't know where to turn on the lights basically MySQL traditionally has only second granularity on everything so something takes zero seconds one second or two seconds and well you don't get lower than that in MariaDB you get to microsecond level so you can see you can see if your query took I don't know 0.5 seconds or 0.51 seconds instead of having just a zero there or one so that's something that added in MariaDB and this goes both for the slow query log and the process list so you can see a statement how long has it been running in microseconds and not only in seconds then they added a thread pool a very basic thread pool which basically traditionally MySQL every thread every connection to the every connection to the database is a thread so if you have a hundred connections you have a hundred threads plus a few extra threads and this can be great but it can also be not so great for example if you have a lot of small connections you have a lot of connections doing small transactions this means you end up have a lot of a lot of context switches between between a different threads because well every connection is a thread and if they do stuff all the time they have to switch between the threads all the time so for these types of scenarios a thread pool makes sense you get less context switching because you queue the stuff you queue all the operations into a pool of threads another thing another thing that Monte program did was MySQL had an optimizer team that was basically working on the MySQL optimizer it had four people in it or five at the end so five people working on the MySQL optimizer well Monte program hired five of these five which left how many in Oracle zero so all the guys working on the optimizer moved out to Monte program so they added features to the optimizer one of the first was this table elimination feature which is to be honest it's not that great a feature it's mainly mainly useful if you use a machine created SQL statement so if you have some kind of middle layer that creates the SQL for you that's when you might run into this so basically if you have this kind of anchor schema which basically means that you you have attribute as a table so if you instead of having a table with like a person and you have as a column you have it's his ID his name is phone number and stuff you have each attribute will be in its own table so the first date will be one table the whatever name will be one table here we have actors and the rating of the actor and another table the idea with this is that it's very very modular so you don't you don't do an alter table to add columns you just add more tables to add columns so it becomes very modular all right if you have this type of schema normally you would create a view because you don't want to have like nine table joins every time you want to see some things so you could have viewed at for it to access all this which means to have a view with all the nine tables and then when you do a simple query on this view it will have to join in all the nine tables every single time and that's what happens in standard MySQL every time you use this view even if you're looking at just one column of one of the tables it will join all nine tables and that's where this is where table elimination elimination comes into the picture it actually eliminates the table the tables the optimizer eliminates the tables from the join algorithm again you are not likely to do is by yourself but if you use machine-created code they can actually create these types of schemas for you and you don't even know it yes that was MariaDB 5.1 because Oracle was very slow at coming with new versions of MySQL and MariaDB well you know they were developers and they were developing new features so that is decided to create a new version of MariaDB 5.2 which is built on MariaDB 5.1 which is built on MySQL 5.1 so it's still based on MySQL 5.1 MariaDB 5.2 and here they added a few features one is plugable plugable authentication so traditionally MySQL there's one authentication protocol and that's built in and well that's what you have to use however in MariaDB 5.2 you can choose a different authentication protocol you can plug it in so to say there's already a plan plan plug-in included but you can use any type of you can create your own plugin for this for the authentication so you don't have to use the MySQL built-in authentication anymore that's one feature another very useful feature is user statistics in MySQL there's a lot of statistics but they're all on the server level so you see what's going on the server but you have no clue on what's going on for an individual client individual user or index or tables so this is something that MariaDB 5.2 so you get statistics per IP so you see okay all connections from this IP address what have they done how much memory have they use how many queries and everything same with a specific user account what's this user account done done and then the last ones are index so for every single index you see how many times this index been read and so forth and same with the tables this before MariaDB 5.2 this doesn't exist so if you use MySQL 5.1 you don't have statistics on this level you have server for the whole server then there's something called things does anyone know what's things is okay things is an open source it's a full text full text search engine it's very much used for the web MySQL has some has full text indexing on my island tables but the full text indexing indexes are not very good I mean they're better than no full text indexing but if you're talking about high-performance websites it's not good enough so instead of using the Maya some built-in full text indexing you can use Sphinx which is a program where you have full text indexing so you put data in it and it creates indexes on text based on articles or whatnot and normally you run Sphinx as a separate process and you query it through your code so you send a request through Sphinx saying okay I want to look for this and which articles and it gives you back the key values for the articles and match your search. While you MariaDB you can do the same except that you can do it from within the MySQL server so you can access the Sphinx engine from within the MySQL servers you don't have to use the native C or whatever calls to Sphinx you can use it through the MySQL engine which means that it's easier to combine the results of Sphinx and MySQL. Another thing is the segment ison key cache which basically allows better concurrency. My ison tables have a joint key cache for every single index of every table is stored which doesn't work too well when you have a lot of concurrent users because they all have to use the same key cache so you run into concurrency issues and by segmenting it up you can have a lot better concurrency. I have a benchmark image that Jamie will show you to give you a bit of an idea so again if you run with here's one benchmark and if you see the number of threads growing you see the benefit of running segmental key cache so this is basically here you have the threads here's how many queries you can do per second if you don't have a segmental key cache you basically max out around 5,000 queries per second with my ison if you have a segmental key cache you grab to 20,000 and this well that depends on what type of benchmark you run but this is about the worst performance game in the benchmarks and that was MariaDB 5.2 but the big release for MariaDB is MariaDB 5.3 which went GA in December so not so long ago and this has a lot of new features again it's built on MariaDB 5.2 so it's built on MariaDB 5.1 which is built on MySQL 5.1 so we're still on MySQL 5.1 it has quite a few features something called handler socket which basically gives you a kind of a no SQL or no SQL interface to InnoDB engine so you can directly access the InnoDB engine without passing through the SQL layer or the or even the optimizer layer just send a simple crude operator so a crude is what is it creates read update delete something like that create read update delete so you can do direct operations on the data without passing through the optimizer so this is very useful if you do simple kind of batch type of operations basic things where you normally wouldn't use a SQL engine you would use some kind of no SQL stuff like MongoDB or something so you can kind of get similar stuff with InnoDB then there is something called dynamic columns which is again kind of a no SQL feature because it does stuff that you're not supposed to do with SQL so basically it allows you to create columns with dynamic content so basically you store stuff in a blob and you just have some functions for treating what's in the blob so basically this is useful for some type of data warehouse or something where you sell products and well depending on what type of product it is it has different attributes and you don't want to have different tables because okay you have all their phones here you have all your memory in that table you have everything there so you can store all your products in this one table and you just store the attributes in this dynamic column and I have an example for you so you get the idea of what we're doing here if you can see anything well not the red so where is turn off the light so the best way to do this is we have a blob column here and then we use a column create function and we just give an index which means which attributes is there must be a switch somewhere I'll check it out there I have door sorry for you in the news I guess not yeah so there's a column create function and you just basically give even index to the attribute you want to have and then you give a value to the attribute so here of course as as the guy programming this you have to keep track of okay which which index corresponds to which start with attributes so for example one here corresponds to the color we have one is blue you one is black you one is black and so forth and we have other other things like that and we can store any number of attributes so one product could have one attribute another product could have five and the third could have seven and so forth and then we can use this column list function to basically get this data back out the for example here I just had a list of all the attributes for each product so for example the Nokia is a phone it has attributes one and two one being color and two was something else that she says to me something else and so forth to get the list of the attributes wage product and we can also do searches of these products so for example yeah I don't know if you can see that there but we're looking for all products that have attribute one as black so basically we're doing a normal search we just want everything has black and we get a few phones and we're going to not be thinking about and what not we get all product that has black and so forth here we do a different thing we're just looking at and what color are these things up well so again it's feature at fairly useful for different things then we're going to be also added a lot of replication enhancement who here has used mySQL replication no one okay then you probably don't care about this basically it's yeah if you're using mySQL replication all of well some things are better but I'm going to skip this if you don't care there's a user feedback plugin which means that we can mySQL can already be can get data sent through so they can log what people are doing with the server okay not what type statements but just where they are what version they're using and stuff to better get an idea of what's just been used out there and this is my favorite feature actually they added a progress report for auto table it's a very simple feature to code but or it was a way where's the feature to code but before this if you do an auto table on mySQL you do auto table do use and change presenter and then you wait and after 10 minutes you're like it's been 10 minutes I mean I guess it's going to be done soon then you wait some more and it's been half an hour you're like well I can't stop it now it's going to be done really soon now then it doesn't hour and you're like oh it's been an hour I mean if I kill it now then have to redo the hour and thank you whoever did that and so forth thanks so it's a very simple feature but a lot of DBAs complain about this they do an auto table and then they're like I don't know how long it's going to take should I just kill it or should I wait or watch that do now you know there's a progress progress bar you get when you do the auto table you actually get a progress bar and you can do a show process list in the server you get a progress bar as well so the guy doing it gets a progress report and everyone else can get a whatever the DBAs can get a progress report as well so you know okay should I kill this or should I just let it run is it close to being done or not and this is my favorite feature there's a lot of optimizer enhancements um my school added sub queries in version four that one so somewhere around 2004 or three or something and this is great we could there was a check box do you have sub queries we could click it and say yes the bad thing was that they sucked it was a common common knowledge that my school has sub queries but please never use them because they're really slow but we still had them but so the bad thing with this is that well the good thing is that everyone learned that if you want performance you shouldn't use sub queries and in most cases you can do work runs you can use joints or you can do other things instead of sub queries the bad thing is that there's a lot of again a lot of people using middle layers application layers where where the school is created by the middle layer and there they use sub queries and and you can't control it because you're using this application layer and you have to use it and they had really bad performance because well sub queries weren't performing well well mainly because they hired all the all the optimizer team it was easy for them but mainly we fixed it so sub queries work a lot faster just basically sub queries are usable now they weren't before there's a lot of optimizations there but mainly what they do is if a sub query can be rewritten to a join the optimizer will actually do it previously the optimizer didn't even try it was like okay a sub query let's assume it's a dependent correlated and do worst case scenario it will be always work so the sub queries were always done in the slowest possible way before not anymore and some other doing materialization and stuff like that traditionally myoscope only had one type of join there's a lot of ways you can join tables when you do a join in a database kind of the simplest is to one myoscope used it's called nested loop join and it's basically like for loops nested for loops you have a for loop for the first table for the second third and so forth and that's to do you take the first row first first the second versus third and blah blah and this is great in many cases but in some cases it sucks and well if you only have this you only have this and if it sucks it sucks and you you can't do anything most database systems other database systems commercial database systems have multiple join algorithms and this is something that was now added to my gdb i guess the main main difference is like the hash joins is what it has to do with indexing and the batch key well batch acts is actually it's easy to understand instead of doing it a for loop one row per row you do a batch so you do a batch of the first table and then you do a batch of the second table and so forth depending on your memory structures and messaging and this can be a lot faster in some cases as an example there's an end there's my school cluster is a product where where the data is stored on the network on different nodes and there when you use nested loop joins it means that every time you want a row you have to send a network message all right I want this row send it back okay then next step then next step so the latency will kill you if you do a nested loop join it means that for every single row combination you have to send end messages on the net so normally with my school cluster if you have joins with more than two tables you should go for a coffee break while you're doing the join well with batch key access you don't have to do this anymore because you can actually get in one send you can send hundreds of rows so you have like a batch you say uh one kilobiter one 500 kilobytes of rows in one go instead of having to sequentially send every single row and wait for the network latency network so that's great and some other optimizations I'm not going to go into this however another another thing is that there's not optimizer control so you can control which of these optimizations are on and off for your queries so you can say well I want to use the old way and I don't want to try these new things and you can control it with this optimizer switch and then perhaps the biggest or best feature of my marydb53 is something called group commit and in order to explain group commit I have to explain a bit about how the server works so basically on the server this is in inodb or xodb the transactional engines with something like my eyes and you don't care about group commit because nothing is is crash safe anyway so you don't have to worry about currently in crash safe however inodb and xodb they're supposed to be acid compliant which means that their transactions are durable if the system crashes and you restart everything that's been committed is committed right you can't lose anything that's what crash safe means the problem is of course in general disk IO is very slow so what you don't want to do is commit every time you do a change you don't want to write this change on to the disk because it's going to be very very slow so instead what you do is you have a buffer pool all your all your data blocks index blocks and stuff are in this buffer pool you do changes to them well instead of well when you do the changes you have something called a log buffer where you log these changes as operations not the actual changes when you're done you have to you commit your transaction you don't write the data pages to the disk which are large you just write this this log to disk so you have a log of the changes where you just say whatever update or something and you write that to the disk because that's small so when you commit when you commit a transaction you have to sync your redo log to disk to make sure that this change is not written somewhere so in case the system crashes I can get this state back somehow and what happens when your system restarts is that well it looks at the data on disk and looks at start going through the redo log okay which of the which of the transactions in the redo log have been applied on this disk state and which haven't so that's basically what it does that's the crash recovery it goes to the redo log and applies the changes to the disk all right so when you commit you have to sync you do an F sync your redo log is now flushed it sync to disk and you're safe there's a few problems with this one is during concurrency you have hundreds of transactions or thousands of transactions everything of transactions commits at the same time well you can't do F sync at the same time so you do the sequentially in a queue and you F sync one transaction then the second the third and so forth again if you have lots of transactions this is not a good thing to do so what you try to do instead is you group your transactions together and do a group commit so you F sync all of them at the same time instead of doing it in sequence and that's the basic idea of group commit and that you do get in the newest version of mySQL you get this however what really added is something else mySQL also has something called a binary log who knows what a binary log is no one excellent so the redo log is specific to the innerDB engine okay so it's only two actions in this engine then there's something called a binary log which is the whole mySQL server and why is this important it's because this is basically an incremental backup of your system so when you do changes all the changes are stored in this binary log and this is also used for replication if you have replicating mySQL service they will use the binary log to replicate stuff so the problem now is well when you do a commit you not only have to sync your redo logs you have to sync your binary log as well because you have to match when you crash when you crash when you restart have to match you can have a transaction that's written to this here but not here because then your incremental backup well it's useless because it's some transactions are transactions are missing from it and so forth so you have to make sure that these are synced together and this is actually done through three fsinks so first you do a prepare on the redo log files then you have sync the binary log and then you are fsink the final commit on the redo log so you have to three fsinks for every transaction again if your load is if you have a lot of concurrency your system will fairly soon become IO bound because of this and this is the great feature of MariaDB they actually made a group commit that has the binary log included in it so I mean any system where you want to have a backup you will need to use the binary log in mySQL so anyone using backups will also have the binary log which means that they will need this group commit feature well they will need this group commit feature if their load is IO bound and this is the great thing so we have group commit where the commits on the binary log are grouped together and it's dynamic you don't have to turn it on basically if you have one commit and then there's more than two in the queue they will be grouped together so as soon as you have more than three commits at the same time you will use the group commit feature and I have a benchmark here and this is this benchmark is really cool because actually not done by MariaDB or anyone promoting this this is actually done by Facebook because they also needed like a group commit because they were had an IO bound load so they decided to code a group commit feature so they coded it it was okay but it wasn't very good compared to MariaDB so they created a new version of it and this is the benchmark with original mySQL the second version of the Facebook patch and MariaDB so here you see as your concurrent transactions grow how many commits can you do per second basically standard mySQL max is out around 5,000 commits per second you can't do more than that because well your IO bound and you can't do more than that it of course depends on your disk but this is on a fast disk look at MariaDB what you have to almost 28,000 commits per second which is huge Facebook gets to around 18,000 commits per second and actually after doing this benchmark Facebook officially decided to stop this project because the MariaDB patch was so much better so they they're not doing it anymore but you can actually this you can find on the internet if you if your Google group commit Facebook MariaDB you will find this article on Facebook pages it's done by Mark Culligan who's the mySQL guru at Facebook who actually used to work for Google so he was the mySQL guru at Google and now he's the mySQL guru at Facebook and this is one of the best features of mySQL 5 MariaDB 5.3 well well MariaDB team was having fun creating new versions best of mySQL 5.1 Oracle actually produced a new version called mySQL 5.5 and the MariaDB team were like well you know we'll just we'll just merge our MariaDB versions into this new Oracle versions version turns out it wasn't that easy because MariaDB had been adding so many features all the features you saw until now putting them back into mySQL 5.5 which has had some of some features that were similar and they had to kind of make sure that they didn't do the same thing twice or they didn't overlap and so forth took a long time MariaDB originally promised to be to be released MariaDB 5.5 was promised to be released in April 2011 it was released two weeks ago the first alpha version but so basically what is MariaDB 5.5 it's mySQL 5.5 so the newest version of mySQL plus all the features from up to MariaDB 5.3 so they've taken all the new features merged them into mySQL 5.5 and well they had to a lot of work because of the the the conflicts but they managed to finally do it and this was what it was supposed to be but at the same time they couldn't help themselves but adding a few more features so there's a few more features here and this also has to do with the fact that in mySQL 5.5 there's now the community version and the enterprise version differ so this is when Oracle added these commercial extensions that you can only get if you're paying customer and the MariaDB team they decided that well we cannot have Oracle giving commercial extensions that there are no open source alternatives to so they wanted to actually code the same thing in MariaDB and one of the extensions I think there's only two extensions out there but one of the extensions was a thread pool and it turns out the Oracle thread pool was better than the one MariaDB had implemented previously and the MariaDB guys were furious because this is not good so they had to re-code their thread pool and that's what they did so the thread pool added in 5.1 is now and they added a new thread pool in 5.5 the big difference is that it's dynamic the previous thread pool was not you had a fixed set of threads here it's dynamic and it's also optimized differently for different operating systems and you can tune it quite well on Linux machines and so forth so again this should be on Linux this is pretty much the same as the one Oracle did on windows know they use MariaDB use something else I don't know and then there's some more features one interesting feature is my MySQL has I put this here because it's actually quite cool MySQL has this feature called limit so you can when you do any type of select statement you can add limit clause and it limits the output to a certain amount of rows you say select huge select limit 10 and you will only get 10 rows in your output and that's cool the bad thing is that if you have a 407 table join it will actually do the join before it limits your 10 rows to the output so it might last forever and now they extended this limit with rows examine so you put a max number of how many rows should be examined so for example here I'm doing Cartesian product between two tables which means I take the n times n so if I have one million rows and one million rows this Cartesian product will actually create what is that a thousand a trillion I don't know one million times a million it's something weird 12 trillion but whatever it's a lot of rows anyway but if I had this rows examined a thousand it will stop once it's done a thousand first rows so that's a city feature but it's cool I'm going to be 5.5 so the alpha or I think it's beta version is out now and it's supposed to be GA fairly so I mean there's not a lot of features it's mostly the old features so it should it should stabilize earlier than the others and the previous version I'm going to be then for the future I'm going to be 5.6 well all of this is of course guesswork because they're not working on it yet they're waiting for 5.5 to stabilize me for but here are some of the features that they're thinking of adding and you can actually vote for features so we work together with multi-programmed so we have a vote where there's a few potential features list and you can go and vote okay I want this feature and great and that will go into multi-programmed all right we're on your side so is Maria MariaDB a branch or a fork with my score 5.1 it's very simple my score so Oracle has my score now the community edition in 4.5.1 the community edition equals the inter-presentation and MariaDB follows these without the problem so it's definitely a branch same thing for MariaDB 5.2 and 5.3 because they're based on on my score 5.1 with my score 5.5 it gets a bit hairier why well because the community edition and the inter-presentation is no longer the same because they added these commercial extensions and because MariaDB of course had a feature fest in the previous version that had to backport all these features but they did manage to merge my score 5.5 and MariaDB 5.5 together so it is still just a branch it's still merging back to when there's a new version of of my score 5.5 the community edition MariaDB will merge into that version and there will be a new version of MariaDB so it's still only a branch however 4.5.6 who knows it depends on what Oracle does the more commercial Oracle goes the closer MariaDB comes to being a real fork and a no longer branch but at the moment it's a branch meaning that it merges back into the main tree a fork no longer merges back that's a big difference and also fork is made of metal and this is tree would all right so now all of you want to get MariaDB right after this who wants to get MariaDB now that's great so how can you get it well this MariaDB.org that's the main site you can download stuff you can also look at documentation to have a knowledge base and so forth you can also get it from the open source build servers so if you're using open source well the have package is ready made for you and a few others and multi-programmed have their own yumm and app repos available so you can get it if you add their repos to your distribution you can actually use yum or app gets to get them. What is empty? What is empty? Multi-programmed sorry it's a company behind it and what about support and services well multi-programmed they do what they call NRE which is non-recurrent engineering so if a customer comes and says well we want to have this feature in MariaDB. Multi-programmed is okay great we'll put it on the on the list of features and we'll get to it eventually the customer says no no I really want this feature so all right how much are you paying and then if they pay well they will put this feature sooner and that's basically what they do. Multi-programmed is a company with only developers so they don't really do anything else than develop the problem is that their product is free right it's gpl no one pays them anything so they don't earn any money and that's where kind of we come in SkySQL we do services we do support training and consulting and basically every time we sell support contracts a part of what the price goes to multi-programmed because they are kind of the back end support so we do a level one and level two and they do all the bug fixing so if there's a bug if one of our customers finds a bug or well they find they have a problem our support in jeans try to try to well look at the problem and say well this is a bug we send it to multi-programmed multi-programmed delivers a pledge and of course this works with MariaDB but also it's standard MariaSQL because they know mySQL pretty well it's strange enough and of course there are others providing MariaDB so we're not the only ones I don't know what contracts the others have but there's the list at MariaDB org service providers but basically if you buy stuff from us remember the money the part of the money goes to MariaDB that's how they are in the living because they don't get that much NRE and even more information you can go to this thing there's a mailing list on launch pad and of course the source code is on launch pad there's Maria channel on free node that's basically how they communicate at SkySQL we use Skype and we have our we have an IAC channel but it's internal closed and these guys use free nodes IAC channel as the only means of communicating well and email but so you will have them discuss features and development stuff on this free node you can just anyone can log in and see what they're talking about you might not always understand it because you know like me they're not native English speakers they're Russian and stuff so sometimes you don't tell a clue what they're talking about anyway and there's actually a MariaDB book MariaDB crash course which you can get from Amazon or whatever if you're interested that's it yes I'm wondering about the MariaDB source code how big it is in my so it's fairly big because it's I mean it's it's my SQL plus the extra features so it's I can't say how big it is but it's big hundreds of megabytes yeah I think so yeah it's written in C plus well we could see plus it's basically written in C plus plus but it's kind of procedural so it's just it's just like features of C plus plus but it's not really it's not really object oriented in a real way any other questions let's see what is the real life scope of my SQL workbench so workbench is works yeah it's compatible with MariaDB and yeah it's compatible with MariaDB it's like like a like a design call it a a a-d tool entity whatever it's a design tool you can design tables and stuff and but it communicates through SQL so it's works finding both what yeah one day one is nice about like no SQL it's an embedded objects turns the dynamic columns have embedded objects some kind of a table no so you can you can you can just embed data but not more complicated structures you can embed characters or integers and stuff like that but you can't I mean I guess you could have multiple but I mean it's not an object now okay but if you have more common questions we'll be we have a booth up here so you can come and ask questions at the booth we'll be there thank you you have been listening to Hacker Public Radio at Hacker Public Radio does our we are a community podcast network that releases shows every weekday Monday through Friday today's show like all our shows was contributed by a HPR listener like yourself if you ever considered recording a podcast then visit our website to find out how easy it really is Hacker Public Radio was founded by the Digital Dog Pound and the Infanomicon Computer Club HPR is funded by the binary revolution at binref.com all binref projects are proud sponsored by lunar pages from shared hosting to custom private clouds go to lunarpages.com for all your hosting needs unless otherwise stated today's show is released under a creative comments, attribution, share alike, free dose of license