Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Lee Hanken
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions

648
hpr_transcripts/hpr3689.txt Normal file
View File

@@ -0,0 +1,648 @@
Episode: 3689
Title: HPR3689: Linux Inlaws S01E65: TerminusDB
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3689/hpr3689.mp3
Transcribed: 2025-10-25 04:08:00
---
This is Hacker Public Radio Episode 3,689 for Thursday, the 22nd of September 2022.
Today's show is entitled, Linux and Laws Season 1 Episode 65, Terminus DB.
It is hosted by Monochromic and is about 68 minutes long.
It carries an explicit flag.
The summary is, Terminus DB NOIS QL Database.
This is Linux and Laws.
A podcast on topics around free and open-source software, any associated contraband, communism,
the revolution in general and whatever else, fanciful.
Please note that this and other episodes may contain strong language, offensive humor,
and other certainly not politically correct language.
You have been warned.
Our parents insisted on this disclaimer.
Happy mum?
Thus the content is not suitable for consumption in the workplace, especially when played
back on a speaker in an open-plan office or similar environments.
Any miners under the age of 35, or any pets including fluffy little killer bunnies,
a trusted guide dog, unless on speed, and Q2T rexists are other associated dinosaurs.
Due to the use of a closed source operating system from the northwest of the US of A,
on the Terminus DB site, all your qualities suffered a bit and may not meet the expectations
of all listenership with regards to the usual all your quality of the podcast.
We apologize for any inconvenience caused.
On the flip side, we are happy to report that no animals were harmed or used for testing
during the production of this episode, at least as far as we know.
Welcome to Linux in-laws season 1 episode I can't even remember.
Unfortunately Martin is not here tonight because he's feeling a bit under the weather, but
I'm more than happy to host this alone for a change, and I'm more than happy to introduce
our guests for tonight, it's a project called Terminus DB, and with me are Gavin Mendeltson
and Lucvini.
But without further ado, guys, why don't you introduce yourself?
I'm Gavin Mendeglison, and I'm a CTO for Terminus DB and one of the co-founders, so my background
was in maths and physics, and then I went into computer science, got a doctorate in computer
science, and then I was working on a project at Trinity College Dublin on long range historical
research, and we needed a database, and strangely, we decided to actually write one.
So that's sort of the origin of Terminus DB, and a bit about my background.
Luke, off to you.
So hi, thanks very much for having me, I'm Luke, I'm Luke Fini, I am also one of the co-founders
of Terminus DB, I joined the gang after the technology and the company spun out of Trinity
College in Dublin, and my background is not in technology, but rather in diplomacy,
so I work more on the relationship side of the organization, though I now have some
input into the tech side, as much as I can, at least, to give some direction there.
Now, funny, fun fact, Trinity College, full list of other people, it's the place where
did the PhD, about 25 years ago, apparently, not a Luke nor Gavin wore around at the time,
but it's still the happening place that it used to be, I mean, I did the PhD in the
very research group, where Ayona was a spin-off from or by whatever, whatever we would call
it, and you're looking at kind of early 90s, and I reckon the rest is history, so maybe
before we go into the depth of the project, maybe you can shed some light on actually
how you met and how you found the Terminus DB, being a spin-off project or being a spin-off
of a research project at Trinity College, because this is a 3TCD or Trinity College, as
it's also known, it's still closed to my heart, because, as I said, I did PhD there
ages ago, so why don't you give it a whirl?
So, yeah, so when I was at Trinity College, Dublin, I was invited there, actually, by Kevin
Fini, who's Luke Fini's figure, and Kevin Fini suggested that I work with him on a project
that he was putting together, a European project, in which he was trying to involve long-scale
historical research, some industry partners, and others that were interested in, basically,
in big data management projects.
Okay, and then we spun out, we needed somebody who was more on the commercial side, and so
then as soon as we spun out, essentially, we went and we approached Luke to see if he
would be keen on doing it.
Now, Luke, that must have been quite a transition, coming from the diplomatic side to a
software company, was that up for that matter?
Yeah, it's a huge transition, you feel like you're, you know, when you get off a ship
after going on a long ferry, you feel like the ground is still moving around underneath
you.
Okay, it feels a little bit like that, and so far, there's just so much information to
take on board, and so much the secret language of technology that needs to be ingested before
you can even sound credible.
So yeah, it's definitely a big shift, but it's been very exciting, very interesting.
It's great to do something very different, so that you're, you know, you're fresh, the
world from an entirely different angle, you get to meet new challenges, meet new challenges
in a different way.
One of the nice things about working in a diplomatic service is that there's always
something new to do, you know, you go off on an overseas posting to an embassy and it's
quite different from working in different bits of government, and that's definitely the
same with startups, you know, you have to kind of hit the, do a million different things
at the same.
Okay.
Now, when I hear diplomatic services, for some reason, and maybe I've just read too many
spa novels, the intelligence angle rings a bell, but I reckon you are nowhere near that
sort of focal point being a diplomatic being in the liberal services, of course, you can't
speak about it, it's very enough, but, yeah, but Ireland is somewhat unique in support
as we have no intelligence service.
This is the official version, of course.
We have the police and we have the special branch, the police, who do intelligence for
policing.
We don't have a specific agency that's all shooted in the military or an independent agency
like the CIA or MI6, or anything like that in Ireland.
There has been some suggestion through the years that we found such an agency, nothing
has ever come of that.
And so uniquely in Ireland, at least I can say that I can be 100% sure that I was not involved.
Plus, I was recruited by a foreign intelligence agency, and I can assure you that.
So MI6, if you're listening, these rumors of an Irish intelligence service are just
that, rumors, no truth behind this, moving just off too much safer grounds, maybe you
can tell us a little bit about the project itself, basically why you decided to spin this
up into commercial company, and then also talk a little bit about the history of Terminus
after you spun it off from TCD.
Yeah, so I guess in the early period we were evaluating different storage possibilities
for a collaborative knowledge graph.
So essentially we knew we wanted something that was extremely rich in terms of FEMA that
it could store.
So our problems that we were looking at, for instance, were looking at human polities
throughout human history, and we're trying to figure out what you have to be able to
say things about them that are both geographically and time-scoped, so you might say what is
the population of foam, and that population exists within some kind of range of error
bounds, and then the period over which you think it's about that also exists over some
kinds of ranges that also have error bounds temporally, and then you'll have lots of these
sorts of data points.
These data points are quite rich themselves, they're not just like a number, it's like
a lot of information is bundled up into what they call a variable in CESHAT.
So CESHAT was the global historical data bank that was trying to build these big complex
historical databases or knowledge graphs, and it extended far beyond just the population,
so it's also talking about the likely carrying capacity, it's talking about what kinds of
rituals they practiced, whether or not they had human sacrifice, whether or not they had
a doctrinal religion, the number of levels of hierarchy in the states, all kinds of things
like this, various social complexity variables.
That's a very complex rich databases that had to be managed over time, that would change,
the ontologies would change, researchers would discuss the problems, and we had to somehow
be able to allow curation by a large international group of people in some kind of coherent way.
So it was a hard sort of data management problem that is actually pretty similar to a lot
of things that exist in the industry.
Okay, before we move on, maybe there are two or three people in the audience who do not
know what an ontology is, and how that sort of database that you are just describing
as a semantically based graph database connects to the ontology aspect.
Maybe you can share a little bit on that, for the non-technical people in the audience.
Yeah, that's a very good, I mean it's actually kind of a complicated philosophical problem,
so ontology is trying to find out.
We do have a case.
So we do have about four hours, so max itself out.
Okay, so ontology is really just the logic of the meanings of things and how to categorize
them and how they relate to each other.
So semantic meaning and the sort of structure that you want to put on knowledge.
So it's not really a schema per se, but you can imagine creating a data model from an ontology
or having a data model that is supposed to in some way reflect the ontological structure
of something.
So essentially what you're implying, of course, term is being a no-sequel database as
in an somewhat unstructured database, meaning if I have an assumption correctly, essentially
you have an unstructured knowledge base that you're pouring into a no-sequel schema for
wonderful better explanation.
Yeah, so I mean that's not too far from the reality, so you have something that describes
the sorts of things that we want, but we wanted like much richer control over what was going
into the database to make sure that for instance when people were putting in the population
of Rome, they weren't putting in something random.
So one of the problems we had initially, we were trying to source information from public
sources, and one of those sources we used was DBpedia, and the DBpedia had, for instance,
we wanted to import battles and wars into such a from the information in DBpedia.
So DBpedia is a data bank that was developed from the info boxes in Wikipedia.
So it's got a lot of information, a lot of very useful information, however there's not
that much in the way of quality controls on some of these things.
One interesting problem that we ran into, we imported all of these wars, and we were
searching through, and like there was one about all of these birds, there was all these
connections to birds, and we were like, why are there birds in our database, all of a sudden,
it turned out that there was an ornithology war of 1865 or something like that, which
is completely not a war in any sense, but somehow it had been imported into the war section
of technology.
So that kind of looseness is quite dangerous, and if you're trying to build the knowledge
graph, you need to have, if you want to use it for analysis, for scientific analysis,
you have to have a lot of quality control over what's going in.
So a lot of what we were developing in the early stage was really data quality assurance
technologies, trying to keep things so that they met a schema structure that was guided
by an ontology.
Okay, this bird thing is really interesting.
Did you buy chance, also tackle co-war as in between Mexico and Colombia for that matter?
No, we didn't get those ones in, I'm just asking that some other, okay, it was probably
only because we didn't search those dates, so I was avoiding for earlier, so.
So when did you actually, you said that you scrapped Wikipedia, but also DBE, did you?
DBE is a scrape.
Oh, okay, it is a scrape from the intro box, isn't Wikipedia.
And they take that data, and then they also do some curation on that data to try to improve
its quality.
Okay, and when I take a look at the Wikipedia page from that, often the C-Shirt project,
it says that the global history, the data in a single large database that can be used
for to test scientific hypothesis.
Now, that's an interesting approach, isn't it?
So essentially, you use historical data to confirm or reject scientific hypothesis.
It's beyond history, or as in generally speaking, it's just historical hypothesis.
Well, I mean, it could, I mean, some of the implications are present today, right?
But it is meant to test past hypotheses.
It has also been used to predict future events.
So for instance, some of the information that was used, one of the hypotheses was used
to try to predict the sort of Trump era instabilities 10 years prior.
So essentially, there's a bunch of social variables that predict that there's going to be
a likely crisis in the United States due to various different convergent elements.
But you have to happen around 2022, so.
What about the Klingon Romulan war?
Does that feature on your map?
Well, I mean, if we put it in, then we should be able to just fear that it's going to happen, yeah.
Okay, I'm very interesting background.
So the idea was actually to monetize this in terms of spending it out of this project.
And I reckon that TCD was part of C-Shirt as in the project itself.
So we were part of the C-Shirt project, but Trinity itself wasn't a partner.
Ah, okay.
Yeah.
It was like some people from Oxford, Peter Turchin, who was sort of leading the thing,
but Harvey Whitehouse and that's actually.
And going back to the spin-off, what exactly drove you to the conclusion that there might be commercial market for this?
Why did you take this concept from a research angle and put it into a start-up?
So we were also in this.
So the C-Shirt project was part of a larger European project on data management.
And one of the other partners in that was Walter's Clure and Walter's Clure is a big Dutch German.
And they were interested in the technology, and so we had an object with them.
And that's sort of what the genesis of this spin-out was.
Wow.
And if I'm asked why the name term is, or term is DB for the matter?
So it's term is DB because, so term is was the place in which there was a repository of all of the important human knowledge in the foundation trilogy by Asimov.
The famous author, okay.
Wow.
But we're also thinking about terminus, the god, the Roman god, the god of boundaries.
At the time, we were doing a lot of data quality trying to import information from places and put it into a knowledge graph.
They would have high quality data, and so you need boundaries.
And so that's terminus is the god of boundaries.
And if I'm a bit too nosy, who came up with it with this name?
The Fini Brothers or Brothers or yourself?
I believe it was the Fini Brothers.
Wasn't you the god, I can't remember where it came from.
It was you, it was you.
I'm trying to pass up the credit.
Okay.
We had an interim title between, so when we spun out of university, we were called data chemist.
And the camera still lives on.
It's the consultancy wing of the organization.
So when we do consultancy, we do chemistry on your data to make it all fit together nice and neatly.
But we then we were transitioning across to being named after our principal product.
And we were naming our principal product at the same time.
So we were we were regular DB for a while.
And regular DB sounds like a laxative though.
Teacherless.
We only the copyright to regular DB in case there are any budding database developers
right there that want to buy it from us.
We will sell it for a reasonable price.
And we were a little bit worried about terminus because there were some French chaps
who had a company called terminus media.
And we were worried that we wouldn't get a word right or copyright on the name for EU
because it was somewhat similar.
But actually they were very reasonable.
And after a little bit of communication back and forth, they agreed that these,
that nobody to actually get confused by a movie company and a database company
and said that we could go ahead formally, said that we could go ahead and call it terminus DB
after both the Asimov and the, and the God the Roman God and Chris, as you know,
the hard things that the joke is obviously that the hard things in computer science
are naming things.
And that's definitely the case when it comes to naming databases.
Ash go here and send off by one errors.
So those are the two hard things in computers.
Okay.
Changing tack a little bit and given the fact that we have the CTO on this,
on this, on this episode of Recording.
Well, we might, I might as well ask some technical questions.
If I take a look, it's a GitHub repo.
There are two things that come to mind, Gavin.
A Git-like engine and apparently which is written in Rust as the underlying technology.
Given the fact that this is almost, as in the little, the little is almost half a rust podcast
because, because quite a few episodes have been spent on tackling rust
from different anger's community marketing and all the rest of it.
And yes, we will have Rebecca Rumble on upcoming episodes.
That can be this closed.
So no worries.
And maybe you can check sunlight on the technology itself, why you went for this technology
and especially what sets the storage engine apart from other approaches.
Redis comes to mind, Mongo comes to mind and other and other kind of databases
that claim to be pretty performant, for example, Redis.
Does it basically only memory?
Maybe you can check sunlight on the, on the, on your approach and the architecture
and also spill the beans on why you went on that route.
Yeah, absolutely.
Maybe it's easiest if I start why we went on that route.
So originally we were trying to figure out how to get extremely large graphs
to function well for queries.
And we did some experiments.
And it looked like for SESHAT we were going to be able to work with
a sort of conventional graph database.
And it was going to be fairly manageable because you were talking about tens of millions
veges and not that many.
But we started doing some loading of, well, with a subsidiary of Walters Clear,
we did an analysis of the Polish economy basically doing all of the,
like storing all of the companies, their boards of directors,
all of the shareholders, all of the subsidiaries and all of the
people who are involved in bankruptcy proceedings, et cetera.
So all of that information essentially sensed the fall of the Warsaw Pact.
So going back to the early 1990s.
And that ended up being over a billion edges.
And that just wasn't manageable using the conventional techniques.
So we searched around and we actually found there was this open source project
called HTT, which it's a RDF sort of static database.
It's a load once.
What's an RDF for these people who don't know in the audience that do not know what that means?
That's right.
Okay, so RDF was a semantic web standard that was developed.
And it's basically just a format for storing triples.
The basic idea is that you can have a node, a named node, a named edge,
arriving at some target node or a data point.
So you have sort of two different types of triples,
ones that one that is node cross property cross node,
and one that is node cross property cross data.
And that sort of framework is a very general framework for storing labeled graphs.
And you can put a lot of things quite conveniently into a sort of RDF framework.
It's in contra distinction, I guess, to property graphs,
which have a slightly different approach to modeling.
But I mean, they're sort of isomorphic in a sense.
It is not a huge difference between them.
In any case, HTT, so it came out of the semantic web originally, I suppose.
The RDF structure.
So we were trying to figure out how, so we decided to use this HTT as an experiment.
We got a really big, massive machine with a lot of memory.
And we just created a bunch of worker nodes to make lots of these little HTT files
and then merge them all together into a gigantic plane that would then run in memory
as a read-only database.
And that worked quite well, actually, so we were able to go up to very huge databases
with very good performance for recovering long chains, especially.
So if you're looking up a single node, then you might be faster with some other type of database.
But if you start looking down chains of nodes, this actually turns out to be a very good way of structuring the data.
Now, the thing that's interesting about HTT is they were using something called the succinct data structure
to store the information.
And succinct data structures are queryable data structures that are approach the information theoretic minimum size for representation,
while also being able to allow these queries in reasonable amounts of time, like logarithmic time,
or something along those lines, depending on what precise operation is this year.
Like a bitmap, essentially, for example.
Yeah, so bitmaps, exactly.
Yeah, so you have these sort of bit arrays, bitmaps. Those can be used.
You can do logarithmic searches on it.
Things like that.
There's other techniques. It's sort of a family of techniques for representing.
And this was really cool.
So these succinct data structures are very cool, but they're hard to write.
So they like to be written once. They don't like to be updated.
There are dynamic structures and dynamic ways of dealing with this persistency and things like that, but they tend to be right once.
So in naturally, we started developing the system where we would store deltas and then occasionally roll them up into a big flat plane when necessary.
And if you do this in a clever way, you can sort of cleverly create a sort of log-like shape to the way that you have these planes as you update them.
So that idea sort of naturally created a database, which was A, immutable, and B, had a commit structure, somewhat like KIT.
So it sort of developed quite naturally out of this need to go up to very large graphs.
So the gives you version control pretty much out of the box now.
Yeah, so that's something I'm going to think of in.
So I mean, there's some orchestration on the top. This is actually a fair bit of extra work. We thought it would be less extra work than it was.
But yeah, no, so we could see, oh, this is very much like it. All we'd need is like branches, and then suddenly we'd have something a lot like it just based on this structure.
And why rust as the primary implementation language? Why not C or C++? Try to test it.
Yeah, no, we started with C++. So we were using C++ to do using HTT's libraries and then modifying HTT's libraries to improve some of the problems that they had.
And we really needed multi-threading in order to make it so that it was performant to do large assemblages like when we're really creating a large graph.
And that ended up in a lot of crashes. So there were a lot of like race conditions and there was a lot of non-reentrant functions that were supposed to be reentrant and they were incredibly difficult to track down.
Okay.
And so one of our engineers, Matthias Finanity, he was sort of learning rust in his spare time and he thought, you know what, I'm just going to rewrite this library and rust in my spare time, not in my spare time.
And then he came in and he was like, here, I have a working prototype. What do you think? And I was like, wow, that's amazing.
And no, we almost never suffer from those sorts of crashes or non-reentrant behavior.
Before we discussed rust a little bit more, didn't you take a look at something called libous to something to eliminate the complexity of multi-threading C++ code?
Because normally boost and that remaining ecosystem isn't supporting frameworks is existing and has kind of tackled the hard work already or didn't or didn't or wasn't libous and friends effect from the project.
Just curious.
It would have meant sort of going through systematically everything in the HTTP library and trying to tighten it up.
So it turned out that a rewrite in rust was probably on the similar scale of complexity to doing that.
So I think it was probably, I feel like it was the wiser decision overall.
It shaped us a lot of time in adding various kinds of multi-threading and avoiding a lot of different kinds of bound checking errors.
But yeah, who does improve things significantly over old time?
I mean, yeah.
The beauty with rust is of course in full disclosure, marketing part, market portion starting right now for us isn't played marketing for us.
Rust comes with a lot of components already in the sound library that you have to tack on your C++ code before you get it to work on the same level as rust basically supports it with the sound library already.
For example, marketing threading is channeled so all the rest of it where you have to resort to extra in our business C++.
Rust basically has it as part of the standard library which makes it easier I reckon to implement because it's part of the core library set of functionality rather of the language already.
Yeah, I mean the compiler's ability to help you is really not to be underappreciated as well.
I mean, so you can take something like the GMP library which is a multi-precision large number library essentially that's widely used.
And if you look at the rust libraries that use it, they basically just have a shim on top, they use a bunch of unsafe calls, etc.
But they make it so that you can't shoot yourself in the foot very easily and that kind of that shim above it is not the same as the way that boost would be able to help you.
So boost can help you a little bit but it really doesn't provide this ability to get the compiler to help you not shoot yourself in the foot in the same way.
Whenever I talk to people that have mastered the learning curve, they tend to tell me that rust is probably the easiest language learned.
What was the experience of you and your team when you decided to take that learning curve of learning?
I wouldn't, like it is kind of significant. The borrow checker is a weird way of thinking about things.
It does take us longer to write code I think. We spend less time, I think even C++ is, I don't know, there's some aspects of rust that are so convenient that they may be almost neck and neck with C++ but it's kind of similar.
It's not like a high level language like Python where you can just, you know, really quickly, you do spend some time, even once you understand what's going on fighting the compiler trying to get the type checker to pass.
But it's worth it, you know, and the original eventually there is a point at which you're like, ah, okay, and the basic ideas of the borrow checker kind of sink in.
I would say it took me sort of a month before it kind of made sense enough that I wasn't constantly fighting, you know, an uphill battle.
But I think other people pick it up quicker.
Full disclaimer, Gavin just talked about advance, I wouldn't say advance, but rather basic to the media concepts and rust.
There is a previous episode of something called Lines in Lost where we tackle these problems or these concept details will be in the show notes and of course there's also the language project homepage where these concepts are explained in depth.
So we won't take them now. But so if I take, if I understand this correctly, Gavin, the old rust addage still applies if you can convince the compiler to generate code, you're almost there.
As in the amount of time that you have to spend on debugging your code is really cut down because much of the effort that is normally spent after you get a build on debugging that build is actually done prior to generally the executable.
Because the compiler basically takes the code apart and puts it back together again.
Yeah, I mean, like obviously you can use types in more or less sophisticated ways.
But the kinds of bugs that it kills with the type checker that I think are really the most important ones are those sorts of like raised conditions, sag faults, things that end up really being difficult to track down.
So your bugs tend to be more like logical errors and more like the kinds of things that you'd mess up in something like Python, you know, where you have a simpler language, you just messed up the idea.
And you don't have this mysterious sudden crashing that is like if crashes happen in a very irregular or unrepeatable way, it's really hard to get rid of them.
So nice serve to never have those sorts of sudden, sudden bugs that you have no idea where they came from and can't repeat them on a second go.
Oh, interesting changing tack just a little bit here, the Wikipedia page about terms, it helps me that you change licensing halfway in between from a new from a new license to Apache, which is very interesting move.
Maybe you or Luke can check more light on the background and especially why you did this because normally projects tend to do the other way moving away from a more permissive license to a more communist slash restrict license like the news of the game.
It's like the news of the world or GPS for that matter.
We would have done that if we if we'd been overwhelmed with popularity on our initial release, then we might be becoming more restrictive rather than more permissive.
But we had a bunch of users coming into the community and chatting to us who, you know, wanted to build a application with the software but felt.
Even if it was unjustified, had a general feeling that they were going to be caught up by the licensing and that the liberty to build with the licensing.
And so we decided to go open source, we'd open source of software with a with a GPL 3.
And we, you know, we didn't feel that there was any real issue for those people in just building with the software that there wasn't going to be any copy left provisions that were going to get in their way, but they had the feeling that there was.
They had the feeling that their organizations weren't going to be in a position or a willing position to use a new style licenses.
As you probably know, like Google restricts the use of certain GPL licenses entirely.
So none of the AGPL licenses are allowed in any Google software at all and are not allowed to be used within the Google organization.
Also, unfortunately, you know, for better or for worse, there's a chilling effect that takes place then.
And when we wanted to go for the spread as broadly as possible, we thought that the Apache was the better balanced there.
So it was with a little bit of regret and the slightly heavy heart that we made that decision.
But I think it's the right one for databases because it does allow people to build as permissively as they need to.
Sorry, yes. AGPL, of course, referring to the FRGPL, which is probably one of the more stricter new public license versions out there.
Essentially, basically, it's viral communism at heart. Essentially, if you talk to an AGPL license component, even through an API, you have to essentially disclose as and reveal as in public resource code details, maybe the journals.
But, of course, there's also a previous episode of something called Lungs and Laws where we actually discuss open-source licenses, but back to the issue at hand.
So essentially, if I understand this correctly, you change that licensing model to increase adoption, which is a rather interesting move, if I may say so.
Yeah, that's exactly it. We increased it in order to increase adoption and increase the ability for people to use it to build within companies within enterprises.
And without the fear that they might be in some way restricted by some legal provision within the license that some latest.
And this is the feedback that you're getting from the community to after this change.
Or and after the change, yeah.
Interesting. Okay.
Yeah, so definitely we got that feedback a bunch of times within the financial services sector.
I mean, problem is one of the things that we observed when we were out talking to companies, talking to people about their problems is that people were using GPL licenses in ways or GPL software in ways they probably shouldn't anyway.
They were embedding it. They weren't telling the companies they were doing all sorts of shady stuff.
And we kind of felt well, you know, if people are doing that sort of stuff anyway, we should probably just say that the Apache 2.0 is good enough and definitely been somewhere where we've been comfortable to land.
Interesting. So it's interesting to see what happens if you change that license.
And changing tack once, once again, if I take a look at your website called service to be.com, of course, links will be in the show notes.
I see quite a few offerings apparently built on top of terminus to be I see the I see the talk quality technology terms to be, but I also see terms X schema service version X and some other stuff.
Maybe and this is and without basically going too much into marketing, maybe you can check some light on the on the efforts that you try to monetize on top of your open source core technology.
Yeah, so only too happy to discuss that. We.
Terminus X, which is a database service and we're actually going to transition that across into a slightly different model into being more like a what we're calling a beta builder.
So a service where people can come along and they can test out building a beta using our cloud platform.
It gets anybody over the hump of having to get their own software, you know, running and doing their own deployments, building their own.
So they can come, they can use our API, they can get started with their project, they can then see if it's successful, see if the software works for them, see the graph capabilities, use the query language, you know, build knowledge graphs and have them running as production applications.
But then the idea is that people will transition across to their own deployments because as things become more successful, people will want to govern better performance and parameters ran performance and we'll help with the deployment to any cloud that they may have and I'm rolling out of any enterprise features.
And so. Terminus X will become more like a beta builder, a cloud environment for people to go get started, not like it's, you know, it's like a sandbox on steroids.
So it's like a, you know, people often offer sandboxes, but this is one where you can actually build a production up and very easily in the cloud and then transition across to having that in your environment with our support in our enterprise pocket.
Really, that's where we, we see ourselves making more commercial deals in the future is on that transition from the thing that you're rolling out as a beta into something that.
Want to deploy in one of the hyperscalers in a Zora in AWS or or.
And we're obviously ready to do that at a short notice and then we offer support to ran that all those sort of things as well.
We have a, we have some supplementary products that we've built on the back of the software and for various different reasons, one of those is version Excel, which you mentioned which.
It's Excel like a front and like a get an interface for the, for the database and then another one is the critical asset.
But for understanding climate resilience that we built that with the United Nations, so that developing world communities can better understand.
Ascading failures of critical assets in.
I'm at stressed environments and.
Given that we're graph database, you can do those sorts of queries very simply to.
How assets might be linked together across most.
And now he's gone.
Yeah.
Okay, maybe you could, maybe you can chip in because this is not working with the audio.
Yeah, so just on that people building applications on terminus X.
So one of the features that we have is we have a very get like structure and it's distributed database in the same sense that get is a distributed database.
So you can sort of manually move around changes from place to place.
So if you build something on terminus X, it's actually quite simple to just clone it and pull it into some other terminus DB installation.
And you can just do it over the wire.
So.
And did it actually you can do incremental updates and incremental backups can also be done this way.
Interesting.
So taking a look at the offerings on your web page.
Terminus DB as such as an open source code base can be found GitHub terminus X is your manage service offering.
Of course, you're probably charging money for this.
But what about schema service version XL?
This would be open core products as in relying on terminus DB.
But essentially turning the company into an open core company as an open core approach.
Where the where the exactly where the quality technology similar to this Mongol college base and whatever is on GitHub.
But the but the other stuff, especially the managed services are paid for commercial products.
That's right.
And I mean, I think the big focus for the schema as a service is to offer additional functionality to MongoDB users.
MongoDB users that want to do path queries that want to have any of those get like version control features.
And to make that very, very simple for them to do and to make them obviously the ability to define a strong schema for your MongoDB data.
One of the nice things about terminus in general is that it's both a graph database and a document database.
So what we call a document graph.
I can interact then very nicely with MongoDB and it's JSON document structures in order to do that.
So really some of those deficiencies that exist within the Mongo ecosystem.
We're trying to pick up on and help users that are there to to do some of that stuff.
And yeah, and I mean, I think the version Excel is exactly the same insofar as it's a layer that's built on top of the open source database of open core model.
Interesting perspective.
And now for the really big sheet questions in terms of when explicit podcasts, you know, change the attack a little bit moving on to community marketing.
And if I take a look at a website called DB minus engines.com, I see Neo4j at the position number 18, 19, maybe Tiger at 122, red is at 7, I think.
And Arango at 75 now these would be all multimodal databases or native graph databases, which you may or may not consider as a competition.
But if I check out terminus, it scores at position 309.
Yeah, yeah, that's that's a fair point.
I think that Arango DB we definitely see as as competition. I mean, they also do that crossover between document and so they're and they're great database.
And they also change their name they used to be.
So I think that database that have changed your name that are about talking to graph or our competition, but I we'd be very negative about DB engines and the way that their algorithm works for the ranking.
And a lot of it is about how many times the database has been mentioned in job advertisements and other things like that. So Oracle remains on top always and my sequel comes second.
So I don't think it really reflects reflects that well on younger databases and their ability to to impact on the market.
Okay, apart from D dosing what do you intend to do about this DB engine ranking or not given the fact that as you said DB ran DB DB engines takes none but takes multiple actors, which at times may not be that accurate.
But as much of fact to some extent they reflect the community echo of your of your of your code base of your adoption of the adoption in the community.
And any any thoughts on this one what do you do with regards to community marketing and of course deep links within the show notes as applicable.
Yeah, so the things that they take into account are we're not really sure exactly how they're balanced across their algorithm, but obviously we do try and become more relevant and social networks and professional networks.
And we spend a lot of time and effort on trying to build a community that is inclusive and and helps users to better understand the technology.
We run a discord server and where we try and respond to users queries as quickly as possible.
And we have a Twitch channel where we do a bunch of live coding for anybody that's interesting coming along and seeing some rust or pro log coding and we do that.
And so yeah, I think like you'll know Chris it's a it's a slow process building a community and especially if you're trying to build it in such a way as it's durable.
It's a community of people that are genuinely interested in the technology and genuinely interested in building and we have a kind of you know an active user base within our community who are building really remarkable things in different areas.
So yeah, I think like we will climb up the DB engines ranking because we'll get better there and but we'll just take quite a while for us to catch up with some of those more established databases.
Even if of course our performance, our technology, our people and everything about us is better than all of the other databases that you mentioned.
Absolutely, you see otherwise look, I wouldn't be talking to you in the first place.
I'm joking.
No jokes aside people, where do you move on to waste air for grounds now?
Where do you see this going not just in terms of the open source code base, but also in terms of maybe the commercial offering or maybe the community in general?
What's next for terminus to be?
Well, we're great believers in the ideas around data mesh.
That have been so thoughtfully expanded by people in thought works and elsewhere over the last couple of years.
And Jamak, who's kind of the founder of the modern socio-technic data mesh distributed architecture.
So what we'd like to do where we'd like to see a grow over the next few years is to build out a series of features that allow us to be a drop in distributed data mesh for data oriented organizations that want to push data management out to domain.
So that this incredible monolith of the modern data stack at the center of a lot of organizations can be broken up to give more flexibility, give more durability and give better results for users.
So that people sitting in marketing or sales aren't subject to the whims of data engineers at the core, but can actually build and create data products that can be surfaced to the rest of the organization by themselves and can be modeled there.
So we're very much on that and left hand turn of moving back towards a more heavily modeled world of data.
And what we'll do kind of probably in the near term is launch a what we'll call a probably a knowledge management system.
So like a CMS but for knowledge rather than just content and so that's kind of where we're a lot of our technical effort is going in now is to build out the features around that.
But we'll slowly over the next kind of year put together all of the features so that we can be a drop in data mesh type environment for the enterprise.
Oh, I'm sure now I didn't discuss this with Gavin in advance.
Nice one people nice.
Now jokes aside, if I mean that look that dangerlessly moves you some to so close to so to something called machine learning deep learning.
If I take a look at a wrong would it be and the strides they're making especially in the machine learning curve with regards to for example something called a wrong pipe where they explicitly position them themselves as the graph database of choice for machine and deep learning with the architecture on top of the existing.
I'm going to be a run going to be the core where do you and as I said knowledge I see this moving to what machine and very quickly where do you see this going.
I mean do you actually want to enter that space or when you keep it on the kind of fairly high semantic level without going down that artificial intelligence route.
Oh, no, we definitely want to go down that artificial tells you is very definitely.
Because we think we're you know a lot better choice than than even orangua DB who are very very good and we think the revision control features of terminus DB make a lot of sense.
We think the our approach to data modeling is is is the better one and we also fundamentally think that our query language is better.
And we think that at the world will move towards data log if you look at it.
I don't know if you've ever come across some Chris now there's a company called relational AI and the former CEO of snowflake Bob Muglia.
He brings about where he joined their board recently and they are like hey we're going to take graph databases and we're going to make them work with SQL.
And then when you look under the hood, it's like yeah, okay, you can do a few things that SQL.
But if you want to do anything exciting or interesting or next generation with this new way of looking at data, you have to go into using their variance of data log.
And so we think the data that databases and data stores that have those sorts of abilities built close to the core are going to be the ones that that win out in the end.
And terminus has already been deployed in a whole bunch of machine learning AI infrastructure projects were members of the AI infrastructure alliance, which is building a new canonical stack for AI.
So we really do see ourselves and we see version control data as being fundamental to tomorrow's machine learning and an AI.
Interesting, okay, any thoughts on on something called quantum artificial intelligence.
I mean, this is the this is the strive that D wave at friends are making into a much more commercial adoption of their technology.
So when can we expect the next version of terminus actually talking to the likes of D wave IBM and some other quantum companies out there anytime soon.
Yeah, no, we'll have a we'll have a version in a couple of weeks just.
People you heard in your first.
No jokes aside, I mean, it's a niche, but it's a very fast, it's a very quickly growing niche.
I'm running this software for this podcast on my quantum computer right now.
Excellent. So what could possibly possibly go wrong looking that case?
Okay, no jokes aside.
Anything that we should touch upon before we close this off as a move on to the boxes.
That's a really gave a good overview of where we're going and the way we see the world, I think, at the moment.
Do you have something to add maybe on the machine learning side and kind of, you know, why you think terminus.
Yeah, so why is the top exactly?
Yeah, exactly. So I guess so the we talked a little bit about succinct data structures and a little bit about the data mesh.
So essentially in order to leverage this sort of knowledge repositories that we have, you want to build knowledge graphs, but you can't build a knowledge graph of everything all at once because then it becomes totally unmanageable.
And you don't know the ontology that you need to use for all of these different segments.
So you need sort of domain control and really what you want to do is create a network of knowledge graphs that creates an uber knowledge graph out of combining individual knowledge graphs.
In order to do that, you need some kind of mesh approach.
So you need to be able to create these individual data products and you need to mix and match them.
And you need some sort of way to perform queries over a union of these in some sort of in a fairly effective manner.
And terminus' use of succinct data structures is really the right, it really forms a good sort of backbone for doing this.
And we were looking at some people in industry that were spinning up hundreds of SQL light databases merging them together into a giant sort of uber database and doing the queries over that.
And it turned out that that was much faster than say like Amazon's redshift.
So this is actually a very effective way to approach the problem if you can do these sorts of piecemeal merges of the problem domains that you need to create the network that's of interest for an analysis across maybe many different kinds of data products.
And I do think that the combination of data mesh and the combination of sort of succinct data structure scaling is really where we see ourselves trying to hit in a year.
We're not really there yet, but we intend to be there in about a year.
Okay, look at everything to add.
So I think that captures it pretty well that.
No, there's a combination of factors that really work well with terms to be on data mesh and machine learning and really about trying to.
Now it's trying to both finish off some of the features that will enable that like querying across data products effectively picking and choosing from data products effectively if you're a downstream user like a.
Data scientists that wants to run some experiment and then just interfacing with existing enterprise architecture as we all know.
Once you try to do deployments into enterprise, you run into all sorts of star work beasts and have to engage in a whole bunch of data archaeology and there's a lot of potential for integration work.
And we're trying to short circuit as many of those problems as possible.
And so that when people do make the decision to implement the data mesh, which of course is not just a technological decision.
It's often more an organization or a social decision than anything else because you're, you know, you're required or you need to do redeployment of teams out to the domains to make sure that the expertise is where it needs to be.
And rather than building these very dense cores within organizations.
So once you're making those decisions that the tooling is available to make that easier.
And again, that means that you've got to think about how enterprises actually use software and how we could make that a little bit easier.
That has been more than an interesting hour of technical and non-technical discussion.
One of the things that we do to close the show office actually to discuss boxes, boxes stands for the pick of the week.
Essentially something that has crossed your path over the last one or two weeks, which you would think worth is worth mentioning.
It can be a book, it can be a piece of software, it can be a movie, it can be even maybe a TV series as in whatever show whatever is shown on Netflix these days.
Or it can be, of course, if you choose to do so piece of our politics.
So what's your, what's the pick of the week that in your box?
Well, I think it's going to have to be carbon. So Google's carbon, they just launched, I don't know if it was last week or the week before, but it's very new.
The 25th July or something like that.
And it's their answer to C++. And it's interesting, it's sort of, it looks like it's angling to be a rust killer.
Really? Oh wow, okay.
That's how I kind of see it.
You know, even the syntax has some kind of feel to it.
So it looks to me like Google feels that this thing is going to happen. C++ is going to die.
And they'd rather replace it with their own than get stuck with with rust.
But that's sort of my naive read of it. Maybe other people looking into it have a better read.
Of course, links maybe in the show notes.
What what happened to go along in that case?
Yeah, screw go thing, right?
It's fair enough. Cool invention to amard at.
Okay, check this out people.
As I said, links when we show notes. It's on GitHub as in behalf of these.
Yeah, it's okay.
Perfect. Luke, what's your what's your pox?
It was a great choice, but Gavin, I thought he was going to say something about carbon as in people fighting about carbon, but it's because that's all.
Okay.
Fair enough. Obviously, it's something else.
No, what I'm going to mention is I was on holidays last week in the South of France and I read a book that I really enjoyed by a Peruvian author.
No, no, he's not Peruvian. He's Chilean, which is called when we cease to understand the world, which is a nonfiction novel.
If such a thing makes sense, which is about mathematics and physics and about the people losing, losing themselves when they get close to the core of mathematical ideas.
And it threads a beautiful thread of modern physicists and mathematicians from the 19th century through to quantum mechanics and about their psychological states and how they approach the world.
And it's a fantastic, a short read and very, very, very enjoyable for anybody that's that that's interest in physics and interest in human psychology and the crossover of those things that I'd recommended.
The other thing that I'm a regular reader of, which I can only recommend to the audience is is Ben Stanzel's substock.
So that's Ben, B-E-W-N dot substock. And he's a great writer about data in general. I had a couple of great recent blog posts, one called the data config about humble YAML file with ambitions for more.
And another one about data driven companies, whether they actually win or not.
We can do a bunch of conversations within, within terms.
Okay, isn't humble YAML a contradiction in itself?
I'm just asking sorry.
Okay, and, and plain English version as a translation of the, of the book you mentioned first is available to.
Yeah, yeah, yeah.
Perfect.
Absolutely.
Absolutely.
We cease to understand the world.
Fair enough.
I read it.
Of course, I'm not.
That sounds bad.
Fair enough.
Fair enough.
That sounds very interesting.
And my parks and just to keep this short, because we're almost approaching me to our mark, though I'm joking.
We're just running over an hour.
And mine is a movie called Chirac.
It's from 2015 done by a director called Spike Lee from a completely mistaken.
And it's a modern day adaption of the ancient Greek play called, listen, listen, strata.
Essentially, it's about a cartoon conflicting parties.
Let's put it this way, where the woman simply get fed up with the violence.
Also an ancient Greek as a Greek.
Sorry.
And they tell the man.
Sorry.
If you have two choices, either you stop this war or this conflict.
Or you basically will have to live without physical encounters as insects.
Now, this one is actually set in modern day Chicago.
And it's very funny to watch because what actually Spike Lee did.
He took portions of the ancient Greek play and put it into the mouth of mouth of the likes of.
Of Westlist labs and Samuel Jackson.
So if you're a game for that sort of thing, that's what we know to be missed.
It's probably available on Netflix and other streaming services.
As I said, it's from 2015.
Links, of course, within the show notes.
Luke Gavin or Gavin and Luke.
No particular order.
Thank you very much.
That has been more than interesting.
And hoping to have you on the podcast in a couple of years time.
When you have moved to position number one on something called DB engines.
If it's this end is relevant.
Thank you.
This is the Linux in laws.
You come for the knowledge.
But stay for the madness.
Thank you for listening.
This podcast is licensed under the latest version of the creative comments license.
Type attribution share like credits for the entry music go to blue zero stars for the song summit market.
To twin flames for their peace called the flow used for the second intros.
Finally, to select your ground for the songs we just use by the dark side.
You find these and other details licensed under CC at Chimando.
A website dedicated to liberate the music industry from choking copyright legislation and other crap concepts.
You have been listening to Hacker Public Radio at Hacker Public Radio does work.
Today's show was contributed by a HBR listener like yourself.
If you ever thought of recording podcasts and click on our contribute link to find out how easy it really is.
Hosting for HBR has been kindly provided by an onsthost.com, Internet Archive and R-Sync.net.
On this otherwise stated today's show is released under Creative Commons Attribution 4.0 International License.