Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Lee Hanken
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions

328
hpr_transcripts/hpr3758.txt Normal file
View File

@@ -0,0 +1,328 @@
Episode: 3758
Title: HPR3758: First sysadmin job - war story
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3758/hpr3758.mp3
Transcribed: 2025-10-25 05:02:59
---
This is Hacker Public Radio Episode 3758 for Wednesday, the 28th of December 2022.
Today's show is entitled, First Sis Admin Job War Story.
It is hosted by Norrist, and is about 28 minutes long.
It carries a clean flag.
The summary is, how I got my first job as a Sis Admin, and a story about NFS.
Okay, so I thought I'd record a quick holiday episode for HPR, and I'll do kind of a combo
story about how I got my first job in tech, I haven't always worked in tech I'm currently
a Linux Admin, and then I'll combine that with a bit of a war story about my first week.
I have for a long time since 2000, then a Linux user, and I didn't have a Linux job
that far back.
But I was working for a place that had a contract with the government, and the contract was
going to end.
We didn't know like the specific date was going to end, but we knew the job itself that
we were there doing was only going to take about 10 years.
So we all knew, when you took a job there, you knew that at some point you were going
to get laid off, and if you made it until the end, everyone was going to get laid off
at the end.
So even though it kind of sucks having a job where you can't work there forever, it gives
you sort of a unique opportunity to sort of plan for changing careers.
So since you can look ahead and know, owner about this year, what I have to do something
different, it gives you time to prep for it.
So since I had been sort of a Linux on the desktop, hobbyist for a long time, I thought
well, now's my chance to do what I can, and then maybe when I do get laid off, I can
bond a job as a Linux admin or something.
So I started just kind of adding to the things that would normally do around the house with
Linux.
So instead of, you know, printers and playing music and configuring X11, I would, you know,
do things like trying to set up web servers or file servers, or maybe even an LDAP server
or stuff like that and, you know, doing virtualization and whatever I could that I thought maybe
things that a Linux admin might do.
The other thing I started working on was getting some certifications.
So I started with the Red Hat certifications I went and got the Red Hat at the time
it was Red Hat Certified System Administrator, or no, at the time it was Red Hat Certified
Technician and they've since changed it to Red Hat Certified System Administrator.
But I started with that, that's kind of their entry levels, and then a few years later,
I got the Red Hat Certified Engineering Cert.
Eventually, I got laid off just like I knew I would, and I started kind of slowly
starting looking for tech shop.
So one of the jobs I applied for, they called me back pretty quick like the next day,
and it turns out the company that called me, there were a small web development shop, and
they had some, they had three Linux admins, well they were staffed to have three Linux admins.
And earlier in the year, two of them had left, not at the same time, but in for different
reasons.
But one had left and they were kind of dragging their feet a little bit on replacing them.
And another one left and they started getting serious about replacing them.
And then there was a third guy who was kind of a junior admin.
He was kind of a mix of an admin and a developer.
So he was sort of a member of the show by himself for a little while, and eventually he had
left, he had decided to leave.
And so at this point, they were desperate to get some new people in because they had,
I guess they were staffed with three people, and they were just a few weeks away from
having zero.
So they were able to hire from like a temp IT agency, a Linux admin, but he couldn't work
there forever.
And they had found another kind of senior admin, but he wasn't going to be able to start
right away because he had a job and he had some big projects and stuff he wanted to finish.
But they needed, they needed someone to start immediately.
And since I was laid off, and even though I could tell they weren't really sure if I could
do the job, since I could start immediately, it really got their attention.
So the, like I said, it was a small web development shop they had about 10 developers, a few project
managers and designers, and support desk so people can call in with support stuff.
It was most of their applications were PHP applications that ran on Linux, and they were
kind of, they were all over the place with Linux, they were sort of Linux versions.
It was kind of whoever was charged at the time would deploy whatever Linux version happened
to be their favorite at the time.
So there was Zeus, there was Ubuntu, there was Debian, there was Red Hat, there was
Leras, it was, it was a big, big mix of things.
And there was also some Java and a little bit of Windows.
Like I said, they were desperate and I could start right away.
So they started interviewing.
So I got to interview with the guy who was leaving of the three, the last one that was
there.
And it was basically his last week.
So I interviewed with him and some of the kind of senior developers they knew a bit
about Linux, and they did, they were really careful with me, or the row.
So I did, you know, I came in and I did an interview with like the person who was going
to be my boss's boss and the developers, and then the guy who was going to be my boss
but hadn't started yet, he wanted to meet me.
So we kind of met for a quick lunch interview, because he wanted to make sure, you know,
I would do, or we could at least get along and that, you know, things I said made sense
to him.
Then they wanted to do something a little more technical.
So they had someone set up a laptop with a Linux VM on it.
They sort of wrote out a list, a task for me to do.
So I mean, it was anywhere from simple stuff to adding users and making sure they could
suit them.
They wanted me, for some reason, they wanted me to compile from source, a specific version
of Apache and PHP, and they just had all these kind of crazy things that they wanted me
to do, and that the list was long, they gave me a big long list and like two hours to do
it.
I didn't, I didn't finish.
The list was too long, I didn't finish, but the other thing they wanted to do was after
that kind of technical interview, they wanted me to meet with all the managers.
So again, it was a boss's boss and his boss and his boss's boss all just kind of set
now.
And it wasn't, they asked me a few technical questions in that interview, but I think it
was mostly just trying to figure out, am I, am I for real, you know, is it really possible
that someone who's never worked in IT before can, can, can do the job?
So obviously, since I'm telling the story, they, they did hire me, my first week there,
it was just me and the guy from the tip, ain't it, say the, the third guy who had helped
with the interviews and stuff, he, he was gone.
So his last day was like the Friday before my, my first day, but there was, you know,
there was some minimum turnover, some, you know, maybe a 20 page or a document.
And then the, you know, the two or three weeks that the temp admin had before that was
really the extent of the, the training in turnover.
So a little bit about kind of the infrastructure there.
All of their servers were in a data center that wasn't too far from the office so we could
go visit the data center when we needed it, when we needed to, and it was in, it was
like three racks worth of equipment, it was mostly virtualized.
There were a few physical servers, physical machines for heavy loads, like databases,
it would be physical servers.
A lot of ESX hosts that we, you know, we virtualized on VMware and a lot of, and some storage
and stuff like that.
The applications were mostly virtual machines.
For the PHP applications, they would all kind of share a directory to get their PHP
code from.
And when I say get their code from, I don't mean like they would copy it whenever new
code was available.
I mean, they would just literally mount this NFS share in like VAR, WW, or whatever.
The way every, every application server had the exact same code all the time.
And then there's a few other things they would have on this NFS server, including, you
know, config files for some of the load balancers will be on there, application logs will be
on there.
It was just kind of a generic place to put things, anything that needed to be available to
more than one server was probably on this NFS server.
You know, the NFS server was a virtual machine also.
You could tell, it had kind of grown over Tom, you know, there's a, there's a few strategies
for adding this space to a virtual machine when it's running, kind of the easiest one
to just add another disk.
So this NFS server that was a virtual machine, had like five disks attached to it because
it would, every time they would add a new kind of project or something for it to do, they
didn't have enough space for it, they would just add another virtual disk to it.
For the VMware cluster was kind of an oldish sand, it was, it was branded sun, but this
was after Oracle had bought sun.
So it was all sun branded stuff, it was supported by Oracle and to kind of maximize the available
space, most of the sand was raid Vov.
So they would, you know, take a group of disks, put it together and raid Vov and then use
those raid Vov disk bundles to export that to VMware and then that's where VMware would
store the virtual disk for the machines, including all of the application servers and this
NFS server.
So even before I started there was a history, it went in the last year, before I started,
there was a history of really poor performance with the PHP applications and no one really
understood why, I mean, any, all of the troubles you did with the previous admin to
just kind of let it dead ends.
But one thing we would notice when the applications were running slow, was that the load average
on the NFS server would climb and it wouldn't get high, like it wouldn't get into the hundreds
or anything, but it would just go from like where I would normally run at one or one and
a half, it would go up to like four and we could tell, like we could look at the load
average on the NFS server and based on that tell how well or poorly the PHP applications
were running.
One of our sort of first indicators that things were going poorly was that we had one of
the office staff with processed payments that people would make.
So you know, a lot of our applications would take, take payments and then the sort of
accounting personnel, we had a kind of a homegrown tool that was also a PHP application
ran on the same infrastructure, but they were usually the first to notice that things
were going south and they would try to say can you check the load average on the NFS server
but they would usually come and scream at about the load balancer instead.
We did have a load balancer, but that wasn't actually the problem, but it was clear
to us, you know, everyone was sort of frustrated with how things were performing and frustrated
with the fact that despite all of us looking at it, no one could really figure out, you
know, we tried a lot of different things, PHP settings and NFS settings, but nothing
helped.
So we had this one of our applications, it was basically the company's kind of flagship
application, it's biggest, most popular.
If anyone asked it, you know, if anyone asked, you know, what does this company do that
was list off things and this would be always me and the list of things that they had made.
But the application took payments for the system that was taking payments for.
There was an annual deadline and it was the deadline was the same for everybody.
So you could pay it any time during the year, but people being people, everyone would
wait until the very last day to make the payment.
So this particular application ran, okay, most of the time, but you know, once a year
on sort of the big day, things would get slow, things would always get slow.
And it was sort of known that there's going to be some slowdown and some performance problems
and it would all be, you know, kind of geared up and ready for it.
This particular year, you know, approximately, I'm about 10 days into the job when, you
know, big day arrives and it's terrible.
It's awful.
Like I've never seen, you know, I don't have time to experience there, but in my two
weeks, I saw some poor performance.
This was absolutely positively unusable.
I mean, you would bring up the website.
If you could log in as soon as you try to do anything, you would just stall after stall
after stall.
So it was pretty bad.
So we were all kind of desperate to figure out what a solution just to get us through
the day.
So remember I'd say that the PHP application says how they all had an in a vest mount where
they kept their code, that way they could all have the same code.
And the developers, they were pretty insistent that that's how it, the developers wanted
to be that way.
So they could ensure that every application was running exactly the same.
Well, we talked our managers into, you know, for today only, let us build some application
servers that are exactly the same except that instead of, you know, instead of mounting
the NFS server, we just copy all the files over and let these applications run, you
know, just totally off local disk and, you know, in reality, it's a virtual disk on that
scene we mentioned before, but it's not touching the NFS server.
So that, that quick fix got us some pretty good results.
So we went from unusable to actually pretty good.
Now, at the time, we didn't understand why, we didn't know like, in our heads, we're
thinking, okay, all it's doing is reading the PHP, which isn't that big of the NFS server
and separating the NFS server from the application, fix the problem.
We didn't understand it.
One of the things we thought might be an issue was the sand performance, but the same thing,
you know, the applications reading their content directly from the sand versus the applications
reading their content from an NFS server that's on the sand was nine day difference.
So after we all had a minute, a few days after the big day and we could kind of collect
our thoughts and calm down and breathe a little bit, we started trying to figure out,
okay, what is it about this NFS server?
Well, anything that server is in the mix, performance tanks.
So as we're digging in and as we're digging in, we start trying to involve the developers
a little bit.
And one thing that this application is doing that we didn't know about is logging and when
I say logging, I mean, obviously we would look at, you know, the PHP logs and Apache logs.
And those are things we were always looking at trying to figure out why is it slow and
they didn't leave us anywhere.
We didn't know what the application had another log that would log every SQL query that
the application ran.
So if you did a select, I mean, if you just logged in and search for yourself, search for
your name and the application query would be written to the logs.
And if you made a payment, that query would be written to the logs.
Every query was written to the logs and I want to say the logs, that's wrong.
It was all of those queries went to the same log file.
That's sort of okay.
No, that's not, that's really a bad idea.
So NFS doesn't allow multiple clients to write to you the same file at the same time.
So if a client says, hey, I need a write to this log file, NFS server will block the file,
let the client log to it and then unlock the file.
So because we had multiple application servers trying to write to the exact same file,
the NFS server was slowing down the applications so it could queue up the rights.
So that was the reason we saw such big performance gains when we moved off the NFS server is
that the application didn't have to wait anymore before it can write to the query log.
Now, eventually, when we heard about this, that's a bad idea for a lot of reasons,
writing a query to a log.
So eventually we were able to talk to developers out of logging this information, but it was
a clear win for us because we were finally able to figure out like, what is it about this
NFS server that makes these applications so bad?
And this particular application wasn't the only one that was doing that writing to a common
log file, but like I said, it was the biggest one and it was the one that calls the most
problems and it was the one that got the most attention.
So after that, we were still kind of interested in why the NFS performance was so bad and
why it had gotten worse because the application itself, you know, where it's writing to this
kind of common log file, it had been like that for years and there were some growth in
the application, but not enough growth to explain the performance drop year over year.
So we knew, even though we fixed the problem, but we knew there had to be something else
kind of underlying because the problem was getting worse and worse and worse.
So we had some pretty decent monitoring and we were able to, remember I said, the load
average on the NFS server would go up when performance was bad and you could see it,
you know, the owner of monitoring looked at graphs of load average and we could see, you
know, big spikes whenever on busy days and drop off on weekends and stuff like that.
And when we could zoom all the way out, we could zoom the graphs out till like a year and
we could see, you know, then we could see big days and small days, but it was interesting
to see sometimes, you know, you would go, so when you zoom out to like a year, you could
see like a month at a time and the lawns would be pretty steady, you know, for month to
month to month and then you would see kind of a drop and then month to month to month
and you may see a stair step rise, month to month to month, a lot of times we would look
at those and we would try to investigate, okay, what happened on this day that caused
this sort of stair step and one thing we really noticed was we finally got rid of that
crappy old sun slash Oracle San, upgraded to something considerably better, then you
could definitely see the load average on that NFS server, you know, when I said it used
to average one and maybe go up to four, you know, now it was down in the light .2 is
1.3 and might go up to 0.8, so that was a huge difference in the application just changing
the sand, but there was another place when we looked at the annual graph where we could
see a drop in load average, a pretty significant maybe about 30% drop and we couldn't figure
out one, a lot of times we could go back and we could see these stair steps and go back
and oh, that was the day we changed this application or that was the day we got to understand, but
we couldn't figure out one, there was one day, particular day, and it happened to be
about a few months after this big day of where everything went south, a few months after
that we saw like a 30% pretty steady month over month, week over week, 30% drop and load
average and we couldn't figure out one, so I ended up working here, working at this
top for about five years, you know, sort of the, I was always still kind of the new guy,
you know, and just about anywhere you work, if you're working hot tea, if you're just a
sad man, you're always kind of the afterthought, like no one really thinks about hot tea unless
something's broken, and so I was on a team that no one ever thought about and I was like,
the junior guy on the team that I went over thought about, so I had to, I said that to
tell you, I had to move, I had to change offices a lot, it was kind of like a cube farm
kind of place where there was, there was cubes and desk and offices and it was always nice
to be able to move out, you know, from a cube into an office, but someone else would show
up, you know, and they'd want my office, so I'd have to move out, and you know, because
the, sort of the last person that was ever really considered whenever, thinking about who
was going to, who was going to work in what office, I had to move off, there's a lot.
One time I was getting ready to move offices again and I was cleaning out a file cabinet
and it was just the folder I was looking through, it was just all kind of random receipts
and hardware things and stuff like that, I picked up a receipt and I was looking at it
and I was trying to figure out what it was and it was a receipt for returning a disc to
son or to Oracle and I'm trying to figure out what it was like, why do we do that?
And then I remembered that the guy who was my boss whenever I first started, the guy
who started on the same day as me and didn't really have any good turnover and he was
supposed to be my senior, he had done an RMA or like you, but one day he was in the
day center and he saw on this son storage system that one of the disc had a yellow light
instead of a green light, so he purported it to son, they sent him a replacement disc
and he sent the old bad disc back to son and when I was staring at the piece of paperwork
that documented that change and I thought to myself, I wonder if this has anything to do
with that unexpected load average drop or the unexpected performance boosts on that NFS
server and I looked at the date and it was within like a few days of that drop so finally
I was able to piece together that NFS server, some of its discs were on the portion of
the storage system that was built using Ray Vov and that disc that he replaced was part
of that array.
So the reason that the NFS performance had gotten worse year over year was because at some
point during the year, no one noticed but a drive failed that was part of a Ray Vov
array.
If you know anything about Ray and Ray Vov, if you don't know anything what you do need
to know is that Ray Vov is fine but if you lose a single disc out of a Ray Vov array,
all of your data will still be there but the performance will be terrible.
It no longer has an extra disc to write the parody information to so because of this
Ray Vov array running with a bad disc, the performance was terrible and then when he swapped
the disc out, that's when we could see, we didn't notice it at the time but that's when
we could see the performance increase in the NFS server.
So a long rambling story, I don't know if you can learn any lessons from that, except
maybe if you want to change careers, one key to doing that is to plan ahead if you can
but it's sort of the real key, you have to find someone who's desperate, desperate enough
to hire someone with no experience, always be careful when you're logging or writing
to a network share and never ever ever run Ray Vov in production period, I'll see you
guys next time.
You have been listening to Hacker Public Radio, and Hacker Public Radio does work.
Today's show was contributed by a HPR listener like yourself.
If you ever thought of recording podcasts, then click on our contribute link to find
out how easy it really is.
Hosting for HPR has been kindly provided by an onsthost.com, the Internet Archive and
our Sync.net.
Unless otherwise stated, today's show is released under Creative Commons, Attribution 4.0
International License.