Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr3758.txt
+++ b/hpr_transcripts/hpr3758.txt
@@ -0,0 +1,328 @@
+Episode: 3758
+Title: HPR3758: First sysadmin job - war story
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3758/hpr3758.mp3
+Transcribed: 2025-10-25 05:02:59
+
+---
+
+This is Hacker Public Radio Episode 3758 for Wednesday, the 28th of December 2022.
+Today's show is entitled, First Sis Admin Job War Story.
+It is hosted by Norrist, and is about 28 minutes long.
+It carries a clean flag.
+The summary is, how I got my first job as a Sis Admin, and a story about NFS.
+Okay, so I thought I'd record a quick holiday episode for HPR, and I'll do kind of a combo
+story about how I got my first job in tech, I haven't always worked in tech I'm currently
+a Linux Admin, and then I'll combine that with a bit of a war story about my first week.
+I have for a long time since 2000, then a Linux user, and I didn't have a Linux job
+that far back.
+But I was working for a place that had a contract with the government, and the contract was
+going to end.
+We didn't know like the specific date was going to end, but we knew the job itself that
+we were there doing was only going to take about 10 years.
+So we all knew, when you took a job there, you knew that at some point you were going
+to get laid off, and if you made it until the end, everyone was going to get laid off
+at the end.
+So even though it kind of sucks having a job where you can't work there forever, it gives
+you sort of a unique opportunity to sort of plan for changing careers.
+So since you can look ahead and know, owner about this year, what I have to do something
+different, it gives you time to prep for it.
+So since I had been sort of a Linux on the desktop, hobbyist for a long time, I thought
+well, now's my chance to do what I can, and then maybe when I do get laid off, I can
+bond a job as a Linux admin or something.
+So I started just kind of adding to the things that would normally do around the house with
+Linux.
+So instead of, you know, printers and playing music and configuring X11, I would, you know,
+do things like trying to set up web servers or file servers, or maybe even an LDAP server
+or stuff like that and, you know, doing virtualization and whatever I could that I thought maybe
+things that a Linux admin might do.
+The other thing I started working on was getting some certifications.
+So I started with the Red Hat certifications I went and got the Red Hat at the time
+it was Red Hat Certified System Administrator, or no, at the time it was Red Hat Certified
+Technician and they've since changed it to Red Hat Certified System Administrator.
+But I started with that, that's kind of their entry levels, and then a few years later,
+I got the Red Hat Certified Engineering Cert.
+Eventually, I got laid off just like I knew I would, and I started kind of slowly
+starting looking for tech shop.
+So one of the jobs I applied for, they called me back pretty quick like the next day,
+and it turns out the company that called me, there were a small web development shop, and
+they had some, they had three Linux admins, well they were staffed to have three Linux admins.
+And earlier in the year, two of them had left, not at the same time, but in for different
+reasons.
+But one had left and they were kind of dragging their feet a little bit on replacing them.
+And another one left and they started getting serious about replacing them.
+And then there was a third guy who was kind of a junior admin.
+He was kind of a mix of an admin and a developer.
+So he was sort of a member of the show by himself for a little while, and eventually he had
+left, he had decided to leave.
+And so at this point, they were desperate to get some new people in because they had,
+I guess they were staffed with three people, and they were just a few weeks away from
+having zero.
+So they were able to hire from like a temp IT agency, a Linux admin, but he couldn't work
+there forever.
+And they had found another kind of senior admin, but he wasn't going to be able to start
+right away because he had a job and he had some big projects and stuff he wanted to finish.
+But they needed, they needed someone to start immediately.
+And since I was laid off, and even though I could tell they weren't really sure if I could
+do the job, since I could start immediately, it really got their attention.
+So the, like I said, it was a small web development shop they had about 10 developers, a few project
+managers and designers, and support desk so people can call in with support stuff.
+It was most of their applications were PHP applications that ran on Linux, and they were
+kind of, they were all over the place with Linux, they were sort of Linux versions.
+It was kind of whoever was charged at the time would deploy whatever Linux version happened
+to be their favorite at the time.
+So there was Zeus, there was Ubuntu, there was Debian, there was Red Hat, there was
+Leras, it was, it was a big, big mix of things.
+And there was also some Java and a little bit of Windows.
+Like I said, they were desperate and I could start right away.
+So they started interviewing.
+So I got to interview with the guy who was leaving of the three, the last one that was
+there.
+And it was basically his last week.
+So I interviewed with him and some of the kind of senior developers they knew a bit
+about Linux, and they did, they were really careful with me, or the row.
+So I did, you know, I came in and I did an interview with like the person who was going
+to be my boss's boss and the developers, and then the guy who was going to be my boss
+but hadn't started yet, he wanted to meet me.
+So we kind of met for a quick lunch interview, because he wanted to make sure, you know,
+I would do, or we could at least get along and that, you know, things I said made sense
+to him.
+Then they wanted to do something a little more technical.
+So they had someone set up a laptop with a Linux VM on it.
+They sort of wrote out a list, a task for me to do.
+So I mean, it was anywhere from simple stuff to adding users and making sure they could
+suit them.
+They wanted me, for some reason, they wanted me to compile from source, a specific version
+of Apache and PHP, and they just had all these kind of crazy things that they wanted me
+to do, and that the list was long, they gave me a big long list and like two hours to do
+it.
+I didn't, I didn't finish.
+The list was too long, I didn't finish, but the other thing they wanted to do was after
+that kind of technical interview, they wanted me to meet with all the managers.
+So again, it was a boss's boss and his boss and his boss's boss all just kind of set
+now.
+And it wasn't, they asked me a few technical questions in that interview, but I think it
+was mostly just trying to figure out, am I, am I for real, you know, is it really possible
+that someone who's never worked in IT before can, can, can do the job?
+So obviously, since I'm telling the story, they, they did hire me, my first week there,
+it was just me and the guy from the tip, ain't it, say the, the third guy who had helped
+with the interviews and stuff, he, he was gone.
+So his last day was like the Friday before my, my first day, but there was, you know,
+there was some minimum turnover, some, you know, maybe a 20 page or a document.
+And then the, you know, the two or three weeks that the temp admin had before that was
+really the extent of the, the training in turnover.
+So a little bit about kind of the infrastructure there.
+All of their servers were in a data center that wasn't too far from the office so we could
+go visit the data center when we needed it, when we needed to, and it was in, it was
+like three racks worth of equipment, it was mostly virtualized.
+There were a few physical servers, physical machines for heavy loads, like databases,
+it would be physical servers.
+A lot of ESX hosts that we, you know, we virtualized on VMware and a lot of, and some storage
+and stuff like that.
+The applications were mostly virtual machines.
+For the PHP applications, they would all kind of share a directory to get their PHP
+code from.
+And when I say get their code from, I don't mean like they would copy it whenever new
+code was available.
+I mean, they would just literally mount this NFS share in like VAR, WW, or whatever.
+The way every, every application server had the exact same code all the time.
+And then there's a few other things they would have on this NFS server, including, you
+know, config files for some of the load balancers will be on there, application logs will be
+on there.
+It was just kind of a generic place to put things, anything that needed to be available to
+more than one server was probably on this NFS server.
+You know, the NFS server was a virtual machine also.
+You could tell, it had kind of grown over Tom, you know, there's a, there's a few strategies
+for adding this space to a virtual machine when it's running, kind of the easiest one
+to just add another disk.
+So this NFS server that was a virtual machine, had like five disks attached to it because
+it would, every time they would add a new kind of project or something for it to do, they
+didn't have enough space for it, they would just add another virtual disk to it.
+For the VMware cluster was kind of an oldish sand, it was, it was branded sun, but this
+was after Oracle had bought sun.
+So it was all sun branded stuff, it was supported by Oracle and to kind of maximize the available
+space, most of the sand was raid Vov.
+So they would, you know, take a group of disks, put it together and raid Vov and then use
+those raid Vov disk bundles to export that to VMware and then that's where VMware would
+store the virtual disk for the machines, including all of the application servers and this
+NFS server.
+So even before I started there was a history, it went in the last year, before I started,
+there was a history of really poor performance with the PHP applications and no one really
+understood why, I mean, any, all of the troubles you did with the previous admin to
+just kind of let it dead ends.
+But one thing we would notice when the applications were running slow, was that the load average
+on the NFS server would climb and it wouldn't get high, like it wouldn't get into the hundreds
+or anything, but it would just go from like where I would normally run at one or one and
+a half, it would go up to like four and we could tell, like we could look at the load
+average on the NFS server and based on that tell how well or poorly the PHP applications
+were running.
+One of our sort of first indicators that things were going poorly was that we had one of
+the office staff with processed payments that people would make.
+So you know, a lot of our applications would take, take payments and then the sort of
+accounting personnel, we had a kind of a homegrown tool that was also a PHP application
+ran on the same infrastructure, but they were usually the first to notice that things
+were going south and they would try to say can you check the load average on the NFS server
+but they would usually come and scream at about the load balancer instead.
+We did have a load balancer, but that wasn't actually the problem, but it was clear
+to us, you know, everyone was sort of frustrated with how things were performing and frustrated
+with the fact that despite all of us looking at it, no one could really figure out, you
+know, we tried a lot of different things, PHP settings and NFS settings, but nothing
+helped.
+So we had this one of our applications, it was basically the company's kind of flagship
+application, it's biggest, most popular.
+If anyone asked it, you know, if anyone asked, you know, what does this company do that
+was list off things and this would be always me and the list of things that they had made.
+But the application took payments for the system that was taking payments for.
+There was an annual deadline and it was the deadline was the same for everybody.
+So you could pay it any time during the year, but people being people, everyone would
+wait until the very last day to make the payment.
+So this particular application ran, okay, most of the time, but you know, once a year
+on sort of the big day, things would get slow, things would always get slow.
+And it was sort of known that there's going to be some slowdown and some performance problems
+and it would all be, you know, kind of geared up and ready for it.
+This particular year, you know, approximately, I'm about 10 days into the job when, you
+know, big day arrives and it's terrible.
+It's awful.
+Like I've never seen, you know, I don't have time to experience there, but in my two
+weeks, I saw some poor performance.
+This was absolutely positively unusable.
+I mean, you would bring up the website.
+If you could log in as soon as you try to do anything, you would just stall after stall
+after stall.
+So it was pretty bad.
+So we were all kind of desperate to figure out what a solution just to get us through
+the day.
+So remember I'd say that the PHP application says how they all had an in a vest mount where
+they kept their code, that way they could all have the same code.
+And the developers, they were pretty insistent that that's how it, the developers wanted
+to be that way.
+So they could ensure that every application was running exactly the same.
+Well, we talked our managers into, you know, for today only, let us build some application
+servers that are exactly the same except that instead of, you know, instead of mounting
+the NFS server, we just copy all the files over and let these applications run, you
+know, just totally off local disk and, you know, in reality, it's a virtual disk on that
+scene we mentioned before, but it's not touching the NFS server.
+So that, that quick fix got us some pretty good results.
+So we went from unusable to actually pretty good.
+Now, at the time, we didn't understand why, we didn't know like, in our heads, we're
+thinking, okay, all it's doing is reading the PHP, which isn't that big of the NFS server
+and separating the NFS server from the application, fix the problem.
+We didn't understand it.
+One of the things we thought might be an issue was the sand performance, but the same thing,
+you know, the applications reading their content directly from the sand versus the applications
+reading their content from an NFS server that's on the sand was nine day difference.
+So after we all had a minute, a few days after the big day and we could kind of collect
+our thoughts and calm down and breathe a little bit, we started trying to figure out,
+okay, what is it about this NFS server?
+Well, anything that server is in the mix, performance tanks.
+So as we're digging in and as we're digging in, we start trying to involve the developers
+a little bit.
+And one thing that this application is doing that we didn't know about is logging and when
+I say logging, I mean, obviously we would look at, you know, the PHP logs and Apache logs.
+And those are things we were always looking at trying to figure out why is it slow and
+they didn't leave us anywhere.
+We didn't know what the application had another log that would log every SQL query that
+the application ran.
+So if you did a select, I mean, if you just logged in and search for yourself, search for
+your name and the application query would be written to the logs.
+And if you made a payment, that query would be written to the logs.
+Every query was written to the logs and I want to say the logs, that's wrong.
+It was all of those queries went to the same log file.
+That's sort of okay.
+No, that's not, that's really a bad idea.
+So NFS doesn't allow multiple clients to write to you the same file at the same time.
+So if a client says, hey, I need a write to this log file, NFS server will block the file,
+let the client log to it and then unlock the file.
+So because we had multiple application servers trying to write to the exact same file,
+the NFS server was slowing down the applications so it could queue up the rights.
+So that was the reason we saw such big performance gains when we moved off the NFS server is
+that the application didn't have to wait anymore before it can write to the query log.
+Now, eventually, when we heard about this, that's a bad idea for a lot of reasons,
+writing a query to a log.
+So eventually we were able to talk to developers out of logging this information, but it was
+a clear win for us because we were finally able to figure out like, what is it about this
+NFS server that makes these applications so bad?
+And this particular application wasn't the only one that was doing that writing to a common
+log file, but like I said, it was the biggest one and it was the one that calls the most
+problems and it was the one that got the most attention.
+So after that, we were still kind of interested in why the NFS performance was so bad and
+why it had gotten worse because the application itself, you know, where it's writing to this
+kind of common log file, it had been like that for years and there were some growth in
+the application, but not enough growth to explain the performance drop year over year.
+So we knew, even though we fixed the problem, but we knew there had to be something else
+kind of underlying because the problem was getting worse and worse and worse.
+So we had some pretty decent monitoring and we were able to, remember I said, the load
+average on the NFS server would go up when performance was bad and you could see it,
+you know, the owner of monitoring looked at graphs of load average and we could see, you
+know, big spikes whenever on busy days and drop off on weekends and stuff like that.
+And when we could zoom all the way out, we could zoom the graphs out till like a year and
+we could see, you know, then we could see big days and small days, but it was interesting
+to see sometimes, you know, you would go, so when you zoom out to like a year, you could
+see like a month at a time and the lawns would be pretty steady, you know, for month to
+month to month and then you would see kind of a drop and then month to month to month
+and you may see a stair step rise, month to month to month, a lot of times we would look
+at those and we would try to investigate, okay, what happened on this day that caused
+this sort of stair step and one thing we really noticed was we finally got rid of that
+crappy old sun slash Oracle San, upgraded to something considerably better, then you
+could definitely see the load average on that NFS server, you know, when I said it used
+to average one and maybe go up to four, you know, now it was down in the light .2 is
+1.3 and might go up to 0.8, so that was a huge difference in the application just changing
+the sand, but there was another place when we looked at the annual graph where we could
+see a drop in load average, a pretty significant maybe about 30% drop and we couldn't figure
+out one, a lot of times we could go back and we could see these stair steps and go back
+and oh, that was the day we changed this application or that was the day we got to understand, but
+we couldn't figure out one, there was one day, particular day, and it happened to be
+about a few months after this big day of where everything went south, a few months after
+that we saw like a 30% pretty steady month over month, week over week, 30% drop and load
+average and we couldn't figure out one, so I ended up working here, working at this
+top for about five years, you know, sort of the, I was always still kind of the new guy,
+you know, and just about anywhere you work, if you're working hot tea, if you're just a
+sad man, you're always kind of the afterthought, like no one really thinks about hot tea unless
+something's broken, and so I was on a team that no one ever thought about and I was like,
+the junior guy on the team that I went over thought about, so I had to, I said that to
+tell you, I had to move, I had to change offices a lot, it was kind of like a cube farm
+kind of place where there was, there was cubes and desk and offices and it was always nice
+to be able to move out, you know, from a cube into an office, but someone else would show
+up, you know, and they'd want my office, so I'd have to move out, and you know, because
+the, sort of the last person that was ever really considered whenever, thinking about who
+was going to, who was going to work in what office, I had to move off, there's a lot.
+One time I was getting ready to move offices again and I was cleaning out a file cabinet
+and it was just the folder I was looking through, it was just all kind of random receipts
+and hardware things and stuff like that, I picked up a receipt and I was looking at it
+and I was trying to figure out what it was and it was a receipt for returning a disc to
+son or to Oracle and I'm trying to figure out what it was like, why do we do that?
+And then I remembered that the guy who was my boss whenever I first started, the guy
+who started on the same day as me and didn't really have any good turnover and he was
+supposed to be my senior, he had done an RMA or like you, but one day he was in the
+day center and he saw on this son storage system that one of the disc had a yellow light
+instead of a green light, so he purported it to son, they sent him a replacement disc
+and he sent the old bad disc back to son and when I was staring at the piece of paperwork
+that documented that change and I thought to myself, I wonder if this has anything to do
+with that unexpected load average drop or the unexpected performance boosts on that NFS
+server and I looked at the date and it was within like a few days of that drop so finally
+I was able to piece together that NFS server, some of its discs were on the portion of
+the storage system that was built using Ray Vov and that disc that he replaced was part
+of that array.
+So the reason that the NFS performance had gotten worse year over year was because at some
+point during the year, no one noticed but a drive failed that was part of a Ray Vov
+array.
+If you know anything about Ray and Ray Vov, if you don't know anything what you do need
+to know is that Ray Vov is fine but if you lose a single disc out of a Ray Vov array,
+all of your data will still be there but the performance will be terrible.
+It no longer has an extra disc to write the parody information to so because of this
+Ray Vov array running with a bad disc, the performance was terrible and then when he swapped
+the disc out, that's when we could see, we didn't notice it at the time but that's when
+we could see the performance increase in the NFS server.
+So a long rambling story, I don't know if you can learn any lessons from that, except
+maybe if you want to change careers, one key to doing that is to plan ahead if you can
+but it's sort of the real key, you have to find someone who's desperate, desperate enough
+to hire someone with no experience, always be careful when you're logging or writing
+to a network share and never ever ever run Ray Vov in production period, I'll see you
+guys next time.
+You have been listening to Hacker Public Radio, and Hacker Public Radio does work.
+Today's show was contributed by a HPR listener like yourself.
+If you ever thought of recording podcasts, then click on our contribute link to find
+out how easy it really is.
+Hosting for HPR has been kindly provided by an onsthost.com, the Internet Archive and
+our Sync.net.
+Unless otherwise stated, today's show is released under Creative Commons, Attribution 4.0
+International License.