Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr4337.txt
+++ b/hpr_transcripts/hpr4337.txt
@@ -0,0 +1,371 @@
+Episode: 4337
+Title: HPR4337: Open Web UI
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4337/hpr4337.mp3
+Transcribed: 2025-10-25 23:17:17
+
+---
+
+This is Hacker Public Radio Episode 4337 for Tuesday, 18 March 2025.
+Today's show is entitled Open Web UI.
+It is the 120th show of operator and is about 31 minutes long.
+It carries an explicit flag.
+The summary is, I go over how I have my local LLM server set up because I RMR ref edit.
+Hello, hello, welcome to my LLM server, Hacker Public Radio with your host operator.
+This should be pretty short.
+I am going to record this episode about using Open Web UI, setting up Texas speech and
+importing prompts and adding search and adding rags, possibly.
+And essentially what I did is I ran a script that is supposed to shrink my Docker containers
+and then it is supposed to shrink the images in Windows subsystem philinics and something
+happened and it wrecked all of my Docker containers inside of my WSL images like all of them,
+which wasn't a lot and I wasn't really using all of them.
+But the one that was attached to my main Open Web UI server was completely broken and the
+Docker that was attached to the audio processing, the Kokui or whatever it is called was
+not working.
+So lucky for you, you get a HPR episode while I am on the tail end of doing the importing
+for the fabric patterns.
+So I have a short video on my YouTube that shows some of the features of what I've got
+today, but I've added a few more and added more and more models and things like that.
+So it's kind of a, you know, always a work in progress for these LLM's.
+But generally speaking, I've got it kind of set up and nice and happy.
+So I'll kind of go over my document here, I am pretty sure everything is set up, took
+me about an hour to get it all happy again.
+So I feel like I'm at a good spot where I've got all the stuff I need, the search works,
+the models, luckily didn't lose the models.
+Alamo is still in there and working, but anything I'm going to do with Docker is inside
+of one of such as in Prolinux, which was completely broken.
+Now I have like 100 and something gig, 120 gig thing that's really only 80 gigs.
+So I have to eventually sort that out.
+Let's see how big it is.
+So what happens with general virtual images is that they grow.
+So if you set them to a static size, you're fine.
+That's what's generally preferred in a corporate environment or enterprise.
+But and usually what people do at home is they set them as an initial size and they grow
+over time.
+I'm clicking buttons and I'm clicking the wrong buttons, I'm playing music and launching
+Minecraft.
+So the image I have right now is 126 gigs, gig bytes, and that is not how big it is and
+we're going to shrink it too.
+So we're going to be very careful of how we shrink it, going to use the proper packages
+or the proper command lines to do that without hopefully breaking anything.
+So first thing first, what you have to do, and this is kind of, I'll put in the kind
+of show notes, possibly, I mean, you need kind of need screenshots to go through it.
+But basically, first things first is kind of getting, let me swap the auto updates here
+on the bottom after setup.
+So we're going to put this after search or before search, we're going to do auto updates.
+And there's a million ways to install stuff nowadays.
+You got dockers and windows such as from Linux and virtualization and all that.
+For me, I've been able to get docker to run on windows such as from Linux.
+The reason I don't run docker natively is because I don't like having docker installed
+and as a service, I'd rather just use windows such as from Linux and then have it manage
+all that stuff for me.
+The problem there is when you start getting into GPU stuff, you have to install a CUDA
+tool kit for docker so that you can access the hardware level stuff.
+And it's not necessarily great as for from security standpoint.
+So that's one thing to kind of keep in mind.
+Anyways, so I feel like the first thing to do is kind of go over the installation and
+it looks like I've already skipped all over all that and assuming that I actually know
+how to install it.
+So what I usually do is the one liner and I have on my web, on my scripts site, I have
+an installer that will install OpenWebUI, it's like OpenWebUIFAST, let's see if I can
+find it back, open, let's see, yeah, OpenWebUIFAST.
+So this basically you run in any kind of Linux container and it will automatically kind
+of set everything up for you.
+It sets up the, and live in video container and adds that to the repo.
+I install curl and net tools just for troubleshooting.
+And then there's a memory thing where docker has like a maximum amount of memory or whatever.
+And that's setting up the memory to kind of hold more than the normal amount.
+I think it's at least six, excuse the memory.
+It's VM.map underscore, map underscore count equals and then a number.
+Anyways, basically when you have larger things doing things inside of docker, you have
+to set that, you know, max map count and it's almost like the CPU limit thing for processes.
+So a lot of people recommend using the docker installer as a script either way, you know,
+I like using apt and if it's a few versions behind and I'm not having any problems with
+my docker stuff, then I'll use the act version, but they tell you to use the, you know,
+docker install from kid docker, get dot docker.com.
+This uses the S H version, which both people tell you to use.
+Also you can set up the default DNS inside of docker to resolve to your, the host machine,
+which can cause problems when you're doing networking stuff.
+So that's another thing.
+Then we restart, a Lama or restart docker after setting up the DNS stuff in the memory.
+And then we check for a Lama and we run the Lama web based installer.
+Again, I don't even know if you can apt install a Lama.
+They probably have a docker board, but I use the S H, the local install version, and this
+is all inside of when this is just Linux or inside of a Linux container, whatever.
+And I look like a person very default, very default images.
+A small four gig, well, sorry.
+And let me put this in the show notes to, I don't really have the show notes table yet,
+open.
+And what else?
+Once Alamos kind of already installed, then we install that in video container cool kit,
+which is actually has an app to get installed in video container toolkit.
+They tell you to import the one from Nvidia and use that one.
+I would like to keep everything in apt if it's got an apt package and I don't have any
+issues, then stick with that.
+That way you don't have like 10 different repos essentially going on all over the place.
+Then you have to manage repos over time and it's kind of annoying.
+So yes, you're supposed to do it the way they tell you to do it, but from an upgrade
+and patch management standpoint, you know, you be the judge from a security standpoint.
+I don't know.
+Yeah, it's up to you, but for stability and making it a whatever, I would stick with
+apt and you can have apt manage all your security stuff for you.
+Then we say, I'm configuring container runtime to use and via CTK, which I'm not even
+sure if any of this Nvidia containers even needed with a Lama, so I don't know why
+even it's all that.
+And then Open Web UI, CUDA Docker, which this is the big long command line that uses the
+development version of Open Web UI for CUDA and it actually isn't listening on, yeah,
+listening on all interfaces, zero to zero to zero to zero.
+And you're pretty much up into the races from here, which I will reference this if I
+haven't already.
+Yes, I already have the notes and I will reference my other notes, which I should probably
+just like export as a text file or something.
+So that way you can at least have the text.
+There's some notes in here that aren't just screenshot, and it's a lot of its user preference.
+So once you've got Open Web UI open, I leave the user registration open so people can
+just register to use the platform.
+Your model names, who cares, whatever your system setup is, I do it like one underscore
+and then the model name and then colon and then the parameters it has and then maybe there's
+extra stuff in there.
+So that way I know how many, for me it's knowing how big the thing is, the model that I'm
+using is easiest and if I can see, you know, whatever B8B72P32P, that gives me an idea
+of, you know, which model is which is very confusing.
+I also disable the arena models and open Web UI and then document to work in progress.
+Oh, the rag stuff is not really working.
+I can't get it to work.
+I've uploaded, you know, in reviews and stuff for personal things, so the idea is I want
+my own, start building my own rags for my own data sets and I have a short video which
+I'll link to, for link, link to rag failure and if I do get the rag stuff working, which
+sounds like it's a nightmare, I will come back, come back and do another episode promises
+because I haven't gotten it to work yet and I would like to share once I do get it to
+work.
+On a small set of documents, we're not talking tons, we're talking, I want to say each
+document is anywhere from like 8,000 to 16,000 tokens and here's my discord notes so that's
+all rag stuff that I haven't gotten into yet.
+Oh, auto updates, so there's a Docker run, RM and then you use watch tower to update
+the branch of open web UI you see, which was confusing for me because it says run once
+is the pass, is the parameter you want to pass and then that's kind of what it, wait,
+what it looks like, let me, what's all this, I don't even know if this is running, oh that
+was something else.
+So what we'll say is that the reason I had to add the auto updates is because it's externally
+facing right and I don't necessarily want an old version of something hanging around for
+too long, cron, fs, no, cron tab, let me make sure this is actually, yeah, container,
+that's interesting, so I guess that's the name of the person and the repo.
+So we use watch tower to upgrade open web UI and it's open dash web UI, not the open dash
+web UI, dash dev, cuda, whatever, it's just the name of the platform and somehow it magically
+knows how to update, I don't understand.
+So that said, if you put it in cron, I have it check every four hours to make sure that
+the updates are being, are being to unchecked every like four hours or so.
+And that's been working for me for two, three months now, let's see, we're in this
+is for the search, so the next part is kind of setting up on a Mac search.
+So in open web UI, you have Cerex and GE support, like go, what's the other ones, they have
+paid ones, let me suggest a better one than Google, which I use Koji, which I don't have,
+they're still in beta for people that want to do internet search.
+So I would say web search, Google PSC is like programmable search engine or something.
+Cerex and GE, I not had any luck with, I run Cerex and GE before, Koji, you have to
+have an actual professional key, which I don't know how the fuck to get that, excuse me,
+because I've asked them forever and they never given me one.
+I think Cerepie, Cerepie is the one people I've talked about, but they support a bunch.
+Brave, Koji, Mohik, Rocha, Serp, TAC, Serpstack, Serpber, Serpply, search API, Cera, Pi, Serp
+API, DuckDuckO, Tavoli, Jenga, Gina, being an X, XEX, A, which I've never heard.
+I've heard, I want to say the S-E-R-P-A-P-I, S-E-R-P-S-E-R-P-A-P-I, is like, yeah, Google
+search API is theirs, they're paid, search API, which I supposed to be best, better than
+the free, I don't know.
+So that's might be something you want to look at, but anyways, we're going to move on from
+there.
+We've got how far we in, yeah, I think up 13 minutes.
+So the search is relatively easy to set up, you get your API, C, then your engine ID,
+and then I doubled it to search results count 6 and concurrent requests 20, and then I copy
+and paste it some info on how to set up the APIs, because Google is a nightmare to try
+to find API keys and set them up.
+So you go into the programmable search engine stuff, and you go to create account, and you
+go to control panel, and then you add, and you add a search engine ID, you'll get once
+you create a new app, or whatever you call it, and then you'll take that and generate
+a key, and you'll take that key and paste it in along with the...
+Here's an ID, I used it for AutoGPT and Cerex and G, and both epically failed, I haven't
+mess with AutoGPT in quite some time.
+So your next step is the audio, and I could probably get her to talk here in a second.
+I use Cokuru, and it's decent.
+Anyways, I speed it up to 1.5, and you can search for AF and there's Corbella, and see some
+examples of what she sounds like, but there's a bunch of models in there, and they're fast,
+and it's like, I don't want to say 104 gigs, so you can actually run it on a 6-get card
+or something.
+Of course, you're going to need, you know, to run decent models, you're going to need
+like an 8-get card at least, if not a 3090 to do some really decent stuff.
+At this point, with some creative prompting, I can get very close to, you know, Akaji
+with all its assistance.
+Akaji gives you access to, you know, Anthropic Open, AI, Google, Meta, Alibabas, and Amazon,
+and Deepseek, which I don't even know Ameson had one, but you get access to all those with
+Akaji Assistant.
+The same price as, has open AI, it's like 300 a year or whatever, they charge 30 bucks
+a month.
+Same price for Akaji, but you also get a search engine, excuse me, that is not terrible,
+and it's all customized.
+It's a little bit like Duck Duck Go, so they use Bash stuff, and I think they were the
+ones to kind of do that first, so you can do, like, Bash AI, and then your prompt will
+actually be a response device.
+Like what can you tell me about Earth and Curdy, and it's probably going to be like a painter
+or something, and it automatically uses web in, like, the last model you picked.
+So this is photo-realistic painter, that's not me, it's the guy that, whatever.
+So it's the wrong part of Curdy, but anyways, you get that point.
+So let's go kind of to the last part here, and I'll say, I think it's pretty much all
+set up.
+Let me check my document here.
+So when you're setting up Cogrew, I use the, what did I use?
+I used the Docker for that one also, the fast, Cogrew Fast API GPU, and it tells you
+on the site to do, but basically, first you have to install the Docker, well, first you
+have to app get, update, I do it upgrade, and then I add the repo for NVIDIA Container,
+and then it looks like I installed NVIDIA Container, and then I install Docker, and then
+I do the Docker run.
+And sometimes I have to manually restart the Docker, and it doesn't want to start back
+up.
+And I will also provide links to my script link to XML for ET commands, a tracker, the
+task manager for Windows, the task event manager, what do they call that?
+Task, task, task, task is what I call it, task schedule in Jesus.
+So I have an XML file that has full run multiple commands inside of a Windows subsystem
+for Linux image that we run a Docker.
+So it's the, anyways, it's complicated.
+So when my computer starts up, there's a startup script that keeps these things alive.
+And apparently, if you run with Cogrew, I have two different scripts.
+One is WSL minus D, then the name of the image, and then EXEC, and then D bus, dash, launch,
+space, true.
+And certainly that's what keeps it alive.
+If you don't do that, it will fall asleep and go idle every time it's not doing something.
+And then I had it run right after it, which there's no sleeping or anything, which I think
+is going to break.
+That's why I have to manually restart the Docker sometimes.
+Then it does the Docker run, GPU, all, dash, P, whatever, the regular run command for
+it.
+So that's probably needs to be fixed.
+And then same thing for Open Web UI, you have to do that EXEC, D bus, launch, true for
+things to work.
+Can I copy this?
+I can just put this in the actual, I don't have to provide that.
+But I'll go ahead and provide the whole script, excuse me.
+So we're good there.
+What else I'll say is once you get it set up, the URL is HTTP, and then your host
+name, and then colon 8880, whatever your port you specified, and then forward slash V1.
+And then you put that in the Open AI compatible list for audio and open Web UI.
+And then TDS voice is AF under scorebilla.
+I don't know how to do multiple ones.
+If you put a bunch of comms in there, it tries to use like all of them at once or whatever.
+So that doesn't work.
+And then the last step here is importing fabric prompts.
+Now I have a Python script that will create those fabric prompts for you.
+And then I also need to upload them to the website because I have a newer one here that
+is sitting in a deleted folder.
+And this is the reason I deleted everything in the deleted folder and then that's where
+I was doing some work.
+And it managed to mucky everything out.
+I was transcoding a bunch of stuff, and now it's all gone.
+And I have to start all over from scratch.
+So anyways, here's your fabric patterns under DURP.
+I want to go to my...
+So all in all, if you can wait even on the 3090, even if you can wait for the search to
+happen, and you can wait for the...
+It's easy for the model to get loaded up.
+I think it gets dumped out of VRAM after a while.
+So it does take like five seconds for the model to load back up.
+If you were doing this in a more permanent state, which I'm still messing around with
+things and I'm training models and whatever.
+So I can't leave, you know, 32-bit, you know, 24-gig model in my VRAM because then I can't
+play games or anything.
+So that said, the way open duty, the way Alama is set up is it will automatically tick stuff
+out if it's not being loaded up and it'll keep the VRAM, the VRAM empty for you.
+That's pretty much it.
+So these fabric patterns, right, I've talked to them about before, there's probably a link
+on my YouTube about using it and it's custom prompts for open web UI.
+So I'll use like a very 7B parameter and I'll do no web search, but I'll say web search
+and I'll say what can you tell me about what is the news, Kudji, yeah, video news.
+And I'll say like the first one day ago, Kate Fagan named president.
+So I can say, you know, what can you tell me about Kate Fagan named president and I'll
+use the web search using a 7B parameter.
+Search creating search query, searching, searching details, search 18 sites.
+So it pulls back XML, JSON from those 18 sites, it's only a certain amount of data that
+it can handle, honestly.
+And then it says, I'm unable to find where significant is the main escape by named president
+in the given context.
+It would be helpful if you could provide more context or clarify your question better to
+whatever references from and like literally it gives you a URL that says Kudji Fagan
+name president of mom plus pop music, me and Nashville love all the ball.
+So like that didn't even work.
+Now had it worked, let's try again, I don't know, had it worked, it's probably because
+this is a 7B parameter model, a very small model, like it's doing stuff.
+Coming back with websites, which should have data in them.
+Yeah, Kate Fagan is the name president of St. Louis, blah, blah, blah, her name is something,
+something, something.
+So I'm going to do four slash FAB dash and then I can say find hidden message or something.
+So there's a bunch of fabric prompts that are for using AI.
+And you can do call these prompts directly in open web UI with my giant JSON file.
+So let me go ahead and update that again, actually, free loads, scripts, miscellaneous and
+fabric patterns, open web UI, there's probably the newest ones in here.
+Can I do that?
+No, we'll not add file upload files, I dump this in here, let's say updated.
+Now we have some updated, the updated prompts for, which call it, looks like we're frozen
+because we're, I don't know why we're frozen, there we go, just doing something.
+Hidden message advocating for centralized education under the ultimate control, discriminating
+alternative forms of education and self-directed learning, promoting one size fits all approach
+to teaching and learning, encouraging conformity and suppression of decision-making options,
+oh my gosh, stop talking, clinical analysis, more balanced analysis and favorable analysis.
+So like it's kind of all three sides, you know, if it's left, you got your right for
+the clinical, and then middle is your, you know, middle ground, your, I don't want to be
+political, and then favorable analyst is on the side of, so if it's right that it's
+going to be right, favorable analyst, if it's, you know, left, it's going to be clinical
+analysis, that's the unhidden story of behind what how I use hidden message.
+So you know, that's some of the idea with these prompts, and it takes time to read, I
+just like 85 of them or something ridiculous, there's a lot, and it takes time to go through
+the, the fabric prompts, but I would definitely check them out.
+There's also in that same folder, miscellaneous folder, there's also old, let's call it,
+Professor Synapse, it's an older Professor Synapse that works kind of more universally.
+So it's a one, one shot prompt, which I'll share, it's a one shot prompt that will help you
+troubleshoot something, and I swear it has saved me three or four times now. So you copy and
+paste an error message inside of AI, and it kind of, it kind of tells you how to fix it, but it
+doesn't really help you troubleshoot it. So with this prompt, it kind of will create a prompt for
+you, and it'll be like, oh, I'm an expert in Docker containers, and I will help you troubleshoot
+this issue, and it'll tell you what to type, and you type it, and you give it the output, and it reads
+that output, I actually had it read the output from in-map when it told me to use like tell net
+to check a port open, and then I tried to do something else, and it was like, oh, you ran this
+on the host machine, and not the guest machine. Like, it detected and basically called me dumb,
+so it's actually very powerful to use Professor Synapse, and the newest one I think is for
+HTTPT or whatever, specifically, and it will call different tools. Now open web UI has tools,
+and it's still kind of in its infancy. I should take a check, again, I won't a universal transcoater,
+that will take any URL with media on it, and rip it, and then take that and create a transcribe it for
+me, and then I can, you know, ask you questions, is the idea, or feed it into a guff, or feed it into
+a rag, so I can, you know, continue to add to that rag additional information. On certain things,
+well, I collect them. Anyways, that's pretty much where I'm at with open web UI. Everything's
+back, and happy now. I'm the only person that uses it. I have it for kind of friends and family.
+Nobody uses it. You can export your chats, admin settings. I'm going to check the admin settings
+more time. Audio images, we're not doing pipelines, we're not doing in databases, we're not doing.
+So I'm all set up. It wasn't as miserable as I thought. So I ended up with DeepSeq R1,
+which is a learning model 32b that will run on the 3090, and then Quinn is 32b, but it's not a
+learning model. So if I just want an answer, and I don't want to go and think in and go in back
+and forth, I'll use Quinn to 0.5, and then I have an 8B dolphin, a lumbar 3.1, which is a good
+8B model that's quick, and we'll sometimes loosen it, but it's good and fast for if you want a
+quick easy, you know, to reword something, or do English, or whatever. So if I'm rewording a thing,
+I'll use a 8B model to reword, or add hashtags to the following thing. I actually lost all my
+prompts. I had a couple custom prompts for myself, but I had to rebuild those.
+And then I have Minecraft, Minecraft LLM, which does the AI Andy bot for Minecraft, which is
+kind of interesting. It's a way to show kids and people, hey, here's something useful that AI
+does besides, you know, voice clothing, or voice clothing or image generation. They're working
+on image stuff, and I'm doing some fine tuning for easier models and learning models with Andy.
+Well, we call him Andy. He's the model that we're trying to train to play Minecraft better,
+without having a massive rig, 70b parameters or something. And then I'm in bed, which is used with
+Minecraft to do it's easier magic. And then we have some might not say for work stuff that's for
+like chatting and roleplay, which doesn't really work, but I'm still kind of playing around with
+that. That's pretty much where my models that are set today. If you want to try it out, let me know,
+hit me up. You know, preload 101.yahoo.com, if you want to invite and play around, I'm doing
+training for anything in the jamboree suite, jamboree.armacardy.com. And you know, if you're hearing
+this, it's probably not in the, you know, backup episodes because it's a more timely relevant.
+And I'm all over the place. So if you want me to go and specifically talk to about a specific
+thing and get it to run or offer you help on getting something to run locally, let me know. But
+you got to have at least a six gig card to do anything useful. And then the more useful you want,
+it goes up from there. I would like another 3090, but that's not possible.
+What even then that wouldn't put me in the, you know, 4 billion parameter market that you're
+working against, but that's why the, you know, deep seek stuff is getting better in the learning
+models or they call them IFEs or something. There's, there's a name for the learning, the learning
+models. Anyways, I've taken up too much of your time. If you're still here, record an episode,
+take it easy, peace out.
+You have been listening to Hacker Public Radio. Hacker Public Radio does a walk.
+Today's show was contributed by a HBR listener like yourself. If you ever thought of recording
+broadcast, you click on our contribute link to find out how easy it really is.
+Hosting for HBR has been kindly provided by an honesthost.com, the internet archive and our
+things.net. On this advice status, today's show is released on their creative commons,
+Attribution 4.0 International License.