Files

372 lines
26 KiB
Plaintext
Raw Permalink Normal View History

Episode: 4337
Title: HPR4337: Open Web UI
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4337/hpr4337.mp3
Transcribed: 2025-10-25 23:17:17
---
This is Hacker Public Radio Episode 4337 for Tuesday, 18 March 2025.
Today's show is entitled Open Web UI.
It is the 120th show of operator and is about 31 minutes long.
It carries an explicit flag.
The summary is, I go over how I have my local LLM server set up because I RMR ref edit.
Hello, hello, welcome to my LLM server, Hacker Public Radio with your host operator.
This should be pretty short.
I am going to record this episode about using Open Web UI, setting up Texas speech and
importing prompts and adding search and adding rags, possibly.
And essentially what I did is I ran a script that is supposed to shrink my Docker containers
and then it is supposed to shrink the images in Windows subsystem philinics and something
happened and it wrecked all of my Docker containers inside of my WSL images like all of them,
which wasn't a lot and I wasn't really using all of them.
But the one that was attached to my main Open Web UI server was completely broken and the
Docker that was attached to the audio processing, the Kokui or whatever it is called was
not working.
So lucky for you, you get a HPR episode while I am on the tail end of doing the importing
for the fabric patterns.
So I have a short video on my YouTube that shows some of the features of what I've got
today, but I've added a few more and added more and more models and things like that.
So it's kind of a, you know, always a work in progress for these LLM's.
But generally speaking, I've got it kind of set up and nice and happy.
So I'll kind of go over my document here, I am pretty sure everything is set up, took
me about an hour to get it all happy again.
So I feel like I'm at a good spot where I've got all the stuff I need, the search works,
the models, luckily didn't lose the models.
Alamo is still in there and working, but anything I'm going to do with Docker is inside
of one of such as in Prolinux, which was completely broken.
Now I have like 100 and something gig, 120 gig thing that's really only 80 gigs.
So I have to eventually sort that out.
Let's see how big it is.
So what happens with general virtual images is that they grow.
So if you set them to a static size, you're fine.
That's what's generally preferred in a corporate environment or enterprise.
But and usually what people do at home is they set them as an initial size and they grow
over time.
I'm clicking buttons and I'm clicking the wrong buttons, I'm playing music and launching
Minecraft.
So the image I have right now is 126 gigs, gig bytes, and that is not how big it is and
we're going to shrink it too.
So we're going to be very careful of how we shrink it, going to use the proper packages
or the proper command lines to do that without hopefully breaking anything.
So first thing first, what you have to do, and this is kind of, I'll put in the kind
of show notes, possibly, I mean, you need kind of need screenshots to go through it.
But basically, first things first is kind of getting, let me swap the auto updates here
on the bottom after setup.
So we're going to put this after search or before search, we're going to do auto updates.
And there's a million ways to install stuff nowadays.
You got dockers and windows such as from Linux and virtualization and all that.
For me, I've been able to get docker to run on windows such as from Linux.
The reason I don't run docker natively is because I don't like having docker installed
and as a service, I'd rather just use windows such as from Linux and then have it manage
all that stuff for me.
The problem there is when you start getting into GPU stuff, you have to install a CUDA
tool kit for docker so that you can access the hardware level stuff.
And it's not necessarily great as for from security standpoint.
So that's one thing to kind of keep in mind.
Anyways, so I feel like the first thing to do is kind of go over the installation and
it looks like I've already skipped all over all that and assuming that I actually know
how to install it.
So what I usually do is the one liner and I have on my web, on my scripts site, I have
an installer that will install OpenWebUI, it's like OpenWebUIFAST, let's see if I can
find it back, open, let's see, yeah, OpenWebUIFAST.
So this basically you run in any kind of Linux container and it will automatically kind
of set everything up for you.
It sets up the, and live in video container and adds that to the repo.
I install curl and net tools just for troubleshooting.
And then there's a memory thing where docker has like a maximum amount of memory or whatever.
And that's setting up the memory to kind of hold more than the normal amount.
I think it's at least six, excuse the memory.
It's VM.map underscore, map underscore count equals and then a number.
Anyways, basically when you have larger things doing things inside of docker, you have
to set that, you know, max map count and it's almost like the CPU limit thing for processes.
So a lot of people recommend using the docker installer as a script either way, you know,
I like using apt and if it's a few versions behind and I'm not having any problems with
my docker stuff, then I'll use the act version, but they tell you to use the, you know,
docker install from kid docker, get dot docker.com.
This uses the S H version, which both people tell you to use.
Also you can set up the default DNS inside of docker to resolve to your, the host machine,
which can cause problems when you're doing networking stuff.
So that's another thing.
Then we restart, a Lama or restart docker after setting up the DNS stuff in the memory.
And then we check for a Lama and we run the Lama web based installer.
Again, I don't even know if you can apt install a Lama.
They probably have a docker board, but I use the S H, the local install version, and this
is all inside of when this is just Linux or inside of a Linux container, whatever.
And I look like a person very default, very default images.
A small four gig, well, sorry.
And let me put this in the show notes to, I don't really have the show notes table yet,
open.
And what else?
Once Alamos kind of already installed, then we install that in video container cool kit,
which is actually has an app to get installed in video container toolkit.
They tell you to import the one from Nvidia and use that one.
I would like to keep everything in apt if it's got an apt package and I don't have any
issues, then stick with that.
That way you don't have like 10 different repos essentially going on all over the place.
Then you have to manage repos over time and it's kind of annoying.
So yes, you're supposed to do it the way they tell you to do it, but from an upgrade
and patch management standpoint, you know, you be the judge from a security standpoint.
I don't know.
Yeah, it's up to you, but for stability and making it a whatever, I would stick with
apt and you can have apt manage all your security stuff for you.
Then we say, I'm configuring container runtime to use and via CTK, which I'm not even
sure if any of this Nvidia containers even needed with a Lama, so I don't know why
even it's all that.
And then Open Web UI, CUDA Docker, which this is the big long command line that uses the
development version of Open Web UI for CUDA and it actually isn't listening on, yeah,
listening on all interfaces, zero to zero to zero to zero.
And you're pretty much up into the races from here, which I will reference this if I
haven't already.
Yes, I already have the notes and I will reference my other notes, which I should probably
just like export as a text file or something.
So that way you can at least have the text.
There's some notes in here that aren't just screenshot, and it's a lot of its user preference.
So once you've got Open Web UI open, I leave the user registration open so people can
just register to use the platform.
Your model names, who cares, whatever your system setup is, I do it like one underscore
and then the model name and then colon and then the parameters it has and then maybe there's
extra stuff in there.
So that way I know how many, for me it's knowing how big the thing is, the model that I'm
using is easiest and if I can see, you know, whatever B8B72P32P, that gives me an idea
of, you know, which model is which is very confusing.
I also disable the arena models and open Web UI and then document to work in progress.
Oh, the rag stuff is not really working.
I can't get it to work.
I've uploaded, you know, in reviews and stuff for personal things, so the idea is I want
my own, start building my own rags for my own data sets and I have a short video which
I'll link to, for link, link to rag failure and if I do get the rag stuff working, which
sounds like it's a nightmare, I will come back, come back and do another episode promises
because I haven't gotten it to work yet and I would like to share once I do get it to
work.
On a small set of documents, we're not talking tons, we're talking, I want to say each
document is anywhere from like 8,000 to 16,000 tokens and here's my discord notes so that's
all rag stuff that I haven't gotten into yet.
Oh, auto updates, so there's a Docker run, RM and then you use watch tower to update
the branch of open web UI you see, which was confusing for me because it says run once
is the pass, is the parameter you want to pass and then that's kind of what it, wait,
what it looks like, let me, what's all this, I don't even know if this is running, oh that
was something else.
So what we'll say is that the reason I had to add the auto updates is because it's externally
facing right and I don't necessarily want an old version of something hanging around for
too long, cron, fs, no, cron tab, let me make sure this is actually, yeah, container,
that's interesting, so I guess that's the name of the person and the repo.
So we use watch tower to upgrade open web UI and it's open dash web UI, not the open dash
web UI, dash dev, cuda, whatever, it's just the name of the platform and somehow it magically
knows how to update, I don't understand.
So that said, if you put it in cron, I have it check every four hours to make sure that
the updates are being, are being to unchecked every like four hours or so.
And that's been working for me for two, three months now, let's see, we're in this
is for the search, so the next part is kind of setting up on a Mac search.
So in open web UI, you have Cerex and GE support, like go, what's the other ones, they have
paid ones, let me suggest a better one than Google, which I use Koji, which I don't have,
they're still in beta for people that want to do internet search.
So I would say web search, Google PSC is like programmable search engine or something.
Cerex and GE, I not had any luck with, I run Cerex and GE before, Koji, you have to
have an actual professional key, which I don't know how the fuck to get that, excuse me,
because I've asked them forever and they never given me one.
I think Cerepie, Cerepie is the one people I've talked about, but they support a bunch.
Brave, Koji, Mohik, Rocha, Serp, TAC, Serpstack, Serpber, Serpply, search API, Cera, Pi, Serp
API, DuckDuckO, Tavoli, Jenga, Gina, being an X, XEX, A, which I've never heard.
I've heard, I want to say the S-E-R-P-A-P-I, S-E-R-P-S-E-R-P-A-P-I, is like, yeah, Google
search API is theirs, they're paid, search API, which I supposed to be best, better than
the free, I don't know.
So that's might be something you want to look at, but anyways, we're going to move on from
there.
We've got how far we in, yeah, I think up 13 minutes.
So the search is relatively easy to set up, you get your API, C, then your engine ID,
and then I doubled it to search results count 6 and concurrent requests 20, and then I copy
and paste it some info on how to set up the APIs, because Google is a nightmare to try
to find API keys and set them up.
So you go into the programmable search engine stuff, and you go to create account, and you
go to control panel, and then you add, and you add a search engine ID, you'll get once
you create a new app, or whatever you call it, and then you'll take that and generate
a key, and you'll take that key and paste it in along with the...
Here's an ID, I used it for AutoGPT and Cerex and G, and both epically failed, I haven't
mess with AutoGPT in quite some time.
So your next step is the audio, and I could probably get her to talk here in a second.
I use Cokuru, and it's decent.
Anyways, I speed it up to 1.5, and you can search for AF and there's Corbella, and see some
examples of what she sounds like, but there's a bunch of models in there, and they're fast,
and it's like, I don't want to say 104 gigs, so you can actually run it on a 6-get card
or something.
Of course, you're going to need, you know, to run decent models, you're going to need
like an 8-get card at least, if not a 3090 to do some really decent stuff.
At this point, with some creative prompting, I can get very close to, you know, Akaji
with all its assistance.
Akaji gives you access to, you know, Anthropic Open, AI, Google, Meta, Alibabas, and Amazon,
and Deepseek, which I don't even know Ameson had one, but you get access to all those with
Akaji Assistant.
The same price as, has open AI, it's like 300 a year or whatever, they charge 30 bucks
a month.
Same price for Akaji, but you also get a search engine, excuse me, that is not terrible,
and it's all customized.
It's a little bit like Duck Duck Go, so they use Bash stuff, and I think they were the
ones to kind of do that first, so you can do, like, Bash AI, and then your prompt will
actually be a response device.
Like what can you tell me about Earth and Curdy, and it's probably going to be like a painter
or something, and it automatically uses web in, like, the last model you picked.
So this is photo-realistic painter, that's not me, it's the guy that, whatever.
So it's the wrong part of Curdy, but anyways, you get that point.
So let's go kind of to the last part here, and I'll say, I think it's pretty much all
set up.
Let me check my document here.
So when you're setting up Cogrew, I use the, what did I use?
I used the Docker for that one also, the fast, Cogrew Fast API GPU, and it tells you
on the site to do, but basically, first you have to install the Docker, well, first you
have to app get, update, I do it upgrade, and then I add the repo for NVIDIA Container,
and then it looks like I installed NVIDIA Container, and then I install Docker, and then
I do the Docker run.
And sometimes I have to manually restart the Docker, and it doesn't want to start back
up.
And I will also provide links to my script link to XML for ET commands, a tracker, the
task manager for Windows, the task event manager, what do they call that?
Task, task, task, task is what I call it, task schedule in Jesus.
So I have an XML file that has full run multiple commands inside of a Windows subsystem
for Linux image that we run a Docker.
So it's the, anyways, it's complicated.
So when my computer starts up, there's a startup script that keeps these things alive.
And apparently, if you run with Cogrew, I have two different scripts.
One is WSL minus D, then the name of the image, and then EXEC, and then D bus, dash, launch,
space, true.
And certainly that's what keeps it alive.
If you don't do that, it will fall asleep and go idle every time it's not doing something.
And then I had it run right after it, which there's no sleeping or anything, which I think
is going to break.
That's why I have to manually restart the Docker sometimes.
Then it does the Docker run, GPU, all, dash, P, whatever, the regular run command for
it.
So that's probably needs to be fixed.
And then same thing for Open Web UI, you have to do that EXEC, D bus, launch, true for
things to work.
Can I copy this?
I can just put this in the actual, I don't have to provide that.
But I'll go ahead and provide the whole script, excuse me.
So we're good there.
What else I'll say is once you get it set up, the URL is HTTP, and then your host
name, and then colon 8880, whatever your port you specified, and then forward slash V1.
And then you put that in the Open AI compatible list for audio and open Web UI.
And then TDS voice is AF under scorebilla.
I don't know how to do multiple ones.
If you put a bunch of comms in there, it tries to use like all of them at once or whatever.
So that doesn't work.
And then the last step here is importing fabric prompts.
Now I have a Python script that will create those fabric prompts for you.
And then I also need to upload them to the website because I have a newer one here that
is sitting in a deleted folder.
And this is the reason I deleted everything in the deleted folder and then that's where
I was doing some work.
And it managed to mucky everything out.
I was transcoding a bunch of stuff, and now it's all gone.
And I have to start all over from scratch.
So anyways, here's your fabric patterns under DURP.
I want to go to my...
So all in all, if you can wait even on the 3090, even if you can wait for the search to
happen, and you can wait for the...
It's easy for the model to get loaded up.
I think it gets dumped out of VRAM after a while.
So it does take like five seconds for the model to load back up.
If you were doing this in a more permanent state, which I'm still messing around with
things and I'm training models and whatever.
So I can't leave, you know, 32-bit, you know, 24-gig model in my VRAM because then I can't
play games or anything.
So that said, the way open duty, the way Alama is set up is it will automatically tick stuff
out if it's not being loaded up and it'll keep the VRAM, the VRAM empty for you.
That's pretty much it.
So these fabric patterns, right, I've talked to them about before, there's probably a link
on my YouTube about using it and it's custom prompts for open web UI.
So I'll use like a very 7B parameter and I'll do no web search, but I'll say web search
and I'll say what can you tell me about what is the news, Kudji, yeah, video news.
And I'll say like the first one day ago, Kate Fagan named president.
So I can say, you know, what can you tell me about Kate Fagan named president and I'll
use the web search using a 7B parameter.
Search creating search query, searching, searching details, search 18 sites.
So it pulls back XML, JSON from those 18 sites, it's only a certain amount of data that
it can handle, honestly.
And then it says, I'm unable to find where significant is the main escape by named president
in the given context.
It would be helpful if you could provide more context or clarify your question better to
whatever references from and like literally it gives you a URL that says Kudji Fagan
name president of mom plus pop music, me and Nashville love all the ball.
So like that didn't even work.
Now had it worked, let's try again, I don't know, had it worked, it's probably because
this is a 7B parameter model, a very small model, like it's doing stuff.
Coming back with websites, which should have data in them.
Yeah, Kate Fagan is the name president of St. Louis, blah, blah, blah, her name is something,
something, something.
So I'm going to do four slash FAB dash and then I can say find hidden message or something.
So there's a bunch of fabric prompts that are for using AI.
And you can do call these prompts directly in open web UI with my giant JSON file.
So let me go ahead and update that again, actually, free loads, scripts, miscellaneous and
fabric patterns, open web UI, there's probably the newest ones in here.
Can I do that?
No, we'll not add file upload files, I dump this in here, let's say updated.
Now we have some updated, the updated prompts for, which call it, looks like we're frozen
because we're, I don't know why we're frozen, there we go, just doing something.
Hidden message advocating for centralized education under the ultimate control, discriminating
alternative forms of education and self-directed learning, promoting one size fits all approach
to teaching and learning, encouraging conformity and suppression of decision-making options,
oh my gosh, stop talking, clinical analysis, more balanced analysis and favorable analysis.
So like it's kind of all three sides, you know, if it's left, you got your right for
the clinical, and then middle is your, you know, middle ground, your, I don't want to be
political, and then favorable analyst is on the side of, so if it's right that it's
going to be right, favorable analyst, if it's, you know, left, it's going to be clinical
analysis, that's the unhidden story of behind what how I use hidden message.
So you know, that's some of the idea with these prompts, and it takes time to read, I
just like 85 of them or something ridiculous, there's a lot, and it takes time to go through
the, the fabric prompts, but I would definitely check them out.
There's also in that same folder, miscellaneous folder, there's also old, let's call it,
Professor Synapse, it's an older Professor Synapse that works kind of more universally.
So it's a one, one shot prompt, which I'll share, it's a one shot prompt that will help you
troubleshoot something, and I swear it has saved me three or four times now. So you copy and
paste an error message inside of AI, and it kind of, it kind of tells you how to fix it, but it
doesn't really help you troubleshoot it. So with this prompt, it kind of will create a prompt for
you, and it'll be like, oh, I'm an expert in Docker containers, and I will help you troubleshoot
this issue, and it'll tell you what to type, and you type it, and you give it the output, and it reads
that output, I actually had it read the output from in-map when it told me to use like tell net
to check a port open, and then I tried to do something else, and it was like, oh, you ran this
on the host machine, and not the guest machine. Like, it detected and basically called me dumb,
so it's actually very powerful to use Professor Synapse, and the newest one I think is for
HTTPT or whatever, specifically, and it will call different tools. Now open web UI has tools,
and it's still kind of in its infancy. I should take a check, again, I won't a universal transcoater,
that will take any URL with media on it, and rip it, and then take that and create a transcribe it for
me, and then I can, you know, ask you questions, is the idea, or feed it into a guff, or feed it into
a rag, so I can, you know, continue to add to that rag additional information. On certain things,
well, I collect them. Anyways, that's pretty much where I'm at with open web UI. Everything's
back, and happy now. I'm the only person that uses it. I have it for kind of friends and family.
Nobody uses it. You can export your chats, admin settings. I'm going to check the admin settings
more time. Audio images, we're not doing pipelines, we're not doing in databases, we're not doing.
So I'm all set up. It wasn't as miserable as I thought. So I ended up with DeepSeq R1,
which is a learning model 32b that will run on the 3090, and then Quinn is 32b, but it's not a
learning model. So if I just want an answer, and I don't want to go and think in and go in back
and forth, I'll use Quinn to 0.5, and then I have an 8B dolphin, a lumbar 3.1, which is a good
8B model that's quick, and we'll sometimes loosen it, but it's good and fast for if you want a
quick easy, you know, to reword something, or do English, or whatever. So if I'm rewording a thing,
I'll use a 8B model to reword, or add hashtags to the following thing. I actually lost all my
prompts. I had a couple custom prompts for myself, but I had to rebuild those.
And then I have Minecraft, Minecraft LLM, which does the AI Andy bot for Minecraft, which is
kind of interesting. It's a way to show kids and people, hey, here's something useful that AI
does besides, you know, voice clothing, or voice clothing or image generation. They're working
on image stuff, and I'm doing some fine tuning for easier models and learning models with Andy.
Well, we call him Andy. He's the model that we're trying to train to play Minecraft better,
without having a massive rig, 70b parameters or something. And then I'm in bed, which is used with
Minecraft to do it's easier magic. And then we have some might not say for work stuff that's for
like chatting and roleplay, which doesn't really work, but I'm still kind of playing around with
that. That's pretty much where my models that are set today. If you want to try it out, let me know,
hit me up. You know, preload 101.yahoo.com, if you want to invite and play around, I'm doing
training for anything in the jamboree suite, jamboree.armacardy.com. And you know, if you're hearing
this, it's probably not in the, you know, backup episodes because it's a more timely relevant.
And I'm all over the place. So if you want me to go and specifically talk to about a specific
thing and get it to run or offer you help on getting something to run locally, let me know. But
you got to have at least a six gig card to do anything useful. And then the more useful you want,
it goes up from there. I would like another 3090, but that's not possible.
What even then that wouldn't put me in the, you know, 4 billion parameter market that you're
working against, but that's why the, you know, deep seek stuff is getting better in the learning
models or they call them IFEs or something. There's, there's a name for the learning, the learning
models. Anyways, I've taken up too much of your time. If you're still here, record an episode,
take it easy, peace out.
You have been listening to Hacker Public Radio. Hacker Public Radio does a walk.
Today's show was contributed by a HBR listener like yourself. If you ever thought of recording
broadcast, you click on our contribute link to find out how easy it really is.
Hosting for HBR has been kindly provided by an honesthost.com, the internet archive and our
things.net. On this advice status, today's show is released on their creative commons,
Attribution 4.0 International License.