Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
203
hpr_transcripts/hpr2637.txt
Normal file
203
hpr_transcripts/hpr2637.txt
Normal file
@@ -0,0 +1,203 @@
|
||||
Episode: 2637
|
||||
Title: HPR2637: Convert it to Text
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2637/hpr2637.mp3
|
||||
Transcribed: 2025-10-19 06:53:05
|
||||
|
||||
---
|
||||
|
||||
This is HPR Episode 2637 entitled Convert It To Text.
|
||||
It is hosted by me and in about 16 minutes long and Karima Clean Flag.
|
||||
The summary is, this episode will make you want to de-exe all the things.
|
||||
This episode of HPR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HPR15.
|
||||
That's HPR15.
|
||||
Better web hosting that's honest and fair at AnanasThost.com.
|
||||
Hello Hacker Public Radio fans, this is Bee Easy once again bringing you another episode.
|
||||
This time I'm going to talk about a subject that's near and dear to my heart.
|
||||
But before I get there, I want to just say that I am recording this on a new microphone
|
||||
that I got on Amazon that was on sale with a kit that had a whole bunch of stuff as an audio
|
||||
technical ATR 2100 USB has both a USB and a ATR connection or XLR connection.
|
||||
And it also came with a second XLR connection and anti-popping screen for like $70.
|
||||
So I don't know why I was like that but I got it.
|
||||
And my wife gave me a little side eye when I got it because it's completely unnecessary
|
||||
but I couldn't let the deal pass me up.
|
||||
Anyway, let's get to the episode.
|
||||
So this episode talking about converting it to text, whatever it is, convert it to text.
|
||||
And you might wonder why would you want text?
|
||||
You have spent the last, I don't know, 20, 30 years of computer science making all these
|
||||
different formats of being able to see and visualize information in different ways.
|
||||
Why would you want it in text and plain text?
|
||||
And I have a couple of reasons.
|
||||
The main reasons are for portability.
|
||||
We already know that Microsoft has changed their standard from Doc to DocX and they've
|
||||
changed DocX several different times.
|
||||
And ODS is a good standard but it's not portable to everywhere because not everyone supports
|
||||
it.
|
||||
But pretty much anything that is at all text-based has some type of ASCII characters associated
|
||||
with it.
|
||||
It's also very useful to use text with the basic Unix tool set because Unix philosophy is
|
||||
using everything as a file and being plain text means that you can chain tools together and
|
||||
make really interesting and complex systems out of very simple technologies underneath.
|
||||
And another reason and the main reason that I'm going to focus on right now is because
|
||||
of what you can do with the Unix tools, there are tools that are built on top of those
|
||||
tools for visualizing things.
|
||||
My favorite one of which is called Ranger.
|
||||
And Ranger is a, from their website which is on savanna.nangadoo.org, it is a free console
|
||||
file manager which is what I use it for.
|
||||
It's a console file manager that gives you greater flexibility and a good overview of
|
||||
your files without having to leave your next console.
|
||||
It visualizes the directory tree in two dimensions, the directory hierarchy in one list of files
|
||||
on another with a preview to the right so that you know where you'll be going.
|
||||
And so the idea is that the entire file system appears in three panes and those three panes
|
||||
like it describes, the first pane is the context of where you've been, the middle pane is
|
||||
where you are and to the right is either where you are going if you go into the next level
|
||||
of the tree or a preview of the file that you're on.
|
||||
So if what you're currently selecting is a file, it's a preview of the file and if it
|
||||
is a directory then it's a list of what is in that directory.
|
||||
And so usually you would think it would be limited to just being able to look at plain
|
||||
text files.
|
||||
And I, maybe I'll include a screenshot of what it looks like.
|
||||
You just have to see it to believe it.
|
||||
You can go through your entire file system and see what every file has in it pretty much
|
||||
without ever having to open up a GUI file manager and then having to double click on the
|
||||
file and waiting for whatever usual software it takes to load and look at that file.
|
||||
It's just, it's just really amazing.
|
||||
And then you can always click over or enter one more time and actually edit that file
|
||||
or open it and it's native open of session manager opening, which I think it uses xdg open
|
||||
to open things.
|
||||
So let me just talk about how I use it.
|
||||
So the biggest draw for me, the function out of that is really powerful, is the scope
|
||||
functionality.
|
||||
And it's all locked up in a file called scope.sh, which is just bash.
|
||||
And it's in your home slash dot config dot ranger dot scope and it comes out of the box
|
||||
with a bunch of different things.
|
||||
If you read the documentation, it has a bunch of different things.
|
||||
If you install additional documents, additional programs, it works natively.
|
||||
But you can, since all it really takes is to be able, the ability to be able to take
|
||||
whatever file and convert it to text and to output that text as a dump, like a dump to
|
||||
the SDD in, if you have that ability, no matter what you use, besides what they have
|
||||
given you out of the box, you can build upon those scopes.
|
||||
And like I said, the scope is basically a big switch statement based on either file extension
|
||||
or mime type.
|
||||
And if it's a mime type of text, it'll try to use, if you have the program called highlight
|
||||
installed, it'll try to do highlighting of that file based on file session.
|
||||
So if it's a dot pl file, it'll do pearl highlighting.
|
||||
If it's a dot pl file, it'll do Python.
|
||||
If it's dot a sage, it'll do bash.
|
||||
So just for that reason, it's amazing.
|
||||
But if you say you don't deal with those types of files all the time, maybe you deal with
|
||||
you have tar files and zip files.
|
||||
Well, if you install a tool, a tool standing for archive tool, that will make it so that
|
||||
it will automatically preview the contents inside of a zip file.
|
||||
So without having to open the zip file or the tar file, you'll be able to see it.
|
||||
Or if you're like me, and you like, you do like to have plain text files, but they're
|
||||
really big, and you gzip them, it'll, it'll, guns up them and put them in standard end.
|
||||
Right on the screen for you.
|
||||
If you have another tool called poplar utils installed, any PDF that you go over, if it's
|
||||
text inside of it, not a scanned image, but if it's text inside of it, it'll do PDF
|
||||
to text and put text on the screen for you.
|
||||
So you can read the contents of PDF files.
|
||||
If you have Kaka utils installed, it can do one of two things.
|
||||
It can either do ASCII art of any images that you have, which is pretty cool just to see
|
||||
it do ASCII art of whatever files you have.
|
||||
But for certain window environments and terminals, it'll also do the actual picture in your
|
||||
terminal.
|
||||
It doesn't work on GNOME, but it works on, it works on Mate, it works on LSD, if you have,
|
||||
so I think, I think it's the limitation of, of, of mother.
|
||||
And even if I do it on Mate, if I, if I use, if I use a fancy, like if I use compis,
|
||||
I don't think it works.
|
||||
But if you use Marco, or you don't use any compositing and use the built-in Mate terminal,
|
||||
you can actually see the pictures, which is pretty cool.
|
||||
And if you don't have that, it'll still show ASCII art.
|
||||
And there's a, so, and then media info, it's another thing that if you have anything
|
||||
that's a media file, it'll use media info to look at things like the size, the, the encoding
|
||||
type for the audio and for the video.
|
||||
And then for HTML files, it'll, it'll either use links or W3M or e-links to preview that
|
||||
file in plain text, because those are plain text web browsers.
|
||||
So that's out of the box.
|
||||
But like I said, I have added a couple of the things.
|
||||
And I'll include my scope.sh in, in the show notes as well.
|
||||
But the things that I've added, and maybe you just want to use these tools separately,
|
||||
because they're useful tools.
|
||||
There's one called CatDoc, and CatDoc will turn any .odoc or .xls file to either, to either
|
||||
txt or CSV, which is very useful.
|
||||
There's catppt, which will turn any power port presentation into text.
|
||||
There's odt to txt, which will turn odt files.
|
||||
There's ods to tsv, which will turn ods files to tab development files.
|
||||
And then for the newer file systems, the newer extensions for, and file formats for Microsoft
|
||||
Office products, there's odoc x to txt and xls x to CSV, which will turn those file types
|
||||
into text and, and CSV files, which, you know, and those, you know,
|
||||
those types of files are pretty much, I don't know, maybe 98% of all the files, 99% of
|
||||
all the files have ever opened ever, are either plain text or one of those types of files.
|
||||
And so in the little bit of time where it's not those files, I'll just, you know, have
|
||||
to open them.
|
||||
But for the most part, I can't even, I can't stand opening a GUI-based file manager because
|
||||
it takes too long to find anything.
|
||||
And it also has, you know, the ability to, to bookmark items and then it has VIM key
|
||||
bindings so that you can go up, down, left, right, as in the VIM style, but it also has
|
||||
VIM style marks so that you can mark a file and then go to a different place and then come
|
||||
back to that mark.
|
||||
It also allows you to do tab browsing so you can go open up a tab and you can highlight
|
||||
multiple items by clicking spacebar and do dd, which is delete, but it's really cut
|
||||
and then you go to the another place and put and type pp and it'll paste it.
|
||||
So or if you go Y, Y and pp, it'll do yank and paste, like VIM bindings do, oh my goodness.
|
||||
So you just have to try it.
|
||||
If you were into doing things on the console, you really just have to try Ranger.
|
||||
I introduced it to someone at scale two years ago and the look on the guy's face when
|
||||
he started playing with it was just amazing and it was really great to be able to bring
|
||||
that to someone.
|
||||
So that's the bulk of my episode, but I wanted to bring a couple bonus tools that I use
|
||||
to process text.
|
||||
So along with tools like ARC and SED, which I use all the time and things like diff and
|
||||
VIM diff and things like that that I use all the time, three other tools that are very
|
||||
useful for you for messing with semi-structured data and I'm not going to go into what
|
||||
semi-structured data means, but the idea is things that kind of have a structure but are
|
||||
not a relational database type deal.
|
||||
So those three items are XML Starlit, JQ and Q. I think there has been an episode on
|
||||
XML Starlit before, which is a way to parse XML files and that's very useful.
|
||||
So you can do things like select specific tags and look for specific values and all
|
||||
types of fun things with XML.
|
||||
On the limited occasions where I have to deal with XML, it's been very helpful.
|
||||
JQ is sans for JSON query and it is similar, it was influenced by XML Starlit, but it
|
||||
works on JSON files and that is something I do work with pretty often.
|
||||
So it doesn't use expath but it uses a similar type formatting for querying JSON files.
|
||||
So you can look for a specific value, you can look for array types, you can do all types
|
||||
of things and there's a lot of function on that I don't be used in it.
|
||||
So it's a pretty broad tool but it's very powerful and I've only really scratched the surface.
|
||||
But another one that I really like is called Q. And although I do like to use said and
|
||||
awk, when I found Q, it was very difficult, it just makes it so the times I have to use
|
||||
awk especially a lot fewer because Q gives you the ability to write SQL against CSV files
|
||||
or any type of a delimited file.
|
||||
So if you have a file like, I don't know, grocery list.csv or grocery spending.csv, you
|
||||
can do a Q with some options to make sure that you have the ed headers and the separators
|
||||
write and then do inside of double quotes, select some price from, select item type, comma,
|
||||
some price from the name of that CSV file, group by the category and it will parse the
|
||||
CSV file and it's very fast.
|
||||
So that's the other thing I like about working with the plain text is that until you get
|
||||
into the, really above, I've actually used Q just recently to do some aggregation functions
|
||||
on a file that was a megabyte and it was like instantaneous, I've used it on files that
|
||||
were up to 10 megabytes and it still basically has no lag on a regular i5 processor laptop
|
||||
processor. I haven't really tested it on anything really big but for my, the majority
|
||||
of my needs, I will just use Q and if a, if it's any bigger, if the files are any bigger
|
||||
than that, I've been trying to move them out of CSV and trying to move them to HDF5 when
|
||||
possible because binary formats load a lot faster and a lot of the data science programs
|
||||
that I write nowadays. But for the small things, Q does great, like just for doing data
|
||||
quality on a file that someone sends me, I'll do, I'll look for distinct values on, on
|
||||
a common, I suppose to only have a couple of values and, you know, I'll look for missing
|
||||
values, I'll look for the length of different things that'll see if there's bad characters
|
||||
in there. So if I do, it's supposed to only, it's supposed to be common to limit it and
|
||||
there's commas inside of the values, it'll expose all that kind of stuff. So I can't say
|
||||
enough about Q. But that being said, I hope you found this episode interesting and like
|
||||
I said, in the show notes, I'm going to at least put a snippet from my scope. If you
|
||||
want to see the entire thing, just put it in the comments and I encourage you to check
|
||||
out Ranger and all these, if not that at least some of these tools that will turn text,
|
||||
turn different file types into text. So you've been listening to the Hacker Public Radio
|
||||
and as I say, keep hacking.
|
||||
You've been listening to Hacker Public Radio at Hacker Public Radio dot org. We are a community
|
||||
podcast network that releases shows every weekday, Monday through Friday. Today's show, like all
|
||||
our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a
|
||||
podcast, then click on our contributing to find out how easy it really is. Hacker Public Radio
|
||||
was founded by the digital dog pound and the infonomicon computer club and is part of the binary
|
||||
revolution at binrev.com. If you have comments on today's show, please email the host directly,
|
||||
leave a comment on the website or record a follow-up episode yourself. Unless otherwise status,
|
||||
today's show is released on the creative comments, attribution, share a light 3.0 license.
|
||||
Reference in New Issue
Block a user