Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
133
hpr_transcripts/hpr0809.txt
Normal file
133
hpr_transcripts/hpr0809.txt
Normal file
@@ -0,0 +1,133 @@
|
||||
Episode: 809
|
||||
Title: HPR0809: talk geek to me
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0809/hpr0809.mp3
|
||||
Transcribed: 2025-10-08 02:52:50
|
||||
|
||||
---
|
||||
|
||||
.
|
||||
This is DeepGeek. Welcome to Talk Geek to me, a voice of the resistance. You are listening
|
||||
to Talk Geek to me, number two eight segmented downloading. Here are the vials statistics
|
||||
for this program. Your feedback matters to me. Please send your comments to dg at deepgeek.us.
|
||||
So a page for this program is at www.talkgeektme.us. You can subscribe to me on Identica as the
|
||||
user name DeepGeek or you could follow me on Twitter. My username there is dgtgtm as
|
||||
an DeepGeek Talk Geek to me. Introduction to segmented downloading. First I have to say
|
||||
that this is an old fashioned technical Talk Geek to me, not a newscast. So if you are
|
||||
used to my news podcast, you might find it odd to see me revert to my old genre. Consider
|
||||
this an addition to my regular work. I recently started a pilot project with my podcast to facilitate
|
||||
a way of getting large files more efficiently with my podcast, but it would be odd for me
|
||||
not to explain what this technique is. I think if you bear with me, you will at least learn
|
||||
a new way of doing things that might be better even if you are left not thinking that
|
||||
it is particularly appropriate to the podcast community.
|
||||
What I am talking about is segmented downloading. Sigmented downloading is a way of getting your
|
||||
file by getting pieces of your file from different web servers, which mirror each other
|
||||
with identical content. If BitTorrent comes to mind, then you are following me. It is essentially
|
||||
using full fledged web servers as if they were BitTorrent seeds. But in order to understand
|
||||
why you would want to do this, you need to understand some things about old school downloads
|
||||
and some things about BitTorrent before you can understand the why and then the how of
|
||||
segmented downloading. Why not old school downloads? The traditional way of getting a download
|
||||
completed on the internet might not always be the best way, particularly for bigger files.
|
||||
We are not talking about the picture file embedded in a blog post nor the blog post text
|
||||
itself. Those are better served with a traditional download. We are talking about files with a minimum
|
||||
of dozens of megabytes in size, but usually 100 megabytes to CD and DVD ISO file sizes. Think
|
||||
audio over a half hour movie, software CDs and DVDs. That is what we are talking about.
|
||||
Let's suppose something like a music podcast with a 50 megabyte file for the sake of an
|
||||
example. Now, a traditional download is to put the podcast on a well-connected web server and
|
||||
then people who want the file will find it either in a web page or RSS feed and will right-click
|
||||
the link and choose download file in their web browser and the web browser will begin transferring
|
||||
the file onto their computer. Your browser's download manager will connect to the web server
|
||||
and begin copying the file onto your system starting at the beginning and getting peace
|
||||
after peace of the file until it reaches the end. You might ask yourself, what is wrong
|
||||
with this? The answer is that if the file is new and desirable and downloading by many
|
||||
people at once, that the one web server might not be able to keep up with the load. All
|
||||
of a sudden your three megabit per second down DSL connection to the internet is being
|
||||
used at one. Your one minute download might become a three minute download. Now in this
|
||||
case you might not care about the odd two minutes you lose. What if you like your files
|
||||
in the FLAQ format? Now maybe your four minute FLAQ music download becomes a 16 minute
|
||||
download. Your favorite CDIS over Linux distribution, maybe your 20 minute download becomes
|
||||
a hour 15 minute download. Taking a lesson from the BitTorrent crowd, it is interesting
|
||||
to note that the BitTorrent guys have this covered. For extremely popular files, there
|
||||
is nothing like BitTorrent. This is because the file is divided into chunks and everybody
|
||||
who is a downloader is also an uploader. If people share as much as they download, there
|
||||
is no problem. So what is the basics of BitTorrent? First, the file is broken into chunks. Let's
|
||||
say that they are one megabyte chunks. Therefore, the file consists of 50 chunks. If you have
|
||||
hundreds of people sharing the file, you can grab a chunk here and there and your file
|
||||
will load quickly and efficiently. The group of computers sharing the file is called the
|
||||
Swarm. Each computer that is just donating upload bandwidth is called a SEED. As long
|
||||
as people don't close their clients as soon as their download is complete, they keep
|
||||
seeding their file and everything goes smoothly. What can go wrong? Well, a hit and run downloader
|
||||
may not really share as much as he takes. As well as a situation where the file is not
|
||||
popular enough to get a big sustained following, swarms work great with hundreds of people,
|
||||
not with dozens of people. And to the concept of using web services as SEEDs, a web server
|
||||
is connected in a way that is designed to handle many people at once, but not hundreds
|
||||
of thousands of people asking for the same file at once. This idea uses multiple web servers
|
||||
to serve a larger number of media downloaders at once, a number of downloaders that need
|
||||
speed to some extent, and more bandwidth than one web server can handle at peak efficiency,
|
||||
but also handling media objects that are not popular enough to have between work for
|
||||
them efficiently. Our example, we worked for segmented downloading. Let's return to
|
||||
our somewhat popular 50 megabyte music file and it's bigger 200 megabyte flat cousin.
|
||||
If you have a cheap shared hosting available to you on a couple of servers, you can upload
|
||||
the files to several servers at once. They will be identical files hosted on several
|
||||
mirrors. Let's say you have a server space on each coast of the USA as well as a server
|
||||
space in a European country. Now, if you are close to a server, you can still do a traditional
|
||||
download at your nearest server. Nothing in this system stops that. So, if you are on
|
||||
the west coast of the USA, you can still download a copy from the west coast server with
|
||||
your Firefox and still get a somewhat good download. But if you have a really big pipe
|
||||
to the internet, you are not maxing out your connection unless you use segmented downloading.
|
||||
The way you do this is that you would use a segmented download manager like Arya2, Axel,
|
||||
WX Download Fast or a Windows or Mac program that would do the same thing. So you could,
|
||||
to give an example, open up a text window and type Arya2 space. Then you would get one
|
||||
of the URLs from one of the mirrors, copy and paste that, a space, and repeat until you
|
||||
had the word Arya2, which is the command, and a space separate list of the different locations
|
||||
of the same file. In actuality, the command Axel would be exactly the same, but I am
|
||||
most familiar with Arya2, so I will stick to what I know.
|
||||
Now those of you who are text savvy know about download managers. They follow the Unix
|
||||
philosophy of having one job, which in this case is downloading, and they do it very well.
|
||||
Most people get these programs when they grow concerned with the idea of a big download
|
||||
being interrupted, because they are able to talk to the web server and restart a download
|
||||
in the middle. Thus, in a traditional download, if the download were interrupted halfway
|
||||
through, a download manager would later reconnect to the server and say, start in the middle,
|
||||
I got the first half already. But a segmented downloader maxes out the situation. In the
|
||||
Arya2 case, it first allocates a disk space needed for the whole file, you know, to get
|
||||
that pesky disk space allocation thing out of the way. Then Arya2 looks at the 50 megabyte
|
||||
file and thinks, okay, this is really 51 megabyte downloads. Then it connects to the first
|
||||
web server, as for the first megabyte, simultaneously it connects to the second web server, and
|
||||
as for the second megabyte of the file, simultaneously it connects to the third web server, and
|
||||
as for the third megabyte of the file, so far it has acted exactly like its simple cousin
|
||||
Axel. Arya2 is more sophisticated than Axel. Axel will keep round robinning the file until
|
||||
it's done. Arya2 is more obsessive about its connection to the file. Since Arya2 is also
|
||||
a bit torrent client, it uses its bit torrent smarts to max things out. While these three
|
||||
downloads are going on, it's rating the service performance from its perspective, then
|
||||
it will use the less load service more automatically. This behavior will max out your connection
|
||||
to the internet. This situation gets even better if you have a really fat connection, like
|
||||
a fiber optic files connection, or a corporate Office T3 connection to the internet. In
|
||||
that case, the web servers in question may not be able even under the best of conditions
|
||||
to max out that connection. In this case, the best outputs of the three servers are added
|
||||
to each other. To give you an idea when I set up the mirrors for my pilot project of
|
||||
making this available for my news podcast, I draw on two web servers for my last web server.
|
||||
Just the other night, each of the first web servers I set up were functioning at about
|
||||
three megabits per second up there in the internet. When I went to set up the third mirror image
|
||||
where I could use Arya2 on the server, up there in the cloud, I achieved a whopping six
|
||||
megabits per second transfer. That flat file, it was moved in seconds, a speed not available
|
||||
to traditional tools, such as WGET, not that I'm knocking WGET, just they can't do
|
||||
it as quickly. So I end this explanation of segmented downloading with the invitation
|
||||
to you to try it out on my news podcast to see if you like it, and if you do, I hope
|
||||
to hear from you. Thank you for listening to this episode of Talk
|
||||
Geek To Me. Here are the vials statistics for this program. Your feedback matters to
|
||||
me. Please send your comments to DG at deepgeek.us. The web page for this program is at www.TalkGeekToMe.us.
|
||||
You can subscribe to me on Identica as the username DeepGeek or you could follow me on Twitter.
|
||||
My username there is DGTGM as in DeepGeek Talk Geek To Me.
|
||||
This episode of Talk Geek To Me is licensed under the Creative Commons attribution share
|
||||
like 3.0 on poor license. This license allows commercial reuse of the work as well as allowing
|
||||
you to modify the work, so long as you share a like the same rights you have received under this
|
||||
license. Thank you for listening to this episode of Talk Geek To Me.
|
||||
You have been listening to Hacker Public Radio or Hacker Public Radio does not.
|
||||
We are a community podcast network that releases shows every weekday on day through Friday.
|
||||
Today's show, like all our shows, was contributed by a HPR listener by yourself.
|
||||
If you ever consider recording a podcast, then visit our website to find out how easy it
|
||||
really is. Hacker Public Radio was founded by the digital.pound and new
|
||||
Phenomenal Computer Club. HPR is funded by the binary revolution at binref.com.
|
||||
All binref projects are proudly sponsored by Lina Pages.
|
||||
From shared hosting to custom private clouds, go to LinaPages.com for all your hosting needs.
|
||||
Unless otherwise stasis, today's show is released under a Creative Commons
|
||||
attribution share a like 3.0 on license.
|
||||
Reference in New Issue
Block a user