hpr-knowledge-base/hpr_transcripts/hpr0809.txt

Episode: 809
Title: HPR0809: talk geek to me
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0809/hpr0809.mp3
Transcribed: 2025-10-08 02:52:50

---

.
This is DeepGeek. Welcome to Talk Geek to me, a voice of the resistance. You are listening
to Talk Geek to me, number two eight segmented downloading. Here are the vials statistics
for this program. Your feedback matters to me. Please send your comments to dg at deepgeek.us.
So a page for this program is at www.talkgeektme.us. You can subscribe to me on Identica as the
user name DeepGeek or you could follow me on Twitter. My username there is dgtgtm as
an DeepGeek Talk Geek to me. Introduction to segmented downloading. First I have to say
that this is an old fashioned technical Talk Geek to me, not a newscast. So if you are
used to my news podcast, you might find it odd to see me revert to my old genre. Consider
this an addition to my regular work. I recently started a pilot project with my podcast to facilitate
a way of getting large files more efficiently with my podcast, but it would be odd for me
not to explain what this technique is. I think if you bear with me, you will at least learn
a new way of doing things that might be better even if you are left not thinking that
it is particularly appropriate to the podcast community.
What I am talking about is segmented downloading. Sigmented downloading is a way of getting your
file by getting pieces of your file from different web servers, which mirror each other
with identical content. If BitTorrent comes to mind, then you are following me. It is essentially
using full fledged web servers as if they were BitTorrent seeds. But in order to understand
why you would want to do this, you need to understand some things about old school downloads
and some things about BitTorrent before you can understand the why and then the how of
segmented downloading. Why not old school downloads? The traditional way of getting a download
completed on the internet might not always be the best way, particularly for bigger files.
We are not talking about the picture file embedded in a blog post nor the blog post text
itself. Those are better served with a traditional download. We are talking about files with a minimum
of dozens of megabytes in size, but usually 100 megabytes to CD and DVD ISO file sizes. Think
audio over a half hour movie, software CDs and DVDs. That is what we are talking about.
Let's suppose something like a music podcast with a 50 megabyte file for the sake of an
example. Now, a traditional download is to put the podcast on a well-connected web server and
then people who want the file will find it either in a web page or RSS feed and will right-click
the link and choose download file in their web browser and the web browser will begin transferring
the file onto their computer. Your browser's download manager will connect to the web server
and begin copying the file onto your system starting at the beginning and getting peace
after peace of the file until it reaches the end. You might ask yourself, what is wrong
with this? The answer is that if the file is new and desirable and downloading by many
people at once, that the one web server might not be able to keep up with the load. All
of a sudden your three megabit per second down DSL connection to the internet is being
used at one. Your one minute download might become a three minute download. Now in this
case you might not care about the odd two minutes you lose. What if you like your files
in the FLAQ format? Now maybe your four minute FLAQ music download becomes a 16 minute
download. Your favorite CDIS over Linux distribution, maybe your 20 minute download becomes
a hour 15 minute download. Taking a lesson from the BitTorrent crowd, it is interesting
to note that the BitTorrent guys have this covered. For extremely popular files, there
is nothing like BitTorrent. This is because the file is divided into chunks and everybody
who is a downloader is also an uploader. If people share as much as they download, there
is no problem. So what is the basics of BitTorrent? First, the file is broken into chunks. Let's
say that they are one megabyte chunks. Therefore, the file consists of 50 chunks. If you have
hundreds of people sharing the file, you can grab a chunk here and there and your file
will load quickly and efficiently. The group of computers sharing the file is called the
Swarm. Each computer that is just donating upload bandwidth is called a SEED. As long
as people don't close their clients as soon as their download is complete, they keep
seeding their file and everything goes smoothly. What can go wrong? Well, a hit and run downloader
may not really share as much as he takes. As well as a situation where the file is not
popular enough to get a big sustained following, swarms work great with hundreds of people,
not with dozens of people. And to the concept of using web services as SEEDs, a web server
is connected in a way that is designed to handle many people at once, but not hundreds
of thousands of people asking for the same file at once. This idea uses multiple web servers
to serve a larger number of media downloaders at once, a number of downloaders that need
speed to some extent, and more bandwidth than one web server can handle at peak efficiency,
but also handling media objects that are not popular enough to have between work for
them efficiently. Our example, we worked for segmented downloading. Let's return to
our somewhat popular 50 megabyte music file and it's bigger 200 megabyte flat cousin.
If you have a cheap shared hosting available to you on a couple of servers, you can upload
the files to several servers at once. They will be identical files hosted on several
mirrors. Let's say you have a server space on each coast of the USA as well as a server
space in a European country. Now, if you are close to a server, you can still do a traditional
download at your nearest server. Nothing in this system stops that. So, if you are on
the west coast of the USA, you can still download a copy from the west coast server with
your Firefox and still get a somewhat good download. But if you have a really big pipe
to the internet, you are not maxing out your connection unless you use segmented downloading.
The way you do this is that you would use a segmented download manager like Arya2, Axel,
WX Download Fast or a Windows or Mac program that would do the same thing. So you could,
to give an example, open up a text window and type Arya2 space. Then you would get one
of the URLs from one of the mirrors, copy and paste that, a space, and repeat until you
had the word Arya2, which is the command, and a space separate list of the different locations
of the same file. In actuality, the command Axel would be exactly the same, but I am
most familiar with Arya2, so I will stick to what I know.
Now those of you who are text savvy know about download managers. They follow the Unix
philosophy of having one job, which in this case is downloading, and they do it very well.
Most people get these programs when they grow concerned with the idea of a big download
being interrupted, because they are able to talk to the web server and restart a download
in the middle. Thus, in a traditional download, if the download were interrupted halfway
through, a download manager would later reconnect to the server and say, start in the middle,
I got the first half already. But a segmented downloader maxes out the situation. In the
Arya2 case, it first allocates a disk space needed for the whole file, you know, to get
that pesky disk space allocation thing out of the way. Then Arya2 looks at the 50 megabyte
file and thinks, okay, this is really 51 megabyte downloads. Then it connects to the first
web server, as for the first megabyte, simultaneously it connects to the second web server, and
as for the second megabyte of the file, simultaneously it connects to the third web server, and
as for the third megabyte of the file, so far it has acted exactly like its simple cousin
Axel. Arya2 is more sophisticated than Axel. Axel will keep round robinning the file until
it's done. Arya2 is more obsessive about its connection to the file. Since Arya2 is also
a bit torrent client, it uses its bit torrent smarts to max things out. While these three
downloads are going on, it's rating the service performance from its perspective, then
it will use the less load service more automatically. This behavior will max out your connection
to the internet. This situation gets even better if you have a really fat connection, like
a fiber optic files connection, or a corporate Office T3 connection to the internet. In
that case, the web servers in question may not be able even under the best of conditions
to max out that connection. In this case, the best outputs of the three servers are added
to each other. To give you an idea when I set up the mirrors for my pilot project of
making this available for my news podcast, I draw on two web servers for my last web server.
Just the other night, each of the first web servers I set up were functioning at about
three megabits per second up there in the internet. When I went to set up the third mirror image
where I could use Arya2 on the server, up there in the cloud, I achieved a whopping six
megabits per second transfer. That flat file, it was moved in seconds, a speed not available
to traditional tools, such as WGET, not that I'm knocking WGET, just they can't do
it as quickly. So I end this explanation of segmented downloading with the invitation
to you to try it out on my news podcast to see if you like it, and if you do, I hope
to hear from you. Thank you for listening to this episode of Talk
Geek To Me. Here are the vials statistics for this program. Your feedback matters to
me. Please send your comments to DG at deepgeek.us. The web page for this program is at www.TalkGeekToMe.us.
You can subscribe to me on Identica as the username DeepGeek or you could follow me on Twitter.
My username there is DGTGM as in DeepGeek Talk Geek To Me.
This episode of Talk Geek To Me is licensed under the Creative Commons attribution share
like 3.0 on poor license. This license allows commercial reuse of the work as well as allowing
you to modify the work, so long as you share a like the same rights you have received under this
license. Thank you for listening to this episode of Talk Geek To Me.
You have been listening to Hacker Public Radio or Hacker Public Radio does not.
We are a community podcast network that releases shows every weekday on day through Friday.
Today's show, like all our shows, was contributed by a HPR listener by yourself.
If you ever consider recording a podcast, then visit our website to find out how easy it
really is. Hacker Public Radio was founded by the digital.pound and new
Phenomenal Computer Club. HPR is funded by the binary revolution at binref.com.
All binref projects are proudly sponsored by Lina Pages.
From shared hosting to custom private clouds, go to LinaPages.com for all your hosting needs.
Unless otherwise stasis, today's show is released under a Creative Commons
attribution share a like 3.0 on license.