134 lines
11 KiB
Plaintext
134 lines
11 KiB
Plaintext
|
|
Episode: 809
|
||
|
|
Title: HPR0809: talk geek to me
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0809/hpr0809.mp3
|
||
|
|
Transcribed: 2025-10-08 02:52:50
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
.
|
||
|
|
This is DeepGeek. Welcome to Talk Geek to me, a voice of the resistance. You are listening
|
||
|
|
to Talk Geek to me, number two eight segmented downloading. Here are the vials statistics
|
||
|
|
for this program. Your feedback matters to me. Please send your comments to dg at deepgeek.us.
|
||
|
|
So a page for this program is at www.talkgeektme.us. You can subscribe to me on Identica as the
|
||
|
|
user name DeepGeek or you could follow me on Twitter. My username there is dgtgtm as
|
||
|
|
an DeepGeek Talk Geek to me. Introduction to segmented downloading. First I have to say
|
||
|
|
that this is an old fashioned technical Talk Geek to me, not a newscast. So if you are
|
||
|
|
used to my news podcast, you might find it odd to see me revert to my old genre. Consider
|
||
|
|
this an addition to my regular work. I recently started a pilot project with my podcast to facilitate
|
||
|
|
a way of getting large files more efficiently with my podcast, but it would be odd for me
|
||
|
|
not to explain what this technique is. I think if you bear with me, you will at least learn
|
||
|
|
a new way of doing things that might be better even if you are left not thinking that
|
||
|
|
it is particularly appropriate to the podcast community.
|
||
|
|
What I am talking about is segmented downloading. Sigmented downloading is a way of getting your
|
||
|
|
file by getting pieces of your file from different web servers, which mirror each other
|
||
|
|
with identical content. If BitTorrent comes to mind, then you are following me. It is essentially
|
||
|
|
using full fledged web servers as if they were BitTorrent seeds. But in order to understand
|
||
|
|
why you would want to do this, you need to understand some things about old school downloads
|
||
|
|
and some things about BitTorrent before you can understand the why and then the how of
|
||
|
|
segmented downloading. Why not old school downloads? The traditional way of getting a download
|
||
|
|
completed on the internet might not always be the best way, particularly for bigger files.
|
||
|
|
We are not talking about the picture file embedded in a blog post nor the blog post text
|
||
|
|
itself. Those are better served with a traditional download. We are talking about files with a minimum
|
||
|
|
of dozens of megabytes in size, but usually 100 megabytes to CD and DVD ISO file sizes. Think
|
||
|
|
audio over a half hour movie, software CDs and DVDs. That is what we are talking about.
|
||
|
|
Let's suppose something like a music podcast with a 50 megabyte file for the sake of an
|
||
|
|
example. Now, a traditional download is to put the podcast on a well-connected web server and
|
||
|
|
then people who want the file will find it either in a web page or RSS feed and will right-click
|
||
|
|
the link and choose download file in their web browser and the web browser will begin transferring
|
||
|
|
the file onto their computer. Your browser's download manager will connect to the web server
|
||
|
|
and begin copying the file onto your system starting at the beginning and getting peace
|
||
|
|
after peace of the file until it reaches the end. You might ask yourself, what is wrong
|
||
|
|
with this? The answer is that if the file is new and desirable and downloading by many
|
||
|
|
people at once, that the one web server might not be able to keep up with the load. All
|
||
|
|
of a sudden your three megabit per second down DSL connection to the internet is being
|
||
|
|
used at one. Your one minute download might become a three minute download. Now in this
|
||
|
|
case you might not care about the odd two minutes you lose. What if you like your files
|
||
|
|
in the FLAQ format? Now maybe your four minute FLAQ music download becomes a 16 minute
|
||
|
|
download. Your favorite CDIS over Linux distribution, maybe your 20 minute download becomes
|
||
|
|
a hour 15 minute download. Taking a lesson from the BitTorrent crowd, it is interesting
|
||
|
|
to note that the BitTorrent guys have this covered. For extremely popular files, there
|
||
|
|
is nothing like BitTorrent. This is because the file is divided into chunks and everybody
|
||
|
|
who is a downloader is also an uploader. If people share as much as they download, there
|
||
|
|
is no problem. So what is the basics of BitTorrent? First, the file is broken into chunks. Let's
|
||
|
|
say that they are one megabyte chunks. Therefore, the file consists of 50 chunks. If you have
|
||
|
|
hundreds of people sharing the file, you can grab a chunk here and there and your file
|
||
|
|
will load quickly and efficiently. The group of computers sharing the file is called the
|
||
|
|
Swarm. Each computer that is just donating upload bandwidth is called a SEED. As long
|
||
|
|
as people don't close their clients as soon as their download is complete, they keep
|
||
|
|
seeding their file and everything goes smoothly. What can go wrong? Well, a hit and run downloader
|
||
|
|
may not really share as much as he takes. As well as a situation where the file is not
|
||
|
|
popular enough to get a big sustained following, swarms work great with hundreds of people,
|
||
|
|
not with dozens of people. And to the concept of using web services as SEEDs, a web server
|
||
|
|
is connected in a way that is designed to handle many people at once, but not hundreds
|
||
|
|
of thousands of people asking for the same file at once. This idea uses multiple web servers
|
||
|
|
to serve a larger number of media downloaders at once, a number of downloaders that need
|
||
|
|
speed to some extent, and more bandwidth than one web server can handle at peak efficiency,
|
||
|
|
but also handling media objects that are not popular enough to have between work for
|
||
|
|
them efficiently. Our example, we worked for segmented downloading. Let's return to
|
||
|
|
our somewhat popular 50 megabyte music file and it's bigger 200 megabyte flat cousin.
|
||
|
|
If you have a cheap shared hosting available to you on a couple of servers, you can upload
|
||
|
|
the files to several servers at once. They will be identical files hosted on several
|
||
|
|
mirrors. Let's say you have a server space on each coast of the USA as well as a server
|
||
|
|
space in a European country. Now, if you are close to a server, you can still do a traditional
|
||
|
|
download at your nearest server. Nothing in this system stops that. So, if you are on
|
||
|
|
the west coast of the USA, you can still download a copy from the west coast server with
|
||
|
|
your Firefox and still get a somewhat good download. But if you have a really big pipe
|
||
|
|
to the internet, you are not maxing out your connection unless you use segmented downloading.
|
||
|
|
The way you do this is that you would use a segmented download manager like Arya2, Axel,
|
||
|
|
WX Download Fast or a Windows or Mac program that would do the same thing. So you could,
|
||
|
|
to give an example, open up a text window and type Arya2 space. Then you would get one
|
||
|
|
of the URLs from one of the mirrors, copy and paste that, a space, and repeat until you
|
||
|
|
had the word Arya2, which is the command, and a space separate list of the different locations
|
||
|
|
of the same file. In actuality, the command Axel would be exactly the same, but I am
|
||
|
|
most familiar with Arya2, so I will stick to what I know.
|
||
|
|
Now those of you who are text savvy know about download managers. They follow the Unix
|
||
|
|
philosophy of having one job, which in this case is downloading, and they do it very well.
|
||
|
|
Most people get these programs when they grow concerned with the idea of a big download
|
||
|
|
being interrupted, because they are able to talk to the web server and restart a download
|
||
|
|
in the middle. Thus, in a traditional download, if the download were interrupted halfway
|
||
|
|
through, a download manager would later reconnect to the server and say, start in the middle,
|
||
|
|
I got the first half already. But a segmented downloader maxes out the situation. In the
|
||
|
|
Arya2 case, it first allocates a disk space needed for the whole file, you know, to get
|
||
|
|
that pesky disk space allocation thing out of the way. Then Arya2 looks at the 50 megabyte
|
||
|
|
file and thinks, okay, this is really 51 megabyte downloads. Then it connects to the first
|
||
|
|
web server, as for the first megabyte, simultaneously it connects to the second web server, and
|
||
|
|
as for the second megabyte of the file, simultaneously it connects to the third web server, and
|
||
|
|
as for the third megabyte of the file, so far it has acted exactly like its simple cousin
|
||
|
|
Axel. Arya2 is more sophisticated than Axel. Axel will keep round robinning the file until
|
||
|
|
it's done. Arya2 is more obsessive about its connection to the file. Since Arya2 is also
|
||
|
|
a bit torrent client, it uses its bit torrent smarts to max things out. While these three
|
||
|
|
downloads are going on, it's rating the service performance from its perspective, then
|
||
|
|
it will use the less load service more automatically. This behavior will max out your connection
|
||
|
|
to the internet. This situation gets even better if you have a really fat connection, like
|
||
|
|
a fiber optic files connection, or a corporate Office T3 connection to the internet. In
|
||
|
|
that case, the web servers in question may not be able even under the best of conditions
|
||
|
|
to max out that connection. In this case, the best outputs of the three servers are added
|
||
|
|
to each other. To give you an idea when I set up the mirrors for my pilot project of
|
||
|
|
making this available for my news podcast, I draw on two web servers for my last web server.
|
||
|
|
Just the other night, each of the first web servers I set up were functioning at about
|
||
|
|
three megabits per second up there in the internet. When I went to set up the third mirror image
|
||
|
|
where I could use Arya2 on the server, up there in the cloud, I achieved a whopping six
|
||
|
|
megabits per second transfer. That flat file, it was moved in seconds, a speed not available
|
||
|
|
to traditional tools, such as WGET, not that I'm knocking WGET, just they can't do
|
||
|
|
it as quickly. So I end this explanation of segmented downloading with the invitation
|
||
|
|
to you to try it out on my news podcast to see if you like it, and if you do, I hope
|
||
|
|
to hear from you. Thank you for listening to this episode of Talk
|
||
|
|
Geek To Me. Here are the vials statistics for this program. Your feedback matters to
|
||
|
|
me. Please send your comments to DG at deepgeek.us. The web page for this program is at www.TalkGeekToMe.us.
|
||
|
|
You can subscribe to me on Identica as the username DeepGeek or you could follow me on Twitter.
|
||
|
|
My username there is DGTGM as in DeepGeek Talk Geek To Me.
|
||
|
|
This episode of Talk Geek To Me is licensed under the Creative Commons attribution share
|
||
|
|
like 3.0 on poor license. This license allows commercial reuse of the work as well as allowing
|
||
|
|
you to modify the work, so long as you share a like the same rights you have received under this
|
||
|
|
license. Thank you for listening to this episode of Talk Geek To Me.
|
||
|
|
You have been listening to Hacker Public Radio or Hacker Public Radio does not.
|
||
|
|
We are a community podcast network that releases shows every weekday on day through Friday.
|
||
|
|
Today's show, like all our shows, was contributed by a HPR listener by yourself.
|
||
|
|
If you ever consider recording a podcast, then visit our website to find out how easy it
|
||
|
|
really is. Hacker Public Radio was founded by the digital.pound and new
|
||
|
|
Phenomenal Computer Club. HPR is funded by the binary revolution at binref.com.
|
||
|
|
All binref projects are proudly sponsored by Lina Pages.
|
||
|
|
From shared hosting to custom private clouds, go to LinaPages.com for all your hosting needs.
|
||
|
|
Unless otherwise stasis, today's show is released under a Creative Commons
|
||
|
|
attribution share a like 3.0 on license.
|