Episode: 809 Title: HPR0809: talk geek to me Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0809/hpr0809.mp3 Transcribed: 2025-10-08 02:52:50 --- . This is DeepGeek. Welcome to Talk Geek to me, a voice of the resistance. You are listening to Talk Geek to me, number two eight segmented downloading. Here are the vials statistics for this program. Your feedback matters to me. Please send your comments to dg at deepgeek.us. So a page for this program is at www.talkgeektme.us. You can subscribe to me on Identica as the user name DeepGeek or you could follow me on Twitter. My username there is dgtgtm as an DeepGeek Talk Geek to me. Introduction to segmented downloading. First I have to say that this is an old fashioned technical Talk Geek to me, not a newscast. So if you are used to my news podcast, you might find it odd to see me revert to my old genre. Consider this an addition to my regular work. I recently started a pilot project with my podcast to facilitate a way of getting large files more efficiently with my podcast, but it would be odd for me not to explain what this technique is. I think if you bear with me, you will at least learn a new way of doing things that might be better even if you are left not thinking that it is particularly appropriate to the podcast community. What I am talking about is segmented downloading. Sigmented downloading is a way of getting your file by getting pieces of your file from different web servers, which mirror each other with identical content. If BitTorrent comes to mind, then you are following me. It is essentially using full fledged web servers as if they were BitTorrent seeds. But in order to understand why you would want to do this, you need to understand some things about old school downloads and some things about BitTorrent before you can understand the why and then the how of segmented downloading. Why not old school downloads? The traditional way of getting a download completed on the internet might not always be the best way, particularly for bigger files. We are not talking about the picture file embedded in a blog post nor the blog post text itself. Those are better served with a traditional download. We are talking about files with a minimum of dozens of megabytes in size, but usually 100 megabytes to CD and DVD ISO file sizes. Think audio over a half hour movie, software CDs and DVDs. That is what we are talking about. Let's suppose something like a music podcast with a 50 megabyte file for the sake of an example. Now, a traditional download is to put the podcast on a well-connected web server and then people who want the file will find it either in a web page or RSS feed and will right-click the link and choose download file in their web browser and the web browser will begin transferring the file onto their computer. Your browser's download manager will connect to the web server and begin copying the file onto your system starting at the beginning and getting peace after peace of the file until it reaches the end. You might ask yourself, what is wrong with this? The answer is that if the file is new and desirable and downloading by many people at once, that the one web server might not be able to keep up with the load. All of a sudden your three megabit per second down DSL connection to the internet is being used at one. Your one minute download might become a three minute download. Now in this case you might not care about the odd two minutes you lose. What if you like your files in the FLAQ format? Now maybe your four minute FLAQ music download becomes a 16 minute download. Your favorite CDIS over Linux distribution, maybe your 20 minute download becomes a hour 15 minute download. Taking a lesson from the BitTorrent crowd, it is interesting to note that the BitTorrent guys have this covered. For extremely popular files, there is nothing like BitTorrent. This is because the file is divided into chunks and everybody who is a downloader is also an uploader. If people share as much as they download, there is no problem. So what is the basics of BitTorrent? First, the file is broken into chunks. Let's say that they are one megabyte chunks. Therefore, the file consists of 50 chunks. If you have hundreds of people sharing the file, you can grab a chunk here and there and your file will load quickly and efficiently. The group of computers sharing the file is called the Swarm. Each computer that is just donating upload bandwidth is called a SEED. As long as people don't close their clients as soon as their download is complete, they keep seeding their file and everything goes smoothly. What can go wrong? Well, a hit and run downloader may not really share as much as he takes. As well as a situation where the file is not popular enough to get a big sustained following, swarms work great with hundreds of people, not with dozens of people. And to the concept of using web services as SEEDs, a web server is connected in a way that is designed to handle many people at once, but not hundreds of thousands of people asking for the same file at once. This idea uses multiple web servers to serve a larger number of media downloaders at once, a number of downloaders that need speed to some extent, and more bandwidth than one web server can handle at peak efficiency, but also handling media objects that are not popular enough to have between work for them efficiently. Our example, we worked for segmented downloading. Let's return to our somewhat popular 50 megabyte music file and it's bigger 200 megabyte flat cousin. If you have a cheap shared hosting available to you on a couple of servers, you can upload the files to several servers at once. They will be identical files hosted on several mirrors. Let's say you have a server space on each coast of the USA as well as a server space in a European country. Now, if you are close to a server, you can still do a traditional download at your nearest server. Nothing in this system stops that. So, if you are on the west coast of the USA, you can still download a copy from the west coast server with your Firefox and still get a somewhat good download. But if you have a really big pipe to the internet, you are not maxing out your connection unless you use segmented downloading. The way you do this is that you would use a segmented download manager like Arya2, Axel, WX Download Fast or a Windows or Mac program that would do the same thing. So you could, to give an example, open up a text window and type Arya2 space. Then you would get one of the URLs from one of the mirrors, copy and paste that, a space, and repeat until you had the word Arya2, which is the command, and a space separate list of the different locations of the same file. In actuality, the command Axel would be exactly the same, but I am most familiar with Arya2, so I will stick to what I know. Now those of you who are text savvy know about download managers. They follow the Unix philosophy of having one job, which in this case is downloading, and they do it very well. Most people get these programs when they grow concerned with the idea of a big download being interrupted, because they are able to talk to the web server and restart a download in the middle. Thus, in a traditional download, if the download were interrupted halfway through, a download manager would later reconnect to the server and say, start in the middle, I got the first half already. But a segmented downloader maxes out the situation. In the Arya2 case, it first allocates a disk space needed for the whole file, you know, to get that pesky disk space allocation thing out of the way. Then Arya2 looks at the 50 megabyte file and thinks, okay, this is really 51 megabyte downloads. Then it connects to the first web server, as for the first megabyte, simultaneously it connects to the second web server, and as for the second megabyte of the file, simultaneously it connects to the third web server, and as for the third megabyte of the file, so far it has acted exactly like its simple cousin Axel. Arya2 is more sophisticated than Axel. Axel will keep round robinning the file until it's done. Arya2 is more obsessive about its connection to the file. Since Arya2 is also a bit torrent client, it uses its bit torrent smarts to max things out. While these three downloads are going on, it's rating the service performance from its perspective, then it will use the less load service more automatically. This behavior will max out your connection to the internet. This situation gets even better if you have a really fat connection, like a fiber optic files connection, or a corporate Office T3 connection to the internet. In that case, the web servers in question may not be able even under the best of conditions to max out that connection. In this case, the best outputs of the three servers are added to each other. To give you an idea when I set up the mirrors for my pilot project of making this available for my news podcast, I draw on two web servers for my last web server. Just the other night, each of the first web servers I set up were functioning at about three megabits per second up there in the internet. When I went to set up the third mirror image where I could use Arya2 on the server, up there in the cloud, I achieved a whopping six megabits per second transfer. That flat file, it was moved in seconds, a speed not available to traditional tools, such as WGET, not that I'm knocking WGET, just they can't do it as quickly. So I end this explanation of segmented downloading with the invitation to you to try it out on my news podcast to see if you like it, and if you do, I hope to hear from you. Thank you for listening to this episode of Talk Geek To Me. Here are the vials statistics for this program. Your feedback matters to me. Please send your comments to DG at deepgeek.us. The web page for this program is at www.TalkGeekToMe.us. You can subscribe to me on Identica as the username DeepGeek or you could follow me on Twitter. My username there is DGTGM as in DeepGeek Talk Geek To Me. This episode of Talk Geek To Me is licensed under the Creative Commons attribution share like 3.0 on poor license. This license allows commercial reuse of the work as well as allowing you to modify the work, so long as you share a like the same rights you have received under this license. Thank you for listening to this episode of Talk Geek To Me. You have been listening to Hacker Public Radio or Hacker Public Radio does not. We are a community podcast network that releases shows every weekday on day through Friday. Today's show, like all our shows, was contributed by a HPR listener by yourself. If you ever consider recording a podcast, then visit our website to find out how easy it really is. Hacker Public Radio was founded by the digital.pound and new Phenomenal Computer Club. HPR is funded by the binary revolution at binref.com. All binref projects are proudly sponsored by Lina Pages. From shared hosting to custom private clouds, go to LinaPages.com for all your hosting needs. Unless otherwise stasis, today's show is released under a Creative Commons attribution share a like 3.0 on license.