Episode: 3267 Title: HPR3267: Ripping Media 2021 Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3267/hpr3267.mp3 Transcribed: 2025-10-24 19:55:32 --- This is Haka Public Radio Episode 3267.442, May 9th of February 2021, today's show is entitled Ripping Media 2021. It is hosted operator and is about 17 minutes long and can remain an explicit flag. The summary is, I go over Ripping Webpage Media. This episode of HPR is brought to you by archive.org. Support universal access to all knowledge by heading over to archive.org forward slash donate. Hello and welcome to another episode of Hacker Public Radio with your host operator. Today is going to be about Ripping streaming media or content from the internet from websites. I did some episode 126 back in 2008 because I did something. So I've done some here and there maybe quick tips or something. But this was going to be kind of an all inclusive all in one kind of Ripping Media whatever. So at the end of the day, right, we're talking about websites that have crappy players or even DRM that you're trying to get around. But at the end of the day, we all know that everything comes to you in a digital form anyway. So like it's a bit it's a bit chinky than to be like, oh, you can't rip this content even though it's all electronic. DRM is right the attempt to kind of help keep, you know, from people from copying, copyrighted content or whatever. So anyways, so this is going to be about kind of pulling mainly videos from the internet and different methods in which to do that. So first things first, it's the easiest by far is YouTube BL. It's YouTube-DL, their GitHub got pulled down by like MPa or something. And I think it's back up now, but the idea there is that YouTube BL is not just for YouTube, it will download content from other websites. It kind of has like a generic scraper that will scrape the website and try to pull anything in. So YouTube BL is good. I don't know of any plugins for browsers for YouTube BL. So if you guys know of any plugins that I can have that will call YouTube DL. But what I will do in the show notes is I will put in my link to the YouTube downloader script that I use it will automatically pull all the stuff you need to get going. So let me make sure I've got notes. I'm going to run you remind myself to go back to that. This original HPR notes is from June of 2019. So it's been a while since I went over this. So I have a bunch of bill notes and a bunch of things. The back story here is I started out by wanting to download a Facebook live stream and ended up kind of missing it all together. But the idea is that some of these videos are available offline, but if they're live, things can be a little bit tricky to figure out how to pull that stuff down. So the live video looks like something about MPG files and faces. It looks like you need FFMP compiled with Enable lib XML2. This is where I found out about a wonderful script called Media Auto Build Suite. Now I'll put all this in the show notes. But JB-ALVA-R-A-D-O has a GitHub. It's called Media-AutoBuild underscore suite, no spaces. And this will basically in Windows is a, will help you compile FFMPG for a variety of different extra things that you can compile FFMPG with. Now it's a little bit of a learning curve if you haven't compiled anything before. So a lot of that has to do with some of the notes that I put in here. I have a really long one-liner to build. Excuse me, to build this out. Download file. Downloads the master extracts it and then runs audio builds suite. Took over eight hours to compile on an i7. Disabled some of the features that had warnings that he attacked in the GitHub as useless. Let's see. So I'll talk about that. You can leave SIG-WIN. These guys prefer mini-min-min-min-W64. I haven't really used it that often. I've used it for a few things. I usually have a SIG-WIN guy. So SIG-WIN and min-min-X. Well, I apologize when I do some notes here. It's just called min-GW. I don't really know how to pronounce it. But min-GW and then W64 is like the 64-bit version. What I'll say on the off-hand is there's actually a nice portable GitHub repo for a portable SIG-WIN that's on GitHub, which I can see if I can find it. Sort of an unrelated slash-related note. Let me go back to my repositories. SIG-WIN-portable-installer. This is really cool. It installs SIG-WIN and a bunch of other cool stuff. It's on GitHub by V-E-G-A-R-D-I-T. So if you're looking for SIG-WIN and you want it portable, this guy has it all set up magically for you. It's got tabbed. It's got like a little X windows type of the terminal thing. Anyways, I digress. I'm getting back on topic here. It looks like I had some references here around what flags to use. When you're compiling this massive FFNPEG. And then it looks like I renamed some stuff to make some other things work. It's a bit chinky. I mean, the basic version of FFNPEG you download, the executable from the Internet, has most of what you would need. But in this instance, I had some things I need to add the XML2 that I needed to add in there. And this suite kind of built everything for me. That's pretty much that. So I've covered YouTube DL. I've covered how I'm going to share my YouTube downloader script. Let's go over the kind of plugins. I use bulk media downloader for Chrome, which pairs nicely in this, I think, developed by the same company as Turbo Download Manager 3rd Edition. Now, with these two combined, there's also like a helper script that I don't know if you still need anymore. But Turbo Download Manager is a multi-threaded downloader. In most of the way that my YouTube downloader script, YouTube DL downloader script will work, is that it pulls the file in chunks. What my script does is it pulls down ARIA to C2 or ARIA2, which is a multi-threaded command line downloader. So you've got WGet, for example, or the PowerShell command to download a file, whatever. This downloads it in chunks or threads. And you can send it a list of files. And it will download those files and say, okay, I want a maximum of 500 connections, but only four connections per website. So you'll have 500 connections and four connections times 500. And it will pull that download in theory, as long as the server lets you download it. So with the power of bulk media downloader, YouTube DL, and Turbo Download Manager, you can kind of pull everything you need to pull outside of that. So you can start getting into hairy situations where you just give up and do screen scraping. Or you can Google around, or whatever the website is, or if you can figure out what the player that they use is, you can look for other things. Like if they have DRM in there, there's not really a whole lot of things you can do if you're trying to rip NBC or something. They have proprietary codecs where they basically just take a regular codec and put a bunch of garbage in the front of the, in the front of the headers, the headers and the meta tag meta data to mangle it up so that all your standard scrapers just look past it. And so there's encrypted streams and things like that. But at the end of the day, it has to be decrypted and it has to end up rendering on your browser. Now from what I understand, Chromium browser has DRM built into it. And for example, back in the day, when you had Linux, I'm doing you some backstory here. You had Linux, and you wanted to play a movie on Amazon or a TV show on Amazon. There were instances where it wouldn't work at all because you did not have this DRM crap in your browser. Like if you had Linux and you had Amazon, I don't know if it's the same way today. But if you had Linux and you were running an open source browser and you tried to go to watch a movie or whatever, the commercials would load up just fine. The actual content wouldn't load up because it was missing this proprietary DRM binary that's in Chrome. And this was like five years ago, so maybe things have changed. I don't know, but the idea there is my approach is normally, I'll use YouTube BL, feed that URL with my script. It automatically updates YouTube BL and asks you if you want to update the binary reason stuff. So what my script does, oh, back up first, explain what the script does. The script is a bad file that calls a few things. It pulls down W get, I think. It pulls down FFNPIG. It pulls down RSC2, which is the multi-threaded downloader. It pulls down FFNPIG, which is the transcoder, basically, which allows you to pull down a stream and output it to a different stream instead of just copying the stream one to one. Or combining streams that have multiple parts. So for example, a lot of your streams have a audio feed and a video feed, so you might want a low quality audio feed or high quality audio feed and a low quality video feed or high quality video feed. And you can specify with FFNPIG to say, okay, here's where I want you to pull the audio, here's where I want you to pull the video and mash them together when you're done. And that's what FFNPIG will do for you outside of YouTube DL. So I'll hand YouTube DL to my first thing is I'll get the URL and I'll fire it at YouTube DL script. It goes and pulls down, I guess that I think WDGET, FFNPIG, YouTube DL, and make sure it's up to date. RSC2 for the multi-threaded downloads and then it brings up a file and you edit that file with all your URLs in it and then you enter and you save it and you escape out. And it will continue the script and try and pull down all those URLs and hopefully you'll end up with a bunch of videos at the end. If YouTube DL won't get it, what my next step is is I'll use Chrome's. Bulk Media Downloader. And this guy will kind of sniff the traffic. If you remember way back, there's a thing called URL Sniffer. That's essentially what this does. It sniffs kind of media URLs and metadata to pull stuff out. So if FFNPIG or for example YouTube DL won't pull something down, you can use Bulk Media Downloader to kind of look it further in and maybe you can see like a M3U8 file which is kind of a playlist in other terms to have the content people down. Other than that, there's not a whole lot you can do outside of just scraping the screen and doing a standard, you're running a standard screen scraper. The multi-threaded download stuff with Turbo Download Manager. Once you've captured a URL or a group of URLs, you can select them all in Bulk Media Downloader and say copy 2. It's like sent to Turbo Download Manager. It's a bit chinky and kind of whatever, so I usually copy and paste the URL individually until I know a pad and then I paste them in one by one. Depending on what's going on, sometimes it works, sometimes it doesn't, but in general, I just keep the URLs just in case I lose them and they get mangled or something. That's pretty much it. As far as ripping content, there's a lot of notes about FFNPEG and transcoding there. Really, I don't want to understate the huge amounts of work that's put into YouTube DL, but YouTube DL is basically a scraper that has a much predefined engines to scrape content from websites. It's not doing a whole lot other than pulling down the metadata and the titles and stuff like that. You can use FFNPEG, probably just straight up, which I used before YouTube DL. Before I knew it, go download more than just YouTube piles. You can just feed it the URL with FFNPEG and maybe some switches here and there and maybe at the compiler with some specific encryption types built into it. But at the end of the day, you can pretty much do the same thing with FFNPEG that you can with YouTube DL, but YouTube DL just makes it easier. You know, it'll do things like switching based on tags so you can say, I want to do the subtitles, so to pull on the subtitles. I want to thumbnail, I'll pull the thumbnail. So yeah, that's some of that. I think that's pretty much it. It's a pretty quick episode. Pretty high level, but really at the end of the day, my YouTube DL script should get you there. Both media downloader and turbo download manager should get you there. If not, you're doing screen scraping and good luck. You can use something like OBS, the open source broadcasting software to save stuff like that locally. Other than that, I feel like that's pretty much it. This is a lot of notes about switching switches that I had to do to perform to get stuff to compile, right? But anyways, that should pretty much get you guys set. I mean, if you have any questions, if you have any advanced ways to strip media that YouTube downloader won't work or something like URL Sniffer or bulk media downloader won't work, please let me know. We're talking like DRN type stuff like Disney Plus and things like that. I've gotten URLs out of Disney Plus media, but I haven't been able to use Epic. This successfully pulled that step down because a lot of the times when you're pulling this content down with like a third party application, it doesn't have your cookies and browser information built into it. You can send cookies to YouTube, L and FFimpeg. Traditionally, it just doesn't work for me and I haven't had a lot of success in that area. Anyways, if you have advanced stuff on how to rip advanced DRN type of stuff, that's not screen scraping. Please let me know because I'm always interested to find the latest and greatest ways people are pulling content down from the internet. Anyways, take it easy, ladies. You've been listening to HackerPublic Radio at HackerPublicRadio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HPR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. HackerPublic Radio was founded by the digital dog pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is released on the creative comments, attribution, share a life, 3.0 license.