Episode: 2787 Title: HPR2787: NodeJS Part 1 Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2787/hpr2787.mp3 Transcribed: 2025-10-19 16:49:34 --- This is HPR episode 2007-187 entitled Node.S Part 1. It is hosted by Operator and is about 10 minutes long and carries a clean flag. The summary is, I don't know non-unuscript 2. This episode of HPR is brought to you by an Honesthost.com. Get 15% discount on all shared hosting with the offer code HPR-15, that's HPR-15. Get your web hosting that's Honest and Fair at An Honesthost.com. Welcome to the episode of HPR Radio, your host operator. It's going to be kind of a multi-part series on my adventures and learning a real programming language. I'm going to be talking about Node and my struggles with learning Node from a day-to-day basis, maybe, or a weekly basis. So far, it's been about my second day of hardcore coding. My background is setog bash curl and I'm able to pull content down that I want to pull down using those methods normally. But more and more, I'm starting to see sites that have complex Java or they want you to render something in Java or they have multiple CDNs and you have to go to the CDN and pull that content down. And that has anti-spidering or anti-scraping stuff in it. So I started looking at different things. I've used PhantomJS before and that's kind of appreciated and been replaced by some other frameworks. I ended up with Chrome's API using a thing called puppeteer. Now, puppeteer interfaces with Chrome's API, basically. So it's actually ridding their upside in the background and then you can code whatever you need to code and pull out whatever you need to pull out using CSS selectors, basically, which they call other kinds of selectors. So that's kind of where I'm at. So I started doing basic stuff in the browser. The great thing about understanding how it all works is once you understand how the selectors work, it's quite easy to figure out what you want to pull in. So that's mainly what I was looking for as I wanted a website parser that wasn't impossible to use. So I quickly realized that debugging some things with puppeteer and realizing that there's delays involved with that. And there's issues around pulling in content and waiting for that content to come in before you do something with it, obviously. But just inside of the browser natively with Chrome, I'm easily able to, you know, find something right click at view element, see that part of the element. Pull in the parts that I want to pull in using the query selector and the query selector all stuff. So that was that was fairly painless other than spending a happy day trying to figure out that I had some ability issues as far as using puppeteer. It wasn't able to select the stuff that it hadn't downloaded yet, essentially. So I like the idea that I can pull in Diz and tables and all that stuff and pull content in without having to worry about how I got there. I just have to look at the DOM and right click and say selector and copy the selector up exactly how they get there. So that was cool. That made selecting and pulling down content very easy. Now the hard part is that I've never used or learned a object oriented programming. I did a little bit of Java. It's been a couple of weeks doing Java basic stuff. And I got frustrated and ended up back up the shell. So with that said, I struggle and I still struggle and I'm sure this will be continuing struggles of Java and node and puppeteer in how object oriented programming works in array. So the first thing I'll say is that that part of it was easy. Getting it to work was easy. What's not easy, obviously, is writing the code in this syntax. And if you haven't done any object oriented programming, then you struggle with that. Another thing I noticed that like anything, there's 15 ways to skin a cat. So I go to look on, you know, a code repo or, you know, whatever website. And there'll be 15 different ways to do the same thing. And it'll take me an extra 10 times as long as anybody else to read that syntax because I'm new to programming languages to make it work. And then I'll realize that that's not what I actually want. I need to do or want to do. And then I have to backtrack and spend time trying to figure out what I really need to do and what I really want to do. So I have some requirements really multithreading. I have, I want to view obfuscated Java and I want to clean up the content of it. I want to test through that through a proxy server so that I need to have some kind of proxy support within node. I'd imagine there's something in there or something I can call at least at the very least I can call proxy chains if I have to, if there's no support. So my rule is, is that if it, if it's supported in node, then I have to use node. And if there's no, if I can't find any way or ask around a way to do whatever I'm trying to do in node, then I will drop to shell and do it in shell. But I'm forcing myself to do everything through node. So I struggled mostly recently on kind of taking two steps forward and four steps back because I was able to pull the content down. I was able to rejects it out. I was able to replace and find what I needed to find. But on the last step, I have a dot search and that dot search function returns in array. And I had everything as a string and then all of a sudden it switches to an array and now I want to take that array and dump it to a file. And now it's, I'm having to switch back and forth the syntax. So I went from a string, string, string, string, string. And then the only way I know how to prep out anything within node so far is to use this search function. And now that I have this search function, it returns an object. So now I'm in array. So now I have an array and I have to deal with that mess. And I didn't want an array. I want a string. I want to keep everything as string. There might be another function that I don't know about yet. But I'm struggling with the whole, you know, I have my content the way I want it. And then all of a sudden I do a rejects or a grab. And now I'm grabbing out and I go, OK, I got my content the way I want it. Oh look, now it's in array and it's got all these, you know, all these commas in it. And it's, it's a different kind of, it's a different kind of variable or it's a different kind of data set. So that's fairly frustrating for me right now. So I'm working through that. This is basically to replace my proxy script. So I have a script that does set up a curl with some use of proxy chains to end Java, which is spider monkey from way back. And it would process this Java and do things like cookies and refer checking and all that good stuff to pull down content that wasn't supposed to be pulled down or wasn't supposed to be scraped from these proxy to websites. Then it would take those proxies, check them, check them again, and then output the ones that weren't good. And then after that, it would take the ones that were good, pipe those through proxy chains and see which ones actually worked through proxy chains. And then it would export a configuration file for proxy chains for people to use with proxy chains. So it's not really relevant today. I mean, you got to learn other things like that. But there's a lot of places that block tour. There's a lot of, you know, proxies out there that are slow. But yet, if you need to get to some content or something like that that's being blocked by a, by a traditional tour node, then you can use that, right? You don't have a VPN. You can use that. But mainly it's just something I found interesting is, is, is looking at these proxies and seeing where they are and where they come from and how they work. So I'm going to be replacing all of this with, with node. And that's why I'm able to pull down the content that I'm not supposed to be able to scrape is because we're, we're using the Chrome API and we're pulling down everything we need to pull down. And we're not having to worry about offuscation of Java stuff. So that's where I'm at. You know, I'll say that I haven't actually learned a language yet. This is my first kind of attempt. Several attempts at learning a language. But this is my first tent that I've, that I've kind of gone down the rabbit hole on and spent a couple days on. So eventually, I'll get to the point where, you know, I'll have those, those core functions and I'll understand or raise and I'll understand objects. And I'll be able to move and navigate around and know it a little bit better once I figure all that out. But for now, I'm fairly frustrated because I know what I want to do, right? And I know how to do it in a shell, but I can't do it. And then, and no, because I don't know what the function is that I don't know how to call it. And then I do end up finding a function, but it really only operates the way half of what I wanted to do. I want to add some extra feature or an extra extend that function. It's a rejects or a grapple or whatever. And I can't do that because it doesn't support that function or I don't know how to get the syntax right for it to do that. So anyways, that's kind of just more of a rant, if anything. But I'll have some more useful stuff and I'll have a Google Docs about my trials and tribulations about what I'm going through and help along the way for people that tell people make that transition from from a shell environment to a real programming language like node, right? Anyways, hope that doesn't help anybody and we'll get some more helpful stuff as we've released the rest of these series or whatever. You've been listening to Hacker Public Radio at Hacker Public Radio dot org. We are a community podcast network that releases shows every weekday Monday through Friday. Today's show, like all our shows, was contributed by an HPR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. Hacker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise status, today's show is released on the creative comments, attribution, share a light, 3.0 license.