Files

223 lines
17 KiB
Plaintext
Raw Permalink Normal View History

Episode: 4112
Title: HPR4112: JSON and VENDORS and AUTH ohh my!
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4112/hpr4112.mp3
Transcribed: 2025-10-25 19:44:06
---
This is Hacker Public Radio episode 4,112 for Tuesday the 7th of May 2024.
Today's show is entitled, Jason Vendors.
And off of my, it is the 110th show of operator, and is about 21 minutes long.
It carries an explicit flag.
The summary is I talk and rant about Jason and Vendors.
Hello everyone, and welcome to the episode of Hacker Public Radio with your host operator.
So I'm making pre-dinner pro tip that if you have EDD, the only way to make a meal is
to pre-make it, because I can't seem to make a meal exactly when it's supposed to be
ready.
Anyways, that's all we're talking about today.
From about Jason in response to a recent episode, I just listened to you.
And I've worked with it a lot, right?
It's pretty much the de facto standard for web application and processing any data.
It's so prevalent that it's sort of a rant that all of your web apps are essentially
probably just Jason apps, which is on APIs.
So I feel like I would say more than half of every web page you're on is probably some
Jason app or any kind of service-based commercial tool is all Jason.
The problem with that is that if you're not a programmer and the API that the company
gives you access to is for the lack of a better term dog shit, because purely companies
don't have interoperability in mind, because that's how capitalism works and where we
are in the current software space.
So if I make a tool, a security tool specifically, or even an IT tool, and it has a feature, and
someone else has a different tool that has a different feature, they're going to try
to eat each other at some point in time.
That's just how software works, or at least if it makes a security software.
So the problem with that is we brought up a security software and then we said, oh my
gosh, we have all these tools and they can't talk to each other.
So we're going to just call it SIM, or I'm sorry, it's SOAR, SIM is a different piece,
which I believe in, but I don't really believe in SOAR.
Anyways, sort of a reantone of applications and web applications and vendors, and also
a call for help on JSON.
So what I do is I might parse maybe 100 megs of JSON at a time, maybe I'll be working
with some kind of something that takes JSON as an input.
For example, it takes a certain type of input, the HDP in a collector.
So when you send something to Sponk, it through the event collector, it expects it in a
certain format.
So you can't just hide from what I can tell sometimes, you can, I don't know.
But in general, it has to have a specific format with essentially extra headers.
And what I have a problem with, and if somebody can help me, is analyzing, you know, JSON
in general, syntax errors, solving syntax errors, key value pair errors, word lists, or
dictionary errors.
So if there's a big blob of JSON that I had that's improperly formatted, that maybe it's
got a new line, and there's no comment in it, or maybe it's, you know, a dictionary inside
of a dictionary, but the dictionary's not closed out, and you properly each line.
When a human looks at it, right?
This might be, it's me answering my own question, in which I can do with, with, with, with
a large language walls, sometimes can help me parse that stuff out.
And other tools like online tools, like JSON, beautifier, can help you parse stuff out.
But what happens is, you know, when I create JSON output, or maybe a piece of software
creates JSON output, it's standard in that, in that one time that is created.
But, say I'm doing 10,000 requests, or 10,000 posts, and I'm concatenating them all together
and trying to encapsulate them properly, how do I do that?
That scale, idiot proof, what I'm not having to manually, manually process, or manually
look at each piece of JSON I'm receiving, and I'm sending, and it's getting annoying
to wear every time I want to work with a JSON, essentially, for the output of any kind
of way that, I make my own APIs for a lot of security tools, because most of them are
awful.
The prop, the reason they're awful, again, is interoperability is just, by default, not
a thing.
And then also people abuse APIs, because they're dumb.
So they'll do something like a, you know, select star, or they'll do a massive, a massive
query that will break the back end.
And so that's where you get limitations from every API where, oh, we'll give you the
first 5,000 results, and after that, you hit the page, and there's probably other tools
to help me with that too.
A lot of them have a built into, so they'll send you the URL to the next, for example, Microsoft
vendor will send you the URL with the cookie, you have the pass to get the next page, if
you're using your API.
But the problem with that, again, Microsoft also has APIs that don't exist in their defender
or API, or graph API, or any or else in their stock, in their second.
Because they want you using their platform, they want you in their UI, they want you in
their UI, and not doing the automation, and not using their tool.
So when there's an API call on the Web UI, it does exactly what I want, and then I go
to look for that API call, of course, in the official documentation, not only is it there,
but all the useful APIs queries are beta.
So you run into this across the board, CrowdStrike, APIs, completely useless, cost $30,000, just
to get your data in some form, and you can push somewhere else to be filtered.
So if you're like us, you get 3 million events every two days of people opening PDF documents,
I don't need that.
I don't need 50 million heartbeats and 70 million PDF documents opening today.
Maybe I need a sample of that, maybe I need whatever.
My ability to filter that out is lacking because I have to create my own APIs, bypassing,
or including an MFA, or some kind of, my rice, my rice is still hard, so I have brown
rice, but it's not quite brown rice, anyways, so I make my own APIs, and oftentimes there
are hecky, but they get the job done, they get the job done, and then when they change
the UI, of course, sorry, and when they change the UI, of course, everything stops working,
which, you know, that's not great.
So the real question is, even if we did cough up the 30-grain to get data to another thing,
how does it have the same problem in a different platform?
So, genes and noodle, so, so, you know, things like seeing sign-on, right?
I want to create a ping authentication, ping authentication plugin, or central module
for Python, so that I can authenticate any or internal apps and create APIs that have
MFA, or ping off, attach to, and create APIs for every single thing that I want to, so
we have hoteling, right?
That'd be the first use case, it'd be nice to save.
So hoteling is when you, you know, you get the book of, book of cube and, you know, sometimes
a circuit, well, I have a proof of concept, API that I built that will let me book my
chair broadly, or automatically, so I take, I have an array that takes every single Tuesday
Wednesday of every week, and puts that into a cookie, well, even if I reset what I like
to do to test my cookies and their links, is I like to create a cookie and try to create
it with a non-expiring date, if that's possible, and then with the web UI and something like
works week.
So I'll create that cookie, and maybe it is a non-expiring cookie, I never, I never
expires, or I don't get an expiration date.
Well, what happens after 30 days and two weeks, whatever, a couple of weeks, whatever?
That cookie tends to die at some point, so anywhere from, I want to say it's a couple
of weeks to, three weeks, even at every minute, just a hammer away at it, just because I don't
care, even at that, that speed, every 59 seconds, I still will lose my session.
So what I want, right, is paying authentication, so I can authenticate to that service using
regular password credentials without MFA, so I can authenticate to that service and
automate the whole thing, then the next step would be to do the same thing for CrowdStrike,
same thing for Spawn, same thing for any kind of web interface like Microsoft.
So essentially I could do whatever I want to do, that I could do with a browser, I could
do in my own API.
So if anybody knows anyway to troubleshoot JSON, outgoing and ingoing, parse it, tell
you where the problems are, what lines, because I had a 56 meg in-mat conversion from XML
to JSON, which is a, there's a GitHub project that converts, in-mat JSON to XML, it's kind
of like extreme parser, extreme in-mat parser, something like that, it eats XML and spits
out more or less as usable JSON, and that can parse it further through send you to Spawn,
problem, it's at like 50 meg JSON file, and I post this and last month didn't go through,
I don't know what, so I go back and I check and I run the commands and export it out,
we're fine, looks fine, no parsing errors, nothing, nothing unhappy, go to push, do the
post, no data, error code 8, file, whatever, oh great, my syntax is broken, something's
broken, the whole script is broken, I don't know what's going on, so I pick one line of
the JSON, and I push it through, perfect, perfectly fine, no problem, I pick a random shuffle,
SHUF space dash in 20, and I pick a random 20 lines and push that to, using curl, push
that to, to Spawn, and this is instantaneous, we're talking within milliseconds, not even
seconds, did a table of pars 50 megs, works fine, so then I went, okay, that's weird, so
I'll try to push the whole file, nothing, try to push one file, one line of the file,
perfectly fine, 10 lines or five lines, whatever I picked, perfectly fine, and then I go to
push the 50 megs again, and it works perfectly fine, why? I don't know, don't ask me why,
now I start to second guess myself, now I'll do I need to create a loop that will just
loop through all 10,000 events, and 10,000 lines, and post them 100,000 lines and post them
manually, and manually, so I got 10,000 posts to the server, and that's not efficient, nobody
wants to do that, so now I have to do error checking to make sure that the past, and
I have some kind of threshold, and then maybe error out to a log file somewhere, like it's
pushing an email address, which I'm not going to do, I'm just going to check my dates
on my imports manually, when I go to do scripting and stuff, so all this to say, JSON is annoying,
it's nice once you get it in Python, and you can manipulate it, you can push and pull
it, and pull out key pairs, and search strings inside of key pairs, it's a beautiful
career for doing some of that stuff, but when it comes to a 50 meg file that has syntax
errors, or you're pushing to, or most I, you have to fire up burpsweet and figure out
how those requests are coming in, and rewrite the whole thing, and building your own APIs,
gets difficult because maybe there's a carrier return in there, maybe the API, which I
have seen, I've seen APIs get angry at new line characters, or new line slash return slash
in R, I think it a registered nurse, so backslash R is like the return, it's getting
the backslash in is the new line, so I think it has registered nurse, so I always think
of that to do new lines, so traditionally Windows Unix, Windows is R, R, M, and then Unix
is N, so this is a program you can use like DOS to Unix and Unix to DOS, they'll take
those characters and put them out, and I think it does some of the things like maybe
a new character in there, some weird Unicode characters will pull out, so it does fix some
other stuff besides new lines, and carrier returns, but I've seen APIs, you know I'm looking
at burpsweet, everything looks great, it's beautiful, copy and paste, push the request,
you know invalid, don't understand, repeat request, works great, copy and paste my request
to replace what I replayed on the last post, it's error, okay what's going on, I take
it out of the print mode and I observe that it's one solid JSON line, right, when you
push that post to the server, it's a solid line, when you look at it in a burpsweet, it's
all pretty printed for you, and if you're importing new items or you want to change those key
value pairs or something with the data, and you paste it in, it's got those carrier
returns in there, so like any software development, and working with any servers or services,
things can get wonky when you start doing stuff on the back end, writing your own APIs
and whatever, but yeah, anything having to do with handling JSON, troubleshooting, I think
beautiful zip, I have some, I'll try to remember if I put it in the show notes, but I have
some Python functions that are for creating, I've used, I am using to create APIs, and
I'm trying to write one that will basically emulate Samo, because I don't want to do Samo
with, at least the way our server uses it, I don't know how to do Samo with Python, so writing
my own Python Samo module, which is extremely exciting, because it's by encoded weird encrypted
thing and get a key, and then use that key encrypted or something, it's quite the mess,
so I'd rather not write my own, but I think that's what I'm going to have to do if I want
to jump over the inner-operability hurdles that I have with my current setup, so anyways,
I'm not sure I helped anyone, but I do need help, so if you know any good JSON parsers
or JSON fixers or ways to handle and or observe cookies, I know burpsweet has a module that
will basically record much traffic, and then it will tell you which one of those cookies
that you acquired during the authentication process are valid or useful for the actual authentication.
So when you log into any website or you use an app on your phone, whatever, you're sending
tons and tons of data, probably other than authentication, you're probably sending
probably three to five times and more traffic to the site that you need to actually log in,
so there is a burpsweet, if I can't remember the name, I haven't been able to get it to
work, but it's a plugin that's supposed to help you without authentication piece, and that's
how I would use that easily, more easily create my own, essentially, Samo do it deal. I think
Samo modules in Python actually don't do what I'm trying to do, they actually require
like a certificate and to be placed on the ping server, but what I want to use is plain
text credentials, because I know I can use plain text credentials using burpsweet to log
in to internal stuff without anything or any of that stuff, so anyways, that helps out.
If I do end up getting, you know, more stuff, more Python helper scripts, I'll add those
to the show notes, or I'll do another show probably, but right now it's a lot of beautiful
suit stuff, it's called fine cookie, little scrape out of the cookies, there's another one that's
pretty simple, pp simple, that basically will take any input, and output that value.
And the idea is that it will detect it if it's JSON, boom, it'll pause the JSON, if it's a
strain, it'll pause the strain, if it's got weird characters in there, like Colens and
St. Colens, or maybe it's separated by a weird almost JSON thing, it'll kind of make it look
viewable, so if there's a new line in there, it'll put the new line in.
So the idea is I want a single purge print function that will take any input regardless
of whether it's a Boolean or JSON blob or whatever, and it will spit it out, and maybe
even detect the length or the size of it and say, oh, this is big, so I need to cut the
data up or summarize it, or so use the end in the beginning, things like that, so eventually
I want to tune that out so I can easily observe and debug stuff in Python.
I use an interpreter, which I've been forced to do because the tools I use require the code
to be scanned by a fancy, wizzy thing, so if I don't use it, then I have to go through
it manually, clean it up anyways, so anyways, I'll try and post what I've got so far for
people to use as far as Python, Canadians, and pretty much teach myself Python and try
to do actual real coding and consolidation. If you have an easy way to do that, I don't
see there a way to do it, I see it's all manual, which people write in their own security
modules, it's a really horrible idea, but as far as I can tell, I don't know any way to
other than like basic inputs like specific UIDs, that email, full number, you know, basic
stuff like that, I don't see any way to do input validation, which is kind of terrifying, so anyways,
I appreciate it, and let me know if you have any tips on writing secure Python or modules
that I can use that will help me idiot proof my way out of writing secure Python.
Cool, take it easy.
You have been listening to Hacker Public Radio at Hacker Public Radio does work.
Today's show was contributed by a HBR listener like yourself. If you ever thought of recording
podcast, then click on our contribute link to find out how easy it leads.
Hosting for HBR has been kindly provided by an honesthost.com, the Internet Archive and
rsync.net. On the Sadois status, today's show is released under Creative Commons,
Attribution 4.0 International License.