116 lines
9.8 KiB
Plaintext
116 lines
9.8 KiB
Plaintext
|
|
Episode: 3328
|
||
|
|
Title: HPR3328: Pandas Part 2
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3328/hpr3328.mp3
|
||
|
|
Transcribed: 2025-10-24 20:52:28
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is Hacker Public Radio Episode 3328 for Wednesday, the 5th of May 2021.
|
||
|
|
To its show is entitled, Pandas Part 2 and is part of the series a little bit
|
||
|
|
of Python it is hosted by Enigma and is about 12 minutes long and carries a clean flag.
|
||
|
|
The summary is Enigma continues his discussion about his favorite Python module, Pandas.
|
||
|
|
This episode of HBR is brought to you by an honesthost.com.
|
||
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
||
|
|
That's HBR15.
|
||
|
|
Better web hosting that's honest and fair at an honesthost.com.
|
||
|
|
It's Wednesday and you know what that means.
|
||
|
|
I am your host Enigma and this is another episode of Hacker Public Radio.
|
||
|
|
Today I'm going to be talking about my Pandas Part 2 and this is going to be part of a series
|
||
|
|
that I'm kind of renaming on the fly here. It used to be for the love of Python.
|
||
|
|
I'm going to say it's for the love of data. I'm planning on doing more data sciencey,
|
||
|
|
data analytic type things in this particular series and I want it to be all-encompassing
|
||
|
|
and not tied to a particular language because we might do some SQL, we might do some
|
||
|
|
other things as part of the series. So I did a intro to Pandas back in January of this year 2021
|
||
|
|
and this is a follow-up. For those that didn't listen to the first episode it's 3253 I believe.
|
||
|
|
I'll leave a link to it in the show notes but Pandas is a Python module that basically allows you
|
||
|
|
to create two-dimensional data structures in memory and it allows you to do some manipulation
|
||
|
|
and any type of data cleaning, data wrangling that you want to do and you can write to an
|
||
|
|
alpha file, you can write to a database, you can do a lot of cool things with Pandas and I use it
|
||
|
|
every day. So I wanted to talk more about a couple of topics. We're going to talk about
|
||
|
|
another way to apply a conditional field. We're going to create a data frame from a dictionary,
|
||
|
|
we're going to append a data frame with another data frame so just basically concatenating two
|
||
|
|
data frames together with the same column names so we can get into more advanced topics at another,
|
||
|
|
I'm just giving you kind of a high-level basic. We're going to talk about joining data frames with
|
||
|
|
merges and joins, they're different and I use one more than another and we'll talk about that
|
||
|
|
and then we're going to write an alpha file using CSV and this is a one-liner, we'll briefly cover
|
||
|
|
that at the end of the show. So I wanted to talk about first the ways to apply a condition to a field
|
||
|
|
based on other values in the data frame and I talked in my last show about using numpy select
|
||
|
|
for this and you can go back and review that and review the code. I'll also have a working example
|
||
|
|
in this show note so you can compare the differences. This is defining a function and then applying
|
||
|
|
that function to the data frame. So if we were going to hypothetically create a data frame that had
|
||
|
|
a integer value that was one through like let's say 20 and we wanted to create a basically a good
|
||
|
|
bad flag or a true false flag in the data set based on the values that were in that column. We could
|
||
|
|
do that using a function and basically what you would do is you would define your function name
|
||
|
|
and then pass in the data frame and then you would basically do an if statement to say let's say
|
||
|
|
if the the DF score was greater than 10 return good else if it was you know less than 10 or you
|
||
|
|
could just even do an else return bad and then you could basically do outside of the function. You
|
||
|
|
could say DF let's say status and you would put that in brackets and in parentheses equal and then
|
||
|
|
DF dot apply in parentheses your function comma axis one and then you would end your parentheses
|
||
|
|
and this would basically create a status field that would be a good bad based on the data in the
|
||
|
|
other column. I like this approach a little bit more than the numpy select only because it looks
|
||
|
|
cleaner if someone has a plus minus on if they're using both and they found a pro con approach to
|
||
|
|
this I'd love to hear from you shoot me an email leave me a comment or get with me on Twitter
|
||
|
|
I'll have all that in the show notes. So the next thing I wanted to talk about would be basically
|
||
|
|
creating a data frame from a dictionary and this is pretty easy as long as you keep the the
|
||
|
|
dictionary labels and the data frame labels the same it's basically a one line statement so you're
|
||
|
|
going to create your your data frame so let's say DF 2 in this case equal to pd.data frame remember
|
||
|
|
to capitalize the d and the f and tripped up on that many times and then you're going to put that
|
||
|
|
in parentheses your dictionary name so my dictionary and then and your parentheses obviously and
|
||
|
|
this is going to create a data frame based on the dictionary pretty easy. The next thing is
|
||
|
|
talking about merges and joins so there are two approaches to joining two data frames together
|
||
|
|
and this would be basically like a sequel join for those who are more familiar with SQL
|
||
|
|
so if you use dot join so basically df equal df dot join and then the other in in parentheses
|
||
|
|
your other data frame you're going to be joining those two objects based on their indexes
|
||
|
|
and this is assuming they have similar indexes so the dot join is I believe the first
|
||
|
|
function that they they introduced and then the merge was basically a replacement for that
|
||
|
|
I don't know that for true but I use merge way more than I use join and if someone has is has a
|
||
|
|
good use case for that reach out to me I'd love to know because I use merge way more than
|
||
|
|
I'll ever use join so merge gives you the ability to control how you merge the two data frames
|
||
|
|
together so you can do an inner join a left join a right join so and and what that means is
|
||
|
|
basically if you're doing a left join whatever data frame you define first so whatever's in front
|
||
|
|
of the so it would be df dot merge and then let's say we were doing df2 as the merge item
|
||
|
|
the df would be the left portion of the join so you're you're essentially keeping everything in
|
||
|
|
the first data frame irrespective of the second data frame so if you're creating like a df3 for
|
||
|
|
example you would get everything in the first data frame and then join to any matching elements
|
||
|
|
in the second data frame so if you do an inner you just basically get a cross section of both
|
||
|
|
so they have to exist in both based on the columns you define if you do a right obviously it
|
||
|
|
would be whatever the second element is on the on the right join so this has the power of giving
|
||
|
|
cross sections of data frames so I use it a lot when I'm when I'm trying to compare to
|
||
|
|
datasets or I'm trying to append a dataset based on another dataset so I use it a lot in a real
|
||
|
|
life example so I work for a heavy equipment manufacturer and we were appending serial numbers
|
||
|
|
based on a equipment number an internal equipment number or prices based on an equipment number
|
||
|
|
something like that where your datasets might not be completely aligned so I do the left
|
||
|
|
join to see the differences or I'll do an inner to see the the differences so you can also have
|
||
|
|
different column names so by default if you're just doing it and you get past no elements it assumes
|
||
|
|
that the column names are the same if you do a scenario and I'll leave a note in the show notes
|
||
|
|
for this if you do a left underscore on equals you can define the column name and you put that in
|
||
|
|
brackets just like you define any other column in pandas you can define what column you're joining
|
||
|
|
on and then same way with right underscore on there's two elements that you can you can define
|
||
|
|
there so that's a little bit more complicated and I'll leave a detailed explanation in the show
|
||
|
|
notes for that so a pen is basically another one liner pretty much as long as your data frames line
|
||
|
|
up from a field by field perspective so in other words if you're reading in two files that have
|
||
|
|
the exact same columns it's pretty easy and pretty straightforward so in this instance it would be
|
||
|
|
a df equal df dot append and then you pass the second data frame so df2 pretty straightforward
|
||
|
|
the last thing I'm going to talk about is basically writing to an output file and there's
|
||
|
|
there's multiple ways you can write to output in pandas but I'm just going to cover a simple one
|
||
|
|
this one is the dot two two CSV so two underscore CSV and this requires no parameters this is
|
||
|
|
pretty much how I do when I'm when I'm just exploring data and I want to look at it in Excel or I want
|
||
|
|
to look at the output file I pretty much just do an underscore dot two underscore CSV and just to
|
||
|
|
get me an output pretty simple and I'm bad at just naming my output output dot CSV and then
|
||
|
|
if I have it open it'll error you know if whatever so long and short this was a pretty short
|
||
|
|
episode wanted to do another follow up with pandas so there'll be a third at least one more
|
||
|
|
in this pandas series or pandas sub series of my for the love of data we'll be talking about
|
||
|
|
group buys and group buys gives you powerful Excel like pivot capabilities within pandas so stay
|
||
|
|
tuned for that and as always I'll leave a detailed explanation in the show notes for the purposes
|
||
|
|
of you following along to me merambling about pandas and all my contact information will be
|
||
|
|
in the show notes as well otherwise have a great day guys and take care of yourselves
|
||
|
|
you've been listening to hecka public radio at hecka public radio dot org
|
||
|
|
we are a community podcast network that releases shows every weekday Monday through Friday
|
||
|
|
today's show like all our shows was contributed by an hbr listener like yourself if you ever
|
||
|
|
thought of recording a podcast then click on our contributing to find out how easy it really is
|
||
|
|
hecka public radio was founded by the digital dog pound and the infonomicon computer club
|
||
|
|
and it's part of the binary revolution at binrev.com if you have comments on today's show
|
||
|
|
please email the host directly leave a comment on the website or record a follow up episode yourself
|
||
|
|
on this otherwise stated today's show is released on the creative comments attribution share
|
||
|
|
alike 3.0 license
|