Episode: 3953
Title: HPR3953: Large language models and AI don't have any common sense
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3953/hpr3953.mp3
Transcribed: 2025-10-25 17:51:02

---

This is Hacker Public Radio Episode 3953 for Wednesday, the 27th of September 2023.
Today's show is entitled Large Language Models and I Don't Have Any Common Sense.
It is the first show by New Host Hops and is about 18 minutes long.
It carries a clean flag.
The summary is, learn how to load and run GPT-2 or LLOMA2 to test it with common sense
questions.
This is Hops and Lane and Greg Thompson.
We will be trying to load a hugging face, large language model so that you can do text
generation on your own computer without having to use somebody's proprietary API.
Hugging face has a bunch of models including chat models and large language models so you'll
need to create a hugging face account first and we'll put this link in the show notes.
It's for HuggingFace.co slash join and that's where you want to go join up.
Then you'll need to get an access token if you want to use any of the supersized models
from Meta or any other company that kind of hides them behind a business source license.
They're not really open source but they are sharing all the weights and all the data
but you just can't use them for commercial purposes if you get really big to compete
with them.
But anyway, if you need a token you'll need to get that token from your profile on a hugging
face.
You can put that token in a .nv file.
That works with a lot of Python library called .nv and that's what you use to load environment
variables and so if you put it in a .nv file it will combine those together when you load
it with your existing environment variables.
So quick tip you definitely want to use once you see import.nv and then you say .nv.load
ENV but you don't want to then say .nv.nv values because that will load the dictionary of
only your .nv, variable and value pairs and so you want the mapping of all of your environment
variables typically when you're running a server because there will be things like your
Python path with Python version, that kind of stuff that you'll probably need to use if
you're building a real web server.
So we ran into that problem when we were trying to configure our GitLab CIC ePipe line and
then we had to hit that problem again and we went over to Render to deploy our chatbot software
at carry.ai QAR Y.ai.
So once you've got your token loaded you can then say so you import you've got your .nv.loaded
into the .nv package but now you've got to import os and say dict.dict in the os.environ.
So you're going to convert the os.environ which is a dict like object you want to grab a copy
of it basically and convert it coerce it into a dictionary. So dict open parenthesis os.environ.
You should be familiar with that if you've ever worked with environment variables.
The closure parenthesis and you've got that in a dictionary we call it ENV as a variable
and then we can say ENV square bracket quote and then hugging face access token or whatever
you call your variable in that .nv. Anyway it turns out we're going to show you how to do it for
smaller models. We tried to do it for Lama 2 but that's a four gigabyte model and it takes a
long time to download and really hard to do when you're on a conference call with somebody in
Korea where Greg is located. So we are going to and so when you search for models it's really
hard to find models on hugging face unfortunately because there's so many and people can describe
them in a lot of different ways and so really hard to find what you're looking for. Don't ever
hit enter after your search query. Instead go to their full text search that'll give you more
of what you need or you can click on the like the C3358 model results for Lama 2 that's what we did
in order to to find the one we were looking for that could do chat. But like I said we're going
to skip that one and move on to a smaller one GPT2. Not actually it's not that much smaller it's
just that I've already downloaded it downloaded offline several days ago. So if you've already
downloaded it's already done this once this this process of downloading and creating a model if you've
gone through this these steps that we're describing here then you won't have to do it again
and wait for the download and so anyway so we're going to use one that I've already done this for
online. If you do need that license you'll need to apply for that license from meta meta.com if you're
trying to use Lama 2 and you can go that's under slash resources models and libraries Lama
downloads. Anyway the show notes will tell you how to do that but if you just want to use GPT2
you don't need to do that because that's two generations back on what OpenAI is building
so which is that they're up to GPT4 and they're already working on GPT5.
Let's see so now you can we're going to use instead of the auto model that a lot of people use we're
going to use the transformers pipeline object from hugging face so the pipeline will include the
model and the tokenizer and be able to help you to do enter inference you won't be able to retrain
or fine tune the model but at least you can get it to generate some text. So you say from transformers
import pipeline and then you say generator equals pipeline open parentheses text generation and
you need to give it the model name with the key model so you say comma model equals open AI dash GPT
that's open AI dash GPT all lowercase no spaces just that hyphen in the middle between those two
words and then you can ask it a question this is a generative model so it's going to only try to
complete your sentence it's not going to try to carry on a conversation with you so if you need
to create if you're trying to ask a question you probably want to proceed it with the prompt
question colon and then ask your question and then probably a new line after your question and
then answer colon and then that should give it the hint it needs to try to answer her question
another way you can do it is if you're just asking a math question you can just put an equal sign
at the end and it'll try to complete the equation so we're going to try to see if GPT 2 can do any kind
of math or because large language models are famous for not being able to do math or common sense
reasoning which is kind of surprising since they are since computers can do math quite well
and they certainly do logic very well as well but large language models are just trying to predict
the next word and so you'll see how this one balls on its face when you ask it to do one plus one
so if you put in your question a string just just the three characters one plus one and then
a fourth character equal sign and put that in quotes then you can you can
then do your your generator on that question you put the equal sign at the end and it's sort of
like the question mark to a machine and so our least a generative large language model that's
just going to try to complete that formula so so then you're going to say your responses equals
generator open parenthesis oh yeah I've already said generator equals pipeline so you already
got your generator so you're just going to use that function generator open parenthesis and you
give it your your string those five characters you just entered or four characters one plus one
equals and then and then it will return a bunch of answers you could probably you can set a max
length you want it to be bigger than the number of tokens you input and because each one of these
characters is an individual token it represents a piece of meaning in that phrase then you're
going to have four tokens so you need to give it at least five on your max length parameters you
going to say max underscore length equals five or six or seven that'll be it'll just generate
enough tokens to to end at that number that you give it there and this is for GPT2 in generative
mode and then for the num return sequences you can give it another parameter if you'd like
for the number of guesses you would like it to take so the number of times you wanted to try to
generate an answer to that question so we gave it the number 10 just to see if it would have any
chance of answering the question and when we did that so close your parenthesis now after
num return sequences those have underscores between those three words and max length also has an
underscore in between those two words and those those are keyword arguments to the generator function
and your question is the positional argument at the beginning and then you're good to go with your
answers equals that or responses equals that and and so then you can just print out all those responses
if you'd like we got so the responses will include both your question and the answer so in our case
we got the very first response that we got or generated text we got was one plus one equals two
space two space plus so it's going on if you if you if you gave it two extra tokens to go it would
have said it would keep going if you get more than two extra tokens so let's see one two three four
five if you give it six tokens it would stop at two plus you gave it more than that then it's
going to keep going and it's going to say one plus one equals two plus five equals one plus and it
keeps going on so it's just trying to complete some sort of equation or system of equations third
down the list though we do see an answer that looks a lot closer we see one plus one equals two
comma and then it says space equals one and space equals two so it does continue on
beyond what looks like an answer and many of the other answers are not even close there's a one
plus one equals six times the speed of sound and one plus one equals one comma so out of the 10
answers it got one out of tens that'll be a 10% doing its exam and you can't really even count the
one that it got right as a right answer because you'd have to be picking choose some of the tokens
that are generated to to assume you just make it stop after the first token basically to get a good
answer out of it trying a more complicated question where we use that sort of prompting approach
where you say answer colon and question equal colon and answer colon and we put questions like
in the book natural language processing and action we put the question about cows so if you've got
there are two cows and two bulls how many legs are there that was our questions so we put that
after the question prompt and then we had answer colon and then we gave it I think we gave it
30 tokens or so or as our max length we gave it like 30 so that it could answer that question
because there's 25 tokens in there if you look really closely count up all those words and
punctuation marks in there you could probably see that it's going to end up with when you include
the question and answer prompts it's going to end up being 25 tokens it'll give you that estimate if
you give the number two low as a warning saying hey better give me some more tokens that I can't
generate what you need but the answer is that we did come up with with that question about cows was
we only gave it actually it did a really good job let's see did I tell it to stop
well looks like the question and answer prompting gave it a better job when I limited the max length
um that this was when I actually let's set it to be smaller than the correct amount so once I got
it once I set it smaller to the actual question it only got one out of the 10 right I actually
got none of them right because all the numbers it was numbers like four and only a word like only
and four and then the digit two and then the number the word one then the digits 30 3 0 and so on
didn't do very well when I when I under estimated the number of tokens and then when I gave it more
tokens than it needed it gave answers like four f o u r dot and then I carried your turn and then
quote let me see if I have this straight so it's going to ask me a question it looks like after
giving me the answer for for two cows and two bulls so it's so it doesn't know that it lags
or what I'm talking about and not cows male and female because that's what it's counting out when
it gets the answer for the second most likely answer was only three and three cows and two bulls
are bigger than that answer three is kind of interesting because there are a lot of trick questions
that people have been asking chat GPT and that have been including in the training sets that are
trying to trick for logic where you've removed the legs of a couple of the cows or bulls and the
question so some of them will only have three legs so that three might be wise showing up so high
on that list because it's gotten it's memorized some text problems that are trying to fool it
but anyway some many of the other answers are all the other answers are incorrect there's a two
comma two comma one comma there's one per cow there's a 30 dot there's a one dot that's
interesting that that number 30 keeps coming up and one dot and three dot so those are those are
periods looking looks like at the end of a sentence so it thinks it's giving me the full answer
on some of those and one of them says something like three that has the word three period
and then they need to be introduced to the cow population before I wish I'd let that one go on
a little bit further anyway you can have some fun playing with larger language models on hugging face
they're not going to give you much use unless you get to really good job of prompt engineering
and perhaps train them on your kind of problem that you need to solve and that's the kind of thing
we're doing over on the carry project an open source project to build a chatbot that you can
trust and has a lot of rule-based has a rule-based approach to managing the conversation rather than
purely generative so you can keep it grounded in reality anyway hope you I've enjoyed this
little my first ever hacker public radio podcast and I hope you have two and Greg do you have any
questions or thoughts we spent a lot of time looking at all the different models so it's worth
exploring all the different sizes tiny to big and seeing which ones work for your use case
indeed yeah that's a really good point we had trouble finding one that was small enough for us
to do live on this pair programming that we're working on but so you can and you this was one
model out of many many many thousands that you can choose from so have fun searching around on
hugging face and find yourself a model you have been listening to hacker public radio at hacker
public radio does work today's show was contributed by a hbr listener like yourself if you ever
thought of recording podcast and click on our contribute link to find out how easy it means hosting
for hbr has been kindly provided by and on host.com the internet archive and our sings.net
on the satellite status today's show is released on their creative commons attribution 4.0
international license