Episode: 3953 Title: HPR3953: Large language models and AI don't have any common sense Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3953/hpr3953.mp3 Transcribed: 2025-10-25 17:51:02 --- This is Hacker Public Radio Episode 3953 for Wednesday, the 27th of September 2023. Today's show is entitled Large Language Models and I Don't Have Any Common Sense. It is the first show by New Host Hops and is about 18 minutes long. It carries a clean flag. The summary is, learn how to load and run GPT-2 or LLOMA2 to test it with common sense questions. This is Hops and Lane and Greg Thompson. We will be trying to load a hugging face, large language model so that you can do text generation on your own computer without having to use somebody's proprietary API. Hugging face has a bunch of models including chat models and large language models so you'll need to create a hugging face account first and we'll put this link in the show notes. It's for HuggingFace.co slash join and that's where you want to go join up. Then you'll need to get an access token if you want to use any of the supersized models from Meta or any other company that kind of hides them behind a business source license. They're not really open source but they are sharing all the weights and all the data but you just can't use them for commercial purposes if you get really big to compete with them. But anyway, if you need a token you'll need to get that token from your profile on a hugging face. You can put that token in a .nv file. That works with a lot of Python library called .nv and that's what you use to load environment variables and so if you put it in a .nv file it will combine those together when you load it with your existing environment variables. So quick tip you definitely want to use once you see import.nv and then you say .nv.load ENV but you don't want to then say .nv.nv values because that will load the dictionary of only your .nv, variable and value pairs and so you want the mapping of all of your environment variables typically when you're running a server because there will be things like your Python path with Python version, that kind of stuff that you'll probably need to use if you're building a real web server. So we ran into that problem when we were trying to configure our GitLab CIC ePipe line and then we had to hit that problem again and we went over to Render to deploy our chatbot software at carry.ai QAR Y.ai. So once you've got your token loaded you can then say so you import you've got your .nv.loaded into the .nv package but now you've got to import os and say dict.dict in the os.environ. So you're going to convert the os.environ which is a dict like object you want to grab a copy of it basically and convert it coerce it into a dictionary. So dict open parenthesis os.environ. You should be familiar with that if you've ever worked with environment variables. The closure parenthesis and you've got that in a dictionary we call it ENV as a variable and then we can say ENV square bracket quote and then hugging face access token or whatever you call your variable in that .nv. Anyway it turns out we're going to show you how to do it for smaller models. We tried to do it for Lama 2 but that's a four gigabyte model and it takes a long time to download and really hard to do when you're on a conference call with somebody in Korea where Greg is located. So we are going to and so when you search for models it's really hard to find models on hugging face unfortunately because there's so many and people can describe them in a lot of different ways and so really hard to find what you're looking for. Don't ever hit enter after your search query. Instead go to their full text search that'll give you more of what you need or you can click on the like the C3358 model results for Lama 2 that's what we did in order to to find the one we were looking for that could do chat. But like I said we're going to skip that one and move on to a smaller one GPT2. Not actually it's not that much smaller it's just that I've already downloaded it downloaded offline several days ago. So if you've already downloaded it's already done this once this this process of downloading and creating a model if you've gone through this these steps that we're describing here then you won't have to do it again and wait for the download and so anyway so we're going to use one that I've already done this for online. If you do need that license you'll need to apply for that license from meta meta.com if you're trying to use Lama 2 and you can go that's under slash resources models and libraries Lama downloads. Anyway the show notes will tell you how to do that but if you just want to use GPT2 you don't need to do that because that's two generations back on what OpenAI is building so which is that they're up to GPT4 and they're already working on GPT5. Let's see so now you can we're going to use instead of the auto model that a lot of people use we're going to use the transformers pipeline object from hugging face so the pipeline will include the model and the tokenizer and be able to help you to do enter inference you won't be able to retrain or fine tune the model but at least you can get it to generate some text. So you say from transformers import pipeline and then you say generator equals pipeline open parentheses text generation and you need to give it the model name with the key model so you say comma model equals open AI dash GPT that's open AI dash GPT all lowercase no spaces just that hyphen in the middle between those two words and then you can ask it a question this is a generative model so it's going to only try to complete your sentence it's not going to try to carry on a conversation with you so if you need to create if you're trying to ask a question you probably want to proceed it with the prompt question colon and then ask your question and then probably a new line after your question and then answer colon and then that should give it the hint it needs to try to answer her question another way you can do it is if you're just asking a math question you can just put an equal sign at the end and it'll try to complete the equation so we're going to try to see if GPT 2 can do any kind of math or because large language models are famous for not being able to do math or common sense reasoning which is kind of surprising since they are since computers can do math quite well and they certainly do logic very well as well but large language models are just trying to predict the next word and so you'll see how this one balls on its face when you ask it to do one plus one so if you put in your question a string just just the three characters one plus one and then a fourth character equal sign and put that in quotes then you can you can then do your your generator on that question you put the equal sign at the end and it's sort of like the question mark to a machine and so our least a generative large language model that's just going to try to complete that formula so so then you're going to say your responses equals generator open parenthesis oh yeah I've already said generator equals pipeline so you already got your generator so you're just going to use that function generator open parenthesis and you give it your your string those five characters you just entered or four characters one plus one equals and then and then it will return a bunch of answers you could probably you can set a max length you want it to be bigger than the number of tokens you input and because each one of these characters is an individual token it represents a piece of meaning in that phrase then you're going to have four tokens so you need to give it at least five on your max length parameters you going to say max underscore length equals five or six or seven that'll be it'll just generate enough tokens to to end at that number that you give it there and this is for GPT2 in generative mode and then for the num return sequences you can give it another parameter if you'd like for the number of guesses you would like it to take so the number of times you wanted to try to generate an answer to that question so we gave it the number 10 just to see if it would have any chance of answering the question and when we did that so close your parenthesis now after num return sequences those have underscores between those three words and max length also has an underscore in between those two words and those those are keyword arguments to the generator function and your question is the positional argument at the beginning and then you're good to go with your answers equals that or responses equals that and and so then you can just print out all those responses if you'd like we got so the responses will include both your question and the answer so in our case we got the very first response that we got or generated text we got was one plus one equals two space two space plus so it's going on if you if you if you gave it two extra tokens to go it would have said it would keep going if you get more than two extra tokens so let's see one two three four five if you give it six tokens it would stop at two plus you gave it more than that then it's going to keep going and it's going to say one plus one equals two plus five equals one plus and it keeps going on so it's just trying to complete some sort of equation or system of equations third down the list though we do see an answer that looks a lot closer we see one plus one equals two comma and then it says space equals one and space equals two so it does continue on beyond what looks like an answer and many of the other answers are not even close there's a one plus one equals six times the speed of sound and one plus one equals one comma so out of the 10 answers it got one out of tens that'll be a 10% doing its exam and you can't really even count the one that it got right as a right answer because you'd have to be picking choose some of the tokens that are generated to to assume you just make it stop after the first token basically to get a good answer out of it trying a more complicated question where we use that sort of prompting approach where you say answer colon and question equal colon and answer colon and we put questions like in the book natural language processing and action we put the question about cows so if you've got there are two cows and two bulls how many legs are there that was our questions so we put that after the question prompt and then we had answer colon and then we gave it I think we gave it 30 tokens or so or as our max length we gave it like 30 so that it could answer that question because there's 25 tokens in there if you look really closely count up all those words and punctuation marks in there you could probably see that it's going to end up with when you include the question and answer prompts it's going to end up being 25 tokens it'll give you that estimate if you give the number two low as a warning saying hey better give me some more tokens that I can't generate what you need but the answer is that we did come up with with that question about cows was we only gave it actually it did a really good job let's see did I tell it to stop well looks like the question and answer prompting gave it a better job when I limited the max length um that this was when I actually let's set it to be smaller than the correct amount so once I got it once I set it smaller to the actual question it only got one out of the 10 right I actually got none of them right because all the numbers it was numbers like four and only a word like only and four and then the digit two and then the number the word one then the digits 30 3 0 and so on didn't do very well when I when I under estimated the number of tokens and then when I gave it more tokens than it needed it gave answers like four f o u r dot and then I carried your turn and then quote let me see if I have this straight so it's going to ask me a question it looks like after giving me the answer for for two cows and two bulls so it's so it doesn't know that it lags or what I'm talking about and not cows male and female because that's what it's counting out when it gets the answer for the second most likely answer was only three and three cows and two bulls are bigger than that answer three is kind of interesting because there are a lot of trick questions that people have been asking chat GPT and that have been including in the training sets that are trying to trick for logic where you've removed the legs of a couple of the cows or bulls and the question so some of them will only have three legs so that three might be wise showing up so high on that list because it's gotten it's memorized some text problems that are trying to fool it but anyway some many of the other answers are all the other answers are incorrect there's a two comma two comma one comma there's one per cow there's a 30 dot there's a one dot that's interesting that that number 30 keeps coming up and one dot and three dot so those are those are periods looking looks like at the end of a sentence so it thinks it's giving me the full answer on some of those and one of them says something like three that has the word three period and then they need to be introduced to the cow population before I wish I'd let that one go on a little bit further anyway you can have some fun playing with larger language models on hugging face they're not going to give you much use unless you get to really good job of prompt engineering and perhaps train them on your kind of problem that you need to solve and that's the kind of thing we're doing over on the carry project an open source project to build a chatbot that you can trust and has a lot of rule-based has a rule-based approach to managing the conversation rather than purely generative so you can keep it grounded in reality anyway hope you I've enjoyed this little my first ever hacker public radio podcast and I hope you have two and Greg do you have any questions or thoughts we spent a lot of time looking at all the different models so it's worth exploring all the different sizes tiny to big and seeing which ones work for your use case indeed yeah that's a really good point we had trouble finding one that was small enough for us to do live on this pair programming that we're working on but so you can and you this was one model out of many many many thousands that you can choose from so have fun searching around on hugging face and find yourself a model you have been listening to hacker public radio at hacker public radio does work today's show was contributed by a hbr listener like yourself if you ever thought of recording podcast and click on our contribute link to find out how easy it means hosting for hbr has been kindly provided by and on host.com the internet archive and our sings.net on the satellite status today's show is released on their creative commons attribution 4.0 international license