Files
hpr-knowledge-base/hpr_transcripts/hpr4376.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

140 lines
12 KiB
Plaintext

Episode: 4376
Title: HPR4376: Re-research
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4376/hpr4376.mp3
Transcribed: 2025-10-25 23:55:08
---
This is Hacker Public Radio Episode 4376 for Monday 12 May 2025.
Today's show is entitled Rear Research.
It is hosted by Lee and is about 12 minutes long.
It carries an explicit flag.
The summary is, Lee talks about trying to do academic research.
Hello, I'm Lee.
Well it should be revising from my imminent foundation amateur radio exam.
Instead I'm going to take a break from that and talk about trying and failing to do research
module and computing. It's been almost 30 years since my first attempt, when I was still
an undergraduate, so I'm getting quite good at being very bad at it.
My first ever research module was back in the hazy coat-top 90s, and yes we did a lot of Coca-cola
as well as Pepsi in those days. The project was about machine vision. A company I'd worked for
that summer wanted to use digital cameras to detect male sorting errors, like when envelopes
got double-fed into the sorting machine. That was what I'd based my project on.
And while the plan was to work with the company in question on this,
I was experiencing considerable day-to-day problems living in rented accommodation with people
whose lifestyle was a little different from what I was used to. This had brought my
continuations of student into question. To extend I move back to living with my parents,
stop going to the lectures and to not have the inclination to attend to supervision or correspond
with the company in question. So being officially still enrolled on my course and possessing both
intelligence and stupidity in equal measure, I tried to do the research entirely by myself,
which practically involved a few hours hacking each day with Borland C++.
And Borland had a graphics library called BGI, which did for Borland Graphics Interface,
that was great for putting pixels on the screen. Of course, you're limited to five-six colors,
and I was doing everything in grayscale. This was absorbing work when I did finally present it,
if nothing else, my supervisor was impressed by the core graphics. In terms of achieving the intended
goal, I got as far as filtering and applying a transform to detect edges irrespective of the
presented angle, with something called the half-transform, as HOUGH, and also being able to trace
around an outline of what was on screen. But when it came to applying machine learning to analyze
whether a misfeet had occurred, I realised this was a separate project, all of its own.
And so, fudge together some algorithm that was embarrassingly hacky and didn't work until
when presented with fresh data. Another issue was that all my photos of envelopes I turned to
negatives to make them easier to see, but the independent examiner of my presentation could not
make head or tail of them, and there was no copy of paint shop prohand in the lab to convert
these back into positives to reassure him they really were envelopes. Anyway, long story short,
I did not get many marks for this, and was really lucky to come away from the course with any
qualification at all. That sort of research I think would be counted as applied research,
so you're actually producing something rather than just talking about it. The more recent type of
research I tried to do was more trying to find out some things that would be useful to people
working in the field, but not producing a computer product or anything like that. I wanted to know
about different types of database, US girl ones and the NOIS girl ones, and whether it was common
practices to successfully apply NOIS girl ones, particularly the graph ones, to problems other than
the much type use cases such as big data and machine learning. For example, doing something mundane
like writing a web application. So my study on this module started well, but again it has not
ended well, and it's a bit hard to say exactly where things went wrong, so before I ran more
bit about some of what's involved in studying and say where things stand now, I'll start with a
fairly opinionated rant about something I did not fully realise into it, hit me in the face like a
tent untruck. First, academic people don't look kindly on someone just coming up with a theory
or question and then exploring it and gathering evidence about it. To them, it's worthless to do
that. Instead, you have to link what you're doing to what some other people have already done
and proven or investigated. And this is what I didn't get. It's literally worthless to them.
In fact, they will not even consider what you're doing to be research. They'll just
regard it as playing with yourself. So you have to read what other people have discovered or found
out, and by other people I mean other researchers who have followed a proper and rigorous process
then publish their findings in books and journals and research papers. Then you have to attack all
their methods and their findings until you can show there is a gap in what is currently understood
where your work will fit in. That and only that gives you the right to then conduct what will be
rightly called research of your own. And yes, what I'm saying is somewhat of a rant, so please take
it with a pinch of salt. Of course, I was generally aware of all this going into the module,
but how optimistic someone should be that their idea will find a firm foundation in what it's
mysteriously called the literature is a matter of judgment and probably experienced too.
It seems less of a risk in hindsight to start off with that foundation than to try to fit it
retrospectively. I love a problem which is increasingly faced by researchers is that
generative AI has become good. I mean really good. I defy anyone to load up some papers into Google
notebook LM for the first time, listen to the very human-sounding audio discussion of those papers
generated on the fly and not be gobsmacked how this could be anything other than the recorded
conversation between two people who have spent a long time reviewing the literature and are now
really discussing it. It's like the first time someone watched moving pictures and had to be
convinced it was not a real horse and can't move in front of them. Is that good?
Anyone who's been studying at this level for some time will hopefully learn to avoid play
durizing the work of other authors. But when a generative AI is echoing back to you in a totally
unique way your own thoughts put eloquently and seamlessly blended with a wider body of knowledge,
it's a whole other skill altogether not to get confused what is your own work and what is not.
This leads me to a topic I've already touched on that is what we call academic literature.
There is lots of it out there hundreds and thousands of papers and articles on every topic of
knowledge imaginable. Some standalone others put into journals or presented in the proceedings of
conferences. If you try and actually read academic papers though you'll soon discover a substantial
of them are behind one paywall or another. So you can only get those if you're registered with
a university or pay some company who offers access to these as a subscription or one of payment.
On the other hand some papers are published openly and freely downloadable from certain
organisations. I was mentioning plagiarism a moment ago which is basically passing off someone
else's work as your own. So one of the main ways of avoiding that is to explicitly cite the
source of any material you have used in anything you write. Now cetation is like a link. It takes
the reader to where they can access another document. cetation has a short form that goes directly
in your text such as open brackets backstaircoma 2012 close brackets which is the name of the author
in the year it was published in. Then a long form that sits in your references at the end of the
document which includes when and where it was published the full title and other metadata or for
example if it came from the web then hyperlink and the day accessed. And the precise format
expected for this reference varies from one university to another but fairly common format is
the Harvard one and that's what I've become used to using. Now there are types of software that
are designed specifically for collecting references like cetero and some students and researchers
would swear by them. I've been tending to just use Google Scholar which is fairly good for looking
up a paper by title or author and gives the reference in a number of formats. You can normally
follow a link to get an abstract which is a few paragraphs telling anyone what the paper is about
what they did what their findings were. I would file these abstracts in a notes app called
Joplin. As I mentioned to get the PDF you'd normally go to universities online library which should
grown access to whatever third-party service is providing the full text. As well as Google Scholar
there are tools like connected papers that help with searching for papers and seeing how they
link to each other so which one references which other one and how often the particular one is
cited by others. So you can get a feel of which ones are canonical to the subject in question
which ones are just well cited which ones stand pretty much on their own. There's no judgment
implied discussing the number of citations of a paper just because a paper is not cited does not
mean it has not made a valuable contribution might just be on a fairly niche topic. In terms of
reading these papers in the days before I got so lazy that I now get the AI to read them for me
and tell me what I want to know. I used to lay the document on one of the larger kinds of iPad
and then use a stylus to highlight keywords or passages in either yellow, green, pink or blue
depending on some arbitrary categorisation I'd reinvent each time. My favourite iOS app for this
was called Good Notes and it's funny to talk about using an iPad as old fashioned. In the old
old days you would go searching for some section of the university library to find out what you
needed to read. Not that I ever really did that in anger. Even in those days well I would but
just to serve my own curiosity rather than for study purposes. Anyway so back to my present research
or inability to do research. Well having immersed myself in my chosen topic for several months
and finally getting some insightful feedback from a tutor it seems that I'd missed the key part
of the puzzle. It's not so much what choices are out there for databases but the real question is
how and why people particularly software engineers make these choices in the first place. This is the
problem of problems. You think you know what the problem is and it turns out to be something else
entirely. So I've run a ground and since this study module was time limited it looks like I don't
have time to rework what I started so I've given it up at least for now. And it's a shame because
I was looking forward to scaring Stack Overflow and Reddy and GitHub for evidence that I could then
analyse and write a report about. But perhaps that kind of thing is best left for large language
models to do nowadays anyway. Well thank goodness academic success is not the only value I should
were studying at this level. There is a wider community around this stuff and it does tap into
opportunities to broaden horizons. I admit that coming away without qualification yet makes taking
a postgraduate module rather an expensive way to sit down and read a few books. I thought I might
have learned something about how to get things right in three decades but apparently I'm still
learning. So to conclude today rather than suggesting you might copy my example I suggest you learn
from some of my mistakes and do it better than I did. Bye for now.
You have been listening to Hacker Public Radio at Hacker Public Radio does work. Today's show was
contributed by a HBR listener like yourself. If you ever thought of recording podcasts and click
on our contribute link to find out how easy it really is. Hosting for HBR has been kindly provided
by an honesthost.com, the internet archive and our syncs.net. On the Sadois status today's show
is released under Creative Commons Attribution 4.0 International License.