140 lines
12 KiB
Plaintext
140 lines
12 KiB
Plaintext
|
|
Episode: 4376
|
||
|
|
Title: HPR4376: Re-research
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4376/hpr4376.mp3
|
||
|
|
Transcribed: 2025-10-25 23:55:08
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is Hacker Public Radio Episode 4376 for Monday 12 May 2025.
|
||
|
|
Today's show is entitled Rear Research.
|
||
|
|
It is hosted by Lee and is about 12 minutes long.
|
||
|
|
It carries an explicit flag.
|
||
|
|
The summary is, Lee talks about trying to do academic research.
|
||
|
|
Hello, I'm Lee.
|
||
|
|
Well it should be revising from my imminent foundation amateur radio exam.
|
||
|
|
Instead I'm going to take a break from that and talk about trying and failing to do research
|
||
|
|
module and computing. It's been almost 30 years since my first attempt, when I was still
|
||
|
|
an undergraduate, so I'm getting quite good at being very bad at it.
|
||
|
|
My first ever research module was back in the hazy coat-top 90s, and yes we did a lot of Coca-cola
|
||
|
|
as well as Pepsi in those days. The project was about machine vision. A company I'd worked for
|
||
|
|
that summer wanted to use digital cameras to detect male sorting errors, like when envelopes
|
||
|
|
got double-fed into the sorting machine. That was what I'd based my project on.
|
||
|
|
And while the plan was to work with the company in question on this,
|
||
|
|
I was experiencing considerable day-to-day problems living in rented accommodation with people
|
||
|
|
whose lifestyle was a little different from what I was used to. This had brought my
|
||
|
|
continuations of student into question. To extend I move back to living with my parents,
|
||
|
|
stop going to the lectures and to not have the inclination to attend to supervision or correspond
|
||
|
|
with the company in question. So being officially still enrolled on my course and possessing both
|
||
|
|
intelligence and stupidity in equal measure, I tried to do the research entirely by myself,
|
||
|
|
which practically involved a few hours hacking each day with Borland C++.
|
||
|
|
And Borland had a graphics library called BGI, which did for Borland Graphics Interface,
|
||
|
|
that was great for putting pixels on the screen. Of course, you're limited to five-six colors,
|
||
|
|
and I was doing everything in grayscale. This was absorbing work when I did finally present it,
|
||
|
|
if nothing else, my supervisor was impressed by the core graphics. In terms of achieving the intended
|
||
|
|
goal, I got as far as filtering and applying a transform to detect edges irrespective of the
|
||
|
|
presented angle, with something called the half-transform, as HOUGH, and also being able to trace
|
||
|
|
around an outline of what was on screen. But when it came to applying machine learning to analyze
|
||
|
|
whether a misfeet had occurred, I realised this was a separate project, all of its own.
|
||
|
|
And so, fudge together some algorithm that was embarrassingly hacky and didn't work until
|
||
|
|
when presented with fresh data. Another issue was that all my photos of envelopes I turned to
|
||
|
|
negatives to make them easier to see, but the independent examiner of my presentation could not
|
||
|
|
make head or tail of them, and there was no copy of paint shop prohand in the lab to convert
|
||
|
|
these back into positives to reassure him they really were envelopes. Anyway, long story short,
|
||
|
|
I did not get many marks for this, and was really lucky to come away from the course with any
|
||
|
|
qualification at all. That sort of research I think would be counted as applied research,
|
||
|
|
so you're actually producing something rather than just talking about it. The more recent type of
|
||
|
|
research I tried to do was more trying to find out some things that would be useful to people
|
||
|
|
working in the field, but not producing a computer product or anything like that. I wanted to know
|
||
|
|
about different types of database, US girl ones and the NOIS girl ones, and whether it was common
|
||
|
|
practices to successfully apply NOIS girl ones, particularly the graph ones, to problems other than
|
||
|
|
the much type use cases such as big data and machine learning. For example, doing something mundane
|
||
|
|
like writing a web application. So my study on this module started well, but again it has not
|
||
|
|
ended well, and it's a bit hard to say exactly where things went wrong, so before I ran more
|
||
|
|
bit about some of what's involved in studying and say where things stand now, I'll start with a
|
||
|
|
fairly opinionated rant about something I did not fully realise into it, hit me in the face like a
|
||
|
|
tent untruck. First, academic people don't look kindly on someone just coming up with a theory
|
||
|
|
or question and then exploring it and gathering evidence about it. To them, it's worthless to do
|
||
|
|
that. Instead, you have to link what you're doing to what some other people have already done
|
||
|
|
and proven or investigated. And this is what I didn't get. It's literally worthless to them.
|
||
|
|
In fact, they will not even consider what you're doing to be research. They'll just
|
||
|
|
regard it as playing with yourself. So you have to read what other people have discovered or found
|
||
|
|
out, and by other people I mean other researchers who have followed a proper and rigorous process
|
||
|
|
then publish their findings in books and journals and research papers. Then you have to attack all
|
||
|
|
their methods and their findings until you can show there is a gap in what is currently understood
|
||
|
|
where your work will fit in. That and only that gives you the right to then conduct what will be
|
||
|
|
rightly called research of your own. And yes, what I'm saying is somewhat of a rant, so please take
|
||
|
|
it with a pinch of salt. Of course, I was generally aware of all this going into the module,
|
||
|
|
but how optimistic someone should be that their idea will find a firm foundation in what it's
|
||
|
|
mysteriously called the literature is a matter of judgment and probably experienced too.
|
||
|
|
It seems less of a risk in hindsight to start off with that foundation than to try to fit it
|
||
|
|
retrospectively. I love a problem which is increasingly faced by researchers is that
|
||
|
|
generative AI has become good. I mean really good. I defy anyone to load up some papers into Google
|
||
|
|
notebook LM for the first time, listen to the very human-sounding audio discussion of those papers
|
||
|
|
generated on the fly and not be gobsmacked how this could be anything other than the recorded
|
||
|
|
conversation between two people who have spent a long time reviewing the literature and are now
|
||
|
|
really discussing it. It's like the first time someone watched moving pictures and had to be
|
||
|
|
convinced it was not a real horse and can't move in front of them. Is that good?
|
||
|
|
Anyone who's been studying at this level for some time will hopefully learn to avoid play
|
||
|
|
durizing the work of other authors. But when a generative AI is echoing back to you in a totally
|
||
|
|
unique way your own thoughts put eloquently and seamlessly blended with a wider body of knowledge,
|
||
|
|
it's a whole other skill altogether not to get confused what is your own work and what is not.
|
||
|
|
This leads me to a topic I've already touched on that is what we call academic literature.
|
||
|
|
There is lots of it out there hundreds and thousands of papers and articles on every topic of
|
||
|
|
knowledge imaginable. Some standalone others put into journals or presented in the proceedings of
|
||
|
|
conferences. If you try and actually read academic papers though you'll soon discover a substantial
|
||
|
|
of them are behind one paywall or another. So you can only get those if you're registered with
|
||
|
|
a university or pay some company who offers access to these as a subscription or one of payment.
|
||
|
|
On the other hand some papers are published openly and freely downloadable from certain
|
||
|
|
organisations. I was mentioning plagiarism a moment ago which is basically passing off someone
|
||
|
|
else's work as your own. So one of the main ways of avoiding that is to explicitly cite the
|
||
|
|
source of any material you have used in anything you write. Now cetation is like a link. It takes
|
||
|
|
the reader to where they can access another document. cetation has a short form that goes directly
|
||
|
|
in your text such as open brackets backstaircoma 2012 close brackets which is the name of the author
|
||
|
|
in the year it was published in. Then a long form that sits in your references at the end of the
|
||
|
|
document which includes when and where it was published the full title and other metadata or for
|
||
|
|
example if it came from the web then hyperlink and the day accessed. And the precise format
|
||
|
|
expected for this reference varies from one university to another but fairly common format is
|
||
|
|
the Harvard one and that's what I've become used to using. Now there are types of software that
|
||
|
|
are designed specifically for collecting references like cetero and some students and researchers
|
||
|
|
would swear by them. I've been tending to just use Google Scholar which is fairly good for looking
|
||
|
|
up a paper by title or author and gives the reference in a number of formats. You can normally
|
||
|
|
follow a link to get an abstract which is a few paragraphs telling anyone what the paper is about
|
||
|
|
what they did what their findings were. I would file these abstracts in a notes app called
|
||
|
|
Joplin. As I mentioned to get the PDF you'd normally go to universities online library which should
|
||
|
|
grown access to whatever third-party service is providing the full text. As well as Google Scholar
|
||
|
|
there are tools like connected papers that help with searching for papers and seeing how they
|
||
|
|
link to each other so which one references which other one and how often the particular one is
|
||
|
|
cited by others. So you can get a feel of which ones are canonical to the subject in question
|
||
|
|
which ones are just well cited which ones stand pretty much on their own. There's no judgment
|
||
|
|
implied discussing the number of citations of a paper just because a paper is not cited does not
|
||
|
|
mean it has not made a valuable contribution might just be on a fairly niche topic. In terms of
|
||
|
|
reading these papers in the days before I got so lazy that I now get the AI to read them for me
|
||
|
|
and tell me what I want to know. I used to lay the document on one of the larger kinds of iPad
|
||
|
|
and then use a stylus to highlight keywords or passages in either yellow, green, pink or blue
|
||
|
|
depending on some arbitrary categorisation I'd reinvent each time. My favourite iOS app for this
|
||
|
|
was called Good Notes and it's funny to talk about using an iPad as old fashioned. In the old
|
||
|
|
old days you would go searching for some section of the university library to find out what you
|
||
|
|
needed to read. Not that I ever really did that in anger. Even in those days well I would but
|
||
|
|
just to serve my own curiosity rather than for study purposes. Anyway so back to my present research
|
||
|
|
or inability to do research. Well having immersed myself in my chosen topic for several months
|
||
|
|
and finally getting some insightful feedback from a tutor it seems that I'd missed the key part
|
||
|
|
of the puzzle. It's not so much what choices are out there for databases but the real question is
|
||
|
|
how and why people particularly software engineers make these choices in the first place. This is the
|
||
|
|
problem of problems. You think you know what the problem is and it turns out to be something else
|
||
|
|
entirely. So I've run a ground and since this study module was time limited it looks like I don't
|
||
|
|
have time to rework what I started so I've given it up at least for now. And it's a shame because
|
||
|
|
I was looking forward to scaring Stack Overflow and Reddy and GitHub for evidence that I could then
|
||
|
|
analyse and write a report about. But perhaps that kind of thing is best left for large language
|
||
|
|
models to do nowadays anyway. Well thank goodness academic success is not the only value I should
|
||
|
|
were studying at this level. There is a wider community around this stuff and it does tap into
|
||
|
|
opportunities to broaden horizons. I admit that coming away without qualification yet makes taking
|
||
|
|
a postgraduate module rather an expensive way to sit down and read a few books. I thought I might
|
||
|
|
have learned something about how to get things right in three decades but apparently I'm still
|
||
|
|
learning. So to conclude today rather than suggesting you might copy my example I suggest you learn
|
||
|
|
from some of my mistakes and do it better than I did. Bye for now.
|
||
|
|
You have been listening to Hacker Public Radio at Hacker Public Radio does work. Today's show was
|
||
|
|
contributed by a HBR listener like yourself. If you ever thought of recording podcasts and click
|
||
|
|
on our contribute link to find out how easy it really is. Hosting for HBR has been kindly provided
|
||
|
|
by an honesthost.com, the internet archive and our syncs.net. On the Sadois status today's show
|
||
|
|
is released under Creative Commons Attribution 4.0 International License.
|