- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
247 lines
16 KiB
Plaintext
247 lines
16 KiB
Plaintext
Episode: 139
|
|
Title: HPR0139: Compiling a Kernel over the Nework with distcc
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0139/hpr0139.mp3
|
|
Transcribed: 2025-10-07 12:17:59
|
|
|
|
---
|
|
|
|
Alright.
|
|
Packer's over great.
|
|
I'm quite too talking to you about kernel compiling over a distributed network with
|
|
a program called DISTCC.
|
|
Now this is a third part in a kernel compilation series.
|
|
The first one we covered had to compile the kernel, which I don't even remember how
|
|
to do it anymore.
|
|
The second one we talked about had a patch a kernel, which no one does anyway.
|
|
But in this one, we're going to be talking about something that is infinitely useful,
|
|
I think, than anything, because it's not just compiling a kernel over DISTCC, but compiling
|
|
anything over DISTCC.
|
|
So DISTCC, that's DISTCC, it's distributed compiler.
|
|
DISTCC is basically, you could think of it as a front end for GCC, GCC being the compiler
|
|
that we all know and love, and DISTCC will work if you're compiling code that was written
|
|
in C, C++, Objective C, Objective C++.
|
|
So basically, if you think about it, if you set up DISTCC on your internal network,
|
|
it can become the default compiler for pretty much anything you're going to be compiling.
|
|
So if you're someone who finds yourself compiling a lot of software from source for whatever
|
|
reason, whether you're using Gen2 or whether you're using Slackware or whether you just
|
|
like to compile something from source, because you want it to be configured exactly for
|
|
your system, or maybe you just can't find it in your repo.
|
|
If you're finding yourself doing that a lot, you're going to find that just having, you
|
|
know, if you've got a lot of computers on your internal network anyway, they're all on
|
|
anyway.
|
|
Why not put them all into a DISTCC kind of set up so that you can compile things, it'll
|
|
increase compile times substantially.
|
|
And I used to think that maybe the benefit wasn't really that big of a deal, because,
|
|
you know, you have to think, well, it's compiling over a network.
|
|
And so you kind of think to yourself, well, the time that it takes for all that data to
|
|
get over your network is basically time just slowing things down, right?
|
|
It's just why not just write on your local host, on your, you know, on your one computer.
|
|
The data doesn't have to travel back and forth between computers, and surely it must be
|
|
approximately the same kind of deal.
|
|
But I can tell you for sure, simply because as part of the project I was working on, I
|
|
was monitoring, it was part of my job to monitor the network traffic, the internal network
|
|
traffic during compilation, some rendering, some video stuff.
|
|
And for sure, the network will absolutely fill itself up to maximum capacity if you're
|
|
doing things like this distributed workloads.
|
|
It's just, I mean, you might be monitoring your network while you're streaming something
|
|
from YouTube or something, and you're probably only seeing, you know, 25% of the load being
|
|
reached, not a big deal.
|
|
And I think that's probably happened most.
|
|
But if you do a net stat, like net-space-i for your interface, so in my case it would be
|
|
L-A-N-0 for you, it might be E-C-0, whatever, space-D, space-8, that's the delay in seconds,
|
|
that this is going to refresh, space-C for a continuous net stat.
|
|
That will show you your traffic workload.
|
|
And if you watch that while you're, for instance, pinging some website, you'll see little
|
|
tiny little changes.
|
|
If you watch it while you're streaming video from someplace, you'll see a little bit of
|
|
a workload.
|
|
If you monitor that, if you're doing a DCC, or some kind of clustered, and like a Beowulf
|
|
cluster, like in a deep geek episode where he was doing a Beowulf cluster to convert video,
|
|
you will absolutely see your internal network 98, 97% of its capacity.
|
|
So on a 100-megabit network, that's not too shabby, and that doesn't cancel out the
|
|
benefit of distributing the compilation, that's actually worth it.
|
|
So you'll be amazed, I think.
|
|
Okay, so now that I've convinced you that you've got to do this, let's set it up.
|
|
So I think it's my impression that most distributions come with DCC, but if not, you can always install
|
|
it.
|
|
You're going to want to make sure that all your computers on your network are using the
|
|
same version of DCC and the same version of GCC.
|
|
If you're just going to use DCC once in a while, you could always just give it a
|
|
flag during compilation, so when you're compiling whatever you're about to compile, just
|
|
add at the end of the line of the make line, just add C-C, both capital C's, capital C
|
|
capital C equals distcc, all lowercase, d-i-s-tcc, and that will flip over and use distcc as
|
|
the compiler for that instance.
|
|
But I think more often than not, it's worth just having distcc as your default compiler.
|
|
Even if you're away from your network, it won't matter because your computer that you're
|
|
sitting at is going to be in the list of distcc computers, so it will only use your local
|
|
computer.
|
|
It's not like it's not going to work if you're not in your network around all the other
|
|
computers.
|
|
It's just not going to give you the benefit of having a distributed compilation process.
|
|
So let's assume we're going to set this up forever.
|
|
The way to do it would be to add a simlink of distcc in your user folder, so your till-day
|
|
slash bin directory.
|
|
So that's your local little binary directory, and you can add distcc and a simlink to gcc
|
|
and g++, and all that other good stuff within this little user slash bin directory.
|
|
And make sure that that's part of your path.
|
|
It should be, as far as I know, it usually is.
|
|
And then you also add it to your Shells RC file, so if you're using bash, it would be till
|
|
day slash dot bash RC.
|
|
And just make sure that the simlinks for those, you know, the user slash bin is in your
|
|
path.
|
|
And make sure that the distcc is defined as your first choice for a compiler.
|
|
So that would be cc equals distcc, right there in your dot bash RC file, just to make
|
|
sure that when you're compiling it defaults, it knows that the default compiler is distcc.
|
|
Okay?
|
|
So that's setting it up as a default compiler on the host computer or the master computer.
|
|
That is the computer you're sitting at doing all your work.
|
|
What you're going to want to do is also go around each computer on your network, all
|
|
the little client computers or the slaves, and you're going to want to set that up.
|
|
You're going to set up a distcc daemon to run on those computers because your local host,
|
|
your master computer is going to need to call out to these computers.
|
|
They need to have a distcc daemon running to start the daemon on the machines.
|
|
You can set it up to start automatically on boot time, which would be fine.
|
|
It needs to be, as far as I know, started as root, but you can then use it as any user.
|
|
So you can start it as root, for instance, on boot at boot time, but then you go in and
|
|
you can say, okay, so distcc daemon, space, dash, user, space, clat2, space, dash, allow,
|
|
space, 192.168.x.x.
|
|
So you can limit it to whatever master computer IP address you're going to allow to use this
|
|
daemon.
|
|
You can also set that to be a range.
|
|
So if you wanted to say 192.168.x.0 slash 32, I don't know, whatever range of IP addresses
|
|
you want to allow, I usually just limit it to one computer.
|
|
I guess it depends on your workflow.
|
|
If you're compiling a lot of different machines, I guess it might be helpful to have those
|
|
open up to a whole range of computers.
|
|
But I think it's easiest to go ahead and have it start up at boot time, and you can do
|
|
that with just whatever distribution you're using.
|
|
There's usually some either a service manager in the GUI to start and stop services at boot
|
|
or you can go into the INIT folders or the RC folders, whatever to start.
|
|
The INIT services upon boot time, and for a lot of good information on that kind of thing,
|
|
you can listen to episode, I think it's like 110 or 112 or 114, something like that.
|
|
That Dan Washcoke did on that very subject of how to, you know, the INIT process, the boot,
|
|
the boot process, and how things are started and when they're started during the boot process.
|
|
So listen to that because he gives you a lot of great information, just depending on whichever
|
|
distribution you're using.
|
|
Okay, so now DCC should be compiled, I mean, installed and running on all your little
|
|
slave computers, and you've got it as the default compiler on your master computer.
|
|
So now on your master computer, your local host, you're going to want to make it aware
|
|
of all the IP addresses that it is able to use.
|
|
And I should mention, you don't have to switch DCC over to a specific user.
|
|
You can just keep it running as root.
|
|
Like if it starts up at boot time, I think, I know it starts up as root, I think it switches
|
|
over to a DCC user on its own because it doesn't want to occupy the user ID of the root
|
|
user.
|
|
So I'm pretty sure it switches over anyway.
|
|
It's just that if you want, it's specifically to be running as a different user, you have
|
|
that option as well.
|
|
But otherwise, all you basically need to do is install the DCC Damon, or rather have
|
|
that up and running on all those computers one way or another.
|
|
And so they're set to go.
|
|
I just have mine set to come on at boot time so that I don't have to think about it.
|
|
Whenever I do a compilation, it's just kicking in.
|
|
It's just doing the compilation over the network per whatever's available.
|
|
Okay.
|
|
So now on your master computer, to make it aware of the IP addresses on your network, you're
|
|
going to want to add either the host names, or the IP addresses, to tilldayslash.discccslashhost.
|
|
So just do an LS-a in your home directory, and you'll find a .distcc directory.
|
|
And in there, there is a host file, and you're going to want to list all the host names
|
|
or the IP addresses in the order of the priority.
|
|
The priority being the more powerful computers should come at the top.
|
|
So if you've got 10 computers on your network, and two of those are really super powerful,
|
|
dual core, multiple chip computers, you want those at the top of the host list.
|
|
And then if you've got computers that are really fairly slow, you can put them towards
|
|
the bottom.
|
|
And the reason that you want to do in order of the priority is that your local computer
|
|
doesn't really have any way of knowing which is the most powerful computer.
|
|
So it's going to divvy out the jobs according to whatever you define it to do.
|
|
It's going to give the bulk of the jobs to the top listing, and then down as the workload
|
|
needs to be distributed.
|
|
So you want to make sure that you're using the more powerful ones at the top.
|
|
They also need to be the same architecture.
|
|
You're not going to be able to use PowerPC computer to help to pitch in, compiling something
|
|
on an x86 or an i386 computer.
|
|
So make sure that they're all the same architecture, and make sure that they're in the order of
|
|
the priority so that the more powerful ones will get the brunt of the workload.
|
|
Once you've got all that stuff added to the host file of the disccc folder in your home
|
|
directory, it's all set up.
|
|
So you've got disccc as your default compiler, you've got your slave computers running
|
|
a Damon of disccc, and you've got your master computer aware that those little slave computers
|
|
are out there with IP addresses defined in the host file.
|
|
And don't remove, unless you mean to, do not remove local host from the disccc list.
|
|
The only reason you'd want to do that is if you want the computer that you're working
|
|
at not to pitch in to the compilation process.
|
|
But otherwise leave that local host in there because you'll want that to help out on the
|
|
compilation process.
|
|
When you start compiling the code that you're going to compile, you're going to want to specify
|
|
how many jobs you want to create.
|
|
So instead of just saying, okay, compile this, make, you know, cc equals disccc.
|
|
You're going to want to tell the computer how many jobs it has to send out over the network.
|
|
The general rule of thumb seems to be the number of CPUs that exist on the network times
|
|
two and then maybe plus one per CPU.
|
|
So for instance, like if you've got two machines on your network and they both only have a single
|
|
core processor, you would use dash j for jobs four and then maybe add like one per processor
|
|
so it'd be six.
|
|
So dash j six for two computers with a single core processor each or you could say, like
|
|
if you have two machines that have dual core processor chips in them, then you could use,
|
|
you could say dash j eight and then plus one per processor so it'd be ten.
|
|
So dash j space ten and so on.
|
|
So that's the general rule of thumb.
|
|
You can give more or less just kind of depending on what you know about your computers.
|
|
For instance, if I had a couple of really slow processors on the network, I probably wouldn't
|
|
give them an extra job.
|
|
I would give them just, I would assign one job per processor because I don't think, and
|
|
I could be wrong, but it's not my impression that they could really handle an extra job.
|
|
They're slow processors.
|
|
They're like 400 megahertz.
|
|
It's not going to do you any good to give them an extra job.
|
|
But then again, a dual core machine, those are pretty powerful, you can, you can throw
|
|
it an extra job.
|
|
It can handle it.
|
|
Now there's also an argument that you could even go higher if the processors are actual
|
|
separate processors.
|
|
So like a machine where you've got multiple CPU chips in them because there is, I guess,
|
|
a school of thought that the single processor, the single core processors, multiple single
|
|
core processors are more efficient than, for instance, one multiple core processor.
|
|
And whether that's true or not, I'm not too sure I haven't, I'm not, I couldn't say
|
|
for sure, but I've definitely heard a lot of arguments that lean in that direction.
|
|
And as you do it more and more, you'll get how to get a feel for what your network or
|
|
what you're, you can play around different settings.
|
|
It also obviously depends on what else those computers are doing, you know, if they're
|
|
not just being dedicated to compiling your software or whatever, then quite possibly you
|
|
don't want to give them as many jobs as you would if you know that they're just going
|
|
to be sitting around doing nothing otherwise.
|
|
To monitor the compilation process, you've got a tool that should, should be installed
|
|
along with this CC, called discccmon.moin-text.
|
|
And this is just a little text tool that you can also use, you know, via SSA, if you're
|
|
not going to be at the, at the host computer, at the time of compilation, you can always
|
|
SSH into it and use this little application.
|
|
So it's discccmoin-text and then you enter the seconds that you want per update.
|
|
So if you want to update every, I don't know, 10 seconds, then it would be discccmon-text
|
|
space 10.
|
|
And that just shows you a list of the computers, the IP addresses that are compiling and
|
|
what the workload for each of those is, it just kind of gives you an update on the status
|
|
and how quickly it's going.
|
|
So that's a handy little monitoring tool.
|
|
Very simple, obviously it's got a pretty low overhead, it's just a little text, you
|
|
know, terminal console program, whatever.
|
|
You can also just, if you need to, you know, if you're doing some kind of super secret
|
|
software compilation and you're nervous about people, you know, monitoring the compilation
|
|
process on your network.
|
|
You can actually do this all via SSH.
|
|
I've never done it over SSH, rather than just entering the IP address of each computer,
|
|
the disccc host file, you would enter the IP address of each computer preceded by the
|
|
app symbol.
|
|
And that will tell it to run it via SSH.
|
|
You just need to let the host file know that you're going to be doing it SSH and then
|
|
you're going to need to start certain things up via SSH.
|
|
And obviously for best results, I mean, make sure that you've generated all your keys
|
|
and everything like that.
|
|
That's how to compile over distributed network.
|
|
Thank you for listening to Active Public Radio, HPR is sponsored by tarrow.net, so head
|
|
on over to C-A-R-O-J-E-C-R-L-B-T.
|
|
.
|
|
.
|
|
.
|
|
.
|