Files

296 lines
18 KiB
Plaintext
Raw Permalink Normal View History

Episode: 1164
Title: HPR1164: About git
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1164/hpr1164.mp3
Transcribed: 2025-10-17 20:51:01
---
Hello everybody, my name is Yolanda Ruta, I am from Belgium.
This is my first submission to Hacker Public Radio and I am going to talk about GIT,
a version control system.
I have used other version control systems in the past, CVS long ago and subversion, but
today GIT is by far my favorite and this is why.
You can commit new revisions when your internet connection is down.
You can easily prevent that just any developer can commit to the master branch or the
to release branches.
You can try out experimental things locally.
You can commit changes without having to create a branch in a central repository.
Creating feature branches and switching between branches is easy, even without a working
internet connection.
GIT is way more features than subversion or CVS and of course Linus says if you are using
subversion you are stupid and ugly, yet there are problems when moving from subversion
to GIT.
First, GIT works in a very different way than subversion, it requires some effort to
fully understand what it does and the easiest way to work with GIT in my opinion is with
a comment line.
For some users this is a drawback.
I moved a project from subversion and tracked to GIT and RedMine about half a year, something
more ago.
I could use an existing server with GIT, GIT to light and RedMine so I didn't have to
bother about setting those things up.
The conversion from subversion to GIT went pretty easy.
We started using GIT and it kind of worked, although we were not quite sure why it did.
I guess this is a common problem for new GIT users.
I aim to explain things that I wanted to know when starting with GIT.
Now we have some oddities in our repository which could probably be avoided if I only knew
what I was doing.
To this claimers, I only discussed the concepts of using GIT, you will not find specific
GIT commands here and this is intentional.
Possibly I will publish a more practical follow up later on and I make abstraction of
some of the technical stuff just to restrict the length of this introduction.
First thing, GIT is distributed.
Theoretically there is no central code repository and every developer has its own local copy
of the entire repository.
If you want a copy for yourself, you can just clone an existing one.
Your own repository typically contains references to remote repositories.
At the moment you clone a repository, a reference to the original is kept, which is usually called
origin.
A GIT repository is basically nothing more than a targeted, uh, cyclic graph of comits.
A commit represents a specific revision of your source code.
Each commit is determined by a SH-A1 hash, which is a unique checksum.
The hash is 40 characters long, which is tedious to type.
If there is no ambiguity, you can refer to a commit just using the first few characters
of the hash.
Usually 5 or 6 characters are sufficient.
Each commit, except for the initial one, has a reference to 1 or 2 parents.
If you pick a random commit, you can find its entire history following the parent links
to the very beginning.
In the simplest case, a GIT repository is just a sequence of comits, where each commit
has at most one parent and at most one child.
The history is so to speak one straight line.
But generally, it is not a case that all the revisions in a GIT repository nicely follow
one after the other.
In the case of branching, a commit typically has several children.
In a merge operation, a commit can have 2 parents, but more on that later.
While programming, there is always one commit checked out.
This commit is the head of your repository.
The current source code, called the working copy, corresponds to the code of head with
a certain modifications you made.
Models can be added, deleted or changed.
If you want to commit a new revision of your code, you need to inform GIT about the changes
in the working copy that should be included in a new commit.
GIT knows how your version of the code differs from the code in the last commit, but it
does not include by default all changes in a new commit.
If you want a change to be included in the next commit, you should explicitly add it
to the index.
This is called staging.
All stages changes will be part of the next commit.
Changes you did not stage, stay as they are in your working copy, but they are kept
out of the commit.
When a new revision of your code is committed, this commit becomes a new head of the repository.
GIT is distributed and typically the repositories talk to each other.
And when they do that, commits are moved from one repository to the order.
After some time you end up with a lot of commits and it becomes difficult to find your way.
Pranches are the answer to this problem.
You might know the concept of branches from other source control systems.
In the easiest case, your code history is one straight line, one commit after the order
from the initial commit to head.
This way there is only one branch in your source repository.
It is however possible that at some point, development is done in parallel.
After a certain commit, say, commit C, developer A can add new commits, a 1, a 2, a 3.
While developer B ignores these commits and adds other commits, B1, B2 and B3 after C.
And how there are two branches, in which the code is diverging.
These branches could or could not be merged again at some point in the future.
I will tell more about this later.
Technically, a GIT branch is nothing more than a pointer to a particular commit in the
repository, just like head is as a measure of fact.
A branch is pointing to its most recent commit.
If you take two random branches in your repository, you can always find a commit where they diverged.
You start from the commits, the branches are pointing to, and then keep following the
parent links.
At some point, you have to find a common ancestor, and this is the commit you are looking for.
Just as with any other version control system, there is typically one branch checked out.
This is the branch you are working on.
Head is pointing to the same commit as the checked out branch.
Am I committing a new revision?
Not only head moves to the new commit, the branch pointer will move along.
Adding new branches is very easy.
You just add a new pointer to the repository.
The name you give to branches can be anything.
You choose the name, but typically there is one branch called master.
Master is pointing to the main line, the most up to date development revision.
Branches in your own copy of the repository are called local branches.
GIT is also aware of branches in remote repositories.
Remote branches.
When you fetch a remote branch, GIT downloads all necessary commits to your repository, and
puts a pointer to the commit corresponding with remote branch.
You cannot directly add commits to a remote branch.
Typically, you first fetch the remote branch, you link it to a local branch, and you commit
the new revision to the local branch.
Such a local branch that is linked to a remote branch is called a remote tracking branch.
If you are working in a tracking branch, GIT knows where the original is.
GIT makes it easy to download the latest commits in the remote branch, and GIT will inform
you about the difference between the remote branch and your associated tracking branch.
For the rest, a tracking branch behaves just like an ordinary local branch.
If it is checked out and you create a new commit, the branch will move along with it.
Now let's talk about merging.
Suppose you have two branches, let's say A and B.
Those branches originate from a common ancestor commit C.
Merging branch B into branch A means incorporating into A all changes between the common ancestor
C and B.
In the simplest case, branch A itself is an ancestor of branch B.
So when working on branch A, you created a new branch B to which you added some commits.
The common ancestor is just the last commit of branch A.
So branch A has a last commit, you created a new branch B and you added new commits to
B. A stays where it is and the development continues in B.
Now when you merge branch B into the original branch A, GIT will just move the pointer A,
so that it points to the same commit as B.
This kind of merging is called a fast forward merge, which is an important concept in the
world of GIT.
A fast forward merge is a merge operation which comes down to moving the pointer of
the branch into whom you are merging.
Such a fast forward merge is not always possible.
If A and B diverge from their common ancestor C, so if there are commits added both to A and
to B, simply moving a pointer does not work.
In this case, when merging B into A, the changes between the common ancestor and branch B
are applied to branch A.
If you are lucky and this doesn't cause any trouble, GIT will create a new commit on A
containing the changes in branch B.
Now if both branch A and B modify the same part of your code, you cannot just apply the
changes from one branch to the other.
If this happens, GIT marks the conflict and does not commit a result of the merge operation.
You first have to resolve the conflict before you can commit.
So far about merging, integrating changes from one branch into the other in the same repository,
now we will consider push and pull operations.
This is about moving changes across different repositories.
So suppose you have checked out a remote tracking branch and you want to apply the latest
commits of the remote branch on the remote repository on your tracking branch and your
repository.
This is called a pull operation.
GIT fetches the current state of the remote branch together with all necessary commits
and merges it into your tracking branch.
As with any other merge, it could be that this causes conflict which you will have to resolve.
This will be the case if you committed local modifications which change the same part
of the code as the remote commits.
Conversely, you can also push the commits in your local branch to a branch in a remote
repository.
This can be either to a new remote branch as to an existing remote branch.
GIT will upload the most recent commit of the local branch together with all necessary
ancestor commits to link it to the existing remote commits.
This way, you create a new remote branch.
If there was no existing branch, you are done.
But if the remote branch you are pushing to already existed, the newly created branch
will be merged into the existing branch.
In most configurations, this only works if the merge operation is a fast forward merge,
which is the case if no commits were added to the remote branch after your latest pull.
If a fast forward merge is not possible, this is when remotely code has changed that
you locally changed as well, you will get an error message.
To resolve this kind of error, you will first have to fetch the remote branch, emerge
it locally with your local tracking branch, which is in fact the pull operation.
This operation results in a new local commit with the latest commit from the remote repository
as one of its parents.
So if you now push the branch again, it will be fast forward merged into the remote repository
without a problem.
When branches diverge, merging is one way to get them together again, we've seen this
before.
A typical use case of merging is the synchronization of the same branches in different repositories.
That's what we've told in the discussion about push and pull.
However, there is also another way to integrate changes from one branch into another.
This is rebasing.
Suppose you have two branches, let's say A and B with common ancestor C, rebasing B onto
A can be seen as taking branch B from the point where it diverged from A, tearing it off,
checking it away and reattaching it to the current commit of branch A.
More detail.
You create the branch B which diverged from branch A. New commits were added to B, but to
A as well.
When you rebase B onto A, get searches for the commit where the branches diverged, which
is C.
Now Git will iterate over the commits from C to B, determine the changes that have been
applied to the source code between each commit, and then starts a new branch on A, creating
similar commits on there by replaying the same changes.
It is possible that conflicts occur, in particular if the same code was changed in branches
A and B. If so, you will have to resolve this conflict before rebase process can continue.
When all commits from C to B are recreated on top of A, the new branch will take place
of the original B branch.
The overall result will be that changes which were developed in parallel on branches A
and B now appear to be serial changes A first then B.
One should be careful while rebasing.
You should only rebase branches that nobody else is supposed to be tracking.
Rebasing changes the history of a branch, so if a colleague wants to push or pull commits
to a branch you rebased, you probably end up into a lot of trouble.
There are many ways to organize your work with Git, at the moment I usually work as follows.
I have one master branch.
This branch contains the latest relevant code.
It may contain experimental features, but the idea is that the code in master compiles
and works.
Every time you want to implement a new feature, you create a feature branch.
In the feature branch you can commit non-functional code or even broken code.
This is not a problem, only the code in master is expected to work.
Now suppose you are working on a new feature and meanwhile a bug had been reported which urgently
needs a fix.
In that case you can rather easily switch back to master and create a new bug fix branch.
The changes you made in your whole finished feature branch will cause no troubles.
They are invisible in your bug fix branch.
When your bug fix is ready and nothing changes to master in the mean time, you can easily
fast forward merge the bug fix branch into the master branch.
After merging the bug fix branch is of no more interest.
You can remove the pointer.
When you check out your feature branch again and continue to work on the feature.
At some point hopefully your feature implementation is ready and has to be merged into master.
Now a fast forward merge is impossible because the bug fix you created has added new commits
in the master branch.
Now to avoid clutter in the history of your code it is useful to replace your feature
branch into the master branch before merging.
So it seems that the changes of your feature branch are applied after the bug fix and
not in parallel.
A feature branch is typically a branch on which you work alone and chances are high that
no one else is tracking it so rebasing is no problem.
Because of the rebase operation your feature seems to be completely developed after the
bug fix which results in a cleaner history of your project code.
If you had just merged your feature branch without rebasing you would end up with a commit
with two parents which would just make things more complicated than they should be.
Now with a new release of your project is approaching you typically create a release
branch from master.
There are probably a number of bugs that still needs to be fixed before release.
Meanwhile the normal development of the new features can continue in master.
Suppose you have a release critical bug to fix then you fix that bug in the release
branch.
However, you probably also want to apply the bug fix on the master branch.
At this point rebasing the release branch onto master is not an option because this would
make the new features you might have committed to the master branch part of the release
branch.
This is not what you want because these new features could be experimental or untested.
So in this particular case merging the release branch into the master branch is the way
to go.
After the merge operation you must not remove the release branch since you will need it
afterwards for other release critical bugs to be committed.
Then I still have one use case I want to discuss which is a major refactoring.
If you want to refactor your code in such a way that a lot has to be rewritten you also
create a branch.
This kind of refactoring usually takes some time and you typically want feedback from other
developers during the process.
If you are lucky other people are even willing to help you with refactoring.
So it is a good idea to make the refactoring branch publicly available.
Now suppose you want the new fixes from master to be incorporated in your refactoring
branch.
Re-basing your branch into master is usually not a good idea because our developers have
probably pulled it.
They might even be working on it.
So in this case merging the master branch into your feature branch will do.
That's all.
This is my modest introduction into Git.
I made abstraction of some details because I wanted to keep it relatively short and of
course also because there are still things I don't understand myself.
The workflow that I describe it here seems to work fine for me.
I'm not sure whether it's really the best practice.
So if you have any feedback I'm certainly interested.
You can find me on my website which is Johanv.org, j-o-h-a-n-v.org.
It's a Dutch site but you find a link contact and there you can find where you can contact
me.
I will also try to put online some show notes at Johanv.org slash note slash 200.
So thank you for listening everybody.
I hope this Git introduction was useful for at least some of you and thank you for
a public radio for hosting this podcast.
You have been listening to Hacker Public Radio at Hacker Public Radio.
We are a community podcast network that releases shows every weekday on day through Friday.
Today's show, like all our shows, was contributed by a HPR listener by yourself.
If you ever consider recording a podcast then visit our website to find out how easy
it really is.
Hacker Public Radio was founded by the Digital.Pound and the Infonomicom Computer Club.
HPR is funded by the binary revolution at binref.com, all binref projects are crowd-responsive
by linear pages.
From shared hosting to custom private clouds, go to lunar pages.com for all your hosting
needs.
Unless otherwise stasis, today's show is released under a creative commons, attribution, share
a life, free those own license.