Introduction to Git

This is adapted from this tutorial (normally for University of Bristol CS students) to reflect what will be expected at The Pangean.

Why Git?

You are working on a software project. As your project grows, and you work on it together with other people, you will need some way of:

Git can do all these things. It takes a bit of getting used to, but it pays off to learn the basics of Git once and then use it again and again.

Git is not the only system that can do this, but it is the most popular one by far and knowing Git is pretty much a requirement for lots of technical job interviews as a developer. So you might as well learn Git.

On the other hand, Git’s manual is full of technical terms, some of which the authors made up themselves, and there are different ways of using git (called “workflows”) that have manuals of their own. The aim of this tutorial is to get you to the point where you can use Git productively, and understand some of these manuals if you need to.

There are three ways to use Git:

All three ways are different interfaces on the same set of Git “verbs” (or commands) so this tutorial will focus on these, but I will mention the free graphical client fork in this tutorial as that’s what I use myself.

Please do not let anyone tell you that “real programmers” only use the command line. It’s perfectly acceptable to use a graphical client if, like me, you’re more productive that way.

Installing Git and Fork

Unfortunately Fork doesn’t work on linux. On your own machine, type git in a terminal and if you’re lucky you’ll get the help text, showing it’s installed already. If you need to install it yourself:

On Windows, if you want to use the fork client then download and install it from git-fork.com. Fork comes with the command-line tool and there’s a button in the window to open a command line if you need it. If you want the command-line version of Git on its own then download and install Git for Windows.

On Mac, you first want to install Homebrew by opening a terminal and following the instructions on the page. Homebrew is a package manager, that is a piece of software that allows you to install other software. This is particularly useful for developers. Once you have installed Homebrew, open a new terminal and type brew install git and press ENTER. Then you can install fork from git-fork.com.

On Linux, you will not be able to install Fork, but you can install Git with the package manager provided with your operating system: sudo apt install git on Debian-based systems (including Mint and Ubintu) and sudo yum install git on Red Hat-based systems.

When you install Fork, it asks for a username and e-mail address. This is a feature of Git, not of Fork itself. It is not to spam you - each “commit” in Git is linked to an author so you can tell who last changed a file, and the developers decided to use name+email for this. You can just put “-“ as your e-mail address if you want and Fork will not complain. It doesn’t want to e-mail you, after all.

Online services

Git itself is a command-line tool that lets you manage repositories, which are like folders that track the complete history of everything that has ever happened in them. You always have a repository on your own machine, and you can have further repositories stored on a server.

For a typical project you will want a repository on a server, both as a backup of your code and as a way to share your code with others if it’s a team project.

There are three big online services that offer you free hosting of your own repositories, within reasonable limits:

All three services currently offer you an unlimited number (within reason) of private repositories, where only you or people that you allow can see the code. You can also make public repositories, for example for open-source projects or as part of your portfolio when you’re looking for jobs.

For the steps in this tutorial, and for hosting your first few projects, all three services are practically equivalent, and it’s easy enough to move a repository from one service to another later on so it does not matter which one you sign up with.

Kheeran: Please set this up using GitHub. I’ve altered the tutorial to reflect GitHub

Theory: commits

You are writing some code, and you have some files:

some files

The code you’re currently working on is called your working copy in git-speak.

I will indicate the working copy with a folder symbol like this from now on:

folder

This image and others in this tutorial are taken from the gnome-colors-common icon set which is released under the GNU LGPL version 2.

A repository is a database with the complete history of your code. I represent it with this symbol:

repository

You work on your working copy with whatever tools you like, but you only talk to the repository through Git (and Fork, if you’re using it).

Copying code from the working copy to the repository is called a commit. You will do this a lot. Instead of creating a ZIP file of what you’ve been working on and e-mailing it to the rest of your team, you make a commit. Instead of copying the files to a USB stick to take them home, you make a commit.

The basic workflow is this: write or edit some code, commit to the repository, repeat. Your repository, or at least mine for the CNuT notes and exercises (Kheeran: FYI, this is a 2nd year computer science unit from the University of Bristol), then looks like this:

some commits

Each dot is a single commit, with the newest one on top. Git does not of course store a copy of your whole project for each commit; instead it only records the changes since the last commit so it doesn’t waste space.

Sometimes you want to copy code from the repository back to your working copy, for example to undo a change that broke something. This is called a checkout.

Summary so far:

commit and checkout

Practice: make a repository and a commit

Normally it’s quickest to create an empty repository on an online service and then clone it to your own computer.

You have now got a working copy and a local copy of the repository in a folder on your disk (press the “curly arrow” Open in buttin in Fork’s toolbar to open the folder). Your repository, which actually lives in a folder called .git inside the working copy folder, currently has one commit called initial commit with the readme file.

a new repository

At the top you have the menu and toolbar. Repository / Open in Explorer, the shortcut Ctrl+Alt+O or, if you make the window wider, the “curly arrow” that appears in the toolbar opens a folder window for your working copy. Below the toolbar is the tab bar, with one tab per working copy - I currently have five tabs open.

On the left you have a list of various things - the top item “Changes” shows you if there are any changes in your working copy that are not in the repository yet. To the right is the main window which varies depending in what you select in the list to the left. Currently it is showing the details of the initial commit.

Here are some alternative ways to create repositories:

Let’s make a commit:

Fork now shows Changes (1) in the left bar to show that something has changed in the working copy. Click on that to see the change:

a changed file

Under Unstaged Changes you see all the files that have changed since your last commit. Selecting one of these files shows you the file on the right, with lines that you added in green and lines that you deleted in red.

a committed file

When you select a commit in the list, the bottom part of the window tells you the commit details. Each commit has a unique hash - a long random number that you need in some cases to identify a particular commit (if you’re using the command line). Below that you see the commit message (“Added text to readme” in my case) and the file(s) affected in this commit. If you click on the triangle next to a file in the commit screen, it expands to show you the changes in this commit. This feature is useful for browsing the history of your code to see what line(s) relate to what commits.

In the top window, notice that the latest commit (on top) has something called a tag attached that says master, whereas the one below has a tag origin/master.

On the command line, the process for committing goes like this:

General commit notes

Try and make commits

Here’s the example from my CNuT lecture notes again (which you’ll be able to look at in more detail in a moment):

some commits

This can be really useful later on to search for “how did I do that again?”. Imagine you at some point add a “reset” button to a form on a web application you’re making, then later on you have another form where you also need a reset button. If you can search for commits with the word “reset” in the description (with the magnifying glass icon to the very right of the toolbar) and find the last time you did this, then you can quickly apply the same technique to the new form you’re making.

Theory: working with remote repositories

So far, you have a folder with your working copy (currently with one file README.md), a repository (in the subfolder .git) called the local repository and another repository hosted on GitHub called the remote repository.

When you make a commit, you copy files from the working copy to the local repository. This neither counts as a back-up, nor does it help you share code with others.

To work with the remote repository, we need to introduce to new terms:

push and fetch

The idea is that the local and remote repositories are always kept in sync. Fetching is a safe operation in that it cannot cause conflicts, even if someone else has changed code in the meantime. Pulling is not safe in this sense, so the recommended workflow for starting out with Git in “single-player mode” is:

  1. Before you start work each day, do a fetch. If you’ve done work somewhere else (e.g. you work from both your home PC and the lab machines) then this will get your last changes.
  2. If there were no changes, it’s safe to pull which makes sure everything is up to date. If there were any changes, deal with that now (details on this later on).
  3. Do your work. Remember to make small, frequent commits.
  4. Before logging off, commit your work, then push to the server.

As long as you always fetch before you start working and commit+push before you finish, you will never have a Git conflict as long as not more than one person is working on the same project at a time. For teamwork, see later in this tutorial.

Practice: fetch and push to the server

If you’ve followed the tutorial so far your window will have a part like this:

one commit to push

Under Branches, you currently have one default branch called master which you can interpret as “the latest commit”. You can see on the right that the master tag is on the latest commit. Under Remotes, you can see one entry with the default name origin - this is the repository on GitHub. On the right, you can see that the origin/master tag is still on the initial commit: the remote is one commit behind the local repository. This is also shown after the master branch entry with the one followed by an arrow pointing up (pronounced: “1 up”). If you saw for example “3 down” it would mean that the remote is 3 commits ahead, and if you see both for example “1 up 2 down” then it means that you have made commits to the local repository and someone else has meanwhile made commits to the remote one, a situation that we’ll deal with later on.

You can also use the fetch, pull and push options in the Repository menu, then select the branches you want in the dialog box that appears.

On the command line, the commands are git fetch, git push and git pull. If Git isn’t sure which branch or remote you mean, it will give you a message telling you how to select a default one. If you want to manually select a different one, you can do git fetch origin/master or git pull origin/master master to tell Git exactly which branches you mean.

Practice: have a look at the CNuT repository

You can clone other people’s repositories if they have made them public. Select File / Clone in the Fork menu and put the following in the URL box: https://gitlab.com/david-bristol/coconut.git. Choose a name and folder anywhere you like. This will get you the sources and PDFs of all the material for the CNuT unit, as well as the complete history of how and when I developed it. Have a look at some of my commits.

Kheeran: Checkout https://github.com/kheeran/COMS30005-HPC-Serial for an example of my commits. Click 82 commits.

You will be able to make changes to your working copy and your local repository, but you will not of course be able to push changes back to my Gitlab repository.

Note that the URL starts https instead of git. When you clone someone else’s repository, this is fine as it means that you do not have to authenticate yourself or even have an account with the service. For your own repositories, although you could use https and type your username/password each time, it is more secure (and less annoying) to use the git URL scheme, which uses keys instead of passwords. See “Security” earlier on for details.

Teamwork, conflicts and merging

It can happen to the best of us: you’re working together on a project and two people edit their own copies of main.c at the same time. When it’s time to combine everyone’s work for the day, you risk one person’s changes overwriting the other person’s changes. Git calls this situation a conflict, and one of the reasons to use tools like Git in the first place is to help you solve code conflicts. (Git cannot unfortunately solve personal conflicts for you.)

Git solves conflicts by manipulating time. Instead of a time-line where one event happens after another, Git has a time-graph where different timelines can split and join again.

Conflicts that are not real

First, let’s imagine the following situation: two people are working on their own working copies of the same repository. Person one adds a file A.txt and commits it, person two adds a file B.txt and commits it. So far, everyone has just committed to their local repository and no-one is aware of the others’ changes yet.

Person one pushes her changes to GitHub (a.k.a. “origin”). So far so good - GitHub doesn’t know about the other change yet.

But when person two tries to push, they’ll get an error message:

error: failed to push some refs to 'git@gitlab.com:david-bristol/my-project.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally...

The first rule of “multiplayer Git” is that you can only push if no-one else has changed the repository since your last fetch. If you get this error, do a fetch instead:

conflict on different files

The timeline has split into two: after “updated README”, someone else made a commit “File A” and you made a commit “File B”. (The list on the left will show 1 up, 1 down to indicate this situation too.)

If you hit this situation, and the two commits do not directly conflict with each other (that is, you didn’t both edit the same file - you can see which files were changed by clicking on a commit and then looking at the bottom of the window), then what I recommend you do is

  1. pull, but tick the rebase instead of merge box in the window that pops up.
  2. push

I should warn you that some people on the internet have strong opinions about this, and that I will continue to ignore these people.

What rebase does is pretend that you’d done a fetch before making your changes, so the repository after the push looks like this:

after rebase and push

Everything’s fine again.

Conflicts that are real

Kheeran: In this case, inform me and we will discuss how to merge the conflicts. Ultimately, I will do the merging unless I specifically say otherwise.

It is always safe to try a push. If it doesn’t work, do a fetch instead and see what happened. Clicking on a commit shows you which files were changed at the bottom of the window. If two people changed the same file, you have a real conflict.

In this case, try and pull but do not tick the rebase box.

If you try and pull (whether or not you tick the rebase box) and you have a real conflict - two people edited the same file - one of two things can happen. If Git thinks it can tell what’s going on, it will try and combine both your changes to the file. If Git doesn’t know what to do, you get an error with a line like

CONFLICT (content): Merge conflict in README.md

Close the error message and open the file in question. Git has indicated your edits with markers like this (you can search for <<<<< to find them):

<<<<<<< HEAD
edited by person two
=======
edited by person one
>>>>>>> b550ec8bac847dc5ef2b731e30ff18a359b2d582

This means that you, person two in this case (HEAD), added the line edited by person two but someone else added the line edited by person one (the id of their commit is provided in case you want to look it up on the console, but in Fork it’s obvious which commit is meant).

Fix the file, remove Git’s markers and go back to Fork. You get this warning:

merge warning

If you’re happy that you’ve fixed the conflict, Resolve takes you to the commit window, where you can stage your changes and commit (Git has created a default commit message for you, which you can edit if you like). Abort gets you back to before you tried to push.

Your timeline now looks like this:

timeline after merge

You can now try and push again, and if no-one else has changed anything in the meantime, the push will go through and the green tags will move to the top commit.

When two timelines that have diverged join up in a single point again, this is called a merge.

Some technical details

A bit of explanation on what’s going on here. A commit is a data structure with the following information:

Normally, most commits have exactly one parent, except the initial commit which hase none. As long as you do not have any conflicts, your git repository is basically a linked list of commits (in the Fork graph view, a dot is a commit and a line leaving a dot downwards is a parent pointer). New commits get added at the start (top) of the list. In general, the repository is a DAG (directed acyclic graph) of commits.

However, when two people work on the same repository independently of each other and then commit, you get two commits with the same parent.

Some people maintain that you should never change a node once it has been added to the commit graph. In this case, every time you get two nodes with the same parent, you need to make a new commit with two parents called a merge commit to get the same latest version of the code for everyone again.

I disagree - if you do this all the time, then in the worst case 1 in every 3 of your commits will be a merge commit. As long as you have two commits that didn’t edit the same file, there’s no reason not to use the rebase feature: what this does is undo your last commit, fetch from the remote repository and then apply your last commit again. This makes the repository easier to browse when you’re looking for something. Rebasing is completely safe and won’t lose your changes.

If you have a genuine conflict in a file, it’s worth using the more complicated “merge commit” strategy, as in the merge commit you might want to make changes to a file that weren’t in either of the previous commits. For example if person one adds edited by person one and person two adds edited by person two, then you might want the final line to read edited by person one and person two.

So I suggest using a merge commit (pull with rebase off) only when you really have a conflict because then the merge commit itself contains useful information.

It’s not illegal to do a rebase after resolving a conflict, but if your fix for the conflict was to delete something you wrote in the last commit then a rebase would lose this information, and you would be better off keeping it recorded in the repository in case you want to refer to it later.

Multiplayer Git workflow

For team projects, I recommend the following workflow:

  1. When you start a coding session: fetch all previous commits by yourself and other people. This is always safe, it cannot create conflicts.
  2. Write code, commit, repeat. A commit is always safe, as it only affects your local repository.
  3. Before ending your coding session, first do a fetch. A fetch is always safe to do, it won’t fail (unless the server is down).
  4. If no-one else has made any commits - it’s safe to push and you’re done.
  5. If there are other new commits, but no conflict - pull with rebase on. If this succeeds, you’re done.
  6. If you get a conflict during a pull-with-rebase, abort and go to the next step.
  7. If you spot a conflict after the fetch (or have just aborted a push due to a conflict), do a pull-without-rebase. This will still get you a conflict warning, but now you can open the files affected and fix the conflict by hand.
  8. After fixing a conflict, use the Resolve button to create a merge commit, then try and push again (with rebase off). If this succeeds, you’re done, otherwise go back to step 5.

A reminder of the Git commands we’ve learnt so far:

git commands

Tags

It’s time for a closer look at those labels like master, which are called tags in Git-speak.

You can imagine that every repository has a table of tags, where each entry contains a tag name and a commit id. When you create your first commit, Git automatically creates a tag called master and points it at that commit; when you create a new commit then Git changes the master tag to point at that commit. We’ll see later on that this is not always true, but it’s close enough for now.

The only tag that Git requires in every repository is called HEAD and it points at whichever commit your working copy is based on - which so far just means the latest commit too. Fork doesn’t show the HEAD tag directly, instead it displays the HEAD commit of your local repository in bold. master is just a convention, but one employed by pretty much every repository out there.

Tags are part of a repository, so if you have a local and a remote repository then their tags can differ, as we saw on the last pages. Making commits, pushing and pulling (but not fetching) can move tags around too.

You can create your own tags, and Git won’t mess with them. For example, if you want to release version 0.1 of your project, you can right-click a commit and choose Create New Tag, and give it a name such as “v0.1” and an optional message. Your custom tags are basically human-readable names to refer to particular commits.

When you create a tag, it lives in your local repository. Tick the Push all tags box next time you push changes to copy it to the remote; a fetch will automatically fetch tags. (The reason for this convention is that on a group project, an individual member might want a “private” tag on their local repository that others can’t see, but tags that end up in the remote are assumed to apply to everyone.)

tags

In this image, the HEAD commit of the local repository is the one in bold (at the top), master is a local tag, 0.1 is a user-defined tag and the tags origin/HEAD and origin/master are from the remote repository called origin.

Tags can do much more than this, as we’ll see next.

Branches

Branches are what Git is really about. Git lets you do pretty much whatever you like with branches and there are lots of different workflows that use them in different ways. You don’t have to follow any of these until you’re experienced enough and have a project big enough that it makes sense for you.

There are two points to understand: what a branch is, and what a branch does.

What a branch is

A branch is simply a tag that Git moves for you when you create a new commit. master is, and always was, a branch tag.

At any point in time, your working copy is on exactly one branch. When you make a commit, Git moves the current branch tag to point at your new commit.

HEAD is a special tag in that it doesn’t point at a commit directly, it points at the current branch tag. So in a repository you have the linked list HEAD -> (current branch) -> (latest commit on this branch).

What a branch does

Branches let you work on different versions of your code at once without conflicts. Imagine a web project with some HTML files, some stylesheets and a server written in Java. Danny the designer is working on the HTML and CSS while at the same time Charlie the coder is adding new features to the server. With a single branch, every time someone wants to push to the repository they have to rebase or resolve conflicts manually, and worse still the commit history is a complete mix of Danny’s and Charlie’s work.

Instead, Danny can create a new branch called “design” and work on that, meanwhile Charlie creates a branch called “server” and works on that. After a few commits, the repositories look like this:

branches

In Danny’s view, design is the current branch (pink dot, latest commit (HEAD) in bold). Danny has made three commits on this branch, meanwhile Charlie has been wokring on the server branch and made a few commits there. The order of commits in the list is by date (latest on top) and commits on other branches than the current are shown in gray text. Charlie’s view of the repository is the same, but she’s currently on the server branch so the HEAD pointer on her local copy of the repository is on the “server now works” commit.

As long as Danny and Charlie work on different branches, they’ll never have a conflict, but at some point they will want to combine their code so they meet up at the end of the day to do this.

Merging branches

How and when to merge depends on how you’re using branches. You can have one branch per feature and merge them into master when they’re complete, or one branch per person (and occasionally further branches for trying things out), or many other things.

Danny and Charlie have decided to do their own work on branches called design and server and to merge the results into master every now and then.

Charlie does the same, and so the commit graph looks like this:

merged branches

This is Danny’s view on the design branch. The three branch tags are all on the same commit as the respective “origin” tags, showing that all local changes have been copied to the server. (To see if there are new changes on the server, Danny would do a fetch.)

Once the commit graph gets complicated, you can hover the mouse over an entry in the Branches section of the list on the left and two symbols will appear: the star marks a branch as “favourite” and the axe-thingy hides all other branches from view in the main window, so you can look at the commits on your current branch only (click the axe-thingy again to show all branches).

Merging into branches

One strategy to work with branches is called “feature branches”: for each new feature, you create a branch, implement the feature and then merge it into master when it’s done. This way you only ever have to merge from feature branches to master and not back again.

In our example, when Charlie continues working on her code, she might want to have it display Danny’s latest HTML page, so Charlie needs a way to get Danny’s work into her own “server” branch. She can do this by:

Charlie can now carry on coding the server. However, next time she merges into master, she might get a conflict if Danny’s updated the HTML files again - in this case she has to resolve the conflict by taking Danny’s latest version.

While merging from “work” branches into master is normal, merging from master into other branches is worth avoiding if you don’t have a good reason for it, but you can do it if you need to.

On the command line, git checkout OBJECT does a checkout, where OBJECT can either be a branch or the hash of a commit. To create a new branch on the command line, you use git checkout -b BRANCHNAME. The merge command is git merge OTHERBRANCH.

General guidelines on merging

Ignoring files

Sometimes, you don’t want all your files to be included in the repository. For example, if you’re compling a C program the traditional way, you definitely want your source files (e.g. main.c, main.h) in the repository, you might or might not want the executables such as main.exe in the repository, but you definitely don’t want temporary files such as main.o stored in the repository - these are created and updated by the compiler.

To ignore some files, make a file called .gitignore in your working copy - the first character in that filename really is a dot - and open it in any text editor; place one filename or pattern (with * to mean “anything”) per line in that file. For example, for a C program, place *.o in the ignore file.

If you’re working with TeX then there’s so many kinds of temporary file that someone has helpfully written an ignore file for you.

You can ignore whole folders too - if you put a line private in your ignore file, and you have a folder of that name, then Git/Fork will consider the whole folder off limits and not show changes to files in that folder in the (un)staged files view. (This is why you won’t find the solutions to the CNuT exercises in that repository.)

The .gitignore file itself you want to stage and add to the repository along with your other project files.

And finally

If you want to know more about Git, you can

The Git reference manual is written in a style that’s not everyone’s cup of tea, and lots of people find it a bit intimidating (the book, in my opinion, is more beginner-friendly). So someone has created a fake random Git manual page generator which perfectly emulates the tone of the real thing, while making no sense at all.

Kate Hudson’s “Git flight rules” is a good resource for Git FAQ.

And finally, a bit of programmer humour:

in case of fire

(Copied from http://abload.de/img/in_case_of_fireirrtb.jpg. This should be obvious, but as a fire warden I feel the need to point out that in case of a real fire alarm you must leave the building immediately by the nearest emergency exit.)

Altered from the tutorial by David Bernhard available at the University of Bristol’s CSS website