Using git (and use version control) at the SCF

Overview of Version Control Systems and Git

Version control systems are designed to enable users (one or more) to easily track changes in their files and allow multiple users to work on files without having to pass them back and forth or manually deal with different versions. The basic idea is that copies of a given directory structure are stored on different machines and that in each copy, the entire history of the directory structure and its files are stored in an incremental fashion.

Git is a very popular version control system. github.com is a widely-used site that allows one to set up a remote repository available to anyone. One downside is that you have to pay if you want the files to be private; otherwise they are available to anyone on the internet.

The SCF has set up a location on the file system in which users can establish git repositories that can be accessed by other SCF users. See details under "Creating a Shared Git Repository on the SCF Filesystem" below.

Note that version control systems are set up to work best with text files as they can often automatically resolve conflicts between different versions of a text file on a line by line basis. With binary files, conflict resolution can only be done by choosing one version of the file over another.

We'll start by discussing how to create and use a simple repository local to a single machine and then discuss how to create a repository in the SCF git repository repository (GRR) that is accessible to other SCF users.

Working with a Repository on Your Local Machine

The Basics

First note that we can get help on git and on specific git commands as follows:

$ git --help
$ git commit --help

To start, create a directory that will hold your repository (we'll call it 'proj') and initialize the repository:

$ mkdir proj
$ cd proj
$ git init

Now put/create some files in the directory. Notify git of the files to be tracked:

$ git add file.txt
$ git add pic.png

Here's how you check on the status of the repository; you can do this anytime:

$ git status

Commit your changes to the repository with an informative message that says what changes you've made:

$ git commit -am "added file.txt and pic.png"

Note that you can modify and delete files and using the -a flag when committing will cause your changes to be reflected in the repository. But to add additional files to the repository, you need to use 'git add'.

To see the difference between files as currently present in your directory and the version of files in the repository, do:

$ git diff

To see a log showing the history of changes to your repository:

$ git log

Using Branches

Branches are a useful way to try something out without altering your current files.

To make a branch of a project (in this case the branch is called 'trial'):

$ git branch trial

To check on the branches:

$ git branch

To switch to a branch:

$ git checkout trial

Now you can make changes to the branch; adding and committing as above.

To merge changes back into the master (the primary version of the repository), do the following. When you do the merge, git may need you to resolve conflicts between the different versions.

$ git checkout master
$ git merge trial

To delete a branch:

$ git branch -d trial

To see a graphical representation of changes and of the branching structure, you can use gitk, which is installed on the SCF Linux machines:

$ gitk --all

A text-based alternative is:

$ git log --graph --full-history --all --color --pretty=format:"%x1b[31m%h%x09%x1b[32m%d%x1b[0m%x20%s%x20%x1b[33m(%an)%x1b[0m"

Creating a Shared Git Repository on the SCF Filesystem

First, email consult@stat to request that you be added to the 'repos' group, which will allow you to create a repository. If you want to have a specific group for your project (which is a good idea to avoid the possibility that someone not in your project might make changes to your repo), also provide the usernames of the people you want in your group.

Now, ssh to any SCF Linux machine and go to the git repository repository (GRR):

$ cd /scratch/repos

Make a directory for your new repo (we'll call it 'bigProj') and change the ownership to the group, where 'userName' is your SCF user name and 'groupName' is the name of the group requested above. If you haven't requested a group be set up for your project, just use 'repos' as the group name:

$ mkdir bigProj
$ chown -R userName:groupName bigProj

Initialize the repo and set up permissions:

$ git init --bare --shared bigProj
$ chgrp -R groupName bigProj

Now we'll initialize a local repository (on any machine you have access to, not just SCF machines), add files, and synchronize with the repo on /scratch/repos.

Go to the directory on the local machine and create the local repository:

$ mkdir bigProj
$ cd bigProj
$ git init
$ git add file.txt
$ git commit -am "added file.txt"

Next we'll set things up for easier communication with the GRR and synchronize the local master branch with the remote repository, where by convention we use origin as the name of the centralized remote repository:

$ git remote add origin ssh://userName [at] beren [dot] berkeley [dot] edu/scratch/repos/bigProj
$ git push origin master
$ git branch --set-upstream master origin/master

Next we'll see how you can set up a local repository from a pre-existing remote repository, and then we'll see how we work with the remote repository.

Accessing an Existing Remote Repository

First, go to the parent directory in which you want your local directory. Then clone the repository:

$ cd parent
$ git clone ssh://userName [at] beren [dot] berkeley [dot] edu/scratch/repos/bigProj

Again, set things up for easier communication:

$ git branch --set-upstream master origin/master

Working with a Remote Repository

First, always start by pulling any changes made to the repository, so that your local version is current:

$ git pull

Now make changes as desired with 'git add', 'git commit', etc. These will only change your local repo and not yet be reflected in the remote repository.

Next, do the 'git pull' again in case anyone changed anything while you were working locally. If anything has changed, git will try to merge things automatically but may tell you to manually resolve conflicts. If you do need to resolve any conflicts, you'll need to commit any changes made in the process of resolving the conflicts:

$ git pull

# If you've made changes while resolving conflicts
$ git commit -am "resolved conflicts"
# To check no new changes have been made to the remote repository while you were resolving conflicts
$ git pull

Now push your changes to the remote repository:

$ git push

That's it; you've now made changes that your collaborators can now access by doing their own 'git pull'. An alternative approach to that outlined above is that it's probably best to make your changes on a branch and then merge that branch into the master and then push to the remote repository.

$ git pull
$ git branch trial
$ git checkout trial

Now you can make changes to the branch; adding and committing as above.

To merge changes into the master and push the result to the remote repository:

$ git checkout master
$ git pull
$ git merge trial
$ git push
# If you want to remove the branch now that things are merged
$ git branch -d trial

If you'd like the version history to appear as a single chain that doesn't show the temporary branching, you can do the following as discussed in this posting:

$ git checkout master
$ git pull
$ git checkout trial
$ git rebase master
$ git checkout master
$ git merge trial
$ git branch -d trial

Finally, note that 'git pull' actually does two things: it does a 'git fetch' to get the remote history information and then a 'git merge' to merge with your local repo. If there are many branches in the repo, you may want to use 'git fetch' and 'git merge' separately to manually deal with individual branches.