CommunityData:Git

Wondering why this is a topic to care about? Check this what is version control? article.

Getting Access to the CDSC Git Repository
If you need access to the CDSC Git Repository, you should ask on the #communitydata IRC channel for access. If you need access to a specific repository only, mention which one. While you likely already know which repo you want access to, you can find the public ones on code.communitydata.science, and a complete list of all of them   file in the   git repository. If you are a new CDSC member, mention that you need to be added to the  group in Gitolite. Anybody in the collective who uses the Git repository will be able add you.

Install Git
To get started, you will need to install git. Doing so requires different steps depending on your operating system. Basic instructions available from the Git website.

You will also likely need to set it up so it knows what your name and email address is. You can do that like:

Note that RStudio also has Git integration now. Instructions and details available via RStudio support documentation.

Configuring Git for submodules
Once you've installed git, there are some configuration options which will make your life much easier. You can set them globally with the following commands:

git config --global alias.spull '!__git_spull { git pull "$@" && git submodule sync --recursive && git submodule update --init --recursive; }; __git_spull' git config --global status.submoduleSummary true

These two commands will ensure that git works a little better with submodules. Submodules are essentially git repositories that are buried inside other git repositories. For example, the  repository currently uses the   repository as a submodule. If you're working in a repository like this, you'll want to use  instead of just   which will also check for and pull changes made in any of your submodules.

Gitolite Server
We have a private git server which uses gitolite to manage permissions for git repositories. It's like a private Github server that hosts our respositories, but just ours, and on our server.

SSH Keys
Once you've got git installed, you will also need a public SSH key. You can send your public key (usually ~/.ssh/id_rsa.pub) to a current administrator (see the list of administrators below on this page), and they can add you as a new user.

Cloning a repository
"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.

To clone a repository, run the following command:

git clone --recursive git@code.communitydata.science:REPOSITORY_NAME

Note that you need to use this SSH syntax rather than the git protocol (e.g., ), which doesn't have write permissions.

Creating a new repository
To create a new repository, you will need to have admin rights. Currently, everyone in the collective group is an administrator.

First, you will need to clone the gitolite-admin repository

$ git clone git@code.communitydata.science:gitolite-admin

And then edit the file. To add a new project, simply create a new entry at the bottom of the file.

For example,

would create a new repository at git@code.communitydata.science:foo with aaron and mako as admins, and give jdfoote read-only access *once this file was saved, committed, and pushed*.

In order to actually create the repository you need to:


 * 1) Save the file (i.e., with text editor)
 * 2) Add the file with (with  )
 * 3) Commit the file (with  ) (this will put you into a text editor where you can add a commit message)
 * 4) Push the file back to the server (with  )

Pushing data into a new repository on the server from a local git repository you already have
You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:

If this project already exists in git, then it's even easier. Just change the remote, and push it.

Adding new users
To add new users, simply add their public key to the  directory, renamed as. The persons username (as called in the  file) will be whatever the username in the filename above is.

Using git-annex to manage large files in git
This is still experimental, and may go away. Don't put files in it without a backup.

Getting Set Up
Git is not a very good system for managing large files, which is a problem for us, since we often have large data files. Enter git-annex, a system that works in tandem with git and lets you store large files (but avoids using git as the data store). Our gitolite installation supports git-annex. To start using git-annex, install git-annex locally in your computer. Most GNU/Linux distributions have git-annex packages. If you're on a Mac, in Terminal.app, try the instructions from Homebrew: https://formulae.brew.sh/formula/git-annex.

Setting Up Your Repo To Use Annex
Then, in your existing git repository execute the following initialization command:

This needs to be done only once. To add a file, in your repository, run the following commands:

You should encrypt the file if the data is not public. You can use GNU Privacy Guard to do the encryption, and have all your collaborators as recipients for the file. Once encrypted, execute the following commands to include and push the file to the server.

Using an Existing Annex Repo
Once these commands are successful, your collaborators should be able to get the file with the following command (assuming that they have already run ):

Once you've encrypted non-public data, git-annex is easy to use using the webapp.