CommunityData:Git: Difference between revisions

From CommunityData
mNo edit summary
 
(43 intermediate revisions by 8 users not shown)
Line 1: Line 1:
== Setting Up Git ==
Wondering why this is a topic to care about? Check this [https://about.gitlab.com/topics/version-control what is version control?] article.
 
== Getting Access to the CDSC Git Repository ==
 
If you need access to the CDSC Git Repository, you should ask on the [[CommunityData:IRC|#communitydata IRC channel]] for access. If you need access to a specific repository only, mention which one. While you likely already know which repo you want access to, you can find the public ones on [https://code.communitydata.science/ code.communitydata.science], and a complete list of all of them  <code>conf/gitolite.conf</code> file in the <code>gitolite-admin</code> git repository. If you are a new CDSC member, mention that you need to be added to the <code>@collective</code> group in Gitolite. Anybody in the collective who uses the Git repository will be able add you.
 
== Install Git==
 
To get started, you will need to '''install git'''. Doing so requires different steps depending on your operating system. Basic instructions available from [https://git-scm.com/downloads the Git website].
 
You will also likely need to set it up so it knows what your name and email address is. You can do that like:
 
<source lang='bash'>
$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com
Again
</source>
 
Note that RStudio also has Git integration now. Instructions and details available via [https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN RStudio support documentation].
 
== Configuring Git for submodules ==


Once you've installed git, there are some configuration options which will make your life much easier. You can set them globally with the following commands:
Once you've installed git, there are some configuration options which will make your life much easier. You can set them globally with the following commands:
Line 6: Line 26:
  git config --global status.submoduleSummary true
  git config --global status.submoduleSummary true


These two commands will ensure that git works a little better with submodules. Submodules are essentially git repositories that are buried inside other git repositories. For example, the <code>wikiresearch</code> repository currently uses the <code>RCommunityData</code> repository as a submodule. If you're working in a wiki like this, you'll want to use <code>git spull</code> instead of just <code>git pull</code> which will also check for and pull changes made in any of your submodules.
These two commands will ensure that git works a little better with submodules. Submodules are essentially git repositories that are buried inside other git repositories. For example, the <code>wikiresearch</code> repository currently uses the <code>RCommunityData</code> repository as a submodule. If you're working in a repository like this, you'll want to use <code>git spull</code> instead of just <code>git pull</code> which will also check for and pull changes made in any of your submodules.


== Gitolite Server ==
== Gitea ==


We have a private git server which uses [http://gitolite.com/gitolite/index.html gitolite] to manage permissions for git repositories.
We have a private git server which uses [https://about.gitea.com/ gitea] to manage permissions for git repositories. It's like a private Github server that hosts our respositories, but just ours, and on our server. If you have a question about Gitea, please message cdsc-it or look in [https://docs.gitea.com/ Gitea's general documentation].  


=== Dependencies ===
===Making an Account/SSH Keys===


To get started, you will need a [https://help.github.com/articles/generating-ssh-keys/ public SSH key]. You can send your public key (usually ~/.ssh/id_rsa.pub) to a current administrator (probably Mako), and they can add you as a new user.
You can make a new Gitea account on [https://gitea.communitydata.science/user/sign_up this webpage] using your email address and password. Connect your local git to your new Gitea account by adding your SSH key ((usually ~/.ssh/id_rsa.pub) to "SSH/GPG Keys" in your user "Settings" page.


You will also need to have git installed.
===Cloning a repository===
 
=== Cloning a repository ===


"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.
"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.


To clone a repository, run the following command:
To clone a repository, run the following command:  
 
  git clone --recursive git@code.communitydata.cc:''REPOSITORY_NAME''


=== Creating a new repository ===
  gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git


To create a new repository, you will need to have admin rights. Currently, the administrators are Nate, Jeremy, Aaron, Mako, Sayamindu, and Jim. If you'd like to be an administrator, you should contact one of them!
===Creating a new repository===


=== Using git-annex to manage large files in git ===
To create a new Gitea repository, follow the instructions in the ``Create'' button in the top-left corner of the homepage, immediately left of the user profile icon.


{{note}} This is still experimental, and may go away. Don't put files in it without a backup.
===Pushing data into a new repository on the server from a local git repository you already have===


Git is not a very good system for managing large files, which is a problem for us, since we often have large data files. Enter [https://git-annex.branchable.com/walkthrough/ git-annex], a system that works in tandem with git and lets you store large files (but avoids using git as the data store). Our gitolite installation supports git-annex. To start using git-annex, install git-annex locally in your computer. Most GNU/Linux distributions has git-annex packages. Then, in your existing git repository execute the following initialization command:
You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:
 
<source lang='bash'>
<source lang='bash'>
$ git annex init
$ cd foo
$ git remote add origin gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git
$ git push --set-upstream origin main
</source>
</source>


This needs to be done only once. To add a file, in your repository, run the following commands:
If this project already exists in git, then it's even easier. Just change the remote, and push it.


<source lang='bash'>
<source lang='bash'>
$ mkdir data
$ git remote set-url origin gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git
$ cp ~/largedata.csv.bz2 data/
$ git push
</source>
</source>


You should encrypt the file if the data is not public. You can use GNU Privacy Guard to do the encryption, and have all your collaborators as recipients for the file. Once encrypted, execute the following commands to include and push the file to the server.
====Adding new users to repository====
 
To add new users, use the project Settings page from the home page of the repository.
 
===Setting the default branch===
 
Gitea sometimes has issues because it expects the default branch to be "main", but some repositories may have a different default branch name. To fix this (if it is a problem), change the default branch for the repositories under "Settings > Branches".
 
== Gitolite Server ==
 
We have a private git server which uses [http://gitolite.com/gitolite/index.html gitolite] to manage permissions for git repositories. It's like a private Github server that hosts our respositories, but just ours, and on our server.
 
=== SSH Keys ===
 
Once you've got git installed, you will also need a [https://help.github.com/articles/generating-ssh-keys/ public SSH key]. You can send your public key (usually ~/.ssh/id_rsa.pub) to a current administrator (see the list of administrators below on this page), and they can add you as a new user.
 
=== Cloning a repository ===
 
"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.
 
To clone a repository, run the following command:


<source lang='bash'>
  git clone --recursive git@code.communitydata.science:''REPOSITORY_NAME''
$ git annex add data/largedata.csv.bz2.gpg
$ git commit -m "Added data file"
$ git push --all
$ git annex copy --to origin
</source>
Once these commands are successful, your collaborators should be able to get the file with the following command (assuming that they have already run <code>git annex init</code>):


<source lang='bash'>
Note that you need to use this SSH syntax rather than the [https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols git protocol] (e.g., <code>git://code.communitydata.science/repo_name</code>), which doesn't have write permissions.
$ git annex get data/largedata.csv.bz2.gpg
</source>


=== Details for Administrators ===
=== Creating a new repository ===
==== Creating new repositories ====


If you are all already administrator, this describes how you will create a new repository.
To create a new repository, you will need to have admin rights. Currently, everyone in the collective group is an administrator.


First, you will need to clone the gitolite-admin repository
First, you will need to clone the gitolite-admin repository


  $ git clone git@code.communitydata.cc:gitolite-admin
  $ git clone git@code.communitydata.science:gitolite-admin


And then edit the file <code>conf/gitolite.conf</code>. To add a new project, simply create a new entry at the bottom of the file.
And then edit the file <code>conf/gitolite.conf</code>. To add a new project, simply create a new entry at the bottom of the file.
Line 80: Line 108:
</source>
</source>


would create a new repository at git@code.communitydata.cc:foo with aaron and mako as admins, and give jdfoote read-only access *once this file was saved, committed, and pushed*.
would create a new repository at git@code.communitydata.science:foo with aaron and mako as admins, and give jdfoote read-only access *once this file was saved, committed, and pushed*.
 
In order to actually create the repository you need to:
 
# Save the file (i.e., with text editor)
# Add the file with (with <code>git add conf/gitolite.conf</code>)
# Commit the file (with <code>git commit</code>) (this will put you into a text editor where you can add a commit message)
# Push the file back to the server (with <code>git push</code>)
 
=== Pushing data into a new repository on the server from a local git repository you already have ===


You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:
You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:
<source lang='bash'>
<source lang='bash'>
$ cd foo
$ cd foo
$ git init
$ git remote add origin git@code.communitydata.science:foo
$ git remote add origin git@code.communitydata.cc:foo
$ git push --set-upstream origin main
$ git add ./ # Adding everything to be tracked in git
$ git commit
$ git push --set-upstream origin master
</source>
</source>


Line 95: Line 129:


<source lang='bash'>
<source lang='bash'>
$ git remote set-url origin git@code.communitydata.cc:foo
$ git remote set-url origin git@code.communitydata.science:foo
$ git push
$ git push
</source>
</source>
Line 102: Line 136:


To add new users, simply add their public key to the <code>keydir/</code> directory, renamed as <code>username.pub</code>. The persons username (as called in the <code>code/gitolite.conf</code> file) will be whatever the username in the filename above is.
To add new users, simply add their public key to the <code>keydir/</code> directory, renamed as <code>username.pub</code>. The persons username (as called in the <code>code/gitolite.conf</code> file) will be whatever the username in the filename above is.
== Using git-annex to manage large files in git ==
{{note}} This is still experimental, and may go away. Don't put files in it without a backup.
=== Getting Set Up ===
Git is not a very good system for managing large files, which is a problem for us, since we often have large data files. Enter [https://git-annex.branchable.com/walkthrough/ git-annex], a system that works in tandem with git and lets you store large files (but avoids using git as the data store). Our gitolite installation supports git-annex. To start using git-annex, install git-annex locally in your computer. Most GNU/Linux distributions have git-annex packages. If you're on a Mac, in Terminal.app, try the instructions from Homebrew: https://formulae.brew.sh/formula/git-annex.
=== Setting Up Your Repo To Use Annex ===
Then, in your existing git repository execute the following initialization command:
 
<source lang='bash'>
$ git annex init
</source>
This needs to be done only once. To add a file, in your repository, run the following commands:
<source lang='bash'>
$ mkdir data
$ cp ~/largedata.csv.bz2 data/
</source>
You should encrypt the file if the data is not public. You can use GNU Privacy Guard to do the encryption, and have all your collaborators as recipients for the file. Once encrypted, execute the following commands to include and push the file to the server.
<source lang='bash'>
$ git annex add data/largedata.csv.bz2.gpg
$ git commit -m "Added data file"
$ git push --all
$ git annex copy --to origin
</source>
=== Using an Existing Annex Repo ===
Once these commands are successful, your collaborators should be able to get the file with the following command (assuming that they have already run <code>git annex init</code>):
<source lang='bash'>
$ git annex get data/largedata.csv.bz2.gpg
</source>
Once you've encrypted non-public data, git-annex is easy to use using the webapp.
<source lang='bash'>
$ git annex webapp
</source>

Latest revision as of 18:21, 30 October 2024

Wondering why this is a topic to care about? Check this what is version control? article.

Getting Access to the CDSC Git Repository[edit]

If you need access to the CDSC Git Repository, you should ask on the #communitydata IRC channel for access. If you need access to a specific repository only, mention which one. While you likely already know which repo you want access to, you can find the public ones on code.communitydata.science, and a complete list of all of them conf/gitolite.conf file in the gitolite-admin git repository. If you are a new CDSC member, mention that you need to be added to the @collective group in Gitolite. Anybody in the collective who uses the Git repository will be able add you.

Install Git[edit]

To get started, you will need to install git. Doing so requires different steps depending on your operating system. Basic instructions available from the Git website.

You will also likely need to set it up so it knows what your name and email address is. You can do that like:

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com
Again

Note that RStudio also has Git integration now. Instructions and details available via RStudio support documentation.

Configuring Git for submodules[edit]

Once you've installed git, there are some configuration options which will make your life much easier. You can set them globally with the following commands:

git config --global alias.spull '!__git_spull() { git pull "$@" && git submodule sync --recursive && git submodule update --init --recursive; }; __git_spull'
git config --global status.submoduleSummary true

These two commands will ensure that git works a little better with submodules. Submodules are essentially git repositories that are buried inside other git repositories. For example, the wikiresearch repository currently uses the RCommunityData repository as a submodule. If you're working in a repository like this, you'll want to use git spull instead of just git pull which will also check for and pull changes made in any of your submodules.

Gitea[edit]

We have a private git server which uses gitea to manage permissions for git repositories. It's like a private Github server that hosts our respositories, but just ours, and on our server. If you have a question about Gitea, please message cdsc-it or look in Gitea's general documentation.

Making an Account/SSH Keys[edit]

You can make a new Gitea account on this webpage using your email address and password. Connect your local git to your new Gitea account by adding your SSH key ((usually ~/.ssh/id_rsa.pub) to "SSH/GPG Keys" in your user "Settings" page.

Cloning a repository[edit]

"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.

To clone a repository, run the following command:

 gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git

Creating a new repository[edit]

To create a new Gitea repository, follow the instructions in the ``Create button in the top-left corner of the homepage, immediately left of the user profile icon.

Pushing data into a new repository on the server from a local git repository you already have[edit]

You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:

$ cd foo
$ git remote add origin gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git
$ git push --set-upstream origin main

If this project already exists in git, then it's even easier. Just change the remote, and push it.

$ git remote set-url origin gitea@gitea.communitydata.science:$USERNAME/REPOSITORY_NAME$.git
$ git push

Adding new users to repository[edit]

To add new users, use the project Settings page from the home page of the repository.

Setting the default branch[edit]

Gitea sometimes has issues because it expects the default branch to be "main", but some repositories may have a different default branch name. To fix this (if it is a problem), change the default branch for the repositories under "Settings > Branches".

Gitolite Server[edit]

We have a private git server which uses gitolite to manage permissions for git repositories. It's like a private Github server that hosts our respositories, but just ours, and on our server.

SSH Keys[edit]

Once you've got git installed, you will also need a public SSH key. You can send your public key (usually ~/.ssh/id_rsa.pub) to a current administrator (see the list of administrators below on this page), and they can add you as a new user.

Cloning a repository[edit]

"Cloning" a repository downloads the files, as well as the history, of a repository. It also creates a new git instance in that directory, so that you can commit changes to the code.

To clone a repository, run the following command:

 git clone --recursive git@code.communitydata.science:REPOSITORY_NAME

Note that you need to use this SSH syntax rather than the git protocol (e.g., git://code.communitydata.science/repo_name), which doesn't have write permissions.

Creating a new repository[edit]

To create a new repository, you will need to have admin rights. Currently, everyone in the collective group is an administrator.

First, you will need to clone the gitolite-admin repository

$ git clone git@code.communitydata.science:gitolite-admin

And then edit the file conf/gitolite.conf. To add a new project, simply create a new entry at the bottom of the file.

For example,

repo foo
    RW+ = aaron mako
    R   = jdfoote

would create a new repository at git@code.communitydata.science:foo with aaron and mako as admins, and give jdfoote read-only access *once this file was saved, committed, and pushed*.

In order to actually create the repository you need to:

  1. Save the file (i.e., with text editor)
  2. Add the file with (with git add conf/gitolite.conf)
  3. Commit the file (with git commit) (this will put you into a text editor where you can add a commit message)
  4. Push the file back to the server (with git push)

Pushing data into a new repository on the server from a local git repository you already have[edit]

You could then go to wherever the files are that you would like to track, and add this repository as a remote, like so:

$ cd foo
$ git remote add origin git@code.communitydata.science:foo
$ git push --set-upstream origin main

If this project already exists in git, then it's even easier. Just change the remote, and push it.

$ git remote set-url origin git@code.communitydata.science:foo
$ git push

Adding new users[edit]

To add new users, simply add their public key to the keydir/ directory, renamed as username.pub. The persons username (as called in the code/gitolite.conf file) will be whatever the username in the filename above is.

Using git-annex to manage large files in git[edit]

Note Note: This is still experimental, and may go away. Don't put files in it without a backup.

Getting Set Up[edit]

Git is not a very good system for managing large files, which is a problem for us, since we often have large data files. Enter git-annex, a system that works in tandem with git and lets you store large files (but avoids using git as the data store). Our gitolite installation supports git-annex. To start using git-annex, install git-annex locally in your computer. Most GNU/Linux distributions have git-annex packages. If you're on a Mac, in Terminal.app, try the instructions from Homebrew: https://formulae.brew.sh/formula/git-annex.

Setting Up Your Repo To Use Annex[edit]

Then, in your existing git repository execute the following initialization command:

$ git annex init

This needs to be done only once. To add a file, in your repository, run the following commands:

$ mkdir data
$ cp ~/largedata.csv.bz2 data/

You should encrypt the file if the data is not public. You can use GNU Privacy Guard to do the encryption, and have all your collaborators as recipients for the file. Once encrypted, execute the following commands to include and push the file to the server.

$ git annex add data/largedata.csv.bz2.gpg
$ git commit -m "Added data file"
$ git push --all
$ git annex copy --to origin

Using an Existing Annex Repo[edit]

Once these commands are successful, your collaborators should be able to get the file with the following command (assuming that they have already run git annex init):

$ git annex get data/largedata.csv.bz2.gpg

Once you've encrypted non-public data, git-annex is easy to use using the webapp.

$ git annex webapp