Difference between revisions of "Getting started with GitHub"

Jump to: navigation, search
 
(12 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<onlyinclude>
GitHub is a web-based hosting service for managing code work and tracking changes made to code. It is a useful collaborative tool through all stages of research and fieldwork. This page provides resources and links to resources on how to get started with GitHub.  
This page provides resources and links to resources on how to get started with GitHub. There are other Git alternatives to GitHub but most of these resources are applicable to those alternatives as well. See for example [https://about.gitlab.com GitLab] and [https://bitbucket.org Bitbucket].
</onlyinclude>
The World Banks GitHub repositories can be found at [https://github.com/worldbank https://github.com/worldbank].


== Read First ==
== Read First ==
* Note that GitHub is meant to be used only on code files and other raw text type files. This and resolutions to that are discussed in more detail below.
*GitHub efficiently tracks changes to raw text files, including all code files in any programming language, but not to binary files. gitignore files can address these inefficiencies. 
* Combining GitHub with another syncing service like Dropbox or OneDrive is an effective method that requires a specific setup.
*There are many resources online on how to get started on GitHub.
== Overview ==
GitHub is a web-based hosting service for managing code work and tracking changes made to code. GitHub is an amazing system for code collaboration and is very efficient in tracking changes to raw text files, including all code files in any programming language in addition to .tex, .txt, and .csv files. However, GitHub does not have direct access to the text and numbers in binary files like .doc, .docx, .xls, .xlsx, or .pdf files. Thus, though GitHub can store binary files very efficiently, it cannot track changes made to them in detail. Instead, it thus stores one full version of binary files for each change made to them. This can become very inefficient. See the sections on [[Getting_started_with_GitHub#gitignore_files | ignore files]] and [[Getting_started_with_GitHub#Combining_GitHub_and_DropBox | combining GitHub and DropBox]] below for how to avoid related inefficiencies.


The World Banks GitHub repositories can be found [https://github.com/worldbank here]. There are also a number of Git alternatives to GitHub. Most of the resources outlined in this page are applicable to these alternatives as well. See, for example, [https://about.gitlab.com GitLab] and [https://bitbucket.org Bitbucket].


== Best Practices ==


== What GitHub is good at and what it is less good at ==
=== gitignore files ===
Git was implemented to manage code work and doing so by tracking changes made to code in great detail. This is the reason why Git is an amazing tool to collaborate on code, but the draw back is Git is only efficient in tracking changes to raw text files. All code files in any programming language are always raw text files, and so is .tex, .txt, .csv files, .doc/.docx, .xls.xlsx, .pdf files and images are examples of binary files that are not raw text files.  Binary file are stored very efficiently but Git does not have direct access to the text and numbers in those files and can therefore not track changes in detail. Git therefore stores one full version of binary files for each change made to them, which gets very inefficient. See the sections on [[Getting_started_with_GitHub#gitignore_files | ignore files]] and [[Getting_started_with_GitHub#Combining_GitHub_and_DropBox | combining GitHub and DropBox]] below for how to relate to this.
 
== Resources for absolute beginners ==
Since GitHub is used extensively outside the research community there are a lot of resources online on how to get started on GitHub. Some of those resources expect technical skills, but the list below links to resources that does not:
* https://guides.github.com/ - GitHub's own guide on how to get started


=== Specific sections in GitHub's guide we recommend to researchers learning to use GitHub ===
A gitignore file is a very important tool to control what in your data work folder you will share in the cloud. This file ignores files added to your repository locally and does not sync them with the repository in the cloud. It ensures that you do not share data files with private data in the GitHub cloud and that you do not share binary files that would make your GitHub repository big and slow to work with.


Some topics discusses in the GitHub guide are not relevant in research, but we recommend resaechers to read the topics described in the follow sections and to use those topics frequently.
See GitHub's own documentation on ignore files [https://help.github.com/articles/ignoring-files/ here]. The World Bank's DIME team has developed a [https://github.com/worldbank/DIMEwiki/tree/master/Topics/GitHub template gitignore file] with the needs of a researcher especially in mind. While in most cases you can use this file without modification, in some contexts you may need to edit it.
* [https://guides.github.com/features/issues/ issues]
* [https://guides.github.com/features/wikis/ documentation]


== Best  practices for managing a research project using GitHub ==
=== GitHub and Dropbox ===
=== gitignore files ===


gitignore files is a very important tool to control what in your data work folder that you will share in the cloud. This file ignores (dig''ignore'') files added to your repository locally and do not sync them with the repository in the cloud. This is a great way to make sure that you do not share data files with private data in the GitHub cloud, and to not share binary files that otherwise makes your GitHub repository big and slow to work with.
Researchers often use syncing services like DropBox or OneDrive in combination with GitHub. This is a great way to share data and binary files with team members without leaking private data in the GitHub cloud. Further, since GitHub tracks binary files inefficiently, combining it with Dropbox helps alleviate this problem.  


See GitHub's own documentation on ignore files [https://help.github.com/articles/ignoring-files/ here] and that page has links to more detailed reading. The World Bank's DIME team has developed a template gitignore file with the needs of a researcher specially in mind. In most cases you can use it as it is, but in some contexts you might have make some edits, but then it is still a great starting point. You find the template [https://github.com/worldbank/DIMEwiki/tree/master/Topics/GitHub here]
Combining GitHub with another syncing service requires a specific setup. See [https://github.com/kbjarkefur/GitHubDropBox this guide] for guidance.


=== Combining GitHub and DropBox ===
== Resources for Beginners ==
In research we often want to use a syncing service like DropBox, OneDrive etc. in combination with GitHub. This requires a specific setup as GitHub is also a syncing serve, although it works very differently compared to DropBox, OneDrive etc.


Combining GitHub and DropBox is a great way to share data and binary files across team members without leaking private data in the GitHub cloud and to get around that GitHub tracks binary files in a way that is very inefficient in terms of disk space. See [https://github.com/kbjarkefur/GitHubDropBox this guide] for how to combine GitHub and DropBox. This guide includes some slightly more technical steps, but it solves a big issue, and is easy to maintain once it is set up.
Since GitHub is used extensively outside the research community, there are many resources online on how to get started on GitHub. While some of these resources assume technical skills, GitHub’s [https://guides.github.com/ guide] on how to get started, for example, does not. GitHub’s resources on [https://guides.github.com/features/issues/ issues] and [https://guides.github.com/features/wikis/ documentation] are also of use.


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Data Management]]
This article is part of the topic [[Data Management]]
 
== Additional Resources ==
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/git-1-intro.pdf Intro to GitHub]
*DIME Analytics’ guides to  [https://github.com/worldbank/DIME-Resources/blob/master/git-2-github.pdf 1] and [https://github.com/worldbank/DIME-Resources/blob/master/git-3-flow.pdf 2] to Using Git and GitHub
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/git-4-management.pdf Maintaining a GitHub Repository]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/onboarding-3-git.pdf Initializing and Synchronizing a Git Repo with GitHub Desktop]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/onboarding-4-gitflow.pdf Using Git Flow to Manage Code Projets with GitKraken]
*An [http://web.simmons.edu/~wilsonjd/LIS488/website/OverleafTutorial.pdf Introduction to Overleaf] by Alli Gofman and Jaclyn Wilson
[[Category: Technical Tools]]
[[Category: Data Management ]]
[[Category: Data Management ]]

Latest revision as of 14:29, 13 April 2021

GitHub is a web-based hosting service for managing code work and tracking changes made to code. It is a useful collaborative tool through all stages of research and fieldwork. This page provides resources and links to resources on how to get started with GitHub.

Read First

  • GitHub efficiently tracks changes to raw text files, including all code files in any programming language, but not to binary files. gitignore files can address these inefficiencies.
  • Combining GitHub with another syncing service like Dropbox or OneDrive is an effective method that requires a specific setup.
  • There are many resources online on how to get started on GitHub.

Overview

GitHub is a web-based hosting service for managing code work and tracking changes made to code. GitHub is an amazing system for code collaboration and is very efficient in tracking changes to raw text files, including all code files in any programming language in addition to .tex, .txt, and .csv files. However, GitHub does not have direct access to the text and numbers in binary files like .doc, .docx, .xls, .xlsx, or .pdf files. Thus, though GitHub can store binary files very efficiently, it cannot track changes made to them in detail. Instead, it thus stores one full version of binary files for each change made to them. This can become very inefficient. See the sections on ignore files and combining GitHub and DropBox below for how to avoid related inefficiencies.

The World Banks GitHub repositories can be found here. There are also a number of Git alternatives to GitHub. Most of the resources outlined in this page are applicable to these alternatives as well. See, for example, GitLab and Bitbucket.

Best Practices

gitignore files

A gitignore file is a very important tool to control what in your data work folder you will share in the cloud. This file ignores files added to your repository locally and does not sync them with the repository in the cloud. It ensures that you do not share data files with private data in the GitHub cloud and that you do not share binary files that would make your GitHub repository big and slow to work with.

See GitHub's own documentation on ignore files here. The World Bank's DIME team has developed a template gitignore file with the needs of a researcher especially in mind. While in most cases you can use this file without modification, in some contexts you may need to edit it.

GitHub and Dropbox

Researchers often use syncing services like DropBox or OneDrive in combination with GitHub. This is a great way to share data and binary files with team members without leaking private data in the GitHub cloud. Further, since GitHub tracks binary files inefficiently, combining it with Dropbox helps alleviate this problem.

Combining GitHub with another syncing service requires a specific setup. See this guide for guidance.

Resources for Beginners

Since GitHub is used extensively outside the research community, there are many resources online on how to get started on GitHub. While some of these resources assume technical skills, GitHub’s guide on how to get started, for example, does not. GitHub’s resources on issues and documentation are also of use.

Back to Parent

This article is part of the topic Data Management

Additional Resources