Getting started with GitHub
GitHub is a web-based hosting service for managing code work and tracking changes made to code. It is a useful collaborative tool through all stages of research and fieldwork. This page provides resources and links to resources on how to get started with GitHub.
Read First
- GitHub efficiently tracks changes to raw text files, including all code files in any programming language, but not to binary files. gitignore files can address these inefficiencies.
- Combining GitHub with another syncing service like Dropbox or OneDrive is an effective method that requires a specific setup.
- There are many resources online on how to get started on GitHub.
Overview
GitHub is a web-based hosting service for managing code work and tracking changes made to code. GitHub is an amazing system for code collaboration and is very efficient in tracking changes to raw text files, including all code files in any programming language in addition to .tex, .txt, and .csv files. However, GitHub does not have direct access to the text and numbers in binary files like .doc, .docx, .xls, .xlsx, or .pdf files. Thus, though GitHub can store binary files very efficiently, it cannot track changes made to them in detail. Instead, it thus stores one full version of binary files for each change made to them. This can become very inefficient. See the sections on ignore files and combining GitHub and DropBox below for how to avoid related inefficiencies.
The World Banks GitHub repositories can be found here. There are also a number of Git alternatives to GitHub. Most of the resources outlined in this page are applicable to these alternatives as well. See, for example, GitLab and Bitbucket.
Best Practices
gitignore files
A gitignore file is a very important tool to control what in your data work folder you will share in the cloud. This file ignores files added to your repository locally and does not sync them with the repository in the cloud. It ensures that you do not share data files with private data in the GitHub cloud and that you do not share binary files that would make your GitHub repository big and slow to work with.
See GitHub's own documentation on ignore files here. The World Bank's DIME team has developed a template gitignore file with the needs of a researcher especially in mind. While in most cases you can use this file without modification, in some contexts you may need to edit it.
GitHub and Dropbox
Researchers often use syncing services like DropBox or OneDrive in combination with GitHub. This is a great way to share data and binary files with team members without leaking private data in the GitHub cloud. Further, since GitHub tracks binary files inefficiently, combining it with Dropbox helps alleviate this problem.
Combining GitHub with another syncing service requires a specific setup. See this guide for guidance.
Resources for Beginners
Since GitHub is used extensively outside the research community, there are many resources online on how to get started on GitHub. While some of these resources assume technical skills, GitHub’s guide on how to get started, for example, does not. GitHub’s resources on issues and documentation are also of use.
Back to Parent
This article is part of the topic Data Management
Additional Resources
- DIME Analytics’ Intro to GitHub
- DIME Analytics’ guides to 1 and 2 to Using Git and GitHub
- DIME Analytics’ Maintaining a GitHub Repository
- DIME Analytics’ Initializing and Synchronizing a Git Repo with GitHub Desktop
- DIME Analytics’ Using Git Flow to Manage Code Projets with GitKraken
- An Introduction to Overleaf by Alli Gofman and Jaclyn Wilson