Difference between revisions of "Publishing Data"

Jump to: navigation, search
 
(5 intermediate revisions by 2 users not shown)
Line 5: Line 5:
*Before publishing data, remove all [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations or financial records.   
*Before publishing data, remove all [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations or financial records.   
*Accompany published data with proper [[Data Documentation | documentation]] to ensure that users understand the data.
*Accompany published data with proper [[Data Documentation | documentation]] to ensure that users understand the data.
* Clearly state who [[Data Ownership|owns]] the data that is being published.
*Publish data within a comprehensive directory that includes all necessary data files, raw outputs, and code.
*Publish data within a comprehensive directory that includes all necessary data files, raw outputs, and code.
*[[Getting started with GitHub | GitHub]], [https://osf.io/ The Open Science Framework], and [https://www.researchgate.net/Research Gate] are all platforms on which researchers can publish data, code, and directories
*[[Getting started with GitHub | GitHub]], [https://osf.io/ The Open Science Framework], and [https://www.researchgate.net/Research Gate] are all platforms on which researchers can publish data, code, and directories
Line 33: Line 34:
== Author’s Preprint ==
== Author’s Preprint ==


Consider releasing an author’s copy or preprint, but check with your publisher before doing so: not all journals will accept material that has been released. Therefore, you may need to wait until acceptance is confirmed. You can do so on a number of pre-print websites, many of which are topically-specific. You can also use GitHub and link the file directly on your personal website or whatever medium through which you are sharing the preprint. Do not use Dropbox or Google Drive for this purpose: many organizations do not allow access to these tools, and that includes blocking staff from accessing your material.  
Consider releasing an author’s copy or preprint, but check with your publisher before doing so: not all journals will accept material that has been released. Therefore, you may need to wait until acceptance is confirmed. You can do so on a number of pre-print websites, many of which are topically-specific. You can also use GitHub and link the file directly on your personal website or whatever medium through which you are sharing the preprint. Do not use Dropbox or Google Drive for this purpose: many organizations do not allow access to these tools, and that includes blocking staff from accessing your material.


== Additional Resources==
== Related Pages ==  
[[Special:WhatLinksHere/Publishing_Data|Click here for pages that link to this topic.]]


*Find an example of a published World Bank directory for replication [https://github.com/worldbank/Water-When-It-Counts here].
== Additional Resources ==
*Read the Berkeley Initiative for Transparency in the Social Sciences [https://www.bitss.org/2016/05/23/out-of-the-file-drawer-tips-on-prepping-data-for-publication/ tips] on preparing data for publication.
*J-PAL, [https://www.povertyactionlab.org/sites/default/files/resources/J-PAL-guide-to-publishing-research-data.pdf Guide to Publishing Research Data]
*J-PAL [https://www.povertyactionlab.org/sites/default/files/resources/J-PAL-guide-to-publishing-research-data.pdf Guide to Publishing Research Data]
* International Aid Transparency Initiative, [https://iatistandard.org/en/guidance/preparing-organisation/organisation-data-publication/how-to-license-your-data How to License Your Data]
[[Category: Publishing Data]]
* World Bank, [https://github.com/worldbank/Water-When-It-Counts Example of a published World Bank directory for replication ].
[[Category: Reproducible Research]]

Latest revision as of 17:07, 21 June 2021

Data publication is the release of data and data documentation following data collection and analysis. Data publication is an increasingly common standard that bolsters research transparency and reproducibility. Preparation for data publication begins in the early stages of research: effective data management and analytics throughout the project will ensure that the research team can easily publish data when the time comes and that outside users can access and use the data to replicate the researcher's primary results. This page will discuss preparing and publishing data, code, documentation, and directories.

Read First

Preparing for Release

Preparing Data

Released data should allow any user to replicate research findings. Therefore, released data should be clean and well-labelled, contain all variables used in data analysis, and include identifying variables. Make sure to maintain the privacy of respondents by carefully de-identifying any sensitive or personally-identifying information (PII) such as names, locations, or financial records, all of which are not ethical to publish.

Preparing Data Documentation

Analysis datasets should be easily understandable to researchers trying replicate results. Therefore, it's important that proper documentation, including variable dictionaries and survey instruments, accompany the data release. This ensures that users can easily understand the data. See the Microdata Catalog Checklist for instructions on how to prepare data and documentation for primary data release.

Preparing Code and Directory

For full reproducibility, release a structured directory that allows a user to immediately run your code after changing the project directory. If you have followed the DIME Wiki’s protocols and effectively managed data throughout your research project via, among other things, an organized project folder and master do-file, you will already have well-written and reproducible code within a well-structured directory.

The folders should include all de-identified data necessary for the analysis, all code necessary for the analysis; and the raw outputs you use for the paper. Using iefolder from DIME’s ietoolkit can help standardize your directory. In either the /dofiles/ folder or in the root directory, include a master script (.do or .r for example). The master script should allow the reviewer to change one line of code to set his/her directory path. Then, the master script should run the entire project and re-create all the raw outputs exactly as supplied. Check that all code will run completely on a new computer: install any required user-written commands in the master script and make sure that settings like version, matsize, and varabbrev are set. All outputs should clearly correspond by name to an exhibit in the paper, and vice versa.

Publishing

A data publication platform must be able to handle structured directories and provide a stable, structured URL for your project.

DIME survey data is typically published and released through the Microdata Catalog.

GitHub, The Open Science Framework, and Gate are often used for replication packages, as these platforms allow for publication of data, documentation, and code.

Author’s Preprint

Consider releasing an author’s copy or preprint, but check with your publisher before doing so: not all journals will accept material that has been released. Therefore, you may need to wait until acceptance is confirmed. You can do so on a number of pre-print websites, many of which are topically-specific. You can also use GitHub and link the file directly on your personal website or whatever medium through which you are sharing the preprint. Do not use Dropbox or Google Drive for this purpose: many organizations do not allow access to these tools, and that includes blocking staff from accessing your material.

Related Pages

Click here for pages that link to this topic.

Additional Resources