Difference between revisions of "Publishing Data"

Jump to: navigation, search
Line 1: Line 1:
 
Making data available to other researchers in some form is a key need of research transparency and reproducibility. However, it is not generally possible or advisable to release raw data. [[Primary Data Collection | Primary data]] usually contains [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations, or financial records that are unethical to make public; [[Secondary Data Sources | secondary data]] is often owned by an entity other than the research team and therefore may face legal issues in public release. It is therefore important to structure both data management and analytics such that the data that is published replicates the researcher's primary results to the best degree possible and that the data that is released is appropriately accessible.
 
Making data available to other researchers in some form is a key need of research transparency and reproducibility. However, it is not generally possible or advisable to release raw data. [[Primary Data Collection | Primary data]] usually contains [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations, or financial records that are unethical to make public; [[Secondary Data Sources | secondary data]] is often owned by an entity other than the research team and therefore may face legal issues in public release. It is therefore important to structure both data management and analytics such that the data that is published replicates the researcher's primary results to the best degree possible and that the data that is released is appropriately accessible.
  
== Publishing Primary Data ==
+
== Guidelines==
 +
=== Publishing Primary Data ===
  
=== Preparing data for release ===
+
==== Preparing data for release ====
 
The main issue with releasing primary data is maintaining the privacy of respondents. It is essential to carefully [[De-identification | de-identify]] any sensitive or personally-identifying information contained in the dataset. Datasets released should be easily understandable by users, so [[Data Documentation | documentation]], including variable dictionaries and survey instruments, should be released with the data.
 
The main issue with releasing primary data is maintaining the privacy of respondents. It is essential to carefully [[De-identification | de-identify]] any sensitive or personally-identifying information contained in the dataset. Datasets released should be easily understandable by users, so [[Data Documentation | documentation]], including variable dictionaries and survey instruments, should be released with the data.
  
=== DIME data releases ===  
+
==== DIME data releases ====  
 
DIME survey data is released through the [[Microdata Catalog]]. However, access to the data may be restricted and some variables may be embargoed prior to publication.
 
DIME survey data is released through the [[Microdata Catalog]]. However, access to the data may be restricted and some variables may be embargoed prior to publication.
  
== Publishing Analysis Data ==
+
=== Publishing Analysis Data ===
 
Some journals require datasets used in [[Data Analysis | data analysis]] to be released when a paper is published. This is intended to make research more transparent and allow readers to [[Reproducible Research | reproduce findings]].
 
Some journals require datasets used in [[Data Analysis | data analysis]] to be released when a paper is published. This is intended to make research more transparent and allow readers to [[Reproducible Research | reproduce findings]].
  
=== Preparing data for release ===
+
==== Preparing data for release ====
 
The objective of the data release is to allow users to reproduce results in the paper. Therefore, the released dataset needs to contain all variables used in [[Data Analysis | data analysis]],  
 
The objective of the data release is to allow users to reproduce results in the paper. Therefore, the released dataset needs to contain all variables used in [[Data Analysis | data analysis]],  
 +
 +
== Back to Parent ==
 +
This article is part of the topic [[Publishing Data]]
 +
 +
== Additional Resources==
 +
  
 
[[Category: Publishing Data]]
 
[[Category: Publishing Data]]

Revision as of 16:44, 9 February 2018

Making data available to other researchers in some form is a key need of research transparency and reproducibility. However, it is not generally possible or advisable to release raw data. Primary data usually contains personally-identifying information (PII) such as names, locations, or financial records that are unethical to make public; secondary data is often owned by an entity other than the research team and therefore may face legal issues in public release. It is therefore important to structure both data management and analytics such that the data that is published replicates the researcher's primary results to the best degree possible and that the data that is released is appropriately accessible.

Guidelines

Publishing Primary Data

Preparing data for release

The main issue with releasing primary data is maintaining the privacy of respondents. It is essential to carefully de-identify any sensitive or personally-identifying information contained in the dataset. Datasets released should be easily understandable by users, so documentation, including variable dictionaries and survey instruments, should be released with the data.

DIME data releases

DIME survey data is released through the Microdata Catalog. However, access to the data may be restricted and some variables may be embargoed prior to publication.

Publishing Analysis Data

Some journals require datasets used in data analysis to be released when a paper is published. This is intended to make research more transparent and allow readers to reproduce findings.

Preparing data for release

The objective of the data release is to allow users to reproduce results in the paper. Therefore, the released dataset needs to contain all variables used in data analysis,

Back to Parent

This article is part of the topic Publishing Data

Additional Resources