Difference between revisions of "Microdata Catalog"

Jump to: navigation, search
m
(Adding material from Cathrine)
Line 1: Line 1:
The [http://microdata.worldbank.org/index.php/home Microdata Library] is an online platform the offers free access to microdata produced not only by the World Bank, but also other international organizations, statistical agencies and different actors in developing countries. It includes datasets from surveys implemented as part of impact evaluations and research on development, as well as administrative data.


== Read First ==
== Read first ==
*The [http://microdata.worldbank.org/index.php/home Microdata Library] is an online platform the offers free access to microdata produced not only by the World Bank, but also other international organizations, statistical agencies and different actors in developing countries. It includes datasets from surveys implemented as part of impact evaluations and research on development, as well as administrative data.
* Data sets publish in the microdata are tipically minimally processed survey data. The [[Checklist: Microdata Catalog submission|Microdata Catalog Checklist]] lists data format, documentation requirements and instructions on how to deposit data sets.
 
* When submitting data, it is recommended to include as much information about the study and the data set as possible. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.
* As part of the submission process, it is possible to choose from different access conditions under which the data will be shared
* It is possible to make changes to access terms, as well as data sets, after the initial submission


== Guidelines for submission ==
== Guidelines for submission ==
Submission to the Microdata Catalog is done after the initial data cleaning for a round of data collection is finished. That means one impact evaluation may have different data sets in the catalog, for example for baseline, midline and endline if it has been completed, or just the baseline if no other data collection happened yet. Data sets submitted to the [http://microdata.worldbank.org/index.php/home Microdata Library] must be de-identified and accompanied by data documentation and study description. The  [[Checklist: Microdata Catalog submission|Microdata Catalog Checklist]] lists data format, documentation requirements and instructions on how to deposit data sets.
Submission to the Microdata Catalog is done after the initial data cleaning for a round of data collection is finished. That means one impact evaluation may have different data sets in the catalog, for example for baseline, midline and endline if it has been completed, or just the baseline if no other data collection happened yet. Data sets submitted to the [http://microdata.worldbank.org/index.php/home Microdata Library] must be de-identified and accompanied by data documentation and study description. The  [[Checklist: Microdata Catalog submission|Microdata Catalog Checklist]] lists data format, documentation requirements and instructions on how to deposit data sets.


Data sets can be submitted to the microdata catalog under different [http://microdata.worldbank.org/index.php/terms-of-use access conditions].
World Bank staff can deposit their data directly to the online [[http://microdatalib.worldbank.org/index.php/home Data Deposit Application]]. Depositors outside the World Bank can fill a form and submit it by email.
 
=== Data sets ===
Data may be uploaded in different formats, including STATA, SPSS and SAS, and must be [de-identified | De-identification] and minimally [cleaned | Data Cleaning]. The data cleaning required aims to provide a clear indication of what information is to be found in any given variable, so both variable and value labels must be present, including [labels for extended missing values | Data Cleaning#Survey Codes and Missing Values]. To protect the confidentiality of respondents, all [Personally Identifiable Information| De-identification#Personally Identifiable Information] must be removed. Variables containing sensitive information such as PII can be flagged in the ""Data Distribution"" section to indicate they should not be distributed.
 
=== Supporting documents ===
All relevant material that would allow the users to better understand the data and interpret the results should be included. A non-comprehensive list of documents that may be relevant is included bellow. Note that some of the material in the list below may contatin sensitive information (for example in the form of options listed in the questionnaire), so it should also be checked and de-identified.
* Questionnaires (paper format equivalent is better than CAPI form)
* Enumerator manuals
*[[Data Documentation#Field work documentation | Field work documentation ]]
* Methodology description
*[[Data Documentation#Data cleaning documentation | Data cleaning documentation]]
*[[Data Documentation#Construct documentation | Variables construction documentation]], if applicable
* Outputs such as reports, presentations, publications and papers
 
=== Study description ===
During submission, it is necessary to fill a form collecting information on the survey (metadata). Not all fields are mandatory, but providing as much information as possible makes it easier for the users to understand and explore the data. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.
 
*Mandatory Fields:
** Title
** Country
** Dates of Data Collection
** Access policy
** Catalogue where the data should be published
 
* Recommended Fields:
** Abstract
** Geographic Coverage
** Primary Investigator
** Funding
** Sampling Procedure
** Weighting
 
=== Access conditions ===
The World Bank Microdata Library disseminates data under the [[https://data.worldbank.org/summary-terms-of-use Microdata Terms of Use for the World Bank]]. When submitting data, it is possible to indicate wether the datasets should be available only to World Bank staff or to external users. It is also possible to embargo any data submitted for a certain period of time. To protect the confidentiality of individual information and to meet the requirements of the data owners who provide the microdata, there are five principal [[http://microdata.worldbank.org/index.php/terms-of-use types of access]] that may be applied:
 
* "Open access": this is the least restrictive access policy. Datasets and the related documentation are available to users for commercial and noncommercial purposes at no cost. There is no need to be being logged into the application.
 
* "Direct access": relevant datasets and the related documentation are made freely available to registered and unregistered users for statistical and scientific research purposes only, and may not be distributed. Any publications employing this type of data must cite the source, in line with the citation requirement provided with the dataset.
 
* "Public Use Files": PUFs are available to anyone agreeing to respect a core set of easy-to-meet conditions. These data are made easily accessible because the risk of identifying individual respondents or data providers is considered to be low. Terms of use are the same as direct access, but users are required to register before obtaining the data sets.
 
* "Licensed files": are files whose dissemination is restricted to bona fide users. Access is granted to authenticated users who have received authorization to access them after submitting a documented application and signing an agreement governing the data's use. These users must be acting on behalf of an organization, who must take responsibility for the use. To release data under this license, a World Bank staff must be indicated as point of contact to grant access to the data. That person will be contacted through the data catalogue manager, who works with the team to approve or reject requests.
 
* "External Repositories": The World Bank Microdata Library operates both as a data catalog for World Bank owned or licensed data as well as a portal to data held in a number of external
repositories. It is the aim of the Microdata Library to provide to user the most comprehensive catalog of development related microdata possible. To this end, studies conducted and owned by other institutions as well as links to those studies are listed in the Microdata Library Catalog. Datasets provided by external agencies are not owned or controlled by the World Bank and have their own conditions of use. When a user accesses external repositories, the terms governing use of those external repositories shall govern access to their data.
 
* "No access": some datasets have no access policy defined, or are not accessible. In some limited situations we may include a limited number of such datasets for the sake of completeness and for the purpose of providing access to questionnaires and reports.
 
== Releasing data before publication ==


== Releasing data before publication ==
One common concern among researchers is under which conditions to submit data from studies that are still ongoing and whose results have not yet been published.


== DIME Datasets on Microdata Catalog ==
== DIME Datasets on Microdata Catalog ==

Revision as of 22:04, 30 November 2017

The Microdata Library is an online platform the offers free access to microdata produced not only by the World Bank, but also other international organizations, statistical agencies and different actors in developing countries. It includes datasets from surveys implemented as part of impact evaluations and research on development, as well as administrative data.

Read first

  • Data sets publish in the microdata are tipically minimally processed survey data. The Microdata Catalog Checklist lists data format, documentation requirements and instructions on how to deposit data sets.
  • When submitting data, it is recommended to include as much information about the study and the data set as possible. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.
  • As part of the submission process, it is possible to choose from different access conditions under which the data will be shared
  • It is possible to make changes to access terms, as well as data sets, after the initial submission

Guidelines for submission

Submission to the Microdata Catalog is done after the initial data cleaning for a round of data collection is finished. That means one impact evaluation may have different data sets in the catalog, for example for baseline, midline and endline if it has been completed, or just the baseline if no other data collection happened yet. Data sets submitted to the Microdata Library must be de-identified and accompanied by data documentation and study description. The Microdata Catalog Checklist lists data format, documentation requirements and instructions on how to deposit data sets.

World Bank staff can deposit their data directly to the online [Data Deposit Application]. Depositors outside the World Bank can fill a form and submit it by email.

Data sets

Data may be uploaded in different formats, including STATA, SPSS and SAS, and must be [de-identified | De-identification] and minimally [cleaned | Data Cleaning]. The data cleaning required aims to provide a clear indication of what information is to be found in any given variable, so both variable and value labels must be present, including [labels for extended missing values | Data Cleaning#Survey Codes and Missing Values]. To protect the confidentiality of respondents, all [Personally Identifiable Information| De-identification#Personally Identifiable Information] must be removed. Variables containing sensitive information such as PII can be flagged in the ""Data Distribution"" section to indicate they should not be distributed.

Supporting documents

All relevant material that would allow the users to better understand the data and interpret the results should be included. A non-comprehensive list of documents that may be relevant is included bellow. Note that some of the material in the list below may contatin sensitive information (for example in the form of options listed in the questionnaire), so it should also be checked and de-identified.

Study description

During submission, it is necessary to fill a form collecting information on the survey (metadata). Not all fields are mandatory, but providing as much information as possible makes it easier for the users to understand and explore the data. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.

  • Mandatory Fields:
    • Title
    • Country
    • Dates of Data Collection
    • Access policy
    • Catalogue where the data should be published
  • Recommended Fields:
    • Abstract
    • Geographic Coverage
    • Primary Investigator
    • Funding
    • Sampling Procedure
    • Weighting

Access conditions

The World Bank Microdata Library disseminates data under the [Microdata Terms of Use for the World Bank]. When submitting data, it is possible to indicate wether the datasets should be available only to World Bank staff or to external users. It is also possible to embargo any data submitted for a certain period of time. To protect the confidentiality of individual information and to meet the requirements of the data owners who provide the microdata, there are five principal [types of access] that may be applied:

  • "Open access": this is the least restrictive access policy. Datasets and the related documentation are available to users for commercial and noncommercial purposes at no cost. There is no need to be being logged into the application.
  • "Direct access": relevant datasets and the related documentation are made freely available to registered and unregistered users for statistical and scientific research purposes only, and may not be distributed. Any publications employing this type of data must cite the source, in line with the citation requirement provided with the dataset.
  • "Public Use Files": PUFs are available to anyone agreeing to respect a core set of easy-to-meet conditions. These data are made easily accessible because the risk of identifying individual respondents or data providers is considered to be low. Terms of use are the same as direct access, but users are required to register before obtaining the data sets.
  • "Licensed files": are files whose dissemination is restricted to bona fide users. Access is granted to authenticated users who have received authorization to access them after submitting a documented application and signing an agreement governing the data's use. These users must be acting on behalf of an organization, who must take responsibility for the use. To release data under this license, a World Bank staff must be indicated as point of contact to grant access to the data. That person will be contacted through the data catalogue manager, who works with the team to approve or reject requests.
  • "External Repositories": The World Bank Microdata Library operates both as a data catalog for World Bank owned or licensed data as well as a portal to data held in a number of external

repositories. It is the aim of the Microdata Library to provide to user the most comprehensive catalog of development related microdata possible. To this end, studies conducted and owned by other institutions as well as links to those studies are listed in the Microdata Library Catalog. Datasets provided by external agencies are not owned or controlled by the World Bank and have their own conditions of use. When a user accesses external repositories, the terms governing use of those external repositories shall govern access to their data.

  • "No access": some datasets have no access policy defined, or are not accessible. In some limited situations we may include a limited number of such datasets for the sake of completeness and for the purpose of providing access to questionnaires and reports.

Releasing data before publication

Releasing data before publication

One common concern among researchers is under which conditions to submit data from studies that are still ongoing and whose results have not yet been published.

DIME Datasets on Microdata Catalog


Additional Resources