Difference between revisions of "Microdata Catalog"
(Adding material from Cathrine) |
|||
Line 58: | Line 58: | ||
* "No access": some datasets have no access policy defined, or are not accessible. In some limited situations we may include a limited number of such datasets for the sake of completeness and for the purpose of providing access to questionnaires and reports. | * "No access": some datasets have no access policy defined, or are not accessible. In some limited situations we may include a limited number of such datasets for the sake of completeness and for the purpose of providing access to questionnaires and reports. | ||
== Collections available == | |||
The Microdata Library operates as a portal for datasets originating from the World Bank and other international, regional and national organizations. These contributions make up the Central Microdata Catalog, which can also be viewed and searched by collection. When submitting data to the Catalog, it is necessary to specify in which collection it should be filled. | |||
* World Bank catalogs | |||
** Global Financial Inclusion (Global Findex) Database | |||
** Service Delivery Facility Surveys | |||
** The STEP Skills Measurement Program | |||
** The World Bank Group Country Opinion Survey Program (COS) | |||
** Development Research Microdata | |||
** Enterprise Surveys | |||
** Impact Evaluation Surveys | |||
** Living Standards Measurement Study (LSMS) | |||
** Migration and Remittances Surveys | |||
* External catalogs | |||
**Global Health Data Exchange (GHDx), Institute for Health Metrics and Evaluation (IHME) | |||
**Integrated Public Use Microdata Series (IPUMS) International | |||
**MEASURE DHS: Demographic and Health Surveys | |||
**Millennium Challenge Corporation (MCC) | |||
**UNICEF Multiple Indicator Cluster Surveys (MICS) | |||
**WHO’s Multi-Country Studies Programmes | |||
**DataFirst , University of Cape Town, South Africa | |||
== Releasing data before publication == | == Releasing data before publication == | ||
One common concern among researchers is under which conditions to submit data from studies that are still ongoing and whose results have not yet been published. There are several options available. | |||
First of all, we recommend submitting the data and soon as it is collected. The review process will guarantee that documentation is submitted, reducing the riskof not remembering important details about how the data was processed if it is only used when the intervention is completed, or endline data is collected. Further more, once deposited the data is safely stored, reducing the less likely, but even more worrying chance of losing any data. So depositing data early on guarantees that transistion between team members is more smooth and that less information is lost over time. | |||
One | |||
The different access conditions and the possibility of updating the data can be used to hold from release any information that may create issues if made public prior to publication. One possibility is to submit a data set and embargo any treatment assignment variables from release until results are published. In this case, it is important to indicate in the documentation that such variables have been removed and will be released in a future date. Alternatively, it is also possible to embargo the whole data set, making it "no access". | |||
== DIME Datasets on Microdata Catalog == | == DIME Datasets on Microdata Catalog == |
Revision as of 22:35, 30 November 2017
The Microdata Library is an online platform the offers free access to microdata produced not only by the World Bank, but also other international organizations, statistical agencies and different actors in developing countries. It includes datasets from surveys implemented as part of impact evaluations and research on development, as well as administrative data.
Read first
- Data sets publish in the microdata are tipically minimally processed survey data. The Microdata Catalog Checklist lists data format, documentation requirements and instructions on how to deposit data sets.
- When submitting data, it is recommended to include as much information about the study and the data set as possible. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.
- As part of the submission process, it is possible to choose from different access conditions under which the data will be shared
- It is possible to make changes to access terms, as well as data sets, after the initial submission
Guidelines for submission
Submission to the Microdata Catalog is done after the initial data cleaning for a round of data collection is finished. That means one impact evaluation may have different data sets in the catalog, for example for baseline, midline and endline if it has been completed, or just the baseline if no other data collection happened yet. Data sets submitted to the Microdata Library must be de-identified and accompanied by data documentation and study description. The Microdata Catalog Checklist lists data format, documentation requirements and instructions on how to deposit data sets.
World Bank staff can deposit their data directly to the online [Data Deposit Application]. Depositors outside the World Bank can fill a form and submit it by email.
Data sets
Data may be uploaded in different formats, including STATA, SPSS and SAS, and must be [de-identified | De-identification] and minimally [cleaned | Data Cleaning]. The data cleaning required aims to provide a clear indication of what information is to be found in any given variable, so both variable and value labels must be present, including [labels for extended missing values | Data Cleaning#Survey Codes and Missing Values]. To protect the confidentiality of respondents, all [Personally Identifiable Information| De-identification#Personally Identifiable Information] must be removed. Variables containing sensitive information such as PII can be flagged in the ""Data Distribution"" section to indicate they should not be distributed.
Supporting documents
All relevant material that would allow the users to better understand the data and interpret the results should be included. A non-comprehensive list of documents that may be relevant is included bellow. Note that some of the material in the list below may contatin sensitive information (for example in the form of options listed in the questionnaire), so it should also be checked and de-identified.
- Questionnaires (paper format equivalent is better than CAPI form)
- Enumerator manuals
- Field work documentation
- Methodology description
- Data cleaning documentation
- Variables construction documentation, if applicable
- Outputs such as reports, presentations, publications and papers
Study description
During submission, it is necessary to fill a form collecting information on the survey (metadata). Not all fields are mandatory, but providing as much information as possible makes it easier for the users to understand and explore the data. This reduces the number of future queries received from both catalogue staff preparing the data and users trying to properly understand the survey process.
- Mandatory Fields:
- Title
- Country
- Dates of Data Collection
- Access policy
- Catalogue where the data should be published
- Recommended Fields:
- Abstract
- Geographic Coverage
- Primary Investigator
- Funding
- Sampling Procedure
- Weighting
Access conditions
The World Bank Microdata Library disseminates data under the [Microdata Terms of Use for the World Bank]. When submitting data, it is possible to indicate wether the datasets should be available only to World Bank staff or to external users. It is also possible to embargo any data submitted for a certain period of time. To protect the confidentiality of individual information and to meet the requirements of the data owners who provide the microdata, there are five principal [types of access] that may be applied:
- "Open access": this is the least restrictive access policy. Datasets and the related documentation are available to users for commercial and noncommercial purposes at no cost. There is no need to be being logged into the application.
- "Direct access": relevant datasets and the related documentation are made freely available to registered and unregistered users for statistical and scientific research purposes only, and may not be distributed. Any publications employing this type of data must cite the source, in line with the citation requirement provided with the dataset.
- "Public Use Files": PUFs are available to anyone agreeing to respect a core set of easy-to-meet conditions. These data are made easily accessible because the risk of identifying individual respondents or data providers is considered to be low. Terms of use are the same as direct access, but users are required to register before obtaining the data sets.
- "Licensed files": are files whose dissemination is restricted to bona fide users. Access is granted to authenticated users who have received authorization to access them after submitting a documented application and signing an agreement governing the data's use. These users must be acting on behalf of an organization, who must take responsibility for the use. To release data under this license, a World Bank staff must be indicated as point of contact to grant access to the data. That person will be contacted through the data catalogue manager, who works with the team to approve or reject requests.
- "External Repositories": The World Bank Microdata Library operates both as a data catalog for World Bank owned or licensed data as well as a portal to data held in a number of external
repositories. It is the aim of the Microdata Library to provide to user the most comprehensive catalog of development related microdata possible. To this end, studies conducted and owned by other institutions as well as links to those studies are listed in the Microdata Library Catalog. Datasets provided by external agencies are not owned or controlled by the World Bank and have their own conditions of use. When a user accesses external repositories, the terms governing use of those external repositories shall govern access to their data.
- "No access": some datasets have no access policy defined, or are not accessible. In some limited situations we may include a limited number of such datasets for the sake of completeness and for the purpose of providing access to questionnaires and reports.
Collections available
The Microdata Library operates as a portal for datasets originating from the World Bank and other international, regional and national organizations. These contributions make up the Central Microdata Catalog, which can also be viewed and searched by collection. When submitting data to the Catalog, it is necessary to specify in which collection it should be filled.
- World Bank catalogs
- Global Financial Inclusion (Global Findex) Database
- Service Delivery Facility Surveys
- The STEP Skills Measurement Program
- The World Bank Group Country Opinion Survey Program (COS)
- Development Research Microdata
- Enterprise Surveys
- Impact Evaluation Surveys
- Living Standards Measurement Study (LSMS)
- Migration and Remittances Surveys
- External catalogs
- Global Health Data Exchange (GHDx), Institute for Health Metrics and Evaluation (IHME)
- Integrated Public Use Microdata Series (IPUMS) International
- MEASURE DHS: Demographic and Health Surveys
- Millennium Challenge Corporation (MCC)
- UNICEF Multiple Indicator Cluster Surveys (MICS)
- WHO’s Multi-Country Studies Programmes
- DataFirst , University of Cape Town, South Africa
Releasing data before publication
One common concern among researchers is under which conditions to submit data from studies that are still ongoing and whose results have not yet been published. There are several options available.
First of all, we recommend submitting the data and soon as it is collected. The review process will guarantee that documentation is submitted, reducing the riskof not remembering important details about how the data was processed if it is only used when the intervention is completed, or endline data is collected. Further more, once deposited the data is safely stored, reducing the less likely, but even more worrying chance of losing any data. So depositing data early on guarantees that transistion between team members is more smooth and that less information is lost over time.
The different access conditions and the possibility of updating the data can be used to hold from release any information that may create issues if made public prior to publication. One possibility is to submit a data set and embargo any treatment assignment variables from release until results are published. In this case, it is important to indicate in the documentation that such variables have been removed and will be released in a future date. Alternatively, it is also possible to embargo the whole data set, making it "no access".
DIME Datasets on Microdata Catalog