Difference between revisions of "Data Map"

Jump to: navigation, search
Line 1: Line 1:
A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime DIME] for organizing the 3 main aspects of '''data work''': [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]]. It consists of three components: a [[Data Linkage Table|data linkage table]], a [[Master Dataset|master dataset]], and [[Data Flow Charts|data flow charts]]. [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] recommends using these components to organize the various your data work using these tools will make your data work more efficient, and will increase the quality of your data and, therefore, of your research.
A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime DIME] for organizing the 3 main aspects of '''data work''': [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]]. It consists of three components: a [[Data Linkage Table|data linkage table]], a [[Master Dataset|master dataset]], and [[Data Flow Charts|data flow charts]]. [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] recommends using '''data maps''' to organize the various components of your '''data work''' in order to increase the quality of data, as well as of research.


== Read First ==
* A '''data map''' has three components: a [[Data Linkage Table|data linkage table]], a [[Master Dataset|master dataset]], and [[Data Flow Charts|data flow charts]].
* The best time to start creating a '''data map''' is before starting with [[Primary Data Collection|data collection]].
* The [[Impact Evaluation Team|research team]] should keep updating the '''data map''' as the project moves forward.
* The '''data map template''' is meant to act as a starting point for [[Data Management|data management]] within a '''research team'''.
* It is important to understand the underlying '''best practices''' for each component of a '''data map''' before discussing which components do not apply in a given situation.


Read first
== Overview ==
- Your Data Plan has three components: a Data Linkage Table, Master Dataset(s), and Data Flow Chart(s)
- While the best time to start your data plan is before you start acquiring data, it is never too late. You should keep updating your data plan as your project evolves.
- If you are in the middle or towards the end of your project and you spend more time linking your datasets than doing other data work, you should step back and create a data plan.
- If you are in the middle or towards the end of your project and you spend more time linking your datasets than doing other data work, you should step back and create a data plan.
- As with all templates, you might need to add items to our Data Plan Template or you may find that some items do not apply. The template is meant to be the starting point for a conversation about your team’s data needs and organization. Make sure that you understand the underlying best practice for each item in the template before you decide that it does not apply.


 
- As with all templates, you might need to add items to our Data Plan Template or you may find that some items do not apply.
Overview


Most of the details required for a data plan are not complex by themselves. It is, for example, easy for the field coordinator to remember what the respondent ID is, during the time of that data collection activity. However, maintaining a shared understanding across time and team members - so that details are used consistently throughout the project - is not straightforward. Because details seem obvious in the short run, project teams often do not spend enough time planning and organizing data work. This tendency makes poor data planning a common source of error. Fortunately the solution - a data plan - is quick and easy to implement.  
Most of the details required for a data plan are not complex by themselves. It is, for example, easy for the field coordinator to remember what the respondent ID is, during the time of that data collection activity. However, maintaining a shared understanding across time and team members - so that details are used consistently throughout the project - is not straightforward. Because details seem obvious in the short run, project teams often do not spend enough time planning and organizing data work. This tendency makes poor data planning a common source of error. Fortunately the solution - a data plan - is quick and easy to implement.  

Revision as of 19:53, 8 September 2020

A data map is a template designed by DIME for organizing the 3 main aspects of data work: data analysis, data cleaning, and data management. It consists of three components: a data linkage table, a master dataset, and data flow charts. DIME Analytics recommends using data maps to organize the various components of your data work in order to increase the quality of data, as well as of research.

Read First

  • A data map has three components: a data linkage table, a master dataset, and data flow charts.
  • The best time to start creating a data map is before starting with data collection.
  • The research team should keep updating the data map as the project moves forward.
  • The data map template is meant to act as a starting point for data management within a research team.
  • It is important to understand the underlying best practices for each component of a data map before discussing which components do not apply in a given situation.

Overview

- If you are in the middle or towards the end of your project and you spend more time linking your datasets than doing other data work, you should step back and create a data plan.

- As with all templates, you might need to add items to our Data Plan Template or you may find that some items do not apply.

Most of the details required for a data plan are not complex by themselves. It is, for example, easy for the field coordinator to remember what the respondent ID is, during the time of that data collection activity. However, maintaining a shared understanding across time and team members - so that details are used consistently throughout the project - is not straightforward. Because details seem obvious in the short run, project teams often do not spend enough time planning and organizing data work. This tendency makes poor data planning a common source of error. Fortunately the solution - a data plan - is quick and easy to implement.

The DIME Analytics data plan template has three components: a Data Linkage Table, one or several Master Datasets and one or several Data Flow Charts. The data linkage table lists all the datasets in your project and how they link to each other. For example, it would describe how a student dataset can be merged to a school dataset, and which ID variable can be used to do so. The data linkage table also includes meta-information, such as where the original version of these data sets are backed-up, etc. There should only be one data linkage table per project. See below for templates, examples, best practices and other details.

The master dataset(s) are how you keep track of units for each level of observation. For example, keeping track of each household if your unit of observation is households, each company if your unit of observation is companies, etc. Most importantly, the master dataset specifies the uniquely and fully identifying ID variable for each unit. The master dataset should also include variables related to the research design, such as sample and treatment assignment variables. The master dataset should be the authoritative source of all information included. Many projects have multiple units of observation, requiring one master data set for each unit of observation that is central to the research. See below for details on what units are central and other details and best practices for master datasets.

The final component in the DIME Analytics data plan template is the data flow charts. There should be one flow chart per analysis data set in the project. Each data flow chart shows what datasets are needed to create the analysis dataset and how they may be combined by appending or merging them. All original datasets in a data flow chart should be listed in the data linkage table; the information in the data flow chart, for example, which variables to merge datasets on, should correspond to the information in the data linkage table. See below for more examples, best practices and other details related to data flow chart.