https://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&feed=atom&action=historyData Linkage Table - Revision history2024-03-28T23:30:38ZRevision history for this page on the wikiMediaWiki 1.37.2https://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=8531&oldid=prevLuizaa: /* Explanation */2022-11-01T23:19:34Z<p><span dir="auto"><span class="autocomment">Explanation</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 23:19, 1 November 2022</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l32">Line 32:</td>
<td colspan="2" class="diff-lineno">Line 32:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''one_to_many_id.''' Which (if any) other project ID variables does this project variable merge one-to-many to? For example, if this dataset had the unit of observations “school” and the project also has student datasets, then this column should include the student ID variable as each school merge to many students. Each ID listed here should have a master dataset.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''one_to_many_id.''' Which (if any) other project ID variables does this project variable merge one-to-many to? For example, if this dataset had the unit of observations “school” and the project also has student datasets, then this column should include the student ID variable as each school merge to many students. Each ID listed here should have a master dataset.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''many_to_one_id.''' Which (if any) other project ID variables does this project variable merge many-to-one to? For example, if this dataset had the unit of observations “student” and the project also has school datasets, then this column should include the school ID variable as many students merge to the same school. All ID variables listed here must also exist in the master dataset for the unit of observation of the dataset. So in the students/school examples, the student master dataset must include the school ID variable. Each ID listed here should have a master dataset.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''many_to_one_id.''' Which (if any) other project ID variables does this project variable merge many-to-one to? For example, if this dataset had the unit of observations “student” and the project also has school datasets, then this column should include the school ID variable as many students merge to the same school. All ID variables listed here must also exist in the master dataset for the unit of observation of the dataset. So in the students/school examples, the student master dataset must include the school ID variable. Each ID listed here should have a master dataset.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* '''file_location.''' This is the file path in the projects shared file system for where the dataset is stored. If this data is not stored in a shared file system and, for example, is pulled from the internet each time the code runs, then the URL should be listed here together with other information related how to access the data.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* '''file_location.''' This is the <ins style="font-weight: bold; text-decoration: none;">[[</ins>file path<ins style="font-weight: bold; text-decoration: none;">]] </ins>in the projects shared file system for where the dataset is stored. If this data is not stored in a shared file system and, for example, is pulled from the internet each time the code runs, then the URL should be listed here together with other information related how to access the data.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''raw_backup_location_1''' and '''raw_backup_location_2.''' Where are these data sets backed up? Be as detailed as possible. The one day you may need this information you will be very thankful for any information that you will have available. List things as storage type (hard drive, cloud etc.), full file path and name. Remember that these files need to be encrypted if they include identifying information (which is almost always the case with raw data), so you should save decryption instructions here as well. (Although you should save the decryption key in a more secure location.)</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''raw_backup_location_1''' and '''raw_backup_location_2.''' Where are these data sets backed up? Be as detailed as possible. The one day you may need this information you will be very thankful for any information that you will have available. List things as storage type (hard drive, cloud etc.), full file path and name. Remember that these files need to be encrypted if they include identifying information (which is almost always the case with raw data), so you should save decryption instructions here as well. (Although you should save the decryption key in a more secure location.)</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''notes.''' A column with notes to a specific dataset. If the same type of note appears for many datasets, consider adding a new column for that type of information</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* '''notes.''' A column with notes to a specific dataset. If the same type of note appears for many datasets, consider adding a new column for that type of information</div></td></tr>
</table>Luizaahttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=8132&oldid=prevAvnish95 at 14:24, 13 April 20212021-04-13T14:24:14Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:24, 13 April 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l38">Line 38:</td>
<td colspan="2" class="diff-lineno">Line 38:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Related Pages ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Related Pages ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Special:WhatLinksHere/Data_Linkage_Table|Click here to see pages that link to this topic]].</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Special:WhatLinksHere/Data_Linkage_Table|Click here to see pages that link to this topic]].</div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category: Reproducible Research]]</ins></div></td></tr>
</table>Avnish95https://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7864&oldid=prevAvnish95: /* Template */2020-12-14T17:04:38Z<p><span dir="auto"><span class="autocomment">Template</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 17:04, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l18">Line 18:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has created this [https://github.com/Avnish95/econ-101/raw/master/Template.xlsx downloadable template] to create a '''data linkage table'''. Given below is an example of a '''data linkage table''' that has been filled out.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has created this [https://github.com/Avnish95/econ-101/raw/master/Template.xlsx downloadable template] to create a '''data linkage table'''. Given below is an example of a '''data linkage table''' that has been filled out.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[File:<del style="font-weight: bold; text-decoration: none;">Data Linkage Table</del>.png|1000px|thumb|left|'''Fig. : A sample template for a data linkage table''']]</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[File:<ins style="font-weight: bold; text-decoration: none;">linkagetable</ins>.png|1000px|thumb|left|'''Fig. : A sample template for a data linkage table''']]</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<!-- diff cache key dimeprod_new:diff::1.12:old-7862:rev-7864 -->
</table>Avnish95https://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7862&oldid=prevKbjarkefur: /* Overview */2020-12-14T15:55:59Z<p><span dir="auto"><span class="autocomment">Overview</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:55, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l13">Line 13:</td>
<td colspan="2" class="diff-lineno">Line 13:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The '''data linkage table''' should not be a static document after it has once been created. It should reflect the research teams best understanding at any given time of the datasets that will be used in the project. If the team learns something new about any of the data sources, or realizes that halfway through the project that a another data source is needed, then the data linkage table should be updated. The overview the table gives the team also makes it easy to spot if this new development has any consequence for any other dataset in the table and then that can promptly be addressed.</ins></div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td></tr>
</table>Kbjarkefurhttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7861&oldid=prevKbjarkefur: /* Overview */2020-12-14T15:48:32Z<p><span dir="auto"><span class="autocomment">Overview</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:48, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8">Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Overview ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Overview ==</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using [[Secondary Data Sources|secondary data]]. <del style="font-weight: bold; text-decoration: none;">For example, there might be two datasets with the same units - such as firms, health workers, or agricultural plots. However, there may not be a straightforward way to </del>'''<del style="font-weight: bold; text-decoration: none;">merge''' or '''append</del>''' the <del style="font-weight: bold; text-decoration: none;">datasets</del>. <del style="font-weight: bold; text-decoration: none;">In such cases, </del>the <del style="font-weight: bold; text-decoration: none;">'''research </del>team<del style="font-weight: bold; text-decoration: none;">''' might </del>have <del style="font-weight: bold; text-decoration: none;">to try performing a '''fuzzy''' match on string </del>variables, for <del style="font-weight: bold; text-decoration: none;">example</del>. <del style="font-weight: bold; text-decoration: none;">However, this can often be a time-consuming and error-prone process</del>, and <del style="font-weight: bold; text-decoration: none;">certainly cannot be scaled </del>up <del style="font-weight: bold; text-decoration: none;">when a large number of datasets are involved</del>.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using <ins style="font-weight: bold; text-decoration: none;">multiple </ins>[[Secondary Data Sources|secondary data]] <ins style="font-weight: bold; text-decoration: none;">sources</ins>. </div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The </ins>'''<ins style="font-weight: bold; text-decoration: none;">data linkage table</ins>''' <ins style="font-weight: bold; text-decoration: none;">allows the team to plan out the "data ecosystem" of </ins>the <ins style="font-weight: bold; text-decoration: none;">project before data has even started to be acquired</ins>. <ins style="font-weight: bold; text-decoration: none;">This helps </ins>the team <ins style="font-weight: bold; text-decoration: none;">make sure that the data sets that will be acquired </ins>have <ins style="font-weight: bold; text-decoration: none;">the ID </ins>variables <ins style="font-weight: bold; text-decoration: none;">needed combine multiple dataset</ins>, <ins style="font-weight: bold; text-decoration: none;">minimizing the risk that two data sets at all cannot be combined or cannot be combined accurately. It also makes the team agree on names </ins>for <ins style="font-weight: bold; text-decoration: none;">each dataset so two similar datasets are not confused when discussed by the team</ins>. <ins style="font-weight: bold; text-decoration: none;">Furthermore</ins>, <ins style="font-weight: bold; text-decoration: none;">it also documents important meta information such as where the raw data is stored in the project folder </ins>and <ins style="font-weight: bold; text-decoration: none;">where it is backed </ins>up.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td></tr>
<!-- diff cache key dimeprod_new:diff::1.12:old-7860:rev-7861 -->
</table>Kbjarkefurhttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7860&oldid=prevKbjarkefur: /* Read First */2020-12-14T15:16:45Z<p><span dir="auto"><span class="autocomment">Read First</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:16, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l2">Line 2:</td>
<td colspan="2" class="diff-lineno">Line 2:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Read First ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Read First ==</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize 3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize <ins style="font-weight: bold; text-decoration: none;">and document the </ins>3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A [[Data Linkage Table|data linkage table]] is one of three components in a [[Data Map|data map]]. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A [[Data Linkage Table|data linkage table]] is one of three components in a [[Data Map|data map]]. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' is the best way to communicate to current and future team members what data will be used and how it may be linked.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' is the best way to communicate to current and future team members what data will be used and how it may be linked.</div></td></tr>
</table>Kbjarkefurhttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7859&oldid=prevKbjarkefur: /* Read First */2020-12-14T15:16:08Z<p><span dir="auto"><span class="autocomment">Read First</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:16, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Read First ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Read First ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize 3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize 3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* <del style="font-weight: bold; text-decoration: none;">The '''data map template''' consists of three components: a </del>[[Data Linkage Table|data linkage table]]<del style="font-weight: bold; text-decoration: none;">, </del>a <del style="font-weight: bold; text-decoration: none;">[[Master Dataset|master dataset]], and </del>[[Data <del style="font-weight: bold; text-decoration: none;">Flow Charts</del>|data <del style="font-weight: bold; text-decoration: none;">flow charts</del>]]. </div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* <ins style="font-weight: bold; text-decoration: none;">A </ins>[[Data Linkage Table|data linkage table]] <ins style="font-weight: bold; text-decoration: none;">is one of three components in </ins>a [[Data <ins style="font-weight: bold; text-decoration: none;">Map</ins>|data <ins style="font-weight: bold; text-decoration: none;">map</ins>]]. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' is the best way to communicate to current and future team members what data will be used and how it may be linked.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' is the best way to communicate to current and future team members what data will be used and how it may be linked.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' helps resolve errors in linking large and complex datasets.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' helps resolve errors in linking large and complex datasets.</div></td></tr>
</table>Kbjarkefurhttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7858&oldid=prevKbjarkefur: /* Read First */2020-12-14T15:14:27Z<p><span dir="auto"><span class="autocomment">Read First</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:14, 14 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l4">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize 3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A '''data map''' is a template designed by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] to organize 3 main aspects of [[Research Data Work|data work]]: [[Data Analysis|data analysis]], [[Data Cleaning|data cleaning]], and [[Data Management|data management]].</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data map template''' consists of three components: a [[Data Linkage Table|data linkage table]], a [[Master Dataset|master dataset]], and [[Data Flow Charts|data flow charts]]. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data map template''' consists of three components: a [[Data Linkage Table|data linkage table]], a [[Master Dataset|master dataset]], and [[Data Flow Charts|data flow charts]]. </div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* The '''data linkage table''' is the best way to communicate to current and future team members what data will be used and how it may be linked.</ins></div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' helps resolve errors in linking large and complex datasets.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* The '''data linkage table''' helps resolve errors in linking large and complex datasets.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
</table>Kbjarkefurhttps://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7846&oldid=prevAvnish95: /* Overview */2020-12-11T22:05:14Z<p><span dir="auto"><span class="autocomment">Overview</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:05, 11 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l9">Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using [[Secondary Data Sources|secondary data]]. For example, there might be two datasets with the same units - such as firms, health workers, or agricultural plots. However, there may not be a straightforward way to '''merge''' or '''append''' the datasets. In such cases, the '''research team''' might have to try performing a '''fuzzy''' match on string variables, for example. However, this can often be a time-consuming and error-prone process, and certainly cannot be scaled up when a large number of datasets are involved.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using [[Secondary Data Sources|secondary data]]. For example, there might be two datasets with the same units - such as firms, health workers, or agricultural plots. However, there may not be a straightforward way to '''merge''' or '''append''' the datasets. In such cases, the '''research team''' might have to try performing a '''fuzzy''' match on string variables, for example. However, this can often be a time-consuming and error-prone process, and certainly cannot be scaled up when a large number of datasets are involved.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">'''Note''': </del>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td></tr>
</table>Avnish95https://dimewiki.worldbank.org/index.php?title=Data_Linkage_Table&diff=7845&oldid=prevAvnish95: /* Overview */2020-12-11T22:04:59Z<p><span dir="auto"><span class="autocomment">Overview</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:04, 11 December 2020</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l9">Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using [[Secondary Data Sources|secondary data]]. For example, there might be two datasets with the same units - such as firms, health workers, or agricultural plots. However, there may not be a straightforward way to '''merge''' or '''append''' the datasets. In such cases, the '''research team''' might have to try performing a '''fuzzy''' match on string variables, for example. However, this can often be a time-consuming and error-prone process, and certainly cannot be scaled up when a large number of datasets are involved.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The purpose of a '''data linkage table''' is to allow the [[Impact Evaluation Team|research team]] to accurately and [[Reproducible Research|reproducibly]] link all datasets associated with the project. Errors in linking datasets are fairly common in development research, particularly when there are several rounds of [[Primary Data Collection|data collection]] involved, or while using [[Secondary Data Sources|secondary data]]. For example, there might be two datasets with the same units - such as firms, health workers, or agricultural plots. However, there may not be a straightforward way to '''merge''' or '''append''' the datasets. In such cases, the '''research team''' might have to try performing a '''fuzzy''' match on string variables, for example. However, this can often be a time-consuming and error-prone process, and certainly cannot be scaled up when a large number of datasets are involved.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">==== </del>Note <del style="font-weight: bold; text-decoration: none;">====</del></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">'''</ins>Note<ins style="font-weight: bold; text-decoration: none;">''': </ins>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>It is also important to note that the '''data linkage table''' should only list the original datasets, and not any datasets that are derived from them. For example, if the research team collects [[Primary Data Collection|primary data]], the data linkage table should only include the raw data, and not the [[Data Cleaning|cleaned version]] of the data. Similarly, in the case of [[Administrative Data|administrative]] or [[Data Acquisition|data acquired]] through '''web-scraping''', the table should not include combinations or reshaped versions of those datasets. As a '''best practice''', the research team should simultaneously [[Data Documentation|document the code]] which creates derivatives of the datasets that are listed in the data linkage table. This allows the research team to trace the derived datasets back to the original dataset.</div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Template ==</div></td></tr>
</table>Avnish95