Difference between revisions of "Geo Spatial Data"
Maria jones (talk | contribs) (Created page with "{{subst:dime_wiki}}") |
|||
(50 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
== Read First == | |||
< | <onlyinclude> | ||
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data. | |||
</onlyinclude> | |||
[[ | |||
== Guidelines == | |||
===Repositories of Spatial Data=== | |||
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources. | |||
*[https://earthengine.google.com/datasets/ Google Earth Engine]: Stores petabytes of satellite imagery on google's cloud. | |||
*[http://sedac.ciesin.columbia.edu/ Socio Economic Data and Applications Center (SEDAC)]: Provides links to a number of spatially referenced datasets. | |||
*[http://geoquery.org/ AidData geo.query]: Allows users to extract data to administrative boundaries. | |||
===Satellite-Based Datasets=== | |||
The following are commonly used datasets from satellite imagery or derived from satellite imagery. | |||
{| class="wikitable" | |||
|- | |||
! Dataset | |||
! Spatial Resolution | |||
! Temporal Resolution | |||
! Description | |||
|- | |||
| [https://ngdc.noaa.gov/eog/viirs/index.html Nighttime Lights: VIIRS] | |||
| 300m | |||
| Monthly, 2012 to Present | |||
| Nighttime lights has increasingly been used as a metric for local economic development. | |||
|- | |||
| [https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html Nighttime Lights: DMSP-OLS] | |||
| 750m | |||
| Annual, 1992-2013 | |||
| For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has [http://journals.sfu.ca/apan/index.php/apan/article/view/7/pdf_7 several improvements] over DMSP-OLS, including a high resolution and less light saturation in urban areas. | |||
|- | |||
| [https://landsat.usgs.gov/ Landsat] | |||
| 30m | |||
| Every 16 days, 1972 to Present | |||
| Landsat images capture the earth across [https://landsat.usgs.gov/what-are-band-designations-landsat-satellites multiple spectral bands], including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index ([https://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.php NDVI]), which provides a measure of vegetation biomass. A list of common indices can be found [http://pro.arcgis.com/en/pro-app/help/data/imagery/indices-gallery.htm here]. | |||
|- | |||
| [https://www.esa-landcover-cci.org/?q=node/175 ESA Land Cover] | |||
| 300m | |||
| Annual, 1992 to 2015 | |||
| Classifies land cover into one of [https://www.theia-land.fr/en/products/land-cover-globcover 22 land cover types]. | |||
|} | |||
===Georeferenced Data Sources=== | |||
*[http://aiddata.org/datasets AidData]: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved [http://aiddata.org/data/world-bank-geocoded-research-release-level-1-v1-4-2 World Bank projects] from 1995 to 2014, [http://aiddata.org/datasets Chinese official finance] from 2000 to 2014, and [http://aiddata.org/data/afdb-2009-2010-all-approved-projects African Development Bank] projects approved in 2009-2010. | |||
*[http://afrobarometer.org/data/geocoded-data Afrobarometer]: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys. | |||
*[https://dhsprogram.com/ Demographic and Health Surveys (DHS)]: DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS [https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf randomly displaces] the geographic coordinates to protect respondent confidentiality. | |||
*[http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:23512006~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html Living Standards Measurement Survey (LSMS)]: Most LSMS datasets are geocoded at the enumeration area level. | |||
===Impact Evaluation with Geospatial Data=== | |||
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see [http://docs.aiddata.org/ad4/pdfs/wps44_a_primer_on_geospatial_impact_evaluation_methods_tools_and_applications.pdf here] for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) [https://github.com/itpir/geoMatch geoMATCH], which employs matching while accounting for geographic spillover from treatment to control units and (2) [https://github.com/itpir/geoSIMEX geoSIMEX], which allows users to account for spatial imprecision in analysis. | |||
===Use of intersection to produce usable data for Stata=== | |||
*For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages. | |||
*Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting. | |||
*So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project. | |||
*Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section. | |||
*These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful. | |||
*All that needs to be done then is to export your newly generated data. | |||
===Data Interpolation=== | |||
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous. | |||
*The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results. | |||
*In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero. | |||
*GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time). | |||
*The interpolation of your data lead to the production of heat maps. | |||
===Heat Maps=== | |||
*When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data. | |||
*These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them. | |||
===Examples of Papers=== | |||
* Many influential papers using these type of data have been published in journals | |||
* J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.102.2.994 Measuring Economic Growth from Outer Space]. In '''American Economic Review''', 102(2): 994-1028. | |||
* Dave Donaldson and Adam Storeygard. 2016. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.4.171 The View from Above: Applications of Satellite Data in Economics]. '''Journal of Economic Perspectives''', 30(4):171-198. | |||
== Back to Parent == | |||
This article is part of the topic [[Secondary Data Sources]] | |||
== Additional Resources == | |||
* Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html | |||
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/stata-gis.pdf Geospatial Data with <code>spmap</code> ] | |||
[[Category: Secondary Data Sources ]] |
Latest revision as of 19:36, 14 May 2019
Read First
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data.
Guidelines
Repositories of Spatial Data
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources.
- Google Earth Engine: Stores petabytes of satellite imagery on google's cloud.
- Socio Economic Data and Applications Center (SEDAC): Provides links to a number of spatially referenced datasets.
- AidData geo.query: Allows users to extract data to administrative boundaries.
Satellite-Based Datasets
The following are commonly used datasets from satellite imagery or derived from satellite imagery.
Dataset | Spatial Resolution | Temporal Resolution | Description |
---|---|---|---|
Nighttime Lights: VIIRS | 300m | Monthly, 2012 to Present | Nighttime lights has increasingly been used as a metric for local economic development. |
Nighttime Lights: DMSP-OLS | 750m | Annual, 1992-2013 | For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has several improvements over DMSP-OLS, including a high resolution and less light saturation in urban areas. |
Landsat | 30m | Every 16 days, 1972 to Present | Landsat images capture the earth across multiple spectral bands, including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index (NDVI), which provides a measure of vegetation biomass. A list of common indices can be found here. |
ESA Land Cover | 300m | Annual, 1992 to 2015 | Classifies land cover into one of 22 land cover types. |
Georeferenced Data Sources
- AidData: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved World Bank projects from 1995 to 2014, Chinese official finance from 2000 to 2014, and African Development Bank projects approved in 2009-2010.
- Afrobarometer: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys.
- Demographic and Health Surveys (DHS): DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS randomly displaces the geographic coordinates to protect respondent confidentiality.
- Living Standards Measurement Survey (LSMS): Most LSMS datasets are geocoded at the enumeration area level.
Impact Evaluation with Geospatial Data
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see here for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) geoMATCH, which employs matching while accounting for geographic spillover from treatment to control units and (2) geoSIMEX, which allows users to account for spatial imprecision in analysis.
Use of intersection to produce usable data for Stata
- For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages.
- Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting.
- So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project.
- Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section.
- These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful.
- All that needs to be done then is to export your newly generated data.
Data Interpolation
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous.
- The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results.
- In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero.
- GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time).
- The interpolation of your data lead to the production of heat maps.
Heat Maps
- When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data.
- These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them.
Examples of Papers
- Many influential papers using these type of data have been published in journals
- J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. Measuring Economic Growth from Outer Space. In American Economic Review, 102(2): 994-1028.
- Dave Donaldson and Adam Storeygard. 2016. The View from Above: Applications of Satellite Data in Economics. Journal of Economic Perspectives, 30(4):171-198.
Back to Parent
This article is part of the topic Secondary Data Sources
Additional Resources
- Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html
- DIME Analytics' Geospatial Data with
spmap