Difference between revisions of "Data visualization"

Jump to: navigation, search
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text.  
Data visualization is a method of expressing descriptive statistics and [[Data Analysis | analytical]] results through visual representations of the data (i.e. charts, [[Checklist: Reviewing Graphs | graphs]], etc.). It can also be a useful tool for the research team during exploratory analysis to better understand the data. During or after data analysis, the research team may use data visualizations to present results to a broader audience. Choosing the right format for data visualization is critical: good visualizations can often communicate results and persuade audiences more effectively than text. This page outlines provides resources on what data visualization to use, what colors to use, and which features to include.  


== Read First ==
* [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has prepared a [[Checklist:_Reviewing_Graphs|checklist on reviewing graphs]] before [[Dissemination|dissemination]].
*Use color strategically to differentiate groups or highlight trends; remember that color choices matter and can influence how effectively the visualization communicates information.
*The title, annotation and citation of the visualization depends on where the visualization is used.
==Deciding on a Data Visualization==
The best format for data visualization depends on the type of data, the results you wish to display, and the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. To help decide which data visualization to use [https://www.data-to-viz.com/ Data to Viz] provides a handy decision tree, while The [http://www.visual-literacy.org/periodic_table/periodic_table.html Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. Further, Gapminder’s  [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations.
== Colors ==
Colors not only set the mood for your visualization, but can also draw attention to certain features of your visualization. Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode a great deal of information if we make intelligent use of them. For a general guide about colors, see [https://blog.datawrapper.de/colors/ What to Consider When Choosing Colors for Data Visualization] and [https://blog.datawrapper.de/colorguide/ Your Friendly Guide to Colors in Data Visualization]. In general, follow these tips when deciding whether to use colors in visualizations:
* Only add colors if they add useful information by, for example, differentiating groups or highlighting information.
* For numeric variables, color can show differences, but they hide absolute values.
* If many different colors are necessary to display what you want, consider using a different way to display information: adding too many colors will make the graph difficult to read.
* Be consistent color use: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save readers both time and confusion.
* Don't forget to add legends indicating what the colors mean.
* Consider that [https://snook.ca/technical/colour_contrast/colour.html#fg=BDBDBD,bg=E7E7E7/ color contrast] is important: it will make the difference when telling colors apart.
* Use intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
*Use sequential or diverging colors scales to represent numeric variables.
* Use distinctive colors to represent categorical variables.
* Take color blindness and transition to gray scale into account.
If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the [https://color.adobe.com/create/color-wheel/ color wheel] lets you create color palettes with specific color hex (like a color code so that you can easily find it). This [http://tools.medialab.sciences-po.fr/iwanthue/index.php free color generator] even includes codes for generating color palettes in Java.


== Read First ==
==Features==
Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]). This page discusses general principles for data visualization.


== Guidelines ==
===Titles and Annotations===
Data visualizations should be intuitive: audiences should grasp whatever the data aims to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some features of a visualization based on where it will be used. Usually, if a visualization is used in an academic report or research paper, it does not need detailed titles or annotations because the paper itself will include explanations for the visualization. Here is an [https://medium.com/@yanhann10/the-state-of-data-visualization-6de999f9e386/ example] of what a visualization looks like in an academic report.


=== What type of data visualization should I use? ===
However, if a visualization is used in a website or presentation without supporting materials, then it should include detailed titles and annotations. In other words, it should help the audience with the key takeaway points by offering at least an annotative lead-in sentence. Here is an [http://www.journalism.org/2014/10/21/political-polarization-media-habits/pj_14-10-21_mediapolarization-08/ example] of an independent visualization with detailed titles and annotations.
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles.  


* [https://www.data-to-viz.com/ |Data to Viz] provides a handy decision tree.  
===Citations===
Data, like literature, should be cited. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a [https://libguides.mit.edu/c.php?g=176032&p=1159520/ guide of data citation].  


* The [http://www.visual-literacy.org/periodic_table/periodic_table.html|Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples.  
If a visualization is used in a presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is [https://www.washingtonpost.com/news/politics/wp/2018/08/09/new-data-makes-it-clear-nonvoters-handed-trump-the-presidency/?noredirect=on&utm_term=.edcfc37c0eda/ an example] of visualization with a data source citation in it.  


* Gapminder.org  [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations.  
Lastly, if you are using primary data, then there is no need to cite the data source.


===Stata Visual Library===
== Data Visualization in R ==
The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library/|Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library|on our github].
R has many options for data visualization. Here are some useful packages:


=== Data Visualization in R ===
* [https://ggplot2.tidyverse.org/ ggplot2]: this is the go-to package for static plots. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html 50 ggplot2 visualizations with full R code].
R has many options for data visualization; the ggplot package is one of the best. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html|50 ggplot2 visualizations with full R code] .  
* [https://plot.ly/r/ plotly]: creates interactive graphs, and is integrated with ggplot.
* [https://gganimate.com/ gganimate]: allows users to create animated GIFs from ggplot plots.
* [http://jkunst.com/highcharter/ Highcharter]: this package is a wrapper for the Highcharts JavaScript library.
* [https://rstudio.github.io/leaflet/ Leaflet]:  an R wrapper  to one of the most popular open-source libraries for interactive maps.
* [https://rstudio.github.io/r2d3/ R2D3]: a wrapper for JavaScript's D3 library, that creates animated graphs.


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Data Analysis]]
This article is part of the topic [[Data Analysis]]
== Additional Resources ==
== Additional Resources ==
* DIME Analytics’ [https://worldbank.github.io/Stata-IE-Visual-Library Stata Visual Library for Impact Evaluation] shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library here].
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/r-5-visualization.pdf Data Visualization in R]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/stata2-7-visualization.pdf Data Visualization in Stata]
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]
 
* Stata Cheat sheets on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Plots_2016_June-REV.pdf data visualization] and [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Syntax_2016_June-REV.pdf customizing data visualization] are useful reminders of relevant Stata code.
 
* [https://research.tableau.com/sites/default/files/p110-mackinlay.pdf Automating the Design of Graphical Presentations of Relational Information] introduces how to achieve effective graphical designs and make intelligent use of human visual abilities.


[[Category: Data Analysis]]
[[Category: Data Analysis]]

Latest revision as of 18:45, 1 July 2021

Data visualization is a method of expressing descriptive statistics and analytical results through visual representations of the data (i.e. charts, graphs, etc.). It can also be a useful tool for the research team during exploratory analysis to better understand the data. During or after data analysis, the research team may use data visualizations to present results to a broader audience. Choosing the right format for data visualization is critical: good visualizations can often communicate results and persuade audiences more effectively than text. This page outlines provides resources on what data visualization to use, what colors to use, and which features to include.

Read First

  • DIME Analytics has prepared a checklist on reviewing graphs before dissemination.
  • Use color strategically to differentiate groups or highlight trends; remember that color choices matter and can influence how effectively the visualization communicates information.
  • The title, annotation and citation of the visualization depends on where the visualization is used.

Deciding on a Data Visualization

The best format for data visualization depends on the type of data, the results you wish to display, and the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. To help decide which data visualization to use Data to Viz provides a handy decision tree, while The Periodic Table of Visualization provides a catalogue of all data visualization types with visual examples. Further, Gapminder’s visualization tools provide beautiful examples of effective visualizations.

Colors

Colors not only set the mood for your visualization, but can also draw attention to certain features of your visualization. Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode a great deal of information if we make intelligent use of them. For a general guide about colors, see What to Consider When Choosing Colors for Data Visualization and Your Friendly Guide to Colors in Data Visualization. In general, follow these tips when deciding whether to use colors in visualizations:

  • Only add colors if they add useful information by, for example, differentiating groups or highlighting information.
  • For numeric variables, color can show differences, but they hide absolute values.
  • If many different colors are necessary to display what you want, consider using a different way to display information: adding too many colors will make the graph difficult to read.
  • Be consistent color use: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save readers both time and confusion.
  • Don't forget to add legends indicating what the colors mean.
  • Consider that color contrast is important: it will make the difference when telling colors apart.
  • Use intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
  • Use sequential or diverging colors scales to represent numeric variables.
  • Use distinctive colors to represent categorical variables.
  • Take color blindness and transition to gray scale into account.

If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the color wheel lets you create color palettes with specific color hex (like a color code so that you can easily find it). This free color generator even includes codes for generating color palettes in Java.

Features

Titles and Annotations

Data visualizations should be intuitive: audiences should grasp whatever the data aims to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some features of a visualization based on where it will be used. Usually, if a visualization is used in an academic report or research paper, it does not need detailed titles or annotations because the paper itself will include explanations for the visualization. Here is an example of what a visualization looks like in an academic report.

However, if a visualization is used in a website or presentation without supporting materials, then it should include detailed titles and annotations. In other words, it should help the audience with the key takeaway points by offering at least an annotative lead-in sentence. Here is an example of an independent visualization with detailed titles and annotations.

Citations

Data, like literature, should be cited. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a guide of data citation.

If a visualization is used in a presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is an example of visualization with a data source citation in it.

Lastly, if you are using primary data, then there is no need to cite the data source.

Data Visualization in R

R has many options for data visualization. Here are some useful packages:

  • ggplot2: this is the go-to package for static plots. Here is a list of 50 ggplot2 visualizations with full R code.
  • plotly: creates interactive graphs, and is integrated with ggplot.
  • gganimate: allows users to create animated GIFs from ggplot plots.
  • Highcharter: this package is a wrapper for the Highcharts JavaScript library.
  • Leaflet: an R wrapper to one of the most popular open-source libraries for interactive maps.
  • R2D3: a wrapper for JavaScript's D3 library, that creates animated graphs.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources