Difference between revisions of "Data visualization"

Jump to: navigation, search
Line 1: Line 1:
Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. This page discusses general principles for data visualization.
Data visualization is a method of expressing descriptive statistics and [[Data Analysis | analytical]] results through visual representations of the data (i.e. charts, [[Checklist: Reviewing Graphs | graphs]], etc.). It can also be a useful tool for the research team during exploratory analysis to better understand the data. During or after data analysis, the research team may use data visualizations to present results to a broader audience. Choosing the right format for data visualization is critical: good visualizations can often communicate results and persuade audiences more effectively than text. This page outlines provides resources on what data visualization to use, what colors to use, and which features to include.  


== Read First ==
== Read First ==
* The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library on our github].


* Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]).  
*Use color strategically to differentiate groups or highlight trends; remember that color choices matter and can influence how effectively the visualization communicates information.
*The title, annotation and citation of the visualization depends on where the visualization is used.


== Guidelines ==
==Deciding on a Data Visualization==
The best format for data visualization depends on the type of data, the results you wish to display, and the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. To help decide which data visualization to use [https://www.data-to-viz.com/ Data to Viz] provides a handy decision tree, while The [http://www.visual-literacy.org/periodic_table/periodic_table.html Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. Further, Gapminder’s  [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations.


== Colors ==
Colors not only set the mood for your visualization, but can also draw attention to certain features of your visualization. Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode a great deal of information if we make intelligent use of them. For a general guide about colors, see [https://blog.datawrapper.de/colors/ What to Consider When Choosing Colors for Data Visualization] and [https://blog.datawrapper.de/colorguide/ Your Friendly Guide to Colors in Data Visualization]. In general, follow these tips when deciding whether to use colors in visualizations:


=== What type of data visualization should I use? ===
* Only add colors if they add useful information by, for example, differentiating groups or highlighting information.
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles.
* For numeric variables, color can show differences, but they hide absolute values.
 
* If many different colors are necessary to display what you want, consider using a different way to display information: adding too many colors will make the graph difficult to read.
* [https://www.data-to-viz.com/ Data to Viz] provides a handy decision tree.
* Be consistent color use: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save readers both time and confusion.
 
* Don't forget to add legends indicating what the colors mean.
* The [http://www.visual-literacy.org/periodic_table/periodic_table.html Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples.
* Consider that [https://snook.ca/technical/colour_contrast/colour.html#fg=BDBDBD,bg=E7E7E7/ color contrast] is important: it will make the difference when telling colors apart.
 
* Use intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
* Gapminder.org  [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations.
*Use sequential or diverging colors scales to represent numeric variables.
 
* [https://research.tableau.com/sites/default/files/p110-mackinlay.pdf Automating the Design of Graphical Presentations of Relational Information] introduces how to achieve effective graphical designs and make intelligent use of human visual abilities.
 
=== Fun with colors ===
Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode much information if we make intelligent use of them. Colors not only can set the mood for your visualization but also can draw attention to certain features of your visualization. This is a general guide about [https://blog.datawrapper.de/colors/ What to Consider When Choosing Colors for Data Visualization].
 
General Guides for Colors in Visualizations:
 
* Only add colors if they are adding useful information.
* Ways you can use color to add information: a) Differentiate groups b) Highlight information
* For numeric variables, color can be useful to show differences, but they hide absolute values.
* If many different colors are necessary to display what you want, then you should consider using a different way to display information, as adding too many colors will make the graph difficult to read.
* Be consistent with your use of colors: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save time for readers, and doing the opposite can be very confusing.
* Don't forget to add legends indicating what colors mean.
 
Tips for Picking Colors:
* [https://snook.ca/technical/colour_contrast/colour.html#fg=BDBDBD,bg=E7E7E7/ Color Contrast] is important: it will make the difference when telling colors apart.
* Using intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
* Use sequential or diverging colors scales to represent numeric variables.
* Use distinctive colors to represent categorical variables.
* Use distinctive colors to represent categorical variables.
* Take color blindness and transition to gray scale into account.
* Take color blindness and transition to gray scale into account.


If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the [https://color.adobe.com/create/color-wheel/ Color Wheel]where you can create color palettes, with specific color hex (like a color code so that you can easily find it). And here is a [http://tools.medialab.sciences-po.fr/iwanthue/index.php free color generator], which even includes codes for generating color palettes in Java.
If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the [https://color.adobe.com/create/color-wheel/ color wheel] lets you create color palettes with specific color hex (like a color code so that you can easily find it). This [http://tools.medialab.sciences-po.fr/iwanthue/index.php free color generator] even includes codes for generating color palettes in Java.
 
=== Adapting visualizations to different contexts ===


==Features==


* When To Use Titles/Annotations?
===Titles and Annotations===
Data visualizations should be intuitive, which means that audiences can grasp whatever the data try to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some details of visualizations based on where they will be used. Usually, if a visualization is used in an academic report or research paper, we may not need to add detailed titles or annotations, because there must be explanations for the graph in the paper. Here is an [https://medium.com/@yanhann10/the-state-of-data-visualization-6de999f9e386/ example] of what a visualization looks like in an academic report.  
Data visualizations should be intuitive: audiences should grasp whatever the data aims to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some features of a visualization based on where it will be used. Usually, if a visualization is used in an academic report or research paper, it does not need detailed titles or annotations because the paper itself will include explanations for the visualization. Here is an [https://medium.com/@yanhann10/the-state-of-data-visualization-6de999f9e386/ example] of what a visualization looks like in an academic report.  


However, if a visualization is used in a website or presentation without supporting materials, then we should include detailed titles and annotations. In other words, we should help the audience on the key takeaway points by including at least an annotative lead-in sentence. Here is an [http://www.journalism.org/2014/10/21/political-polarization-media-habits/pj_14-10-21_mediapolarization-08/ example] of an independent visualization with detailed titles and annotations.
However, if a visualization is used in a website or presentation without supporting materials, then it should include detailed titles and annotations. In other words, it should help the audience with the key takeaway points by offering at least an annotative lead-in sentence. Here is an [http://www.journalism.org/2014/10/21/political-polarization-media-habits/pj_14-10-21_mediapolarization-08/ example] of an independent visualization with detailed titles and annotations.


* Don’t Forget Citations
===Citations===
Data, like literatures, should be cited too. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a [https://libguides.mit.edu/c.php?g=176032&p=1159520/ guide of data citation].  
Data, like literature, should be cited. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a [https://libguides.mit.edu/c.php?g=176032&p=1159520/ guide of data citation].  


If a visualization is used in presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is [https://www.washingtonpost.com/news/politics/wp/2018/08/09/new-data-makes-it-clear-nonvoters-handed-trump-the-presidency/?noredirect=on&utm_term=.edcfc37c0eda/ an example] of visualization with a data source citation in it.  
If a visualization is used in a presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is [https://www.washingtonpost.com/news/politics/wp/2018/08/09/new-data-makes-it-clear-nonvoters-handed-trump-the-presidency/?noredirect=on&utm_term=.edcfc37c0eda/ an example] of visualization with a data source citation in it.  


Lastly, if you are using primary data, then there is no need to cite the data source.
Lastly, if you are using primary data, then there is no need to cite the data source.
Line 68: Line 51:
== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Data Analysis]]
This article is part of the topic [[Data Analysis]]
== Additional Resources ==
== Additional Resources ==
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/stata2-7-visualization.pdf Data Visualization]
* DIME Analytics’ [https://worldbank.github.io/Stata-IE-Visual-Library Stata Visual Library for Impact Evaluation] shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library here].
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/r-5-visualization.pdf Data Visualization in R]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/stata2-7-visualization.pdf Data Visualization in Stata]
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]
* Stata Cheat sheets on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Plots_2016_June-REV.pdf Data visualization] and [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Syntax_2016_June-REV.pdf customizing data visualization] are useful reminders of relevant stata code.  
* Stata Cheat sheets on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Plots_2016_June-REV.pdf data visualization] and [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Syntax_2016_June-REV.pdf customizing data visualization] are useful reminders of relevant Stata code.  
* General Rules about Colors in Data Visualization: [https://blog.datawrapper.de/colorguide/ Your Friendly Guide to Colors in Data Visualization].
* [https://research.tableau.com/sites/default/files/p110-mackinlay.pdf Automating the Design of Graphical Presentations of Relational Information] introduces how to achieve effective graphical designs and make intelligent use of human visual abilities.
 


[[Category: Data Analysis]]
[[Category: Data Analysis]]

Revision as of 23:31, 4 June 2019

Data visualization is a method of expressing descriptive statistics and analytical results through visual representations of the data (i.e. charts, graphs, etc.). It can also be a useful tool for the research team during exploratory analysis to better understand the data. During or after data analysis, the research team may use data visualizations to present results to a broader audience. Choosing the right format for data visualization is critical: good visualizations can often communicate results and persuade audiences more effectively than text. This page outlines provides resources on what data visualization to use, what colors to use, and which features to include.

Read First

  • Use color strategically to differentiate groups or highlight trends; remember that color choices matter and can influence how effectively the visualization communicates information.
  • The title, annotation and citation of the visualization depends on where the visualization is used.

Deciding on a Data Visualization

The best format for data visualization depends on the type of data, the results you wish to display, and the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. To help decide which data visualization to use Data to Viz provides a handy decision tree, while The Periodic Table of Visualization provides a catalogue of all data visualization types with visual examples. Further, Gapminder’s visualization tools provide beautiful examples of effective visualizations.

Colors

Colors not only set the mood for your visualization, but can also draw attention to certain features of your visualization. Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode a great deal of information if we make intelligent use of them. For a general guide about colors, see What to Consider When Choosing Colors for Data Visualization and Your Friendly Guide to Colors in Data Visualization. In general, follow these tips when deciding whether to use colors in visualizations:

  • Only add colors if they add useful information by, for example, differentiating groups or highlighting information.
  • For numeric variables, color can show differences, but they hide absolute values.
  • If many different colors are necessary to display what you want, consider using a different way to display information: adding too many colors will make the graph difficult to read.
  • Be consistent color use: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save readers both time and confusion.
  • Don't forget to add legends indicating what the colors mean.
  • Consider that color contrast is important: it will make the difference when telling colors apart.
  • Use intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
  • Use sequential or diverging colors scales to represent numeric variables.
  • Use distinctive colors to represent categorical variables.
  • Take color blindness and transition to gray scale into account.

If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the color wheel lets you create color palettes with specific color hex (like a color code so that you can easily find it). This free color generator even includes codes for generating color palettes in Java.

Features

Titles and Annotations

Data visualizations should be intuitive: audiences should grasp whatever the data aims to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some features of a visualization based on where it will be used. Usually, if a visualization is used in an academic report or research paper, it does not need detailed titles or annotations because the paper itself will include explanations for the visualization. Here is an example of what a visualization looks like in an academic report.

However, if a visualization is used in a website or presentation without supporting materials, then it should include detailed titles and annotations. In other words, it should help the audience with the key takeaway points by offering at least an annotative lead-in sentence. Here is an example of an independent visualization with detailed titles and annotations.

Citations

Data, like literature, should be cited. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a guide of data citation.

If a visualization is used in a presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is an example of visualization with a data source citation in it.

Lastly, if you are using primary data, then there is no need to cite the data source.

Data Visualization in R

R has many options for data visualization. Here are some useful packages:

  • ggplot2: this is the go-to package for static plots. Here is a list of 50 ggplot2 visualizations with full R code.
  • plotly: creates interactive graphs, and is integrated with ggplot.
  • gganimate: allows users to create animated GIFs from ggplot plots.
  • Highcharter: this package is a wrapper for the Highcharts JavaScript library.
  • Leaflet: an R wrapper to one of the most popular open-source libraries for interactive maps.
  • R2D3: a wrapper for JavaScript's D3 library, that creates animated graphs.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources