Data visualization
Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. This page discusses general principles for data visualization.
Read First
- The DIME Analytics team has created a Stata Visual Library for Impact Evaluation which shows examples of graphs and provides the codes used to create them. You can contribute to the library on our github.
- Specific code for data visualization is available on the software-specific tools (e.g. iegraph).
Guidelines
What type of data visualization should I use?
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles.
- Data to Viz provides a handy decision tree.
- The Periodic Table of Visualization provides a catalogue of all data visualization types with visual examples.
- Gapminder.org visualization tools provide beautiful examples of effective visualizations.
- Automating the Design of Graphical Presentations of Relational Information introduces how to achieve effective graphical designs and make intelligent use of human visual abilities.
Fun with colors
Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode much information if we make intelligent use of them. Colors not only can set the mood for your visualization but also can draw attention to certain features of your visualization. This is a general guide about What to Consider When Choosing Colors for Data Visualization.
General Guides for Colors in Visualizations:
- Only add colors if they are adding useful information.
- Ways you can use color to add information: a) Differentiate groups b) Highlight information
- For numeric variables, color can be useful to show differences, but they hide absolute values.
- If many different colors are necessary to display what you want, then you should consider using a different way to display information, as adding too many colors will make the graph difficult to read.
- Be consistent with your use of colors: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save time for readers, and doing the opposite can be very confusing.
- Don't forget to add legends indicating what colors mean.
Tips for Picking Colors:
- Color Contrast is important: it will make the difference when telling colors apart.
- Using intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
- Use sequential or diverging colors scales to represent numeric variables.
- Use distinctive colors to represent categorical variables.
- Take color blindness and transition to gray scale into account.
If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the Color Wheelwhere you can create color palettes, with specific color hex (like a color code so that you can easily find it). And here is a free color generator, which even includes codes for generating color palettes in Java.
Adapting visualizations to different contexts
- When To Use Titles/Annotations?
Data visualizations should be intuitive, which means that audiences can grasp whatever the data try to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some details of visualizations based on where they will be used. Usually, if a visualization is used in an academic report or research paper, we may not need to add detailed titles or annotations, because there must be explanations for the graph in the paper. Here is an example of what a visualization looks like in an academic report.
However, if a visualization is used in a website or presentation without supporting materials, then we should include detailed titles and annotations. In other words, we should help the audience on the key takeaway points by including at least an annotative lead-in sentence. Here is an example of an independent visualization with detailed titles and annotations.
- Don’t Forget Citations
Data, like literatures, should be cited too. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a guide of data citation.
If a visualization is used in presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is an example of visualization with a data source citation in it.
Lastly, if you are using primary data, then there is no need to cite the data source.
Data Visualization in R
R has many options for data visualization. Here are some useful packages:
- ggplot2: this is the go-to package for static plots. Here is a list of 50 ggplot2 visualizations with full R code.
- plotly: creates interactive graphs, and is integrated with ggplot.
- gganimate: allows users to create animated GIFs from ggplot plots.
- Highcharter: this package is a wrapper for the Highcharts JavaScript library.
- Leaflet: an R wrapper to one of the most popular open-source libraries for interactive maps.
- R2D3: a wrapper for JavaScript's D3 library, that creates animated graphs.
Back to Parent
This article is part of the topic Data Analysis
Additional Resources
- DIME Analytics' Data Visualization
- Harvard Business Review Article on Visualizations that Really Work
- Stata Cheat sheets on Data visualization and customizing data visualization are useful reminders of relevant stata code.
- General Rules about Colors in Data Visualization: Your Friendly Guide to Colors in Data Visualization.