Data visualization

Revision as of 20:12, 30 January 2019 by Yuchen (talk | contribs)
Jump to: navigation, search

Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. This page discusses general principles for data visualization.

Read First

  • Specific code for data visualization is available on the software-specific tools (e.g. iegraph).

Guidelines

What type of data visualization should I use?

The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles.

Fun with colors

Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode much information if we make intelligent use of them. Colors not only can set the mood for your visualization but also can draw attention to certain features of your visualization. For example, red makes you think of emergency, warning, or blood; blue makes you think of peace, rightness, or sadness. Here is an example of using color red to represent victims killed in a drone strike in Pakistan.

How to manipulate color in your data visualization depends on what type of data you have. Generally, if you have continuous data, you would be better using color gradients. In other words, you can use deeper blue to represent a higher value and lighter blue to represent a lower value. For example, New York City has more population than Washington, DC, so we can use deeper blue to represent NY and light blue to DC. If you have categorical data, you should use distinctive colors. A very common case would be people usually use color red to represent Democrats and blue to represent Republicans.

This article introduces how people use distinctive colors and color gradients to represent gender data.

If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. Here is the website that includes various color palettes, with specific color hex (like a color code so that you can easily find it). And here is a free color generator, if you feel like creating your own color palettes.

Adapting visualizations to different contexts

  • When To Use Titles/Annotations?

Data visualizations should be intuitive, which means that audiences can grasp whatever the data try to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some details of visualizations based on where they will be used. Usually, if a visualization is used in an academic report or research paper, we may not need to add detailed titles or annotations, because there must be explanations for the graph in the paper. Here is an example of what a visualization looks like in an academic report.

However, if a visualization is used in a website or presentation without supporting materials, then we should include detailed titles and annotations. In other words, we should help the audience on the key takeaway points by including at least an annotative lead-in sentence. Here is an example of an independent visualization with detailed titles and annotations.

  • Don’t Forget Citations

Data, like literatures, should be cited too. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a guide of data citation.

If a visualization is used in presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is an example of visualization with a data source citation in it.

Lastly, if you are using primary data, then there is no need to cite the data source.

Data Visualization in R

R has many options for data visualization. Here are some useful packages:

  • ggplot2: this is the go-to package for static plots. Here is a list of 50 ggplot2 visualizations with full R code.
  • plotly: creates interactive graphs, and is integrated with ggplot.
  • gganimate: allows users to create animated GIFs from ggplot plots.
  • Highcharter: this package is a wrapper for the Highcharts JavaScript library.
  • Leaflet: an R wrapper to one of the most popular open-source libraries for interactive maps.
  • R2D3: a wrapper for JavaScript's D3 library, that creates animated graphs.

Back to Parent

This article is part of the topic Data Analysis


Additional Resources