Data visualization

Revision as of 19:08, 4 February 2019 by Yuchen (talk | contribs)
Jump to: navigation, search

Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. This page discusses general principles for data visualization.

Read First

  • Specific code for data visualization is available on the software-specific tools (e.g. iegraph).


What type of data visualization should I use?

The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles.

Fun with colors

Thanks to digital interfaces, we can visualize quantitative data in a more dynamic way. In this case, colors can encode much information if we make intelligent use of them. Colors not only can set the mood for your visualization but also can draw attention to certain features of your visualization. This is a general guide about What to Consider When Choosing Colors for Data Visualization.

General Guides for Colors in Visualizations:

  • Only add colors if they are adding useful information.
  • Ways you can use color to add information:
  a. Differentiate groups
  b. Highlight information
  • For numeric variables, color can be useful to show differences, but they hide absolute values.
  • If many different colors are necessary to display what you want, then you should consider using a different way to display information, as adding too many colors will make the graph difficult to read.
  • Be consistent with your use of colors: for example, if you are using two colors, one for treatment and one for control, use the same colors for all graphs in the same document. This will save time for readers, and doing the opposite can be very confusing.
  • Don't forget to add legends indicating what colors mean.

Tips for Picking Colors:

  • Color Contrast is important: it will make the difference when telling colors apart.
  • Using intuitive colors will save time: good is green, blue is water, darker shades are higher values than lighter shades.
  • Use sequential or diverging colors scales to represent numeric variables.
  • Use distinctive colors to represent categorical variables.
  • Take color blindness and transition to gray scale into account.

If you are uncertain which colors to use to make your visualization fancier, then color palettes can be a handy option. This is the Color Wheelwhere you can create color palettes, with specific color hex (like a color code so that you can easily find it). And here is a free color generator, which even includes codes for generating color palettes in Java.

Adapting visualizations to different contexts

  • When To Use Titles/Annotations?

Data visualizations should be intuitive, which means that audiences can grasp whatever the data try to convey in the first 20 seconds when they see a visualization. Thus, it is important to adjust some details of visualizations based on where they will be used. Usually, if a visualization is used in an academic report or research paper, we may not need to add detailed titles or annotations, because there must be explanations for the graph in the paper. Here is an example of what a visualization looks like in an academic report.

However, if a visualization is used in a website or presentation without supporting materials, then we should include detailed titles and annotations. In other words, we should help the audience on the key takeaway points by including at least an annotative lead-in sentence. Here is an example of an independent visualization with detailed titles and annotations.

  • Don’t Forget Citations

Data, like literatures, should be cited too. However, like titles and annotations, data source citation will not necessarily be included in the visualization. If you use the visualization in an academic report or research paper, you will cite data source in the reference page, so that you no longer need to include that citation again in the visualization. Here is a guide of data citation.

If a visualization is used in presentation or website and you are NOT using primary data, it is important to include the citation in your visualization. Usually, we will add a footnote at right bottom or left bottom corner of the visualization in this format - Source: Data Name and Date. Here is an example of visualization with a data source citation in it.

Lastly, if you are using primary data, then there is no need to cite the data source.

Data Visualization in R

R has many options for data visualization. Here are some useful packages:

  • ggplot2: this is the go-to package for static plots. Here is a list of 50 ggplot2 visualizations with full R code.
  • plotly: creates interactive graphs, and is integrated with ggplot.
  • gganimate: allows users to create animated GIFs from ggplot plots.
  • Highcharter: this package is a wrapper for the Highcharts JavaScript library.
  • Leaflet: an R wrapper to one of the most popular open-source libraries for interactive maps.
  • R2D3: a wrapper for JavaScript's D3 library, that creates animated graphs.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources