Charts are the most common type of data visualisation, and certainly the most varied. However, it’s important to note that many readers will be confused by chart types that they have not seen before. Charts should be as easy to consume as possible. In general, a chart is used to present/compare data and a plot is used in the analysis of variables.
These are the most widely used easy to create and read charts; column charts, barcharts, piecharts, scatter plots and histograms.
When designing a data visualisation you must think carefully about the story the dataviz tells your reader, or what questions does this dataviz answer? For simplicity, let’s use the excellent Financial Time’s Visual Vocabulary to create four types of story we can tell:
To demonstrate the different types of stories we can tell with charts, let’s consider a public dataset; the results of the United Kingdom’s Brexit Referendum. In the referendum, voters were asked to select between one of two options; Remain a member of the European Union or Leave the European Union.
Chart selection is heavily dependent on what types of data you have. Many dataviz tools automatically recommend charts to you based on the data type definitions below. Let’s consider an example dataset:
You’ve collected exam results from an exam with 100 questions, taken by 200 students. In the data you have three columns; grade, number of correct responses, and percentage of correct responses.
Grades: This is a categorical and discrete variable as there are a limited set of available values. Because the order of these values is important (i.e. “Fail” should always be displayed before “Pass”) this is also an ordinal variable.
Number of correct responses: This is a discrete variable as students can only answer an integer number of questions correctly.
Percentage of correct responses: This is a continuous variable as results can vary between 0 - 100%. However, practically this is a discrete variable .
A common issue with discrete variables masquerading as continuous variables is weird-looking histograms that are often not fit for purpose; the charts below (histogram, violin chart, column chart) all display the distribution of exam result percentages differently. The “best” chart of the three is dependent on the story the chart is telling.
In general, this is sufficient knowledge about variables to get the most from other more detailed resources. It’s important to be careful with ordered ordinal variables as their intrinsic order must be well presented in the dataviz.
As mentioned above, many charting tools automatically choose the most appropriate charts based on the type of data you have. This is a fairly mature field of research, commonly called graphical perception theory. Cleveland and McGill’s seminal paper in 1987 [DOI:10.2307/2288400] was the first to systematically explore how the human perception system processes graphical data and continues to be relevant today.
In addition to automatically recommending charts, tools like ReVision [DOI:10.1145/2047196.2047247] are even capable of using computer vision to take existing charts and re-design them to be more accessible and easily read by humans.