What Donald Knuth is to computer programming, Edward Tufte is to data visualisation. The terms chart junk and data-to-ink ratio were coined by him, and his book The Visual Display of Quantitative Information is widely regarded as one of the main reference works for the principles behind the creation of proper graphs and charts.
There are many striking examples of efficient, informative, and just plain beautiful data visualisation techniques in Tufte’s books, but little help in terms of the technicalities of implementing the approaches. Tufte himself uses graphics software like Adobe Illustrator for creating his graphs. While this undoubtedly allows very fine control over the appearance of the graphs, it takes a lot of time and is all but impossible to automate. In a setting where the production of data visualisations is only one of many tasks, like when writing scientific articles or business reports, hand-crafting graphs is not a feasible option.
This article is available for reading and for download in pdf format
Enter pgfplots, a LaTeX package for the creation of graphs within the LaTeX document itself. It can plot data or mathematical functions, and allows for a tight integration with the overall document, both in terms of workflow and appearance. The default settings make it extremely easy to rapidly create a graph. To plot data from a .csv
file, all it takes is the line
\tikz{\begin{axis} \addplot table {data.csv}; \end{axis}}
at the point in the document where the graph is required.
In my opinion, the real strength of this package lies in its flexibility. Almost all graphical aspects of the plots can be controlled using styles, and since it’s open source and written entirely in LaTeX, it’s possible to add almost any conceivable feature with limited effort. In this blog post, I’ll demonstrate step-by-step how to create a plot style presented in Edward Tufte’s The Visual Display of Quantitative Information: A bar chart with little chart junk and high data-to-ink ratio. At the end of this post, you’ll have a style tufte ybar, which will allow you to easily create a plot like this:
Assume we have data in a file dataA.csv
that looks like this:
1 8.5 2 12 3 6.5 4 7 5 3 6 17.5 7 13 8 8.5 9 6 10 11 11 5 12 10
To create a vertical bar plot (a column plot) of this data in pgfplots
, you would say
\begin{tikzpicture} \begin{axis}[ybar] \addplot table {dataA.csv} \end{axis} \end{tikzpicture}
The first thing to do to get closer to Tufte’s bar chart style is to switch off the axis lines and tick marks. The x axis can be removed completely, by setting hide x axis
. For the y axis, we’ll merely remove the axis lines by setting y axis line style={opacity=0}
, and the tick marks by setting major tick style={draw=none}
. With bar charts like this, it’s almost always a good idea to set ymin=0
to make sure that the whole length of the bars is shown.
\begin{tikzpicture} \begin{axis}[ ybar, hide x axis, axis line style={opacity=0}, major tick style={draw=none}, ymin=0, ] \addplot table {dataA.csv} \end{axis} \end{tikzpicture}
The bars in Tufte’s chart are a bit slimmer, and there’s a white horizontal grid on top of the bars. We can specify the bar width using bar width=0.7em
(it’s usually a good idea to specify lengths in terms of ex
and em
, which will scale with the surrounding text size). The horizontal grid is switched on using majorygrids
, set to a white line colour with major grid style=white
, and put on top of the bars using axis on top
. In the code sample below, only the new options are shown. They’re all added to the optional argument of begin{axis}[...]
.
... bar width=0.7em, ymajorgrids, major grid style=white axis on top ...
In Tufte’s chart, the columns don’t have a border. We could change the way the columns are drawn by adding styles to the \addplot [...]
options. This would be a bit tedious if we want to apply this style to other graphs, though. Instead, we’re going to define a new cycle list
, which is a list of style sets, separated by \\
. To use a non-standard colour, it’s a good idea to define it at the beginning of the document using, for example, \definecolor{tufte1}{rgb}{0.7,0.7,0.55}
for a gray with a hint of yellow. This colour can then also be used for the x axis line.
\definecolor{tufte1}{rgb}{0.7,0.7,0.55} \begin{tikzpicture} ... cycle list={ fill=tufte1, draw=none\\ } ...
Instead of labelling the origin of the y axis with “0”, Tufte puts a thick line at the bottom of the bars. An easy way to do this would be to switch the x axis line back on. However, it’s very hard to get the length of the axis line to perfectly start and end on the edges of the outermost bars. Instead, we’ll use a draw
command to place the zero line.
We can specify normal TikZ/PGF commands to be executed at the end of the axis by using the key after end axis/.code={
. To draw the line, we’ll use the command \draw ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmin,0}) ++(-0.5*\pgfkeysvalueof{/pgf/bar width},0pt) -- ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmax,0}) -- ++(0.5*\pgfkeysvalueof{/pgf/bar width},0pt);
That’s a bit intimidating, so we’ll break it down. rel axis cs:0,0
refers to the bottom left corner of the plot area, axis cs:\pgfplots@data@xmin,0
is the point with the x-coordinate of the first data point, and the y-coordinate 0. The coordinate expression (A-|B)
specifies the point that lies at the intersection of a horizontal line through A
and a vertical line through B
, or in other words, the point with the y coordinate of point A
and the x coordinate of point B
. So ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmin,0})
is the point in the middle of the bottom edge of the first bar. In this example, using rel axis cs:0,0
isn’t really necessary, because the bottom edge of the plot area and the y zero line are the same, but in other situations where the plot starts above y=0, this expression is very useful.
Now, because we want the line to start at the edge of the first column, not its centre, we add ++(-0.5*\pgfkeysvalueof{/pgf/bar width},0pt)
, which specifies the point that lies half the bar width to the left of the current position, so we end up at the left edge of the column. We then draw a straight line (--
) to the centre of the bottom of the rightmost column (({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmax,0})
) and then continue the line half a bar length to the right (-- ++(0.5*\pgfkeysvalueof{/pgf/bar width},0pt)
). Voila, a line that always goes from the bottom left corner of the first column to the bottom right corner of the last column, independent of the bar width, plot size and axis limits.
... after end axis/.code={ \draw [very thick, tufte1] ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmin,0}) ++(-0.5*\pgfkeysvalueof{/pgf/bar width},0pt) -- ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmax,0}) -- ++(0.5*\pgfkeysvalueof{/pgf/bar width},0pt); } ...
Now all that’s left to do is add the percentage signs to the y tick labels, and remove the tick label for y=0. To add the percentage signs, we set yticklabel=\pgfmathprintnumber{\tick}\,\%
. The macro \tick
contains the current tick value, \pgfmathprintnumber
is a macro that prints numbers in a pretty way,automatically rounding long decimals, or removing trailing zeroes, for example. To print only axis values in increments of 5 starting from 5 we set ytick={5,10,...,100}
. It doesn’t matter that we’re specifying more tick positions than we actually need, they don’t do any harm and they keep the code more flexible in case the data changes to include larger values.
... yticklabel=\pgfmathprintnumber{\tick}\,\%, ytick={5,10,...,100} ...
Finally, to make the code reusable in other plots, we’ll create a new style called `tufte ybar` that contains all the keys that we added to the `axis` options. That way, we can just say \begin{axis}[tufte ybar] ... \end{axis}
to create a new plot.
\pgfplotsset{ tufte ybar/.style={ ybar, hide x axis, axis line style={opacity=0}, major tick style={draw=none}, ymin=0, bar width=0.7em, ymajorgrids, major grid style=white, axis on top, cycle list={ fill=tufte1, draw=none\\ }, after end axis/.code={ \draw [very thick, tufte1] ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmin,0}) ++(-0.5*\pgfkeysvalueof{/pgf/bar width},0pt) -- ({rel axis cs:0,0}-|{axis cs:\pgfplots@data@xmax,0}) -- ++(0.5*\pgfkeysvalueof{/pgf/bar width},0pt); }, yticklabel=\pgfmathprintnumber{\tick}\,\%, ytick={5,10,...,100} } }