TerranArticles ‣ Python Visualization Comparison

Terran ‣ Articles ‣ Python Visualization Comparison

Goals for a Visualization Library

Visualization has long been one of the weakest areas of the Python data science open source tools. Sarkar published his book on Lattice, a groundbreaking R visualization library, in 2008; a full decade later, Python is just barely catching up to the sophistication of the Lattice library, and there are no books on any of the leading options. The Python community instead spent a long time working with imperative graphing libraries, apparently unaware that the state of the art had moved on.

For exploratory data analysis, we need a high-level graphing library where you describe what data feature you want mapped to what type of visual representation and the library handles it. We do not want a library where you have to say "give me a subplot with an axis object, use this axis object to draw these points here, with this line type, and this color. Now draw these points with this other color"; that kind of library causes too much of your working memory to be filled with the mechanics of how to make the plot, and it encourages you to reuse the same type of graphs over and over again, which is not the right kind of thinking.

Here are some other people's overviews: https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data-visualization-landscape-including-ggplot-and-altair/ (note that this evaluates a different, older port of ggplot, not plotnine)

Good Options

To be rated "good", a library must have a high-level descriptive language which is consistent between types of plots, must support all or nearly all of the plots I commonly use, and it must be well documented online or in a book.

As of October 2018, I have not been able to find any options which meet all of my criteria for being "good".

Acceptable Options

Altair

Altair is a newish library being developed and endorsed by Jake VanderPlas, the author of one of the main Python books. It has a proper high level API and is surprisingly pleasant to use; with a Tableau-like feel. It creates charts by emitting JSON which is then rendered by the Vega-Lite library, which is the method of rendering that is very much in fashion today and allows for interactivity. Altair seems to be what plotly should have been. There is no book, but the online documentation seems to be pretty good. Some common plot types are not supported, there are open bugs that bit me even on my fairly simple examples, and it does not support large data well.

Plotnine

Plotnine is a port of ggplot to python. ggplot has an excellent level of abstraction and expressiveness; the biggest weakness in plotnine is the shortage of documentation or tutorials. One has to use ggplot documentation and translate it to Python. I found this translation to be straightforward, and I didn't run into any bizarre failures. Python enthusiasts may object to the "non-Pythonic" syntax of adding elements and settings to a plot with +.

Since the plotnine documentation is not very good or complete, here is some R documentation you can follow along with:

Marginal Options

seaborn (with “figure-level” API from 0.9)

Seaborn is a large package with multiple APIs. The aspect of it which I consider acceptable is what they call the "figure-level" graphs, introduced in 0.9, such as sns.relplot. As of October 2018, some common functionality such as heatmaps is still not available in this API. In addition to being incomplete, it seems to suffer from trying to tie together existing functionality without an overarching design for how plots ought to be specified.

I think Seaborn is a reasonable an option for people who already know Matplotlib or who have a strong external requirement to be Matplotlib-compatible.

My Seaborn Example Notebook

Poor Options

matplotlib

Matplotlib is the lowest common denominator for python graphics; although popular, it is much too low-level for what we want. Python Data Science chapter 4 covers matplotlib.

plotly

Plotly may be even lower level than Matplotlib, requiring manual specification of each marker and its attributes. Plotly produces nice results in interactive webapps, but its API is not what we want for data science.

bokeh

Also much too low-level; look how much code it takes just to make a boxplot or a histogram.

Back to the Full Python Curriculum More Articles