How do you make figures?

Continuing on my stats and figure theme from last week, I’m curious as to how most of our readers make figures. I drafted this post before those posts appeared, and had no idea how common my approach of moving figures from a stats program into another program for final touch ups was. It seems like something that people mostly don’t talk about, though the few times I’ve mentioned doing that to other scientists, they’ve generally been really relieved to hear they weren’t the only person who does this. The comments on those posts from last week suggest there’s a lot of variation in how people do things, but that it might be pretty common for people to do some or a lot of processing of a figure in a program like Powerpoint or Illustrator.

So, I think it would be interesting to poll our readers. As an example, here’s a figure from a recent article by Wolak, Roff, and Fairbairn, entitled “Are we underestimating the genetic variances of dimorphic traits?”:

Wolaketal

(I have no connection to this paper – I picked the figure because I think it shows the sorts of data that ecologists often want to plot, and has multiple panels.)

Let’s assume this was your manuscript and you wanted to make this figure. How would you do it? (Feel free to ignore panel B if that helps. Here, I’m interested in how you’d plot data you collected.)

 

On to the next question:

*I’m just going with options I can think of people mentioning off the top of my head. I’m sure I’m forgetting some. When I was first coming up with the list, I thought of CricketGraph, then realized that’s more than a tad outdated.

 

Finally, let’s assume a reviewer asked you to remake the figure excluding all the data from one of your study sites. How long would this take you (assuming you would be doing this 6 months or so after making the original figure)?

 

I will be very interested in seeing what people do!

 

53 thoughts on “How do you make figures?

    • I would probably use text() after setting xpd = NA parameter via a call to par(). That way you can use the coordinate system of each panel and then push the label some % of the range of the x and y axes outside the plotting region. I find this easier to get precise placing, though I would normally tend towards putting the panel label inside the plotting region but using the same idea of offsetting the label from the edge of the plotting frame by some % of the range of the axes.

      There is a function corner.label() in the plotrix package, which will do this for you though i tend to use my own home-grown code for this (unfortunately, not packaged or on github; most likely languishing in some long-forgotten folder that I have to search for each time I want to do this…)

  1. I mostly make figures in R but then pull them into Powerpoint to make multi-panel figures, add panel labels (a), (b), etc., fiddle with the axis labels, etc. Though for journals that expect you to do a lot of the typesetting yourself (e.g., Plos One), I’ve had trouble conforming to their formatting requirements using this approach.

    I make a few figures in Excel. For Price equation stuff, I usually do the calculations in Excel spreadsheets, then make the figures (which are usually just simple scatterplots) in Excel, pulling them into Powerpoint if I need to make a multipanel figure. (aside to Excel haters: I know what I’m doing and have lots of checksums etc. in my spreadsheets, and the calculations are actually more transparent and easier to follow this way. At least for me, which is what matters most.)

    Oh, and I’m crap with R, so I make simple figures by clicking buttons in R Commander. It’s great for when all you need is something simple like a plot of means and standard errors. And it shows you the code you would’ve typed to do what the button clicking did, so it’s a good tool for learning graphing in R as well. And while I’m sure my approach here will appall readers who care deeply about reproducibility, the truth is that most of my figures are so simple that reproducibility isn’t compromised by my approach here.

    • This is not directed specifically at you, Jeremy, but you sparked a thought.

      A point that I have been trying to get across to one of my colleagues is that although there is an initial (and substantial) time investment to learning a scripting language like R/Python/Matlab/etc., increases in efficiency with continued use should, in the long term, more than make up for the initial time investment. Based on my own experiences using and teaching R, increases in scripting skill proficiency are nonlinear because understanding tends to compound, so the perceived cost in time to learn the scripting language often is greater than the time required.

      I think that the long game is what we all should be looking at with anything related to work efficiency and productivity. Unless you’re close to retirement. In which case, please consider retiring early so some poor postdoc can finally move into a nice tenure-track position! (We’re a very hungry, poor lot.)

      I co-taught a course on R, and one of the topics I covered was basic plotting using the built-in R graphics facilities. It seems like a lot of the plotting issues that folks here are discussing are relatively simple things that would be covered by one of my lab example scripts. So, to increase the efficiency of Science, I give to you, a link to some of my example scripts, along with one of the figures that the scripts generate.*

      https://drive.google.com/a/vt.edu/folderview?hl=en&id=0B0otGHl-eevbcmtMS0JQYVgyaHc&tid=0B7JHuXiePi43Q3Y5VE5kSlRHSHc#list

      * Don’t judge the figures based on any biological relevance. The data just happens to be convenient and the plots are just to show graphic capabilities.

      • “A point that I have been trying to get across to one of my colleagues is that although there is an initial (and substantial) time investment to learning a scripting language like R/Python/Matlab/etc., increases in efficiency with continued use should, in the long term, more than make up for the initial time investment.”

        Judging when those up front investments are worth it in the long term, and when they’re not, is hard. It’s a hard judgement call to make in part because the experience of others isn’t always a good guide.

        For instance, building an audience for a blog requires an up front investment–you typically have to post often and keep it up for months before you get the payoff of an audience. But here’s the thing: the people who tend to make that investment are a non-random subset of all people. Specifically, they’re people who like blogging and are good at it. So the up front investment is less costly for them to make (because they like blogging), and is more likely to pay off (because they’re good at blogging). If you don’t have those attributes, making an up front investment in blogging is probably a bad idea.

        I have an old post on this, arguing that often it makes sense for people to stick with whatever way of doing things works for them: https://dynamicecology.wordpress.com/2013/03/25/advice-why-some-academics-shouldnt-read-blogs-or-use-twitter-or-facebook/

        I think the key thing is to not get stuck in an unthinking rut. Don’t keep doing things the way you’ve always done them purely out of habit, or because you didn’t know there was any alternative. Always keep an eye out for better ways of doing things, and tentatively try some of them out to see if they might work for you (posts and comment threads like this one are great for that!) And pay special attention to advice from people who are similar to you in the relevant ways.

      • @ Jeremy

        I completely agree with you that making a decision on whether or not to invest in making a change is a difficult, and personal, judgement. There’s LOTS of things I’d like to do, professionally, but there are not enough hours in the week.

        With regard to “building an audience for a blog requires an up front investment–you typically have to post often and keep it up for months before you get the payoff of an audience” – my most recent post was on that very topic, with a graph, made in Excel 🙂

        https://jeffollerton.wordpress.com/2015/02/24/building-a-blog-readership-takes-time/

      • Omnigraffle is beautiful. Personally I wouldn’t use it for labels (that’s kind of like using a kitchen-aid blender as a sledgehammer to kill mosquitos), but, for what it’s good for – dynamically laid-out graph/flowchart-like-things – it’s a great time-saver that is powered by adequate design sense that it “makes the right choice” often enough that many results can be used without further tweaking.

    • Ah, yes, Sigmaplot would have been a good one to add. I’ll see if I can edit it to add that in. I almost added SAS as a joke option, because I don’t know anyone who actually makes figures in SAS. But maybe some people do?

      • I couldn’t do the above figure that Meg shows in R in under a minute; maybe 15 minutes if the data for the plot was prepared and I had a good idea of how I wanted the final figure to look. But I could recreate the figure with altered data in seconds or re-arrange the panels accurately in seconds if something had to be changed. I’m reasonably certain that in the (not-so) long run I will win out using R over using something like SigmaPlot because of this, and not because R is faster to create plots in from scratch.

    • Another for SigmaPlot, even if I do the vast majority of my actual analyses in R. Learned it as an MS student; generally like it; have been too lazy to invest time in making better R figures. Someday, lack of a licence will force a change.

  2. I think the example is quite simple and I would try to do it using R only. However, when things turn to be more complex I usually start with R, export the graph as SVG and use Inkscape for fine tuning.
    Inkscape is cool: elements are independent, text can be edited, it’s free, easy to use, light and easy to install (not like Illustrator…)

  3. I used to use SimgaPlot, which was then usually exported for tweaking, but a couple of years ago I started using ggplot2 in R and have never looked back. It requires an intiail time investment, but once you understand *how* it works, it’s a massive time-saver – i’m one of the people that answered < 5 for the last question 😛 It's super powerful and you can make really complex plots, without exporting, etc.! I often have to look up details when doing something new, but Google/StackOverflow are awesome for that.

  4. I arrange my data in Excel, then import and make all my graphs in Prism, where you have a huge amount of flexibility, especially for arranging panels and adding labels etc very easily, plus exporting is very clean (auto-trimming borders, selecting resolution). I used to use PowerPoint and it was TERRIBLE for that. It really isn’t well designed for that purpose.

  5. It’s been a while since I used PowerPoint (or related software to create slide decks), preferring LaTeX/Beamer or HTML slidedecks these days, but for those of you moving figures from R to PowerPoint for editing and enhancement, what format are you saving the figure as from within R? The last time I tried to move anything from R to an Office product and have it remain editable, I had to use metafiles, which invariably was broken, seemingly on the MS Office side of the equation. Has this changed now?

    • I can easily export a plot from Rstudio to ppt by saving it to the clipboard, but only when working on a pc. This doesn’t work on my mac, which is annoying.

      • So it is probably still going as some form of metafile then. I wondered if you could pull PDF figures in to PowerPoint now and edit them as native graphics elements. That this doesn’t work on a Mac therefore is not surprising at all.

    • I find that exporting from R to PDF gives higher quality images than R to jpg, so I generally export to PDF then copy/paste into paint to make a jpg, which I can then use with Office. However, I’ve just learned about RMarkdown, which can export directly to MS Word format (or pdf, or html).

      • If you are preparing a figure for a print journal, never use JPG or PNG. Ideally use a vector-based format (like PDF, but many journals won’t accept PDF so you may need to convert to EPS later), or if the figure contains any raster elements or is very complex (lots of elements) a high-res TIFF is a reasonable alternative.

      • Unless you have raster elements or complex figure with many elements, I use vector-based formats like PDF, EPS, or even SVG if I need to go to Inkscape for some final tweaking of a figure. TIFF is a last resort if the figures are rasters or very complex. I never use PNG unless I’m preparing for the web and that is either only for blog posts or html slide decks.

    • For publications, I save the R figure as a pdf and then do any fine tuning in Illustrator. If I want to include a figure in a talk, I just import the pdf directly into PowerPoint. That way the file can remain small without becoming overly-pixellated after resizing.

    • If I need the figure in a other format then pdf (e.g. journal requirements), I save my figures as SVG and convert then via inkscape to the desired format.

  6. I arrange and archive my data in Excel, I use Igor Pro and Prism for making graphs, then use Illustrator to pretty them up, change colors, etc. Sometimes I make the entire graph and layout in Igor since it is the most flexible. Powerpoint sucks for making figure layouts, and Excel sucks for graphing. Some lab peeps use Matlab for more complex data sets, but for the simple graphs in the example you show Matlab is total overkill, that could easily handled entirely via Igor’s GUI. And if I needed to make a simple change to remove a dataset it would take all of 2 minutes. If I was using an Igor – Illustrator combo, then I would change the plot in Igor and replace the Illustrator layer that has the relevant data. So one extra step. I’ve never used R.

      • Igor is another one I’ve never heard of! The comments on this post have me wondering how people develop their figure-making routine. Is it based on what their advisor did? Feedback from colleagues? Blog posts?

      • @Meg: I my case I decided a long time ago to do exactly the opposite of what my PhD advisor and colleagues did when preparing their figures. That has served me well over the years 🙂

  7. In general, I don’t think post-processing as raster with programs such as Power Point or Photoshop is a good idea. Outputting or converting R plots to raster will nearly always lead to a loss of quality, and typically increase file size as well. Better to use the pdf output of R, and then a vector graphic program such as Illustrator or Inkscape.

    Sadly, even if one does that, many lower IF journals will still take the nice vector file and transfer it to a larger and ugly .jpg.

    • If you bring the figure in from R as a metafile and then pull that into Word after editing in PowerPoint, you’ll be using a vector-based format, either a metafile or an enhanced metafile (I forget what is used now). Those can reproduce with good quality if using MS Office tools to generate the final outputs.

      • OK, if you can keep everything as emf and everyone is working in an MS office environment, that’s fine I guess. I had the impression from the discussion that many people convert to tiff or another raster format though. In general I think that pdf is currently the most widely readable, reusable and changeable file format across the different OS, followed by eps and svg.

  8. While I know a lot of people use R to build figures, I had no idea the response would have that kind of distribution. (More difficult to answer question: how much does the Dynamic Ecology poll-answering readership represent the field? What would the distributions look like with time-since-started-grad school crosstabs?)

    • Answers would be: It doesn’t. And very different! I have no idea why some people think Excel is no good for graphs; for the kind of data I collect I’ve yet to encounter a graph that I couldn’t make in Excel. And certainly all of the graphs in that panel could be done in minutes (and better).

      Perhaps it’s my age…..

  9. Goal – generate the entire Figure in R

    One point I haven’t seen is that one’s way of thinking about how to communicate a result graphically will be constrained by your tools. I don’t like this constraint, because I often let plots drive the way I think about an analysis. R (or matlob or python or whatever) frees your mind! But this means that I need to be able to create novel plots or modify existing plot types. An example: I like Box plots but don’t find the whiskers intuitive. So I created “error interval” plots using the tools of ggplot2 (Any figure from this paper: file:///Users/jeffwalker1/Downloads/Evolution-2014-Walker-Confounding.pdf). You may or may not like them but the point is that I wasn’t constrained but what is available in a canned graphics program.

    Indeed, I love looking at the different plots people create using R or ggplot2. I have a paper in review now that uses a double-direction heat map to illustrate the magnitude of the correlations in a 10 x 10 correlation matrix (near zero is white, closer to -1 is dark blue, closer to +1 is dark red). This is so much more effective than a table of numbers for seeing patterns. I wouldn’t have done this unless I had played around with heat maps for fun non-science stuff (like a calendar heat map of my running miles or something equally stupid).

    ggplot2 is a significant advance from how I used to do things! For example, Fig. 3 here: http://jeb.biologists.org/content/201/7/981.full.pdf (and the other boxplots) was coded in Pascal because I didn’t like the boxplot routines of any canned graphics program available for the mac in the late 1990s. I then touched it up in Freehand, which was an awesome but ultimately lost out to Illustrator.

    Once I used Freehand (or illustrator or inkscape) to make illustrations or touch up figures, I could never use something like Powerpoint or Excel to make or even touch-up a graph. They are tedious and cumbersome and if they are flexible I don’t know how to find it.

    • It’s fun to go back to old issues of Science and see hand drawn lettering. But mechanical tools for lettering, splines, lines, regular shapes, etc. have been around for what, centuries if not millenia? My wife’s first job was creating charts for the biomedical scientists at Penn using all these mechanical tools. When the department bought the very first mac (version 1) it was revolutionary! Another blast from the past is a 3-D scatterplot published in a paper by either/and Sokal and Rohlf in which they built a truly 3-D diorama using styrofoam board, styrofoam balls, and stiff wire to position the balls and then snapped a photo of that and published it!

  10. Years ago: Excel
    The past several years: R (regular plot and ggplot2) followed by touch-up in Paint.NET
    Now: Python with matplotlib

    Each move has been a huge step up. Why spend 5-30 minutes redrawing a figure?! (Have baby, no time for that.) All-in-one program. Drop the data and redraw.

  11. I avoid postediting a figure if at all humanly possible. The ability to go back to a graph creating script, tweak one thing the reviewers requested and have the whole figure redrawn in 30 seconds (including all the label font resizing etc) is hard to beat. But then I hate twiddling with small things like alignment and fontsize in graphing programs.

    With the basic graphics in R it was no question that Matlab (or the very similar Python/matplotlib which is lifted from Matlab) was a superior environment for fully scripting publication quality graphs. With ggplot the playing field has gotten more even, but I still find Matlab/matplotlib more able to do the kinds of tweaks most people are doing in Illustrater/Inkscape, etc.

  12. Processing for representing the data, Adobe InDesign for the layout and labels – but data representation is my research area, so optimizing the plotting routines is half the purpose of what I’m doing anyway.

    I would /strongly/ recommend that people currently using Illustrator for labels/etc, consider looking into InDesign instead. Illustrator is optimized for creating vector art, its layout and labeling features are add-on convenience functions. InDesign is optimized for layout, and its workflow is better optimized for this than Illustrator’s.

  13. I used to do 90% in R, and then touch-up using Inkscape. For one my last papers, however, I made an effort to properly learn the formatting options of ggplot2, specifically to make the entire figure within R. It was for Global Change Biology, which also has very specific figure guidelines (such as “Scale/tick marks on graphs should be inside the axes”, “Only 5-7 ticks should be labeled per axis”, or “Symbols should be 3 mm across. Data lines should be 0.5 mm thick”). Nothing you’d hang on your wall, but I quite liked the result: http://onlinelibrary.wiley.com/store/10.1111/gcb.12308/asset/image_n/gcb12308-fig-0002.png?v=1&t=i6nwp63r&s=61ce8ba365dda0aa66ecfd0f702536a367399d75.

    I think that the strict guidelines actually encouraged me to do it, as I thought “if I need to be this specific, I’d rather do it only once”. One nice thing about ggplot2 is that I could create an object “gcb,opts” with all the formatting parameters , and then just add “+ gcb.opts” after each plot command. This means I had to spend time once to implement all the guidelines as ggplot options, and have it immediately applied to all six plots on the paper. And I’m pretty sure you can do the same thing using par() on base graphics.

    Nowadays, the only thing I don’t use ggplot2 for is plots with two Y axes. It’s not possible do it beacuse Hadley Wickham is explicitly against it: http://stackoverflow.com/questions/3099219/how-to-use-ggplot2-make-plot-with-2-y-axes-one-y-axis-on-the-left-and-another)

    I agree with some of the criticism, but I still think they’re useful if you want to show how two time series co-vary in time. Last time I wanted to do one of those, I used base R.

  14. As a PhD student (of biology/ecology at the University of Turku, Finland) we were taught to do figures with Sigmaplot. So I’ve used it ever since. It is a bit complicated program at times but you get pretty pictures with it.

  15. I’m going to stick my nose in and make a controversial suggestion: It might be appropriate, in many situations, rather than “finding the right software/approach” for making a figure, to instead find a collaborator in the field of data visualization and analytics and enlist their aid in conveying your data.

    Personally, I’m fond of trying to be a “renaissance person”. I completely understand the drive to be able to do everything oneself, and nothing I write should be considered to be a recommendation _not_ to learn how to do data-presentation as effectively as possible but:

    Imagine running across a paper from a Visualization scientist, presenting ecological data, where the Vis scientist had clearly never spoken to an ecologist. Unless you’re quite unlike the microbiologists I’m usually telling this to, that idea (rightly) provokes some eye-rolling and groans. That reaction is true from the other side of the coin as well. (Interestingly, the Vis/Analytics folks would almost never believe that their training prepared them to adequately do ecology or microbiology, but the reverse is not true. I blame the chapter on data analysis in the back of every basic-science-field-X book for this phenomenon. The concept of a Computer science book with a chapter in the back on Microbiology or Ecology, is ludicrous… Anyway, that’s a topic for a different post.)

    There are people who get advanced graduate degrees in data visualization and presentation, and they don’t get those degrees just for knowing where all the right buttons are in Excel, or other canned graph-producing package 🙂 The kicker is that they _desperately_ need you, to provide context for what they do. All the perfectly-designed visualization in the world doesn’t amount to a hill of beans if it’s not visualizing something meaningful, and communicating the content to people who need better insight into the data.

    So, the next time you’re thinking “how should I make this figure”, consider thinking “maybe I shouldn’t. Maybe I should collaborate”. There are people out there who would love to help you present your science more effectively, you’d be helping them advance their own science at the same time, and, you’d both get to concentrate on the areas where you’ve invested extraordinary amounts of effort in exquisite training.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.