Comparison of ways of visualizing individual-level Likert data: line plots and heat maps and mosaic plots, oh my!

Last week, I wrote a post where I talked about how my training in evolutionary ecology led me to try reaction norms (that is, paired line plots) for plotting paired Likert data. I had already tried a few other options, but didn’t include them in that post, and I got some feedback on that post that gave me more ideas. There was also a request for code on how to actually generate those plots. So, this post shows four different ways of visualizing individual-level responses to paired Likert-scale questions (paired line plots, dot plots, mosaic plots, and heat maps). It does that for two different comparisons, leading me to the conclusion that the type of plot that works best will depend on your data. I’d love to hear which ones you think work best — there are polls where you can vote for your favorite! And, if you’re working on similar data and want to see code, there’s an associated Github repo, but it comes with the disclaimer that my code is good enough, but definitely not elegant.

Do you think climate change is happening? Overall class responses

I’ve been mostly interested in exploring how to look at individual-level paired responses, but I’ll start with some plots showing summaries of Likert-scale data. Everything I present in this post is based on a collaborative project with Susan Cheng and JW Hammond that seeks to understand how student understandings of climate change shift over the course of introductory biology. As part of that survey, we asked students:

Do you think that climate change is happening?

– Yes, and I’m extremely sure

– Yes, and I’m very sure

– Yes, and I’m somewhat sure

– Yes, but I’m not at all sure

– No, and I’m extremely sure

– No, and I’m very sure

– No, and I’m somewhat sure

– No, but I’m not at all sure

– I don’t know

We asked that at the beginning of the semester and the end of the semester. We were interested in understanding both what students think at the beginning of the semester, and how those views change (or don’t) over the course of the semester.

Here was the first plot I made to look at the data:

two bar plots. The one on top is from the beginning of the semester and the bottom is from the end. Most students chose "yes, extremely sure" on both, but it's much more skewed at the end

Not bad, and we could definitely make those x-axis labels neater, but we’re probably not going to use this plot, so let’s move on. Would it be easier to compare if we plot things side-by-side?

same data as in previous post, but this time the bars are side-by-side and in different colors

Again, that could use more work if we were going to use it, but we’re not (I actually think the first one looks better), so let’s move on.

I knew, based on earlier work I had done, that I was likely to want to use the really excellent Likert package. Let’s try that:

a plot with sideways stacked bars at the top for the beginning and on the bottom for the end. The biggest section of each is dark green, indicating the "Yes, extremely sure" option

Pretty! That’s definitely the best of the visualizations of the aggregate data! The way the Likert package plots work is by putting all the “yes” answers off on the right in green (with darker shades of green indicating the more sure responses). The “no” responses are on the left of the plot, but there are so few you can barely see them. The “I don’t know” are in gray in the middle. The percents indicate the total percent of students in the “no” category (on the left), in “don’t know” (in the middle), and in the “yes” categories (on the right). So, in short: students entered the course overwhelming agreeing that climate change is occurring, and became more sure of that over the course of the semester.

But remember that one of the things we are interested in is comparing responses of individuals. We have no idea how individual student views shifted (or didn’t). Do some students change a lot? Do some become less sure that climate change is happening? We can’t tell that from the previous plots. So, let’s move on to try to get a sense for how views of individual students changed, comparing four different options.

Do you think climate change is happening? Comparison of four ways of visualizing paired, individual-level, Likert-scale data

As I wrote about in my earlier post, my first thought was to use paired line plots, based on reaction norms in evolutionary ecology. Here’s a paired line plot for this data:

lines going from "beginning" on the left to "end" on the right. Most lines are up at the top (yes, and I'm very or extremely sure) and most have slopes that are flat or positive

For this paired line plot, student responses at the beginning and end of the semester are connected by a line. Lines are partially transparent and slightly jittered, so lines that appear darker and thicker represent more common combinations of responses. This figure lets us see that most students became more sure or stayed equally sure (that is, most lines go up or are flat). Some students make very large leaps (a line with a large positive slope), and a few students become less certain (as indicated by a line with a negative slope).

I like that way of visualizing the data, but wanted to explore others. I originally tried making a heat map. My first attempt at that failed (more on heat maps below), but led me to this plot instead, which I also like:

same data as in previous plot, but the beginning of semester response is now on the x-axis and end of semester on the y. Most points are in the upper right corner.

In this plot, each student is a dot, and the dots are partially transparent and jittered. In my opinion, this plot makes it pretty clear that most students start out very sure or extremely sure and end up extremely sure that climate change is occurring. One thing I like about this visualization is that I think it makes it clearer that there’s a ceiling on how high the responses go. I don’t think that was as clear from the earlier paired line plot.

(I’d started this before my first blog post on paired line plots, but didn’t get the above into that post. I was encouraged to see this style plot recommended on twitter by C. Savio Chan!)

In response to my earlier post, Hadley Wickham asked on twitter if I’d tried a mosaic plot. I hadn’t heard of them before, but they looked worth trying. Here’s the result:

different sized rectangles, with the size of the rectangle indicating the number of students who gave that pair of responses. The boxes are colored based on the response given at the end of the semester.

This also shows that the “very” and “extremely” sure options dominated at the beginning of the semester, and that the “extremely sure” option dominated at the end, but I find it less intuitive to read than the dot plot. It’s also a little awkward that a bunch of the options (especially at the end of the semester) weren’t chosen by any students, leading to all those axis labels getting compressed. I didn’t really invest time in trying to figure out a solution for that issue, though, since I wanted to get back to heat maps.

I was still stuck with heat maps, but my collaborator Susan came to my rescue, resulting in this heat map:

a mostly white square with dark blue in the upper right

That is, um, pretty boring. Part of this is because we let (some) other plots drop missing levels from the axes but forced them on to this one (and the mosaic plot).

(The sharp-eyed R folks might realize that the heat map isn’t made in ggplot. Susan is a lattice person, and I haven’t invested the time in trying to make it in ggplot.)

I gave some of my thoughts above, and will give more below, but first I want to ask: which of these four do you find most compelling?

I thought I was going to prefer the paired line plot but, in the end, I think I prefer the dot plot for this data.

When I first made the paired line plots, I got a little paired line plot happy, and started making them for all sorts of things. They seemed so great for the first things I plotted, that suddenly they became my favorite visualization tool. Except then I moved on to a different question…

Are the students who change their views during the course the ones who thought they could change their views at the beginning?

At the beginning of the course, we asked students:

How much do you agree or disagree with the following statement? “I could easily change my mind about climate change.”

-Strongly agree

-Somewhat agree

-Neither agree nor disagree

-Somewhat disagree

-Strongly disagree

At the end of the semester, we asked them:

This course changed how I think about climate change:

-Strongly agree

-Somewhat agree

-Neither agree nor disagree

-Somewhat disagree

-Strongly disagree

One of the things we wondered was whether the students who reported that the course changed their thinking were the ones who thought they could easily change their minds. So, I immediately busted out a paired line plot…

paired lines with lines moving up and down and staying flat, and for all possible values

…and my head promptly exploded. I had no idea what to make of that, and thought maybe it just meant it was all random. This was actually what led to the plan to try heat maps. In the end, I used all the other approaches plotted above.

Here’s the dot plot:

dot plot with the dots showing much more scatter

 

And the mosaic plot:

different sized rectangles. Most of them are blue, indicating "somewhat agree" at the end of the semester

And the heat map:

heat map. Most things below the diagonal are white, most above the diagonal are different shades of blue

For this comparison, which do you think works best?

Personally, I think the heat map works best for this comparison. It makes it clear that, at the beginning of the semester, students tended to disagree that they could change their mind, but most students agreed at the end of the semester, but that there’s a lot of variation in individual level pairings within those broad patterns.

In case you’re wondering about this comparison: at first, it seems a little funny that students said they couldn’t change their minds but then did change their minds, but I think it’s probably driven by them interpreting those two questions differently. My guess is most students interpreted the question at the beginning of the semester as something like “How much do you agree or disagree with the following statement? “I could easily change my mind about whether climate change is occurring.”” If my guess is true, then I think they were right – they don’t change their views on that. But my guess is that they interpreted the end of semester question along the lines of “I know a lot more about climate change now and/or think differently about the ways it’s happening than I did before”. I’m glad they agree with that!

If you have thoughts on these visualizations (or the data), we’d love to hear them!

 

Note: we had IRB approval for this work, which was more extensive than what I am presenting above. One notable difference is that the analyses above are just for data collected in one semester, whereas the full study includes data from two semesters. We’re working on the manuscript now!

22 thoughts on “Comparison of ways of visualizing individual-level Likert data: line plots and heat maps and mosaic plots, oh my!

  1. I actually think that the paired bar plot with peach and turquoise bars tells the story clearest. Keep it simple! The fact that the size of the bars changes pre-post tells youa utomatically that some students changed their views. For further analysis of viewpoint changes I would use an actual table, because frankly, I find all the other plot based devices confusing (of course your choice partly depends on the audience)

    • Hmm. I’m with Meghan on this one–if forced to choose between the two histogram plots, I’d choose the one-above-the-other plot rather than the two-interdigitated-histograms plot. I want to be able to see each histogram separately in order to compare them. Interdigitating the bars interferes with my ability to do that.

      If you’re going to put two histograms on the same graph, I think it’s best to do them as two line charts rather than as interdigitated bars.

      • p.s. I say this as someone who has published numerous bar graphs with interdigitated bars over the years. I now wish I could go back and change some of those figures.

    • My understanding is that tables are the norm in the education world, so I’ve wondered whether we should have both. But this manuscript on student views on climate change already has too many figures, so I’m not sure my idea to present things in figure *and* table form is actually a good one.

  2. The dot plots are good, but I think you could get more out of them with a slightly different approach by summarizing the data and using a scatter plot with color illustrating the degree of change and size indicating the number of responses:

    Transform the data with a row for each possible outcome before and after. If your categories are: “strongly disagree,” “disagree,” “don’t know,” “agree,” “strongly agree,” you’ll have 25 possible outcomes.

    Rank each category by the degree of change, negative to positive (-4 to +4)

    Count the number of responses for each category

    Create a scatterplot with:
    – diverging color by degree of change
    – scale marks by number of responses

    For my money the number of possible outcomes for the first question is a bit high, I would rank by grouping some responses, like – “No, and I’m somewhat sure” and “No, but I’m not at all sure” to a single response or rank, but as Jeremy says your mileage may vary.

    I made a plot of this in Tableau that’s cool, but there are some problems with the way Tableau does the labels that I can’t get around right at the moment. If I get that figured out I’ll post a link.

    • Not a tableau expert but here’s my take for the general plot style:

      https://public.tableau.com/profile/james4740#!/vizhome/LikertData_0/Sheet1

      It says the same thing as your dot plots, but according to data viz lore the general rule is to associate colors with categories (change rank) and quantitative values with size or position (number of responses in each category). The ranks are fixed by position so the color isn’t strictly necessary, but I think it’s easier to recognize quickly and has the advantage of providing a legend.

      I dunno, I guess it’s one other way of doing it.

      would be interesting to see how it would work with a larger number of responses

      • Thanks for sharing this! I hadn’t heard of this style of plot before. It took me some time to figure out what was going on, but once I did I could see the appeal of these plots.

        I spent time this morning working on making one in R. I suspect I’m most of the way there, but I also need to move on to some other things now, so don’t have time to complete it. Part of the issue is the number of comparisons. As you noted, there are a lot of groupings. That is because the question was based on the one asked in the Yale Climate Survey (so that we could compare it with their findings).

      • I tried making one in R too; I’ve lost it now but if I recall correctly it “worked” technically but not so great from the presentation perspective. I’m not very proficient at R though.

      • One thing that bugs me about the tableau plot is that the way size scales to the number of responses seems misleading. I feel like the size difference between 1 and 3 responses seems much greater that the size difference between 3 and 6 responses.

  3. I’m also in the midst of trying to create a viz for paired likert data responses from a survey (which is why I’m guessing your article showed up in my feed thanks to the all knowing ever listening google lol) — I’m working in PowerBI not straight R and am wondering given how both scales are discrete & ordinal, how you got your dot plot to look like mini clusters? versus just one dot or sized bubble. Thanks in advance for any insight!

  4. I like the paired line, but also like the dot plot (being able to tell how people move based on whether they are above or below the 1:1 line). However, I think there is a bug in your dot plot because “Yes, but not sure” is missing on the y-axis, which throws off how we view the one to one line. I’d also add the 1:1 line to that plot to emphasize how people move.

    • Good point about the 1:1 line! I can add that.

      It’s dropping the “Yes, but not sure” in this case because no students chose that on the post-survey. If this was the final plot, I would need to figure out how to add that. But the full dataset does include someone who picked, so it’s not an issue for the manuscript, fortunately!

  5. I was thinking about ways that ecologists visualize bipartite interaction data, too, in e.g. plant-pollinator interactions. The thing I liked was showing the frequencies of each response and also the frequencies of changes between responses; I didn’t work to make this graph pretty at all. Below is R code for Meg’s data from github

    ### what about bipartite graph (see package igraph too)
    dat<-read.csv("ccsurvey_F17_v2.csv")
    # install.packages("bipartite")
    library(bipartite)

    web<-table(dat$CC_ChangeMind_pre, dat$CC_ChangeMind_post)

    plotweb(web, arrow="down")

  6. The first dot plot has 5 categories on the Y axis and 6 on the X, which I find very confusing. Was that intentional? (i.e. at the end of the semester no one had chosen that other category?).
    Love the post in general – so great to think about how to visualize this info and see all the plots.

    • Correct, in this dataset (from the first semester where we did this survey), no one chose that option. In the second semester, a few students chose that option, so the plot of the full dataset has the same choices on the x- and y-axes, fortunately!

  7. Pingback: Should we aim to leave students feeling empowered to tackle climate change? Should we try to give them hope? And, if so, how? | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.