Last week, I wrote a post where I talked about how my training in evolutionary ecology led me to try reaction norms (that is, paired line plots) for plotting paired Likert data. I had already tried a few other options, but didn’t include them in that post, and I got some feedback on that post that gave me more ideas. There was also a request for code on how to actually generate those plots. So, this post shows four different ways of visualizing individual-level responses to paired Likert-scale questions (paired line plots, dot plots, mosaic plots, and heat maps). It does that for two different comparisons, leading me to the conclusion that the type of plot that works best will depend on your data. I’d love to hear which ones you think work best — there are polls where you can vote for your favorite! And, if you’re working on similar data and want to see code, there’s an associated Github repo, but it comes with the disclaimer that my code is good enough, but definitely not elegant.
Do you think climate change is happening? Overall class responses
I’ve been mostly interested in exploring how to look at individual-level paired responses, but I’ll start with some plots showing summaries of Likert-scale data. Everything I present in this post is based on a collaborative project with Susan Cheng and JW Hammond that seeks to understand how student understandings of climate change shift over the course of introductory biology. As part of that survey, we asked students:
Do you think that climate change is happening?
– Yes, and I’m extremely sure
– Yes, and I’m very sure
– Yes, and I’m somewhat sure
– Yes, but I’m not at all sure
– No, and I’m extremely sure
– No, and I’m very sure
– No, and I’m somewhat sure
– No, but I’m not at all sure
– I don’t know
We asked that at the beginning of the semester and the end of the semester. We were interested in understanding both what students think at the beginning of the semester, and how those views change (or don’t) over the course of the semester.
Here was the first plot I made to look at the data:
Not bad, and we could definitely make those x-axis labels neater, but we’re probably not going to use this plot, so let’s move on. Would it be easier to compare if we plot things side-by-side?
Again, that could use more work if we were going to use it, but we’re not (I actually think the first one looks better), so let’s move on.
I knew, based on earlier work I had done, that I was likely to want to use the really excellent Likert package. Let’s try that:
Pretty! That’s definitely the best of the visualizations of the aggregate data! The way the Likert package plots work is by putting all the “yes” answers off on the right in green (with darker shades of green indicating the more sure responses). The “no” responses are on the left of the plot, but there are so few you can barely see them. The “I don’t know” are in gray in the middle. The percents indicate the total percent of students in the “no” category (on the left), in “don’t know” (in the middle), and in the “yes” categories (on the right). So, in short: students entered the course overwhelming agreeing that climate change is occurring, and became more sure of that over the course of the semester.
But remember that one of the things we are interested in is comparing responses of individuals. We have no idea how individual student views shifted (or didn’t). Do some students change a lot? Do some become less sure that climate change is happening? We can’t tell that from the previous plots. So, let’s move on to try to get a sense for how views of individual students changed, comparing four different options.
Do you think climate change is happening? Comparison of four ways of visualizing paired, individual-level, Likert-scale data
As I wrote about in my earlier post, my first thought was to use paired line plots, based on reaction norms in evolutionary ecology. Here’s a paired line plot for this data:
For this paired line plot, student responses at the beginning and end of the semester are connected by a line. Lines are partially transparent and slightly jittered, so lines that appear darker and thicker represent more common combinations of responses. This figure lets us see that most students became more sure or stayed equally sure (that is, most lines go up or are flat). Some students make very large leaps (a line with a large positive slope), and a few students become less certain (as indicated by a line with a negative slope).
I like that way of visualizing the data, but wanted to explore others. I originally tried making a heat map. My first attempt at that failed (more on heat maps below), but led me to this plot instead, which I also like:
In this plot, each student is a dot, and the dots are partially transparent and jittered. In my opinion, this plot makes it pretty clear that most students start out very sure or extremely sure and end up extremely sure that climate change is occurring. One thing I like about this visualization is that I think it makes it clearer that there’s a ceiling on how high the responses go. I don’t think that was as clear from the earlier paired line plot.
(I’d started this before my first blog post on paired line plots, but didn’t get the above into that post. I was encouraged to see this style plot recommended on twitter by C. Savio Chan!)
In response to my earlier post, Hadley Wickham asked on twitter if I’d tried a mosaic plot. I hadn’t heard of them before, but they looked worth trying. Here’s the result:
This also shows that the “very” and “extremely” sure options dominated at the beginning of the semester, and that the “extremely sure” option dominated at the end, but I find it less intuitive to read than the dot plot. It’s also a little awkward that a bunch of the options (especially at the end of the semester) weren’t chosen by any students, leading to all those axis labels getting compressed. I didn’t really invest time in trying to figure out a solution for that issue, though, since I wanted to get back to heat maps.
I was still stuck with heat maps, but my collaborator Susan came to my rescue, resulting in this heat map:
That is, um, pretty boring. Part of this is because we let (some) other plots drop missing levels from the axes but forced them on to this one (and the mosaic plot).
(The sharp-eyed R folks might realize that the heat map isn’t made in ggplot. Susan is a lattice person, and I haven’t invested the time in trying to make it in ggplot.)
I gave some of my thoughts above, and will give more below, but first I want to ask: which of these four do you find most compelling?
I thought I was going to prefer the paired line plot but, in the end, I think I prefer the dot plot for this data.
When I first made the paired line plots, I got a little paired line plot happy, and started making them for all sorts of things. They seemed so great for the first things I plotted, that suddenly they became my favorite visualization tool. Except then I moved on to a different question…
Are the students who change their views during the course the ones who thought they could change their views at the beginning?
At the beginning of the course, we asked students:
How much do you agree or disagree with the following statement? “I could easily change my mind about climate change.”
-Neither agree nor disagree
At the end of the semester, we asked them:
This course changed how I think about climate change:
-Neither agree nor disagree
One of the things we wondered was whether the students who reported that the course changed their thinking were the ones who thought they could easily change their minds. So, I immediately busted out a paired line plot…
…and my head promptly exploded. I had no idea what to make of that, and thought maybe it just meant it was all random. This was actually what led to the plan to try heat maps. In the end, I used all the other approaches plotted above.
Here’s the dot plot:
And the mosaic plot:
And the heat map:
For this comparison, which do you think works best?
Personally, I think the heat map works best for this comparison. It makes it clear that, at the beginning of the semester, students tended to disagree that they could change their mind, but most students agreed at the end of the semester, but that there’s a lot of variation in individual level pairings within those broad patterns.
In case you’re wondering about this comparison: at first, it seems a little funny that students said they couldn’t change their minds but then did change their minds, but I think it’s probably driven by them interpreting those two questions differently. My guess is most students interpreted the question at the beginning of the semester as something like “How much do you agree or disagree with the following statement? “I could easily change my mind about whether climate change is occurring.”” If my guess is true, then I think they were right – they don’t change their views on that. But my guess is that they interpreted the end of semester question along the lines of “I know a lot more about climate change now and/or think differently about the ways it’s happening than I did before”. I’m glad they agree with that!
If you have thoughts on these visualizations (or the data), we’d love to hear them!
Note: we had IRB approval for this work, which was more extensive than what I am presenting above. One notable difference is that the analyses above are just for data collected in one semester, whereas the full study includes data from two semesters. We’re working on the manuscript now!