A little while back we invited you to ask us anything. Here’s the next question, and our answers. Questions have been edited for clarity and brevity; see the comments on the linked post for the original questions.

**How do you help undergraduate researchers in your lab learn how to analyze data from independent research projects? Particularly students who haven’t yet had a stats course, and the student’s need to understand the statistics conceptually as well as learn a particular stats package. (from snsheth)**

Jeremy: I’ve never had to deal with this problem because I’ve never taken on an independent study student who’d never had a stats course. I couldn’t foresee myself ever doing so. At Calgary, any student experienced enough to be eligible to do an independent study course will already have had at least intro biostats. I think that’s a good thing. If you don’t have sufficient background knowledge, you can’t do an “independent” research project (though you could be a research assistant who’s mostly just asked to follow instructions).

Which doesn’t mean I expect my independent study students to already know the stats they’ll be using! Far from it, in fact. But it’s *much* easier to teach a student the concepts behind some advanced statistical approach, as well as how to do it in your software package of choice, if they’ve already learned some basic statistical concepts and had some practice using a stats package.

Brian: I guess I am a bit like Jeremy. I’m not sure I should teach an undergrad a specific stats concept in the heat of a project. I guess it depends on the undergrad’s role. If they’re doing an honors project or in some other way needing to do an independent start to finish project, I would hesitate to do that with them if they hadn’t taken statistics (or at least taking it concurrently). But if they’re part of a larger project (e.g. a field tech) I might argue that maybe that is not the time/place to teach statistics. (Jeremy adds: in my lab, a complete experiment takes 2-3 months, with data coming in daily. So I do exploratory and preliminary analyses as I go and then share with the summer undergrads who are collecting the data. I don’t teach them R commands or the theory behind any stats I do. I just explain enough of the stats to give them the gist. Although what I consider to be “the gist” probably involves more explanation than it would for some.)

Meg: I have many students in my lab who have not had a formal course in stats. I really like having students start early in their undergraduate careers, if possible, and often they haven’t had stats yet at that point. And, even when they have had stats, it seems like we often need to review a lot when going through the analyses for a project. How much I walk things through with the student depends on what their role is on the project (are they mainly helping someone else out on the project vs. doing an honors thesis?) At the most basic level, I try to have all students understand that we can’t measure things perfectly and how we use statistics to try to help us compare our treatments. We always start by looking through the figures of the data — having the students take a first stab at plotting their data is always very useful in terms of getting them to think through their analyses more. Usually this is something that we discussed earlier, but it’s great to review. In some cases, I then do the actual analyses myself, and then walk through the interpretation with them. For example, if it was a two way ANOVA, we’ll talk about what a significant interaction means (and doesn’t mean). In other cases, I have the students do the analysis, and then we compare our results. And sometimes it’s in between, with us jointly working on doing the analysis.

One thing I’ve found is that the way students present data tends to be very different than the way scientific papers do. It’s really common that the first draft of their results reads something like “Figure 1 shows the results of the food quality experiment.” We then talk through how we normally present results, and hopefully end up with something like “Food quality significantly affected infection levels (stats; Figure 1).” It’s been interesting to me how consistent students are in presenting results this way, even when they’ve read a fair number of scientific papers.

Thank you all for your responses to my question. In my limited previous experience, most of the undergrads who have conducted independent research with me have either 1) have not taken a statistics course, often because they’ve heard from more senior students that what is offered is not all that useful in practice, or 2) have taken a statistics course but did not retain much because they did not grasp the relevance or potential applications at the time they were taking the course. I’ve mostly taken Meg’s approach of running the analysis myself and walking students through what I did, but I find students are increasingly interested in learning R and I feel that part of my mentoring should be at least teach them how to learn R on their own. This becomes challenging because often the students learn R quickly, but do not understand the underlying statistical concepts. It sounds like in many cases the best compromise may be for students to learn how to bring their data into R and make graphs without getting into the details of the statistical analyses unless the student’s previous background permits otherwise. Thanks again!

I’m no longer an undergrad, but despite the number of manuscripts/papers I’ve written and helped write, I *still* screw up what Meg’s flagged in the last paragraph…but in the opposite direction. Every time a draft that I’ve written crosses the desk of one of my supervisors, seems like half of my “X significantly affects Y1 and Y2” type comments turn into “X is significant (see Figure 1).”

On student’s vs. professional writing. I had a colleague’s student send me a draft of a co-authored paper and she wrote like a professional. I was impressed. But the problem I had with her description is that she’s learned many (many!) bad practices. That is, most of the professionals in many fields in biology emphasize exactly the wrong things (p-values and significance) without really giving much thought to the importance of the effect size relative to some loss function or realistic errors that account for bias, etc. or with statements that confuse statistical and biological significance. This criticism has been around for 70 years and is widely and prominently published but it doesn’t seem to have made much of an impact on the practice of professional writing in Results sections. It has a little. But not much.