Also this week: tenured history professor fired for serial sexual misconduct, a data-based profile of polisci Twitter, and more.
From Jeremy:
UC Santa Cruz has fired tenured history professor Gopal Balakrishnan after finding him guilty of sexual misconduct. The firing follows a months-long investigation into multiple formal complaints against Balakrishnan, and an even longer period of student protests against him. One of the complainants, former UC Santa Cruz student Anneliese Harlander, is still suing Balakrishnan in Superior Court for sexual assault. Like the complainants and the protestors, I’m glad UC Santa Cruz did the right thing, while remaining dismayed that it took so many brave complaints and years of protests in order for the right thing to happen. (ht @jtlevy)
The Canadian Institute of Ecology and Evolution is calling for working group proposals. Deadline Oct. 14. You can also apply for funding for training workshops any time.
Nature profiles ecologist Thomas Crowther. Lots to think about here, in terms of where (some of) ecology is at and where it’s going.
A data-based profile of political science Twitter. Very interesting. Someone should do this for ecology Twitter.
A 2018 Nature paper that found that the oceans were warming much faster than predicted by climate models has been retracted. The data analysis contained errors that greatly inflated the precision of the relevant parameter estimates. The errors were originally pointed out by a prominent critic of climate science. Kudos to the authors for doing the right thing: they quickly looked into the errors and publicly corrected them.
Related to a link from last week, here’s an article on how, thanks to climate change, insurance losses due to “secondary perils”–risks like hail and wildfire–now exceed losses due to “primary perils” like earthquakes and hurricanes.
The dangers of reusing the same “natural experiments” to address many questions. Interesting. Curious what Andrew Gelman would think of this.
The latest on the move of the USDA’s Economic Research Service to Kansas City, a move widely seen as an attempt by the Trump administration to gut USDA scientific and technical expertise. Only 19 of 280 employees chose to move; 88 left the agency and 50 retired, with the remainder granted exemptions to remain in Washington, DC. A USDA memo says the agency’s work will be “significantly delayed” due to staff shortages.
Dan Bolnick with a very good post (esp. for PhD students) on how to choose a research project. Related posts from me here and here.
The 2019 MacArthur Fellows have been announced. Congratulations to all recipients, including marine ecologist Stacy Jupiter, evolutionary anthropologist Jenny Tung, crop plant geneticist Zachary Lippman, and paleoclimatologist Andrea Dutton.
The multiple comparisons problem is a real one – although the scale at which we should be correcting for it (if at all) is always unclear to me. More disconcerting to me, is that it’s always seen as a false positive problem when it’s both a false positive and false negative problem. If you do 100 tests your alpha will give you an idea of how many might be significant by chance …but, the beta associated with the tests will give you an idea of how many might be non-significant by chance. Who knows how many real effects we’ve missed in those natural experiments.
I do think that part of the multiple comparison problem pointed out in the link is specific to “natural experiments” analyzed using an instrumental variables approach. This is the key graf:
“There is a second more subtle problem. If more than one of the effects are real it calls into question the exclusion restriction. To identify the effect of X on Y1 we need to assume that X influences Y1 along only one path. But if X also influences Y2 that suggests that there might be multiple paths from X to Y1. Morck and Young made this point many years ago, likening the reuse of the same instrumental variables to a tragedy of the commons.”
Re: the scale at which to correct for multiple comparisons, it’s not always clear to me either.
Right. If you’re making causal inferences it would be easy to mistake direct, indirect and no causal link if you hadn’t considered all potential causal pathways. Although it’s not completely clear to me why that’s more of a problem after researchers have examined many of the relationships than before any of the other relationships had been examined. But, I may be missing something.
this isn’t a multiple comparisons problem, in other words, adjustment doesn’t take this away. But the fact that multiple groups are using the same system with the same Inst Variable highlights that its really hard for a single group to think outside their own sphere of knowledge in how assumptions of a model (exclusion restriction) may be bogus.
here is my interpretation of the problem. A research group uses Z –> X –> Y where Z is the Instr. Variable. The exclusion restriction assumption is that the only causal path from Z to Y is through X (doesn’t matter if it’s direct like I’ve drawn or indirect — that is mediated by other variables). But if other studies are using the instrument Z with a different X (that is, there is a path from Z to this new X independent of the old X) then obviously the exclusion restriction is not a good assumption.
I see the point, Jeffrey, but what I don’t get is why it is a bigger problem if the research has been done or not. Isn’t the real problem that you are assuming there is only one path from Z to Y and it’s through X1, when, in fact, there is another path from Z to Y through X2? If that is the problem (and I’m not sure I have this right) the why does it matter what other researchers are doing? Isn’t the problem that your model structure is wrong? I suspect I’m missing something here.
Jeff Leek has this same concern with cost to type II error. That said, I disagree that the multiple comparisons problem here is a real one — this example highlights the absurdities that emerge with multiple testing adjustment when we think of p-values as *the* evidence for confirming something. If we have ask a question like – is there any gene associated with x, the multiple comparison is an issue if we measure 100,000 associations. But if we have a good, working causal model, we don’t need to adjust for what other people are doing. Do we also adjust for all the future work on the same data? No, its the wrong way to think about the work the p-value is doing (which, isn’t much. And, adjustment for multiple testing only addresses the sampling contribution to inference. The real work is conceiving of and executing different experiments that probe the working causal model).
“But if we have a good, working causal model, we don’t need to adjust for what other people are doing.”
I like that, I think that’s a better way to put the concern about re-use of the same instrumental variable to test many different causal hypotheses. That the same instrumental variable is getting reused to test so many different causal hypotheses suggests that many of those tests don’t have very good working causal models. And if they don’t have very good working causal models, well, then that’s the worry, not multiple comparisons.
I’m not sure I totally agree, Jeffrey. Here’s the thought experiment – if 1000 scientists design a good, working causal model of some phenomenon. Let’s even imagine they test the same causal model 1000 times. If you set a statistical threshold for deciding whether the data support or refute the hypothesis (let’s say 0.05 but it wouldn’t have to be). If the model is wrong you will still find support for it in 50 of 1000 tests. If the model is correct and the power of the test is 0.95 you will still fail to find support for the test 50 of 1000 times. Doesn’t matter how much thought went into constructing the model.
The next place people go is – well, we shouldn’t set a single threshold and use it to decide. It is probably true that setting a single statistical threshold probably does exacerbate the problem but, the fact remains that if we do 1000’s of tests, sometimes we will get data that no reasonable person, using any reasonable method, could interpret as support for the hypothesis even though the hypothesis is true. By the same token, sometimes we will get data that no reasonable person could interpret as evidence against the hypothesis even though it is false. I think multiple comparisons are a real problem but no single study can correct for them.
Having said that, it is never defensible to make a post-hoc multiple comparisons correction if (1) you don’t know the power of your test AND (2) you haven’t taken a position on the relative costs of Type I and II errors.
Re: the MacArthur fellowships, turns out that EEB is the single most common area of science in which MacArthurs are awarded: https://twitter.com/kjhealy/status/1178044608720379905