An ongoing theme to some of my posts has been the notion of statistical machismo. As noted recently, statistical machismo is not really about using (or not using) complex statistics. It is about using more complex statistics for bad reasons (e.g. to impress people) or forcing other people to use more complex reasons again for bad reasons or out of the ill-conceived notion that there is always one correct, best way to do statistics. The discussions on the last posts raised interesting questions about whether statistical machismo is really a problem or if it occurs just as often in the other direction (forcing people to use simple statistics). So of course that called for a poll . I am going to report on the results here.

You can examine the results here. Long story short, statistical machismo is not made up. People feel pushed to use more complicated statistics inappropriately much more often than they feel pushed to use simpler statistics. And people believe nearly every flavor of reputation accrues to those who use more complicated statistics even though people think more complex statistics don’t really change the science much.

There were 405 respondents to the poll (now closed). I watched it closely through the day to see if there were big swings (a sign of somebody trying to get a group to skew the results) and didn’t see it. So I think the results are a good sample of readers of Dynamic Ecology who like to answer polls. How well that represents your world of interest may vary. We had about 40% saying basic research, 20% saying applied, and 40% saying both. We had about 60% male, 35% female, 5% did not say or other. And there were about 1/4 for each career stage (graduate, postdoc, early or established permanent jobs) and even two undergraduates. We did not get a wide diversity of fields: 85% said ecology, 10% said evolution, and 5% spanned other fields.

Our poll takers skewed towards what I have to assume is stats savvy (only 4% said they had less stats knowledge than a typical field ecologist and 38% said they were typical, leaving 58% saying they knew more stats than a typical field ecologist). Only 10% said they skipped the methods sections of stats-heavy paper and over half said they fully understood such sections. Assuming accurate self-reporting, this is definitely a stats-savvy group in comparison to e.g. the typical graduate committees I sit on. With one exception related to career stage noted below, none of the demographic factors seemed to really influence perceptions of statistical machismo that I could find. So I don’t bother to present crosstabs on these, but I did look for several prior hypotheses I had about them. And I think results have to be interpreted in the context of this being a more-stats-friendly group than the average ecologist.

65% of respondents reported being forced to add more complex statistics for bad reasons, with over 30% saying it happens sometimes or frequently. While the reverse (forced to use simpler statistics) does occur, it is definitely the less common scenario. Only 42% had ever seen it happen and only 17% said it happens sometimes or frequently. Similarly, about 50% reported changing statistics to be more complex before the paper was even sent out for review out of fear of what the reviewers would say, while only 24% reported simplifying out of fear of reviewers. So very broad brush – people felt inappropriately pushed to complexify their stats about twice as often as to simplify. The only meaningful cross-tab I identified is that established researchers were more likely to report that they are forced to complexify their statistics inappropriately than postdocs & less established career researchers who in turn reported it more than graduate students.

As far as reputation, people clearly thought having complicated statistics improved the odds of a paper getting into a high profile journal, and especially using complex statistics enhanced the professional reputation of people using them. But people were much more neutral with only slight leans towards complex statistics improving the odds of a paper being impactful or improving the scientific quality of the paper. If people’s perceptions are accurate, it does appear there is room for gamesmanship in statistical machismo – complex stats improve paper positioning and individual reputation much more than they actually improve the impact of a paper or the quality of its science. In a more direct question, 27% said more complex statistics never or rarely changes the scientific conclusions, 13% said it usually or always changes the scientific conclusion and 60% said it sometimes changes the conclusion.

And as for which areas reviewers felt were pushed inappropriately:

The data largely speaks for itself (the x-axis is proportion of people responding to this question – N=317 – who checked this box so 41% said AIC had been inappropriately pushed on them), but a few observations of mine on this:

- I’m no fan of how AIC is used in ecology today. Nor am I fan of overly complex mixed models. But I perceive these as techniques that are rapidly increasing in usage in ecology and among the favorite techniques of ecologists. So I was surprised to see them at the top of techniques that were pushed too much.
- I was also surprised to see GLM and complex multivariate methods (e.g. RDA & CCA) so high up – it seems to me the use cases where they are or are not needed are fairly clear and distinct. But apparently lots of people still disagree.
- Over 30% of people think Bayesian is pushed inappropriately while 15% think frequentist is pushed inappropriately.
- The most fascinating to me was the relative ranks of phylogenetic regression, spatial regression (both pretty high on the culprit list) vs. times series tools (very low). Time series tools is a broad category, but it includes using GLS when you regress Y vs time (e.g. abundance vs. year) to see if there is a trend. These three issues all have the identical problem (overestimated degrees of freedom due to non-independence of data) and identical solution (using GLS with a metric for distance between points). And in fact using GLS on timeseries is way easier than using GLS on phylogenetic data (because you know how far apart different dates are but you have to generate a phylogeny to know how far apart species in a phylogeny are). Yet nobody worries about having to use GLS on timeseries data (I am not sure I’ve ever heard anybody being forced to do it). Yet it is apparently – for ecologists – desperately important to do phylogenetic regression, and apparently people are kind of fed up with having this pushed all the time (and to a slightly lesser degree spatial regression).
- Detection probabilities was medium high up on the list, but the majority of respondents do basic research and may not run into them. The proportion of people who perceive them as pushed inappropriately was highest among people who do both basic and applied, and then next highest in applied and lowest in basic.

So, I’m not sure a poll will settle all debate (in fact I’m sure it won’t). But there does seem to be some solid empirical evidence for the notion of statistical machismo pushing people to use more complex statistics than necessary. Two last thoughts. The respondents of this poll seem more reasonable and measured (e.g. choosing sometimes as the most common answer for whether more complex methods change the scientific answer) than reviewers who seem to push much more firmly (overconfidently). Or maybe people just remember the few extreme reviewers? And there seems to be a pretty wide spread belief among this group of fairly statistically sophisticated poll respondents that reviewers regularly and erroneously push statistical techniques (some complex, some just trendy). We would all do well to be a little more humble and remember that almost never is there only one right approach to statistical analysis. In particular, it seems that trendy might be almost as important a reason behind statistical machismo as complex.

What do you think? Did the poll surprise you? Am I overinterpreting it as evidence of statistical machismo? What would it take to get people to be a little more humble and less overconfident that there is only one right way to do statistics?

The poll results were fascinating. What would be interesting, though a lot of work, would be to actually interview poll participants to find out where their views came from. For example, I’ll confess that I am suspicious of mixed effects models, in part, because a) I have no formal training in their use, b) I have used them but have usually struggled to understand and interpret their results. And actually I have witnessed similar struggles among academics who were trained in the are and who should have been able to interpret results clearly. But hey, every study is different, right?

Another reason I am suspicious of mixed effects is that the specification of random terms often seems to be rather sippery and even sometimes a metter of opinion. So along with your main effects you get estimates of “random” variation due to repeated measurements, nestedness or some such, but the details of model specification (random intercepts, random slopes, and tigers Oh My!) can be tricky for inexperienced users. And in the end, what is any of this really telling us that a box plot would not tell us?

I am increasingly interested in the anthropology of stats – study of how culture influences stats. So I agree it would be a lot of work but it would also be very interesting to conduct those one-on-one interviews (a very anthropological approach by the way).

Net net the best use of mixed models is that you analyze variance instead of changes to means. The second best use is you control for known sources of variance like site effects which you could also do with fixed effects but you lose less degrees of freedom and so increase the power of your study. Worst case use is you get lots of people who don’t really know what they’re doing and seemingly randomly throw down mixed effect terms in overly complicated models. All three scenarios exist, although the first is rare in ecology (but common in genetics).

And one could ask if you need the power gained by random effects over fixed whether your’e really studying something with a big enough effect size to be interesting/important.Or if you’re just gaining power to make tiny effect sizes significant. And one definitely should ask that question, but that is probably a whole other post topic.

Interesting choice of words: suspicious. You say you have no formal training in these methods and (I guess consequently) find them difficult to interpret. So what is it you distrust; the method itself or their (mis|ab) use by ecologists (whether trained in their use or otherwise)?

Given that boxplots do not allow one to do inference[*], I’d say mixed models are telling us quite a lot more than boxplots do.

* Perhaps boxplots can do inference if you turn on the notches option in some software. These make some significant assumptions, notably equal sample sizes, that are often why one is fitting a mixed model in the first place.

Or we could understand that the term ‘random effects’ simply means estimating a batch of coefficients with a variance component (i.e B ~ N(mu,sigma)), whereas a ‘fixed effect’ means treating the among-coefficient variance as infinite (B ~ N(mu,Inf)). Whenever there is any kind of ‘grouping structure’ in your data, your hackles should be raised much more by the latter than the former (although in practice, as Brian would say, there is sometimes not much of a huge difference, and none of this should be seen as obviating the need for quality study design, etc). This is a perfect instance of what I have mentioned here before, that what *seems* more intuitive and simple, ain’t necessarily so…

Good stuff Brian. Particularly striking that a group of pretty statistically-sophisticated respondents would respond as they did.

Devil’s advocate question regarding the perception that doing fancier stats improves your reputation and helps you publish in leading journals: could this be one of those things that everyone *thinks* helps your reputation/chances, but that actually doesn’t? At least not as much as many people think it does?

I’m thinking back to Peter Adler’s old post on how many (most?) fundamental researchers, including me, bullshit about the applied relevance of their work in their papers and grants, presumably because they think reviewers and editors expect and want it. But I bet if you did a poll, you’d find that few reviewers and editors take that bullshit seriously. Not a perfectly analogous case, because here you also have data showing that people really are often asked to complexify their stats in ways they see as unhelpful. So their perceptions of statistical machismo as enhancing reputation and improving the odds of publication are grounded in reality. But playing devil’s advocate, perhaps perceptions of the frequency or rewards of statistical machismo are mismatched to the reality?

Personally my best guess is that fancy statistics give a veneer of novelty/innovation and do help articles place in higher journals at the grain of individual articles. But I suspect, like you and unlike the poll average, that using advanced stats doesn’t really advance one’s overall career much (unless of course your niche is as an ecostats person). But for an empirical or theoretical ecologist, I trust the poll results that say advanced stats don’t really change the impact or science of an article much. In the end most people’s reputations are based on the science they do rather than the stats that they use. But shorter term at the grain size of a paper I think it does have an unjustified benefit. That’s my belief anyway.

A few comments via Twitter:

(minor aside: as I replied to Joe Nocera, important to keep in mind that poll respondents aren’t a random sample of ecologists. Quite possible that 58% of “Dynamic Ecology readers who respond to polls on statistical machismo” are more statistically-savvy than the average or median ecologist.)

One thing that seems weird about the whole poll / discussion is that respondents and reviewers are actually the same people, assuming DE samples representatively from the ecology community. In all likelihood, people that have answered this poll have also reviewed each other’s work.

So, if reviewers and respondents are the same people, why do people have the feeling that reviewers have the tendency to ask for more complicated methods?

Brian, I’d venture that most of the effect we see here is an effect of decision asymmetry, paired with attention bias. What I mean by decision asymmetry is that, if I see people use a non-parametric test where they could also have used a parametric alternative with higher power, I will usually file this under “their choice” and not ask them to switch to the simpler choice – it. Same with PGLS etc. – it might not always be necessary, but it usually doesn’t hurt, so if you do it, I wouldn’t ask you to change it.

On the he other hand, if a t-test is not appropriate, you will point it out. And, if you do this as a reviewer, it seems perfectly reasonable to you, and you forget immediately. If it happens to you as an author, you are outraged because it doesn’t make a difference.

If we play this through, everyone will remember the occasions where they were asked to move to more complicated tests, even people have relatively homogenous opinions.

Interesting point, could be something to it. Though you also have to keep in mind that the respondents are not a representative sample of ecologists as a whole.

As Brian said in the post, respondents seem numerous enough and representative enough of some larger set of ecologists to be worth talking about. But they’re surely not a random sample of ecologists. As indicated for instance by their high level of statistical expertise relative to the average ecologist (assuming they’ve evaluated their own expertise accurately, of course). So poll respondents aren’t necessarily the “same” people as reviewers.

sure, I get this, but I don’t think that’s the explanation – I’m pretty sure you would get the same results if you polled people on a conference.

Hi Florian – I don’ think the survey could prove or disprove your interpretation vs. my interpretation.

The original genesis of the survey was people who felt like reviewers pushed simpler statistics as often or more often than they pushed complex statistics and I think the poll is valid in rejecting that. As a side benefit it got a lot of perceptions about reputation.

I don’t know how you even could test your hypothesis. People would have to agree on when reviewers pushing a different technique were or weren’t doing it for good reasons. Or correctly vs incorrectly. And I don’t personally believe in a world where there is such a thing as one correct way to do statistical analysis. There are always multiple reasonable ways that involve trade-offs (with the trade-offs existing inside of statistical issues but also outside of statistics).

I’ve had enough personal experiences and know enough other people I trust that I am convinced the phenomenon is real. Things like I will make a reasonable counterargument which they then completely ignore in their response and they instead kind of spiral into arguing by authority is a pretty big tip-off to me that it is not about the right technique. When I go ask a stats expert about an issue and they tell me the reviewer is being dumb is another tip-off.

I do think the asymmetry of the review process is an important point, but I think it cuts both ways. Reviewers can easily hide behind their anonymity and power and not engage in a meaningful discussion. Ultimately authors have to justify themselves but reviewers don’t (unless the AE is paying attention).

I think another form of asymmetry is one usually has two or three reviewers but it only takes one ill-informed reviewer to cause problems. But that doesn’t mean its not have an effect on science.

Hi Brian – I’m not questioning the existence of reviewers that asks for unnecessarily complex methods. Sure, they exist, and I have experienced this too.

Btw., when reading your remarks above, I was just thinking of Friston, K. Ten ironic rules for non-statistical reviewers NeuroImage, 2012, 61, 1300-1310, who notes: “Rule number one: dismiss self doubt […] You have been asked to provide comments as an expert reviewer and, operationally, this is now your role. By definition, what you say is the opinion of the expert reviewer and cannot be challenged. […]”

What I wonder though is if the problem of too complex methods is really systemic. See my arguments above, its just a bit suspicious to me that everyone feels that the others are more stupid than themselves.

The danger of such a perception is that it might easily reinforce an dismissive attitude towards reasonable statistical concerns. Because, to be honest, I don’t think that ecology suffers from too conservative methods, rather the opposite, so I’d rather add a random effect than removing one (that is in response to Andrew Park’s comments against mixed models).

Brian, as an editor, do you really get the feeling that reviewers push authors systematically into wrong / overcomplicated methods? I haven’t made this experience. As the stats person in our faculty, I regularly have people asking me for advice about reviewer comments / concerns, and my feeling is that more often than not, the reviewers are actually correct, despite the original author insisting that the old approach was just fine and everyone in their right mind can see that there is an effect.

Hi Florian – important question. No. I think the vast majority of reviews are sincere and on target in their advice.

I think a key test is whether a reviewer is absolutely positively sure they are right but cannot even really explain why or engage in a meaningful discussion of the trade-offs involved in their advice/demands. (which gets to your humorous quote!)

But they are definitely a minority of reviewers. Just a very annoying group. And they have really stifled science in some cases.

I like Florian’s point here. If the population in this poll matches the population of peer reviewers, then wouldn’t some poll-respondents complaining about statistical machismo as authors, be guilty themselves of statistical machismo as reviewers? My hypothesis is maybe a variant of Florian’s: it’s quite easy to suggest more complicated tests as a reviewer without thinking much about how much more work they would be for the authors or thinking really critically about whether they are needed for the paper. Even a good reviewer is going to occasionally make some dumb suggestions. But for authors responding to peer reviews is more work and is therefore more memorable than the act of reviewing. Any time a peer reviewer requests authors to perform an additional complicated analysis, especially when it isn’t needed, that is going really stick in the author’s mind. While if a peer reviewer requests an additional analysis that is simple and fast, this will be be less memorable for an author. The fact that there are two or more peer reviewers doubles the chances that authors will confront and remember a difficult or unjustified critique that slows a paper down. In other words, authors remember the hoops that are hardest to jump through and especially those that seem arbitrary. If peer reviewers are wrong about something, they may be less likely to realize it or remember it. That being said, I’m not trying to discount Brian’s larger point, I’m sure there is a degree of statistical machismo/dogmatism among some ecologists.

Your explanations all make sense to me.

But I think I (and Jeremy) are not as convinced as many readers are that this poll captured a “typical” ecologist rather than a more narrow subset who read dynamic ecology. That’s probably because we’ve seen the biases play out in surveys of readers and in the blog conversations. I think the 60-35 split of male-female is just one example. And I do think the data that the respondents are more statistically sophisticated than average is probably real.

So I don’t take it at all as a given that this poll is representative of the pool of people who have reviewed the papers authored by this group.

And it isn’t in the data of this poll, and maybe I’m just obsessing about a small fraction of reviewers. But the ones that really bug me and stick in my mind are the ones who suggest that a statistical technique must be used. And then when you write a really respectful rational response with detailed explanations about why there are reasons to go in a different direction, or even their approach is impossible for the kinds of data available for the question you are pursuing. And then they ignore everything you say and just repeat “well you have to do it my way to be correct”. IE it takes multiple rounds of review to truly flush out a statistical machismo reviewer. I think both you & Florian give those people the benefit of the doubt more than they deserve. That said I wouldn’t begin to claim this poll quantified how frequent those people are. I have pretty carefully stuck to within group comparisons (push complex>push simple) or to one subsets opinions/guesses about reputation.

Via Twitter:

True – it cannot. As I said to Florian, I am not sure how you even could decide this in an objective sense.

The title of this post should be “Oh, no! Reviewers are demanding that we do our analyses rigorously and correctly!”

I’m sure you are superhuman and always right in every review you do. But surely you can actually see the fact that reviewers can push back correctly or incorrectly. In which case your argument is not really an argument. If you’re not going to take the time to actually read what I’ve said and engage, then I’m not going to bother to take the time to rebut your comment seriously.

Regarding the “anthropology” or “psychology” of reviewers insisting on more complicated methods even when they wouldn’t change the scientific conclusions but might inhibit reader understanding, I wonder how common Joe Mihaljevic’s point of view is:

I respectfully disagree that it is important, or even desirable, for authors to change their stats because the conclusion *might* have changed, or because it *would* change in some *other* dataset. I think that point of view implicitly assumes that all else is equal–that there’s no harm in doing things “right” so you might as well do things “right”. When in fact all else often is not equal (e.g., cost to reader understanding if you do some complicated mixed model rather than a simpler fixed effects model).

I emphasize that I’m not trying to single anyone out or pick on anyone here; no personal criticism is intended. And I recognize that a tweet doesn’t allow for elaboration and nuance so I don’t want to read too much into Joe’s tweet and hope I’m not misinterpreting it. I link to this only because I think this tweet articulates a common line of thinking with which I (and I assume Brian) disagree.

Brian, I’m really concerned here about how we have defined the term ‘too complex’ when talking about which statistical techniques are appropriate. It seems the working definition applied here is that a statistical test is ‘too complex’ if it would have produced the same result (such as significance of a particular effect) as a model with fewer terms.

Now, even if we for a moment leave asside the problems that simpler tests often have assumptions that are frequently violated with common ecological data types (making the intepretation of their p-values dangerous), this is an incredibly low bar. Even in very-well implemented experimental setups, confounding factors are a real thing and it is the job of the author to demonstrate effectively that their result is not simply due to these confounding factors (in mixed-effect models these often express themselves as random effects). It is perfectly reasonable for a reviewer to ask authors to demonstrate that they have considered these factors. Yes, in previous generations the types of tests the reviewers could ask for was limited and, as our discipline has got more quantitative, we have a greater arsenal of tests available to us. This is a good thing. Whilst I know stings when you’re the author in these situations (I’m sure in future years I’ll be complaining bitterly that a reviewer no longer accepts my Bayesian mixed-effects model and wants me to use the new hyperloop-super-toroidal-megamodel or whatever it gets called), this is just how science progresses.

As Florian pointed out earlier in this discussion, if there was a feeling that the statistical tests that we use in ecology are too conservative and that too much important theoretical understanding was being lost to type-II error then you might have a point about ‘machismo’. However, our discipline publishes more than ever, with noisy data, and hazy inference. In that climate, I don’t think a push by reviewers to further justify our claims can be reasonably described as ‘machismo’.

Brian has an old post addressing this issue: https://dynamicecology.wordpress.com/2015/02/05/how-many-terms-in-your-model-before-statistical-machismo/

A further thought: In general, I’m uneasy with the argument that, because practices in a field are trending in a certain way, and there are some good reasons for this, that the trends therefore must constitute progress and anybody who’s against them will be seen as wrong in the eyes of history. Yes, the general trend of any scientific field is usually in a progressive direction, but not always. It is possible for entire fields or subfields to go off the rails for extended periods.

Think for instance of how economics widely embraced instrumental variables as a way to rigorously infer causality from observational data. That approach was widely embraced, for reasons that seemed like good ones. But it turns out that *in practice* it’s actually *worse* than ordinary least squares regression. Just reduces the precision of your estimates without actually making them less biased, while lulling you into a likely-false sense of security that it’s doing the opposite.

True it is possible for disciplines to go off-the-rails but I don’t really see any evidence that this is what is happening here. One complex model that simulataneously takes into account the multiple sources of variation is *more likely* to stop you from making spurious claims of causality than the much maligned p-hacking where you fit multiple simple models one at a time and only publish/highlight those that demonstrate significant relationships.

The post of Brian’s that you link says that many confounding variables can be removed with good experimental design and I agree with that. However, where I disagree is the assertion in the post that you shouldn’t use lots of explanatory variables even in the case where you suspect there may be many contributing (and measurable) factors influencing your response variable.

The argument seems to be that a model with only a few effects has coefficient values and their associated p-values that are easier to interpret so should always be preferred. However, (putting aside for the moment the debate currently surrounding p-values as a reliable inference tool) the p-values of a miss-specified model can be dangerously misleading. If we have a genuinely complex system (and I have seen many of these in ecology – even with well thought out experimental design) then, whichever way you model it, you are going to have untrustworthy p-values either because:

1. You made a ‘too complex’ model (such as GLMM) and, as such, there are technical challenges in even calculating p-values (why GLMM packages in R don’t tend to provide p-values by default).

2. You used a model that was too simple to capture the characteristics of the system you are modelling and resulting in biased coefficient values and erroneous associated p-values (bear in mind missing-covariates tend to express themselves in mis-specified models as biases in the least conservative direction).

Obviously neither situation is good (and in these types of situations we probably have to think of better methods of inference) but lets not kid ourselves that option 2 is better here.

Even on the grounds of ‘interprebility’ I disagree. Basically your conclusion from a ‘significant’ coefficient value in simple model is “if I assume covariate X is the only thing driving Y then X has a non-zero effect on Y”. Is this really a intelligable result if I suspect that covariates L, M, and N also strongly influence Y and, for whatever reason, have also varied throughout the dataset?

Joe a couple of thoughts:

” It is perfectly reasonable for a reviewer to ask authors to demonstrate that they have considered these factors. … I don’t think a push by reviewers to further justify our claims can be reasonably described as ‘machismo’.”

Yes absolutely I agree. Anytime there is a respectful conversation recognizing complexity and trade-offs I’m happy. I’ve seen a lot of cases that where when you explain in great detail why you made a different choice you get back “you have to because”.

“simpler tests often have assumptions that are frequently violated with common ecological data types”

And very often complex tests have even more assumptions that are even more frequently violated. So where do you stop. And these assumptions people have even less awareness of the need to test for, less ability to test for, and less understanding of the implications of violating the assumptions. I’m pretty sure your “hyperloop-super-toroidal-megamodel” is going to make even more assumptions than today.

Finally my statistical machismo antennae go up when I only hear people talk about the statistical issues. Statistics are not the center point of good science. Statistical choices have implications outside of statistics. The two that concern me the most are: 1) the extra work. That’s why I find it laughable that ecologists are OK with doing temporal regression without GLS corrections (which is trivial to do) but won’t let a paper be published these days without a phylogenetic regression even though phylognetic regression requires enormous work, rarely changes the answer, and introduces a bunch of new assumptions (e.g. the phylogeny is error free – how often is that true? – and yes I know you can do Monte Carlo on randomized phylogenies and … – but if you just want a simple macroecological regression is the work worth it?) 2) the loss of understanding. As highlighted the loss of understanding by practitioners is a pretty serious issue. But so is the loss of understanding by readers. Average reviewers are less able to critique the statistics. And average readers are less able to form independent judgments of the validity of the work done.

More complex statistics is not in any way shape or form about just “more correct”.First it is often just adding more assumptions that are invalid. How is that more correct. But it also comes it real costs of loss of understanding and extra work. It is a trade-off. It is not a one-way gain only. Sometimes the answer is one end of the trade-off. Sometimes its the other. And I’m happy anybody wants to have a conversation and wants me to justify my opinions in that context. When the context is “its better so you have to do it” meh! Often its not even better.Thank you Brian and Jeremy for your replies on this.

I don’t want to take up too much of your time but I think one of the reason why this has been a popular/unpopular post is that many have taken it, rightly or wrongly, that you are saying complex = machismo. The fact that you call out specific techniques as examples of machismo is obviously going to upset those who use those techniques.

Honestly, as an author, if the only the only criticism of my paper by a reviewer is stastical then I’m pretty happy: if the reviewer is wrong you have a chance to argue your case, if the reviewer is right then my paper is all the better for it. Compare this with criticism of the experimental design or the interpretation. Issues here are not so easy to fix and will likely sink the paper entirely. The worst case scenario is that have to do an unnessary test: if the result doesn’t change then the only harm is the extra time spent to do it (which is relatively minor when you compare it to running a new experiment) and, if the results do change, then (rightly) you, as the author, have to reflect and ask yourself how robust your results really are (and be prepared to argue for your particular approach).

Yeah a reviewer can be an arsehole and dig-in but they can do that over any aspect of the paper and I’m not convinced that the statistical corner has more than their fare share of arseholes. In fact I find needlessly obstructive criticism of the statistical aspects of my paper the easiest to bat away in my response to the editior.

Thanks Joe. On the issue of statistical machismo being a behavior, not a condemnation of specific techniques, let me quote my post a couple of weeks ago:

“If you read my original post you will see that statistical machismo is not an absolute judgement on any particular statistical technique. And seriously if you are doubting me on that point go read the first few paragraphs of my original post. Any technique can be used with statistical machismo, even ANOVA. And even though I named some candidate techniques, I quite explicitly stated that all of the techniques I named had very valid usages (many of which I have used myself). … There is no technique that is in itself statistical machismo.They’re just used sometimes in bad ways.

The bottom line is this. Statistical machismo is not a set of complex statistical techniques. Statistical machismo is an attitude.”

As for your point that statistical suggestions are among the easiest to adopt, well it depends. Let me propose the following sequence:

1) Use Poisson regression instead of OLS Or use GLS for timeseries regression- easy to do, definitely cannot hurt.

2) Use AIC instead of likelihood or frequentist or abandon a study of abundance so you can use detection probabilities to study occupancy – probably not the hard to do, but really dumb and likely making the paper worse. Hence really annoying.

3) Use phylogenetic regression or spatial regression – theoretically possible but really hard. I have to generate a phylogeny (and all those people who say this is now just a few lines of R code are technically correct but being insincere – building a GOOD phylogeny is not easy – that’s why people spend their whole careers doing it). Or spatial – easy enough to do spatial GLS on a tiny little dataset. But on a big raster, it either is going to take a super computer for a week, or worse, many never run to completion. This scales badly. So now we’re into advice that a minimum amounts to the equivalent of run another paper. Advice to use n-mixture models of detection are in this category too.

4) Use detection probabilities for a continental scale study. Impossible. The only continental scale datasets are things like the BBS and CBC and eBird which were not designed with repeat visits in mind (and which would be prohibitively expensive at that scale).

I feel a little trapped here. People falsely and contentiously equate statistical machismo with any push back on stats (like #1) when that is not at all what I’m talking about – I’m talking about categories #2 (dumb), #3 (really hard) and #4 (literally impossible). So then I name specific examples (all of which I know of in the real world). And people don’t like that I name techniques even that I am careful to say that the techniques themselves aren’t bad and definitely have valid and important uses. At some point people have to take some responsibility for actually reading what I say if they want to critique it! (which I am definitely not including you in Joe – although we disagree you have engaged very constructively).

Via Twitter:

Gavin is a regular commenter here (and commented above), and if memory serves (which it may not) Andrew and Richard have commented here in the past as well. I hope that they, and others who feel as they do, will comment further over here. Although I note the comments (copied below) indicating that some folks don’t feel it’s worth commenting because they feel the poll was badly structured and that the topic isn’t worth talking about (or worth talking about further).

Some other comments from that Twitter conversation:

Don – I really have to call you out and ask where the questions “assumed” that more complex was always wrong. They don’t. And I don’t think that as I’ve said repeatedly.

Really – twitter is not doing much to attract me. Short & confrontational and tending towards fact free is not appealing to me.

Mark Brewer has now joined the Twitter convo:

One question this Twitter thread raised in my mind: perhaps one of the ways in which the poll is a non-random sample of ecologists is “people who disagree with Brian on this issue were less likely to take the poll”.

Since Mark has been mentioned here I thought I’d just point out his excellent MEE blog article on tips to reviewers for reviewing statistically-involved manuscripts: https://methodsblog.wordpress.com/2015/06/03/reviewing_statistics/

I think if all reviewers followed these guidelines then this whole discussion of statistical machismo would be moot.

Thanks Joe – I hadn’t seen that but its great.

Richard I assume you mean “best” as defined by you? Most full time statisticians I know hesitate to throw around words like that.

Unless they are frequentists and straight-up put “best” in the title of their methods, see e.g. BLUP and BLUE. Kidding…mostly 🙂

Well – at least they are specifying their criteria for what is best in an objective fashion!

Only on average under hypothetical repeated sampling of statisticians (subject to various regularity conditions) 🙂

I think that the conceptualization of statistical machismo needs to die and be reborn–or rather, I think the framing of the discussion needs to change for the discussion itself to have any value. I found the original description interesting, but I think the key was in Brian’s previous post: statistical machismo is really a set of *behaviors* (and in turn, I’d say just a subset of behaviors that might just be plain machismo in science).

I have to admit I was a little disappointed by the poll, because it presented a bestiary of horrible *techniques* rather than *behaviors*. Who wakes up in the middle of the night and shakes a fist at the fact that a t-test or any other statistical tool exists? The discomfort results from the reviewer demanding the t-test or the author presenting the convoluted model! But to me, the poll, and as I think about it, maybe most of the previous case-posts have really focused on specific *techniques*. Brian, maybe I’m interpreting your interests incorrectly, but I tend to think that discussing the specific areas in the parameter space in which some specific type of model breaks down (e.g., discussion of the Welch and Guillera-Arroita papers) or even publicizing that these spaces exist is not what you’re trying to accomplish with this. If it is, I’m not sure this is a really rigorous/appropriate forum for it, nor sure that it’s a very interesting topic. Any framework is good or bad in certain circumstances, and Dynamic Ecology is and should be better than WordPress hosting a shiney app where the drop-down selection results in a blog post and 100 comments. Furthermore, framing the problem as statistical rather than behavioral just seems like it will inevitably end up in argument with users of said tool. I don’t imagine Brian enjoys debating said users, I don’t think they enjoy it, and I don’t imagine readers really get much out of it (I don’t, at least).

To me, the bigger problem is that I’m not even sure that readers have a consistent definition of SM–I actually get the sense that Brian has not fully defined it beyond knowing it when seeing it. Lacking a general set of descriptors, I’m kind of concerned that SM has become a catch-all phrase for selective statistical suspicion. Indeed, some of the comments across the SM threads just horrify me. I’m exaggerating and know it isn’t what Brian intends and know many of the comments are tongue in cheek, but it’s like statistical machismo has become some sort of alt-stats or fake-stats rallying cry (“I can’t trust a slippery glmm!”). The juxtaposition of responses to the previous SM post with the exciting data/methodologies post was pretty stark, and further concerning. Folks, that iPhone species ID algorithm is not based on an ANOVA–would you be satisfied with it if it was, or are the macho statistics just the ones that somebody asks you to perform that you don’t want to?

I’d rather see a poll focusing on which specific behaviors readers of the blog consider to be statistical machismo, and what the perceived prevalence of these behaviors is, and what things might improve the process (e.g., how can an author make a reviewer or reader understand a complicated model easier? What are things that a reviewer can or can’t ask for within reason?) This seems (to me, at least) to be the heart of the issue and a more productive topic of discussion.

John – lots of careful thought in there. Thank you.

I take your point about the end of the poll focusing on techniques when I fully agree with your basic point that statistical machismo is a behavior not a technique – something I’ve said in the 1st or 2nd paragraph of every post I’ve written on statistical machismo. But I get how listing techniques could be seen as contradictory. Personally though I’m glad I included it as I was surprised by a lot of the answers. Doesn’t give a definitive final answer to anything. And certainly you are right that is not a defining aspect of statistical machismo.

Really the only intention of the poll was several commentors on my last post said that they get pushed to simplify statistics a lot more often than people are pushed to complexify. The poll was just to see what people’s experiences were (while clearly recognizing this is not a scientific sample – but hey this is not my day job). I suppose you could argue that the same mentality/attitude/behavior could be behind either. Probably true on some levels. But I also think there are some real differences between the two. But the poll was targeted at that question and I just asked a few related things I’ve always wondered about while I was at it.

As far as defining statistical machismo, I don’t actually think I’ve been unclear. Between my first post https://dynamicecology.wordpress.com/2012/09/11/statistical-machismo/ and my last (prior to the poll) https://dynamicecology.wordpress.com/2017/11/14/taking-statistical-machismo-back-out-of-twitter-bellicosity/

I’ve kind of given up bothering to follow what people choose to make up on twitter. It’s not something I recognize. And it often puts words in my mouth that directly contradict what I’ve said. But I’m more than happy to have a conversation until the cows come home in a longer format (yes Dynamic Ecology but for example I had a good conversation with Tom Webb on his blog that was cleverly poking fun at the notion of statistical machismo).

Interesting point about the Welch discussion under the statistical machismo label. I do think those kind of statistical discussions and opinions are something readers of this blog are interested in. But I have increasingly thought myself that I should separate those kind of discussions from the statistical machismo label and discussions for the same reasons you articulate.

I wonder if other people would be interested in a poll about behaviors. Regardless it would probably be a good exercise for me to see if I could come up with a worthy list.

Via Twitter:

Via Twitter. We’ve already discussed the point about “how do you know who’s ‘right’ about which statistical methods to use?” at length. But wanted to record the ongoing Twitter discussion for posterity here.

I found this string of comments and discussions very interesting. I am one of those ecologists who has a medium to decent grasp of all the statistical and mathematical tools…. but I am a user not a developer… somebody else has to make it for me! I have been working with camera traps on estimating abundance/density/occupancy of species living at low density that are difficult to detect for the last 15 years. I think in my world there is a certain level of “fashion”, in which a few model makers dictate what direction is currently good and many other people follow and there is a lack of dynamic communication and alternatives. A new method, lots of workshops on how to use the latest program and R code. Progress seems to be slow and linear (everyone uses this now….. until the next improvement). You have to use Spatially Explicit Capture Recapture models and occupancy models as the latest thing and all people have gone through these paradigm shifts before. One of the big problems is ……. we simply don’t know what is out there and as such all models and whatever we use has never been validated in any manner. It therefore becomes one of those things that reviewers indicate to a certain extend “why are you not following the current fashionable trend?”. I even see some papers in which I am not sure what the question is that is being asked and what the point is. It is just a new application of a variant of the new paradigm and the outcome is actually something very straightforward and boring …. “but the road towards it was so cool and complex!”

With better computational ability, the models have become more complex. E.g. I completely agree that current Spatially Explicit Capture Recapture models (SECR) are much more elegant compared to the old none spatial CR models. It is however when you really read the main justification of why it is an improvement, it does not go further than…. the density estimates are lower and we deem this more realistic, equally they are more precise and that suits us. In essence…. it is nothing more than… we like it. As we never have any sense of validation for many of these you now get these extrapolations that are completely baffling. Field work wise we just keeping doing the same (not enough and mainly dictated by what is logistically possible). We just use more sophistication to seemingly think we make progress.

I just thought I throw out these random thoughts …..in the end we are just waiting for the heat seeking drones that simply count all individuals without the need for complex models and stats. Happy to hear what people think