Taking statistical machismo back out of twitter bellicosity

The post last week on readability of statistics made me realize that the term “statistical machismo” has grown and morphed quite a bit from my original intent. One blog commentor noted that he now hears the phrase statistical machismo thrown at him when he works on developing new statistical methods. And one twitter commentor implied that statistical machismo was equated with “mocking complex statistics”. Both of these usages horrify me. Which has led me to a new word coinage: twitterized – verb – to become overly simlistically black and white as in the phrase “statistical machismo has become twitterized way past its original meaning”. (NB: apparently the word twitterized is already used in another sense but I of course prefer mine).

So I know just as in science, where you don’t have full control over how a paper is perceived once you release it into the wild, I have no control over how the term “statistical machismo” is used. But I at least have to try …

If you read my original post you will see that statistical machismo is not an absolute judgement on any particular statistical technique. And seriously if you are doubting me on that point go read the first few paragraphs of my original post. Any technique can be used with statistical machismo, even ANOVA. And even though I named some candidate techniques, I quite explicitly stated that all of the techniques I named had very valid usages (many of which I have used myself). I did name techniques to start a conversation and because it is my perception that those techniques are more prone to be pushed in a machismo way, but every technique I have ever mentioned in the context of statistical machismo is a perfectly good technique.  There is no technique that is in itself statistical machismo.They’re just used sometimes in bad ways.

I am probably guilty of getting careless about this distinction between technique and machismo attitude myself in some of my later posts on the topic. Especially in the titles although I was generally pretty careful (but I’m sure not perfect) in the text. For example, in the first paragraph of my post on detection probabilities I clearly stated “I at no point said these techniques were bad or should never be used. But I did say that we had in many cases reached a point where the techniques had become sine qua non of publishing – reviewers wouldn’t let papers pass if these techniques weren’t applied, even if applying them was very costly and unlikely to change the results. ”

The bottom line is this. Statistical machismo is not a set of complex statistical techniques. Statistical machismo is an attitude. Many users of advanced statistics don’t have it. And many users of some pretty basic statistical methods do have it.

The two key components of the statistical machismo attitude are:
1) My way is the only correct way – statistics is all about shades of gray and judgement. Have you ever run a test assuming normality when the data didn’t fall perfectly on a line in a Q-Q plot? Most people have. That’s because statistics are messy. Ecological data is messy. It is rare to get perfect conformance with all assumptions. And in many cases (like the normality example) there are simulations showing that it doesn’t matter very much as long as the data is not skewed too badly. Statistical machismo is a reviewer who suggests a particular method, and then when the author provides a carefully reasoned explanation of why they didn’t do it that way, the reviewer doubles down with imperious language about “it has to be done that way” without even recognizing that there is a legitimate discussion to be had. Statistical machismo is also about not recognizing that many methods have significant costs in terms of extra work (e.g. generating a phylogeny, running computations that take weeks) or limiting the scope of questions that can be asked because the techniques are only tractable at certain scales. Bottom line is that statistical machismo is not recognizing that it is a question of judgment and there are multiple valid answers.
2) Malfeasance of motive – statistical machismo is typically motivated by some motive other than doing a good job analyzing the data. Statistical machismo motives include:

  1. An author trying to impress people and distract from the ecology. If a statistical method appears in the title of a paper and it is not a methods paper, that is a bad sign.
  2. A reviewer trying to gatekeep – using statistical methods as a way of saying no to other people and feeling smug about being part of the “in group”.
  3. Being unable to have a conversation about when a technique should and shouldn’t be used. If you think a technique should ALWAYS be used that is statistical machismo.

A related issue that I brought up in my original post is whether it will change the ecological conclusions. There are many cases where more complex techniques do change the conclusion in an important way and are more correct for it. But there are also many cases where it doesn’t. The number of macroecology papers I have seen that run a regression with and without phylogenetic regression and get exactly the same answers has to number into the 100s. Now admittedly sometimes its hard to know the outcome in advance until you try, and if trying is cost free go for it. But other times trying comes at a cost in time and can be known in advance to be very likely not to have an effect and is just not worth it. If you are unopen to that last argument (for any statistical method), you are committing statistical machismo.

The final issue I will briefly raise, but save as full a post for another day, is that the more complex the statistics, the more assumptions and the more complex the assumptions that need to be verified and validated. And I worry that we move from a world where most people know how to assess the assumptions to a world where people don’t even have a clue that they should be assessing newer, more complicated assumptions let alone how to do it.

The bottom line is if you are open to a conversation about trade-offs pro and con of multiple techniques, you’re probably not committing statistical machismo. If as an author you hype the technique you use more than the biology that comes out, or, if, as a reviewer, you are absolutely convinced that there is no other acceptable way to do it despite many rational arguments given by the authors, you are committing statistical machismo. If you are so attached to a method you think there is no valid reason for not using it then you are committing statistical machismo.In my experience, most stats experts are not the ones committing statistical machismo. They’re acutely aware that no technique is perfect and every technique has limitations and trade-offs. Its the people who have struggled to learn a technique and often genuinely don’t know the assumptions and limitations of the technique that are most likely to commit statistical machismo. Or in other words, if you are 100% sure you are right in statistics, you’re probably wrong. And you’re probably practicing statistical machismo.

So lets untwiterrize “statistical machismo”. Lets keep it to a description of an unconstructive, inflexible, superior, gatekeeping attitude. And not to a critique of statistical sophistication in itself.

What do you think? Has statistical machismo morphed in meaning since the original post? Has it become twitterized? Has it outlived its usefulness? Can it be meaningfully applied as a description of an attitude rather than a statistical technique? Or do you I think I’m full of it and trying to have it both ways?


45 thoughts on “Taking statistical machismo back out of twitter bellicosity

  1. I think that this post shows that your posts can become quite influental (at least to a certain degree) within ecologists(biologist). Do you ever think about that when you write, e.g. to be careful to be understood correctly?

    • That’s an interesting question that Jeremy, Meghan and I have discussed. Back in the early days when I wrote the original machismo post, no, it never crossed my mind that one little blog post would have much influence. As time has progressed its become more clear to me (and I think all of us) that we have a certain (still fairly limited) power and we have to be careful to use it for good. Certainly part of that is being careful not to punch down, but when you take on a whole topic be it stats or IDH it is hard to know whether you are punching up, down, or across. Then the question becomes how do you respect that while still writing a punchy, edgy, thought-challenging, interesting blog. It is a fine line to walk. And I’m sure I get it wrong a lot.

      But I’m not complaining. I’m lucky (and appreciative to Jeremy) for having the DE platform to express my thoughts.

      • Oh and as for the write carefully part of the question, I always write carefully. I don’t think I fail to say what I mean and mean what I say. But the internet universe doesn’t always take time to parse nuance and complexity. So its a double edged sword – influence gained by entering the internet sphere comes at a cost of losing control of ideas and message and possibly being perceived in a one-dimensional cartoon-fashion. That is why I personally draw the line at blogging and have not entered twitter. But again, I’m not complaining. I made my choices eyes wide open.

    • We do indeed have influence, but it’s hard to say how much. Certainly not as much as some people seem to think (see, e.g., the conversation starting here: https://dynamicecology.wordpress.com/2014/05/13/is-the-notion-that-species-interactions-are-stronger-and-more-specialized-in-the-tropics-a-zombie-idea-guest-post/#comment-26843). For instance, probably only a few hundred people, certainly less than 1000, read most or all of our posts. More read some and a huge number have read just one or two. Is that a lot of people? Well, depends what you compare it to. It’s a lot more regular readers than most academic science blogs have. But it’s not that many people compared to, say, the 10,000 or so members that the Ecological Society of America alone has. Only a small minority of ecologists read Dynamic Ecology more than occasionally.

      I think it’s fairly unusual for single posts (or short series of posts) to have any appreciable influence. Brian’s statistical machismo posts are a rare exception even for us. Meghan’s “80 hours/week myth” post is another (https://dynamicecology.wordpress.com/2014/02/04/you-do-not-need-to-work-80-hours-a-week-to-succeed-in-academia/). I like to make the analogy to Aesop’s fable of the crow and the pitcher. As a blogger, you’re like a crow dropping pebbles (=posts) into a pitcher of water. Few if any individual pebbles are going to make any appreciable difference to the water level on their own. But if spend enough time dropping enough pebbles, eventually you have some influence (=raise the water level). If you’re going to blog, you’re basically making a bet that that’s a good analogy–that you’re not actually more like someone tossing pebbles into the ocean, or against the windows of abandoned buildings, or etc. 🙂

      We do try to write individual posts carefully, though it’s possible to worry about that too much. If you worry too little about your writing, you’ll be too likely to write something you’ll really regret. But worrying too much about whether you’ve been totally clear or whether you’ll upset anyone can become paralyzing. Your reputation depends much more on your body of work than on any one post. So in the long run it’s best to just try to write each post reasonably well and make sure you don’t write anything really terrible.

      I try to gauge my level of writing effort to how important/controversial/widely read I think the post will be. For instance, I put a *lot* of effort into phrasing every sentence in this post: https://dynamicecology.wordpress.com/2017/10/02/newly-hired-tenure-track-n-american-asst-professors-of-ecology-are-59-women/. Because I thought the data were very important, and because I was very worried that I’d upset people and draw a social media pile-on unless I chose my words carefully. And as it turned out the post didn’t draw as many readers or comments as I thought it would and the response it did draw was uniformly positive (one troll aside). Hard to say if the uniformly positive response was because I wrote the post so carefully, or if I needn’t have worried so much.

      Conversely, I didn’t worry about my brief linkfest item last week asking whether the dilution effect is a “zombie idea” and linking to a popsci article about the heated controversy over the dilution effect. In retrospect, perhaps I should’ve thought harder, because several folks on Twitter saw my use of “zombie idea” in that context as (i) clickbait rhetoric that has no place in science, even on a blog, (ii) punching down at a grad student whose research was mentioned in the article, and (iii) causing imposter syndrome in students. Although no matter how much care you take in using rhetoric like “statistical machismo” or “zombie idea”, you will always upset at least a few people. Rhetoric in scientific writing is something on which people disagree (and that disagreement is perfectly reasonable), so there’s no pleasing everybody.

      And yes to everything Brian said.

  2. Dear Brian,
    I fear that the expression “statistical machismo” will be heavily recycled for bad and good, because it translates something that many of us have experienced; a form of stubbornness on how ecological data should be analysed. I occasionally use the expression, although not on twitter or blog posts, to discuss whether advanced statistical skills are needed to understand and interpret ecological data properly. Many graduate students have come to believe that they cannot be good ecologists if they do not master the most advanced statistical approaches and mathematical tools out there. Such an attitude towards statistics comes, too often, at the price of a poor understanding of the biology.

  3. I use your original post as a reading with my students (introductory stats course for grad students in ecology), with the intention of showing that more complex stats are not always needed. But the students seem to focus more on the particular examples, maybe because it’s easier than to think of the general idea. Now I’ll give this new post along that one to make things clearer.
    I think the expression is really useful, but prone to twitterization… As so much else in ecology! And I also think that most, if not all, analyses in ecology are wrong at some level, and nonetheless most of the time the conclusion are valid.

    So, a question: I tend to use complicated analyses for two reasons: I’d hate it if my results were statistical artifacts (and for this reason I often recommend more complicated analyses); and it’s fun to work on the edge of my statistical knowledge! Would you count this as statistical machismo?

    And one last comment: last week, at the project examinations here, a Masters student had a nice categorical sampling design which to me just asked for an ANOVA, which is what she was going to use. But an examiner told her to use GLMs, because they are more robust and without any other reason. That seemed a lot like statistical machismo to me, and often I’m the one to recommend GLMs to people. 🙂

    • I would say the last paragraph is good evidence you’re not statistical machismo! And you make a good point, some people use more complicated statistics just because they can due their quantitative skills and because they like to (i.e. its fun to them). There is nothing wrong with that. In fact, its good for the field to have people pushing the envelope. The problem is those people who then start to judge others who don’t feel the same way and are using simpler but perfectly defensible approaches.

      • + 1. I think you did a good job clarifying and I love your neologism “twitterize”. This whole subject reminds me of the Zen saying, “Before Zen, mountains are mountains and rivers are rivers. During Zen, mountains are not mountains and rivers are not rivers. After Zen, mountains are just mountains, and rivers are just rivers.” I too get frustrated by blanket recommendations of more complicated methods, when the key is to understand the assumptions linking your models to data and the question you want to pose. Big complicated models become necessary to the extent that you have big, diverse and/or messy data and/or you are trying to address more sophisticated questions (e.g. fitting process models rather than testing for linear “effects”, etc.).

  4. IMO the term’s escape from captivity and subsequent transformation probably reflects a general frustration with the use, abuse, and (to non-statisticians) opacity of statistical methods

    Plus its just a great phrase!

      • All of you have written, I think, about improving education about stats. But I wonder if the problem w/ understanding them is deeper than math. My experience both in teaching and working is that people generally want things to give Y/N answers, so they torture whatever method us at hand to get the simple answer. Perhaps students need more work at all levels on more complex scenarios that require balancing conflicting lines of evidence and assessing the validity of different approaches to tackling probs

      • I think you make a good point.

        For sure, a certain level of math is important. You have to understand calculus and probability write and solve likelihood. But once you cross that bar, it is pretty straight forward in most cases.

        But yes, I like your point that maybe a deeper problem is the lack of training or lack of buy-in to the philosophy of scientific inference which causes people to bend and distort statistics.

        I think a related point is the belief that statistics are black and white correct or incorrect. One will never find a situation in ecology where every assumption is perfectly met. Yet we all go ahead. I think this goes back to your point about inferential frameworks. Stats should be seen as summarizing complex information to simple notions like effect size, variance explained, odds that the signal could be created by noise (i.e. p-values) that should inform a quantitative judgment. Yet exactly as you said, we so often try to use stats to give us a “am I right or wrong” answer.

        Great point.

  5. I like your rejoinder here (and I immensely enjoyed the original post and resultant maelstrom). To me, statistical machismo is one end of a continuum, the other end of which might be labeled statistical humility. Statistical humility is not to be confused with statistical incompetence; it’s the conscious decision to keep things simple because simple is sufficient and more widely comprehensible.

    In 1979, Doug Johnson published a paper in Auk where he develops a maximum likelihood estimator for nest survival, but also points out that Harold Mayfield’s quick-and-dirty “assume the unknown event occurred half way between 2 occasions” works just as well in nearly all cases and can be done on the back of an envelope using addition and division. There are many good reasons to use modern sophisticated survival models, but unknown failure time is rarely one of them.

    • Great story about Doug Johnson. I’ll have to look the paper up. He sounds like my kind of statistician!

      And I like the statistical humility-machismo spectrum.

  6. Brian, I think the concept of statistical machismo is very useful and whether people are referencing it correctly or not does not degrade its value. The conversation should be had and it’s great that you started it.

    The problem that many had with the “detection probabilities” post was that it used an antagonistic headline and then continued with more antagonistic narrative. Your frustrations (valid, no doubt) with peer review experiences concerning BBS data were clear as day! What made things worse was that some of the technical details were incorrect in ways that seemed to further support your argument. Few things generate as much internet furor as being aggressively wrong about a subject.

    If you had used the title, “Is advocating detection probabilities sometimes a case of statistical machismo?” and toned down the attack, the response would have been different. Maybe not as entertaining or engaging though.

    Importantly, this headline would have made clear that you were referencing the machismo of advocacy for a method (i.e., attitude), and not that of the method itself.

    • Dan, speaking from personal experience, I think you’re *very* unusual if you’re totally fine with rhetoric like “statistical machismo” so long as it’s phrased as a question, but not fine with it otherwise! In my extensive experience, if a post title contains rhetoric like “statistical machismo” or “zombie ideas”, people who see such rhetoric as over-aggressive (or clickbait, or whatever) are mostly going to do so no matter how the title is phrased. As evidenced for instance by a recent linkfest item of mine with a title phrased just as you suggest–that really upset a number of regular readers whom I highly respect.

      I’d also suggest that the great conversation Brian started likely wouldn’t have gotten started at all, and *certainly* wouldn’t have continued as long as it did or involved as many people, had Brian not been as provocative and forceful as he was. For better or worse (and on balance, I think it’s more the latter), people only comment at length, and then keep on coming back to comment, when they have something they *really* want to say. Which is when they either *really* disagree with a post, or *really* agree. (though often, they don’t comment even if they do really agree or really disagree, of course). And no doubt rhetoric like “statistical machismo” discourages some commenters too.

      It is just really, really hard to consistently write blog posts that are sufficiently provocative or otherwise engaging that you build an audience of readers who want to comment, but not so provocative (or provocative in the wrong ways) as to annoy or upset lots of people. Especially if you want the comments to include those that disagree with the post. It’s a very fine and fuzzy line to walk. And it’s getting harder to walk, because the population of people who might want to have a serious extended conversation with strangers on the internet about *anything* was never that large, and is shrinking for various reasons (e.g., it’s a pain to type on phones). Honestly, in the long run I expect our comment section to die no matter how we write our posts.

      I don’t have any good answers here, and I certainly wouldn’t say we’re perfect. But I think we have a pretty good track record overall.

    • If there were one headline I would rewrite it would probably have been that one. A single word change would have done it (at least for me)

      Is using requiring detection probabilities a case of statistical machismo?

      I definitely don’t want to re-litigate detection probabilities here, but I think saying I was wrong is a bit strong. There were lots of disagreements about interpretations of importance, but I just reread the comments. Nothing there looks objectively wrong rather than subjectively disagreed about.

      • Dan – based on your comments about “antagonistic narrative” I just reread the post carefully, as I care a lot about civility.

        The bottom line is I was pretty flippant/dramatic about how dire the state of not being able to publish a paper in wildlife journals without detection probabilities had gotten. This was about two early and one middlish paragraph out of dozens of paragraphs. And frankly, I still think it was a pretty dire situation that had gotten that way from a lot of aggressive behavior by advocates of detection probability methods. Strong, attention getting dialog was appropriate. And my description was accurate. Honestly, if that small part is what people objected to, it smacks of hypocrisy to me. Of course reasonable people will disagree on this point.

        Beyond that there was a lot of lot of text describing technicalities of detection probabilities and a brief and I believe constructive recommendation about moving forward. I could have improved the title (see above) and I really regret a juxtaposition of sarcasm with a link to a paper by Michael McCarthy (which I apologized for in the comments). Otherwise it was mostly either factual or my opinions. The two exceptions were the sarcastic use of the phrases “whopper” and “gee really” about some of the assumptions of detection probabilities. But that was two words out of over 2000 and they were two points that I intentionally emphasized because I think they had been ignored a lot. What I think was really going on was a lot of people just didn’t like the Welsh paper or my opinions about the relative importance of different issues and conflated that with wrong or antagonistic. It might have felt uncomfortable or surprising to be disagreed with because it had become such a settled issue. But I’m as entitled to state my opinions as they are. And at least I wasn’t rejecting their papers while doing it (I’ve actually reviewed several detection methods papers and recommend eventual acceptance on all of them).

        I was very clear at multiple points that detection probabilities were sometimes an important issue that had to be addressed and that my issue was with rejecting papers without detection probabilities, not the technique itself, which I was glad to see (to bring it back to the topic of this post).

        Bottom line – after careful consideration, I am going to respectfully disagree with you whether that post relied on rhetoric that was over a line or disproportionate to the situation. I doubt I’ve convinced you or others which is fine – we can just agree to disagree.

        OK – end of self examination – beginning of push back (aka reinforcement of the theme of this post with statistical machismo being about attitudes and treatment of others).

        In the comments of that post I got called ignorant among other things. I didn’t do anything approaching that. I think that is fairly representative of how over the top a lot of people perceive the detection probability community as being and the kind of aggression that I was explicitly criticizing. And always rejecting papers because they don’t use your favorite statistical method may not be rhetorical, but it is really, really aggressive. If people in the detection community are worried about the rhetoric being used against them, they (being a subset, but the subset people notice unfortunately), really need to do a little soul searching in my opinion. If they don’t like it, change what they put out in the world. I realize this is pretty strong push back (again against a certain subset), but I think pretty strong push back against the actions of a subset is still appropriate.

        That said, Dan, I appreciate your feedback that prompted to me look. Its a good exercise, so thanks for the prompt. And certainly you’re not in that subset I just mentioned. I appreciate hearing your perspective. And I know you and I deeply disagree on some things statistically related, but I have always appreciated your civil and productive comments on the blog and learned a lot from them. (I only wish you had time for more!)

    • Ben Bolker gave a nice plenary at the 2014 International Statistical Ecology Conference where he petitioned the audience as a community of editors and reviewers to stop demanding advanced statistical methods from authors when such methods were not critical to the objectives of a paper. He specifically referenced statistical machismo from Brian’s blog entry. The plenary was very well received and, again, plenty of folks (myself included) agreed that statistical machismo is a problem. And it is much appreciated that DE serves as a place for ecologists to have these conversations.

      Being provocative to get attention at DE is understandable. But in this era of clickbait and fake news, it is frustrating to watch misinformation spread like wildfire and have detrimental effects in society. The stakes are much bigger when it comes something like American politics than whether an observational study accounted for detection error while modeling the distribution of a species. That does not mean experts (scientific and otherwise) should remain silent while a topic is being misrepresented.

      Brian, while you may have only sprinkled in jabs at detection probabilities throughout the blog entry, it was enough to get a response because the overall theme was disparaging and the concept was at times distorted. The notion that there is a “detection community” is somewhat insulting, as opposed to recognizing that a large collection of applied ecologists appreciate the problem of observational error and ways to overcome it using design-based and model-based approaches. So much effort has been put into developing and applying hierarchical models because they can address fundamental issues associated with trying to observe and understand highly mobile and/or cryptic organisms across space and time. This is not a methodological conspiracy simply perpetuated by gatekeeping.

      I recognize you do not want to rehash details, but if you’re reflecting on the post then here are some things many would agree you got right:

      Gatekeeping in peer review is a problem. This is not just with methods, it obviously occurs with interpretations of novelty and paradigms of thinking.

      Detection error is not always a problem. Yes, there are instances where accounting for detection error does not change inferences greatly. And historical data sets are not without value simply because repeated sampling was not used. The BBS is exhibit A.

      Model estimation can be tricky with sparse data. Logistic regression is notoriously data hungry and an occupancy model is essentially 2 logistic regression models, so estimation in the face of small sample sizes and complex model structures will be difficult. Even if you can generate estimates, there is no guarantee they will be accurate for any one sample.

      Some things that many would agree you got wrong:

      Detection is estimated because peer review demands it. As I said earlier, the body of literature exists for very many valid reasons that are directly pertinent to ecological understanding. And applied ecologists have increasingly appreciated these reasons.

      Occupancy model assumptions include some “whoppers”. In the same paragraph, you worry about the constant detection assumption but then mention that detection does not have to be modeled as constant (and rarely is). False positives are not nearly as pervasive as false negatives, and that assumption can also be handled with advanced models. And the closure assumption, while sometimes tenuous, addresses a very difficult problem that applies to any observational study on species distribution.

      The problem of abundance-induced variation in detection was only recently recognized. This is simply incorrect as there are methods dating back to when the original occupancy models were developed that exploit this relationship to correct the problem.

      The Welsh et al. paper proves that accounting for detection introduces errors at the same rate as ignoring detection. This paper had some interesting analyses and highlighted potential problems with fitting occupancy models, but it had many flaws that have already been pointed out (here and in the literature). Their conclusions were conditional on a set of scenarios they established for simulation, some of which had weak relationships with reality.

      Distance sampling has not made it to the United States. This is simply incorrect, and distance-based functions of detection are often used in scenarios where they can be applied. Distance sampling has other assumptions and limitations that do not make it a panacea for all problems of detection error.

      You can say these are my opinions but it is clear from the comments – and the literature – that they are shared by many.

      In summary, I agree that statistical machismo is an interesting and relevant concept. I just thought the application of it to detection probability was a bit sloppy and motivated mostly by frustration (which again, was admittedly justified). Either way, the goal of stimulating conversation was achieved.

      • Thanks for the thoughtful reply.

        On many points we may have to agree to disagree (e.g. whether the assumptions required for detection probabilities are a whopper is a matter of opinion). I wrote my post from the perspective of reviewing lots of papers and sitting on the PhD committees of many wildlife students and seeing how detection probabilities are typically practiced by typical field oriented biologists – not what could be done or the exemplary studies by methods developers.

        The only thing I’m going to outright disagree with you on (and really its my central point) is whether “Detection is estimated because peer review demands it.” Of course sometimes it is used because it is important to the question at hand and authors are excited to have the tool at hand. I noted that several times in my post. But a non-trivial amount of the time (I don’t know if its 30% or 70% but its not 5%) detection probabilities are used when the authors don’t really want to because they’re afraid of what reviewers will say. I draw that from specific conversations with authors publishing in wildlife journals. I really don’t see how you can argue that detection probabilities are not overpushed in peer review sometimes. Shoot. I mean much as I would have hated it I would have gone ahead and done detection analysis on the BBS if I could have. But I couldn’t and they got rejected. And that was over six different submissions of three papers. If that’s not evidence that overreach has happened and people are smart to fear what reviewers will say I don’t know what is.

        Really out of the whole discussion, the only point that really matters to me is whether we can agree that detection probabilities are forced inappropriately on authors some meaningful fraction of the time (either in peer review or prior to peer review for fear of peer review).

      • I don’t know much about the many different ways of taking detection probability into account into statistical models. Therefore my very naive question: What are the biological implications if one fails to do so? Said otherwise, is my estimate of effect size and direction entirely flawed by avoiding the extra layer of modeling detection probabilities explicitly. For instance, although fitting a linear regression to a binary (0,1) response variable is statistically incorrect, in many cases it will give an effect size that is in the right ballpark (e.g., weakly positive). This is not an argument in favor of sloppy statistics, but my humble attempt to weigh consequences.

      • Sorry one more point/question.

        “You can say these are my opinions but it is clear from the comments – and the literature – that they are shared by many.”

        Yes sure. And many people don’t share those opinions (including me). But how do we move from opinions on which reasonable people disagree to language of “right/wrong”. And not to make you accountable for another person but how do we move from a domain that is clearly opinion to Marc Kery calling me and everybody who disagrees with him “p-ignorant” in the comments on the original post.

        To me the essence of statistical machismo is failing to recognize that statistics is a domain where judgment plays a big role. Almost nothing (actually done by trained practitioners) is definitively right or wrong in real world applied statistics.

        I just have to be honest that I find statistical humility lacking among some of the major advocates of detection probabilities more than almost anywhere else in ecological statistics. Phylogenetic regression advocates probably comes in second, but to my mind they’re a pretty distant second. I think it is unfortunate because it may cause short term gains, but it hurts the cause in the long wrong.

      • I would agree that estimating detection error is forced inappropriately some fraction of time. But yeah, I suppose we disagree on what that fraction is and how much of a problem it represents. I’d also disagree with the notion that it is only sometimes used due to the question at hand – I think it is actually often. In my experiences, issues with availability (whether an individual is available to be observed, given that it is present at a site) are generally more problematic than detectability, especially for mobile organisms. Occupancy models don’t solve this, but they do force you to think critically about the observation process. The growth of camera trapping has highlighted this.

        I don’t think Marc’s “p-ignorant” comment was meant as a personal attack on anyone that ignores detection. I think it was meant literally – by ignoring detection your analysis is ignorant about p.

        I think the attitude among folks like Marc is that the issue of detection error seems pretty obvious. If you want to look at where and why a species is present where it is present, logistic regression is an obvious choice. Now if somebody told you that many of your 0s were actually 1s, wouldn’t you be concerned with trying to estimate the occurrence probability with such contamination? Observational studies suffer enough with uncertainty while attempting to make inferences on ecological relationships. It seems prudent to make sure your data are not misleading.

      • Thanks Dan as long as we’re both saying the answer is in the interior (some excessive forcing detection probabilities occurs) I think it is OK to have differences of opinion.

        ” If you want to look at where and why a species is present where it is present, logistic regression is an obvious choice. Now if somebody told you that many of your 0s were actually 1s, wouldn’t you be concerned with trying to estimate the occurrence probability with such contamination?”

        This is really the nub of it. The answer is maybe. Every statistical model already has an error term. Every logistic analysis done on any binary variable has error zeros and ones. That is why we used an error model (binomial in this case). We could but should not usually pull out and separately model 10 different potential sources of error. Rather the error term can account for it. Where we cannot do this is when the the errors covary with our independent variable (e.g. veg height or elevation in a typical detection model). This is one of the core assumptions necessary to drive the OLS formula (or any fancier model). If the direction of errors across all errors sources is independent (or in practice nearly independent) of the X variable (which can sometimes happen even if individual sources of error are correlated), then absolutely we can just go ahead. I would have bigger error bars but still be unbiased. And people make that choice every day. The goal is never to eliminate every source of error so that my error bars are zero. It would take 20 years to do a study. And we would be like the physicists trying to estimate the gravitational constant to 3 more decimal places. Does detection covary across space with my variable of interest (say temperature or presence/absence of a predator). Maybe. Maybe not. Often yes but only weakly. And often much more weakly than the effect size of the factor I am really interested in.

        And then you have the Welsh paper (and nobody talks about it but the supplemental material of the rebuttal paper). In exactly the regions where you should be most worried about detection, the identifiability problem can actually worsen your estimates of occupancy! That is not a big incentive to explicitly model the error.

        If you could just analyze the same dataset with or without occupancy, then sure account for it; its just a few watts of computer time. And then carefully eliminate the possibility you are in a zone where you just worsened your answer by computational issues (which I rarely see people do even though they claim to worry a lot about error). But that is not the reality on the ground. You have choices, some of which are costly, to enable detection analysis.

        So a long winded answer, the short version of which is maybe.

  7. Just wanted to leave a positive note here saying thanks for both the original post and this one. I agree that this is an important issue to point out and discuss.
    I especially agree with your link to the readability of papers. Statistics is fun for me and I love being able to use it as a tool to answer complex questions in ecology, but if a question can be answered with a simple statistic like a t-test, that everyone understands, I strongly feel that this should be the way to go. It doesn’t help anyone if the paper can only be truly understood by a small number of people because the statistics just got too complicated.

    • Thanks Taina – the point you highlight about communication and understandability is I think perhaps my most important point, but it seems to be one that is relatively ignored.

  8. @Raphael – this is exactly the conversation I think we should be having around detection probabilities.

    I think the first thing is that detection probabilities matter when you are looking at occupancy. They can have spill over effects on abundance or richness but those fields have had their own ways of dealing with the issues (e.g. index of abundance instead of absolute abundance, mark-recapture to estimate absolute abundance, rarefaction curves or Chao estimators for richness).

    So the central biological question is if I visit 10 sites and I see my species at 6, what is the occupancy? 60% is an obvious answer. But it could be wrong. It could be 50% if I overobserved and misidentified another species. But more likely it could be 70% or 80% if I missed the species at some of the sites I visited (underobserving or detection error). I can try to fix this by going back to the site 2-3 times and if I see it just once I can say it was there. In fact I can do better and estimate how often I am missing the species and apply this correction factor to all my data. Of course, it could also be that the species really wasn’t there and then a colonization event happened, or it was there and an extinction event happened and my observations were perfect. The challenge is that we cannot directly observe detection vs. presence – in statistical terms there is an identifiability problem. So we have to bring in or assume auxiliary information (like no colonizations or extinctions). So when does this matter?

    Well if I care about the absolute occupancy for the one species/time/location I am measuring, as in for example in a management context when there are regulatory thresholds on occupancy, it is probably important to correct just as one needs to use mark-recapture instead of an index of abundance when one wants to know true abundance rather than relative abundance. This is where it quickly gets complicated though. The extremely careful Welsh paper (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0052015) and the supplemental material to the rebuttal paper (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099571) to my read, and the read of the authors of the original paper suggest that sometimes the computational issues in many (realistic) scenarios can sometimes overwhelm the goal and reduce the accuracy of estimating occupancy. Those papers formally analyze the questions you ask about error sizes vs. effect sizes, etc. And short short, it depends on the region of parameter space, but in some cases estimates of absolute occupancy is better when you take detection into account, but in some regions it is actually worse. And of course you don’t necessarily know which region you will end up in a priori. But me personally, if I knew absolute occupancy is what I cared about I would design the data to estimate it and then very carefully scope out my computations and communicate my uncertainties. This has never been a goal of my research though.

    Where it gets tricky is if we only care about relative occupancy. We are comparing occupancy across species, or more typically across space (what are good habitats) or across time (is it trending up or down). This is where the opinions will vary. What is true is that detection probability could vary along the axis of comparison and potentially become a confounding factor. But it could very well not to. My conventional wisdom is that when you are comparing across species one should worry a lot about detection covarying with species and messing up the relative occupancies. But you can cook up scenarios where it wouldn’t be a big effect (e.g you are surveying ducks swimming on well defined ponds, or you are comparing trees >10cm dbh in exhaustively surveyed sites). Also if you are comparing across species that are similar in habitat but vary in orders of magnitude by abundance (as happens frequently such as a study of grassland sparrows), then abundance variation is probably driving most of the variation in detection and could safely be ignored in relative comparisons. Conversely, in comparisons within one species across sites or at the same site across time, my first order suspicion would be that the effects are real and not an artifact of detection(or a least the effect sizes of detection on occupancy would be small compared to the other ecological factors). But of course I can cook up contrary examples here too (e.g. comparing birds across grassland and forest sites or comparing birds over time in a site going through succession). In that last succession example, you could end up saying the bird was declining when in reality it was just getting harder to detect as the vegetation layers go thicker and it actually was increasing, just seen less often. In short, I think for relative occupancies (comparisons across space/time/taxa) judgment really comes into play.

    You can and many do argue that if it could possibly be an issue you have to measure it to see.
    But I would say that is a standard science is never able to rise to. We always have many known potential issues hanging around. We think through what the big ones are and address those. That’s why we train for a decade to become a scientist instead of just putting down a recipe in a science cookbook. Sometimes I see detection being one of the big issues, sometimes I don’t.

    Further weighing into that discussion is that estimating detection probabilities comes at a cost. You have to have repeat visits to the same site (usually twice which would cut your number of sites in half for a given level of effort). And valuable historical datasets like the breeding bird survey don’t have this design, so if you insist on detection probabilities it is tantamount to only answering questions you can answer with modern, shorter-time, smaller space datasets. When you throw in the above mentioned computational complexities which have tangible results on accuracy and the fact that detection probabilities are no more perfect than other approaches (i.e make assumptions that are sometimes questionable), well it gets to be a matter of judgment and opinion – a weighing of issues and priorities given the question you want to answer.

    So if you ask me should you use detection probabilities, my answer is a great big it depends. It depends on whether you are looking at occupancies or abundance. it depends on if you are looking at absolute or relative occupancies. It depends on whether you think detection is going to be an important covariate relative to the effect size of other factors along your axis of comparison. It depends on how seriously you take the computational issues. It depends on whether you can answer your question with data that exists or could be collected using a repeated visit (or distance estimate) design.

    But others will say there is no judgment involved. You should always use detection probabilities no matter what. Just in case. You absolutely shouldn’t publish a paper on occupancy or abundance even in a comparative analysis of abundance (not occupancy) within one species across space. In fact others have said that to me repeatedly (6 separate review contexts) during reviews of papers using the BBS. So I’m probably just bitter 🙂 I just wish this was a topic reasonable people could discuss the trade-offs and occasionally agree to disagree but not think the other person is an idiot.

  9. I don’t know, it seemed entirely clear to me the first time that you were addressing the underlying motivation(s) behind the use of complex techniques, and not their inherent technical legitimacy. It’s fairly sad the whole thing has to be addressed in the first place frankly.

  10. I received the following comment on this post via email and was asked to post it on behalf of the commenter, who wishes to remain anonymous to others:
    I agree that statistical machismo is a useful term. I don’t think it means that the use of more complex approaches is a bad thing. In my opinion, as long as one carefully justifies and explains the approach so that others can understand it there is no problem using more complex approaches or implementing an alternative statistical framework to advance knowledge. It seems to me that an alternate way of labeling “statistical machismo” in the way that Brian intends (he can correct me if I am wrong) is “statistical bullying” as that is what I took away from Brian’s posts. Not all users of more complex approaches or a particular statistical framework are guilty of this. Nonetheless, I too believe that some argue more than they should about the almighty power of more complicated approaches or alternate statistical frameworks and scold/reprimand/reject/critique others who don’t use their particular approach and argue how all students should be taught these other approaches at the expense of learning simpler approaches.

    I don’t think statistical machismo is recognized by the individuals being macho just as bullies don’t see themselves as bullies. It is also easy for these attitudes to be reinforced in anonymous peer review when editors fail to call out reviewers that are being statistically machismo for any number of reasons. The fact that some people don’t seem to recognize that others experience some form of persecution that stems from a biased/subjective opinion on what statistical approach should be implemented to analyze data is disturbing.

    I really do hope that Dynamic Ecology conducts a poll on how many individuals felt that they have observed or experienced someone acting in a statistical machismo way to others. It would also be interesting to learn what fraction of our community is being impacted by these attitudes and whether some groups (e.g., grad students versus late-career faculty, users of particular statistical frameworks or software packages, level of complexity that one tends to include in their analyses) are impacted more than others.

    • Thank you for the comments.

      I agree, in the extreme statistical machismo does cross to statistical bullying.

      And I think it has become clear to me that you are right and we do need a survey on this topic.

  11. Many thanks Brian for your sharp and concise summary. I will have to let it sink for a while. It sounds like yet another version of the sampling effort vs. precision trade-off. I hear it a lot in the context of meta-analyses. Some reviewers will claim that it is a waste of time to conduct a meta-analysis on questions already covered in 15 published studies (the “we already know“ argument), whereas others will claim that 15 published studies is not enough to have a good overview of the question (the “we don’t know enough” argument). It is a lose-lose situation. But the argument I find most depressing is when reviewers argue that the conclusions (of a meta-analysis) cannot be valid because it relies on invalid studies. It makes me realize that “statistical machismo” is a sensitive issue because it transposes to a scientific attitude in a system where it is increasingly difficult to be heard. You probably know that ca 2,000,000 peer-reviewed papers where published in the natural sciences just last year, and that the doubling time is about 15 years? That is depressing to me.

    • I think you are right that there is a healthy dose of we should give our best available answer for now vs. the answer is not perfect so we shouldn’t give it (slightly different than your axis but I think related).

      You’re point about meta-analyses supposedly not working when individual studies have problems reveals the profound misunderstanding of the nature of statistics and how it incorporates error many people have. Which is directly relevant to statistical machismo. In a wierd way statistics embraces error, or at least works with it, yet most people spend their lives fearing error.

      And yes I think in the end statistical machismo is a way of abusing peer review to give precedence to your own field of work disguised under the mask of sophistication and caution. That’s probably not most people’s conscious motive. But that’s where it ends up. That’s part of why I call it out so hard. At my career stage a couple of papers more or less don’t matter as much as I whine about them. But statistical machismo is gaming the sytem. It is distorting science.

      But don’t lose faith! That 2,000,000 papers is divided into an awful lot of subfields. Overwhelmed as I am by the literature these days even compared to my graduate career 20 years ago, I still think I manage to find a majority of the good papers in fields of interest to me (certain aspects of biodiversity, traits, spatial ecology, etc). Good work does get noticed!

  12. Pingback: Poll on experiences with statistical machismo | Dynamic Ecology

  13. Pingback: Recommended reads #117 | Small Pond Science

  14. Thanks for a useful concept. I think the key is that practitioners ask themselves “am I engaging in statistical machismo?” rather than “is John Doe author engaging in statistical machismo?”

  15. Pingback: Poll results on statistical machismo | Dynamic Ecology

  16. Pingback: Machismo estatístico (tradução) – Mais Um Blog de Ecologia e Estatística

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.