A novel check on causal inference methods: test ridiculous causal hypotheses (UPDATED)

Just ran across an interesting paper from international relations (Chaudoin et al. 2016), with potential application to ecology.* It’s about the problem of “selection on unobservables”, also known as the problem of shared causes. For instance, you can’t tell that joining an international human rights treaty causes countries to respect human rights, because some possibly-unobserved causal factor that drives compliance with the treaty might also drive the initial decision to join. So that countries that join the treaty are those that would’ve respected human rights anyway. I’m sure you can think of analogous scenarios in ecology. Various methods have been proposed to deal with this and allow causal inferences from observational data (e.g., matched observations, statistical control using covariates, structural equation models, instrumental variables). But do those methods work in practice?

The linked paper takes an interesting approach to answer that question: it uses a standard causal inference method to estimate if joining the World Trade Organization or the Convention on Trade in Endangered Species has a “causal” effect on variables that nobody thinks are causally affected by international trade or trade in endangered species For instance, the paper asks if joining CITES causes a country to have a legislature. The authors find that membership in both treaties is estimated to have statistically and substantively “significant” effects on irrelevant variables an alarmingly high fraction of the time. Which suggest that standard methods of causal inference from observational data have alarmingly high false positive rates when applied to real world data (or else that researchers’ hypotheses about what causes what are completely useless).

I think it’d be very interesting to take a similar approach using ecological data and other methods of causal inference. For instance, if you fit structural equation models with some ridiculous causal structure to real ecological data, how often do you find “significant” and “strong” causal effects? And how well do the resulting SEMs fit observed ecological data, relative to the fit of SEMs based on “plausible” causal hypotheses? Has anyone ever done this in ecology? If not, it seems to me like low-hanging fruit well worth picking.**

Off the top of my head, I can think of a few ecology papers in the same spirit. For instance, Petchey et al. (2004) and Wright et al. (2006) tested whether conventional classifications of plant species into “functional groups” (e.g., C3 plants, C4 plants, forbs, etc.) are biologically meaningful. They did this by randomly reshuffling the functional groups into which real species were classified, and then checked whether the resulting ridiculous functional group classifications result in a significant relationship between functional diversity and ecosystem function. The answer is yes: randomly classifying species into biologically-meaningless functional groups often results in a “significant” relationship between functional group richness and ecosystem function, even after controlling for effects of species richness. And the relationship often is just as strong as the relationship with “real” functional groups. Which suggests that “real” functional groups aren’t so real after all. Ok, the Petchey et al./Wright et al. approach is slightly different than the one discussed above, in that it uses randomized data on possibly-relevant variables rather than non-randomized data on obviously-irrelevant variables. But the spirit is the same.

UPDATE: In the comments, Sarah Cobey reminds us that she and Ed Baskerville recently used the Chaudoin et al. approach to test a causal inference method known as convergent cross mapping. It failed badly.

I think the same approach could be much more widely used in ecology. Don’t just use causal inference on observational data to detect causes that seem like they might be real. Make sure your approach doesn’t detect causes that definitely should not be real.

*One of the best parts of being me is that I get to type weird sentences like that one.

**And if this is a really stupid idea, hopefully Jim Grace will stop by in the comments and say so. 🙂 One operational definition of “blog” is “place where you can share half-baked ideas, so that people who know better than you can tell you why they’re only half-baked.”

67 thoughts on “A novel check on causal inference methods: test ridiculous causal hypotheses (UPDATED)

  1. Of course this will be the same in Ecology. How can it not? A more systemic exploration of the kind of confounding infecting observational designs for causal inference in ecology and evolution is The effect of unmeasured confounders on the ability to estimate a true performance or selection gradient (and other partial regression coefficients).” Evolution 68.7 (2014): 2128-2136 (sorry,I can’t help myself). Increasing sample size does not reduce this error as this only reduces sampling error. And SEM does not magically make the problem disappear – the assumptions are just more transparent. Instrumental variables (Mendelian randomization for example could be used in E&E) can reduce or even eliminate bias but at a high (high!) cost of variance (and also have some strong assumptions). There are some clever causal discovery algorithms although I’m skeptical of their real-world efficacy (but I haven’t pursued this – so maybe I should be optimistic until proven otherwise).

    I think a major conclusion of my paper is that a goal going forward should not be abandon observational designs but to implement better (more realistic?) models of standard errors (naive models only assume sampling error so the SEs are waaaaaaay too small).

    • “Various methods have been proposed to deal with this and allow causal inferences from observational data (e.g., matched observations, statistical control using covariates, structural equation models, instrumental variables). But do those methods work in practice?”

      Matched observations and statistical control only help if any unmeasured confounders are very small – a strong assumption (the consequence of omnisciently adding the right covariates was explored in the paper cited above). It is a misconception that SEM deals with unmeasured confounding. It doesn’t and I don’t think any SEM guru claims it does. And as I said earlier, instrumental variables also have strong assumptions. I think the causal discovery algorithms should be explored more.

  2. Woooo causality! I have a few thoughts – and I’ll start with the toplevel one. One comment in their paper really struck me:

    “Finally, our strongest emphasis is on the relationship between theoretical knowledge and empirical models. Each and every facet of the problem of false positives, its existence, severity, solution, and assessment, requires the researcher to think carefully about the underlying data generating process and what she theoretically believes about it”

    Basically, build a damn good well justified model before you go and start looking at associations. I like to tell my stats students that this is the “Biology First” part of data analysis. If you have a poor model that is not well thought out and does not accurately capture a plausible causally structured data generating process, then, no, do not try and make causal statements. Towards that end, path diagrams are enormously helpful – let them get as crazy as you will with as many concepts as you can think about, and that begins to show you the depths of potential problems in any simpler associative analysis you are about to do. It’s sobering.

    So, point the first, even if you’re going to look for a bivariate association, you must think heavily about the multicausal data generating process if you want to have any honesty in your discussion of any modeled results. Maybe I’m naive, but I think we’re OK at this in Ecology.

    • This is exactly right. The authors theorized (then simulated) a data-generating process with correlated errors that could cause confounding bias, mis-specified their statistical model by using simple regression (i.e. neglected to include estimates of error correlations), then showed that indeed, failing to account for common cause confounding gives bias (stats 101). This paper is not a cautionary tale about causal (or causal network) modeling, it’s yet another cautionary tale about model mis-specification and the unthoughtful use of traditional bivariate methods.

      • how does one include estimates of error correlations if one is ignorant of what confounders are missing? Confounding can be modeled more generally (I’ve modeled it but see especially papers by Greenland). Also, I’m not sure that the issue of confounding is addressed in stats 101 at least for textbooks in bio classes. I think Schluter & Whitlock do at least mention confounding but I don’t think they outline solutions. I didn’t see any mention in Quinn and Keough 2002, Gotelli and Ellison 2012 or Sokal and Rohlf 2012. The last 3 textbooks are commonly used at the grad level. Consequently, there is very little tradition of addressing confounding in biology at least compared to fields like economics or epidemiology.

  3. Hi Jeremy, Thanks for the interesting post. I have been thinking a lot about causality in ecological research lately, in large part thanks to Jim Grace and Jarrett Byrnes. With respect to your hypothesis that spurious correlations might be misinterpreted as causal relationships, I think this ignores two points that are central to SEM.

    First is that SEM does not promote the testing of illogical causal models or a brute-force, model selection-type approach. In other words, SEM requires the careful specification of directed relationships as to minimize both nonsensical correlations (e.g., temperature driving latitude) and to enhance the interpretation of causation (e.g., the ‘back-door hypothesis’ to constrain the probability of causation). A lot of this is driven by existing theory and experience with the system, further constraining the possibility of erroneously inferring causation. It is even possible to model correlations among error variance, which further gets at the idea of an unspecified underlying driver (of course, that requires thought about whether such a driver might exist, most likely a consequence of the kinds of variables that can be easily measured). It wouldn’t surprise me to find lots of potential configurations that reproduce the data well (or well enough), but only a very misguided researcher would seriously consider those as a viable reproduction of the natural world.

    Second is that SEM does not infer ultimate causation. In other words, X may cause Y but not all the time and everywhere, which is an important caveat to keep in mind when wading into the realm of ‘causal modeling.’ There may in fact be a whole suite of variables that modify this relationship, both directly and indirectly. Examination of the error variance and invocation of existing theory and experiments could be used to validate the likelihood of that causal assumption having been captured by the data. But it is silly to think that even very complex SEMs even come close to reflecting reality (except in cases like Sewell’s guinea pigs where the phenomenon is very constrained).

    Jim made some eloquent points along this line in his responses to your 2012 blog post on SEM that emphasizes the utility of the method for testing causal hypotheses under a given set of data, and then using it as a platform to engage with new data to retest and refine the hypothesis (i.e., the process of doing science!). So a formerly spurious correlation may be reinterpreted in light of new evidence, progressing our understanding of the system in question. Is it incorrect to apply SEM under these earlier circumstances? Perhaps, but that is up the individual researcher to decide what they believe based on the history of evidence and their own intuition (and yes we may be wrong lots, but we are also not operating in a vacuum!).

    Keep up the interesting posts on the topic!

    Cheers,

    Jon

    • “In other words, SEM requires the careful specification of directed relationships as to minimize both nonsensical correlations (e.g., temperature driving latitude) and to enhance the interpretation of causation (e.g., the ‘back-door hypothesis’ to constrain the probability of causation).”

      In other words, you have to already know what causes what. SEM just estimates the strength of the causal relationships that you already know for sure exist.

      Because if you don’t know that, and you’re using SEM (or whatever) at least in part as a tool of causal *discovery*, then what makes you so sure that your carefully specified casual hypotheses are better than random guesses? My point in the post is that you had better check this, and that one way to do it is by checking obviously-silly causal hypotheses. Because if you don’t check it, how do you know you’re not in the same position as all the BEF researchers who were (and maybe still are!) confident that the usual “functional groups” are biologically meaningful?

      “It wouldn’t surprise me to find lots of potential configurations that reproduce the data well (or well enough), but only a very misguided researcher would seriously consider those as a viable reproduction of the natural world.”

      You and I have very different levels of trust in ecologists’ causal intuitions. I’m very worried by the suggestion that our data will not warn us when our causal hypotheses are sensible, but that’s ok, because we don’t need the data to tell the difference between correct (or correct-enough-to-be-useful) causal hypotheses and incorrect ones. No, no ecologist is ever going to be so silly as to think that changes in starfish movement behavior cause global climate change or whatever, even if that hypothesis fits the data well (or as well as plausible causal hypotheses). But ecologists are constantly faced with the task of distinguishing among non-silly hypotheses, with the aid of data. Why should we be confident that our data will reliably distinguish among non-silly causal hypotheses, if it can’t reliably distinguish between silly and non-silly ones?

      And if you say, as Jarrett does, that if our data don’t reliably distinguish silly from non-silly causal hypotheses, that’s on you as an investigator because your path diagram wasn’t sufficiently well thought-out and finely resolved, well, that’s absolutely fair enough. I’d only add that science is done by people, not hypothetical idealized people. I for one would be curious to know how good the typical published ecological SEM is, as compared to a silly one, using something like the Chaudoin et al. 2016 approach. Wanting to know that isn’t an attack on SEMs as an approach. If that exercise revealed that ecologists’ published SEMs are way better than silly ones, great–that’d be reassuring. If that exercise revealed that ecologists’ published SEMs are no better than silly ones, well presumably that’d be a “teachable moment” that could be used to drive home to ecologists the need to “raise their SEM game”.

      • “In other words, you have to already know what causes what.” Not necessarily. If you’re doing exploratory SEM, just be honest about it. And its limitations. Just like if you’re doing exploratory bivariate modeling, be honest about it, and its limitations. SEM is just another technique for reifying biological theories. At the end of the day, all models are wrong, and some are useful, regardless of technique. And once you do it, follow up with something confirmatory using your tested model on a novel data set! Then you’ve moved into confirmatory analysis, and that’s a whole different more stringent ballgame.

        But you are right, we are human. While SEM allows for one to deal with some of the issues of solely bivariate models, welp, Rumsfeld’s unknown unknowns are always there. That’s not a solvable problem ever. We try and skirt it as best we can by doing the best science we can and thinking as deeply as possible about multicausal systems. But someone can always come along and say, “Well, there’s this one thing that even in your great care, you missed!” That’s science! Our hope as an enterprise is that as time goes by in any given field, that happens less and less frequently.

        But no technique ever is going to be resistant to using it with a poorly thought-out causal model of how the world works. The fortunate thing is that the solution there is not better or different statistics, but making sure we all think deeply and read widely before taking the plunge into any complex modeling exercise.

      • Yup! If you end up with a bunch of equally plausible causal models, now is the time to go out, perform experiments, gather more data, and test them. Or perhaps even consider respecifying or collapsing the model to integrate that new information. Or, barring that effort, put them out there and ask others to challenge them.

  4. Points the second and third from the world of SEM (you knew I was going there) is that there are good criteria for building causal models. I found https://www.amazon.com/Causal-Inference-Statistics-Judea-Pearl/dp/1119186846 by Pearl and colleagues to be a nice succinct summary. The two elements of a multicausal model (that you should build even if you are doing an associative study in order to keep you honest – see above) that Pearl pushes are the so-called back-door and front-door criteria (hence points two and three).

    To boil it down, the backdoor criteria essentially states that for any pair of variables, any joint driver (either that drives both, or is driven by one and drives another) has to be accounted for in order to produce causal inference. This is largely what is being talked around in the paper you linked. I’m a hair surprised that a country fixed effect didn’t handle the issue as a ‘garbage collector’, but, given the number of variables that even conflict in causal direction that are wrapped up in that variable – some of which are backdoor drivers some of which are not, I’m not that surprised. This is why it’s hard to pick a good instrument. And this is why building a hairy multicausal model is essential so that you can keep yourself honest. In essence, you have to known what your known unknowns are!

    This is where the front-door criteria jumps in as a helper. In essence, let’s say you have two variables that you suspect are associated. Let’s say herbivore abundance and plant abundance. You suspect a causal connection, but there’s so much else you cannot measure. The front-door criteria would say find something that is **only** driven by your predictor that affects your response and has no additional back-doors. In looking at the system (because you’ve made a great multicausal diagram), you pull out herbivory damage as your “front-door” variable. Now, no matter what co-drives plants and herbivores, you can establish if there is a causal connection between the two going from herbivores to plants. (n.b. I’m sure someone will come up with some biological reason this is a bad front-door, but, hey, usually the example is smoking, cancer, and tar as the front-door, and I thought this more relevant). This is slightly different that an instrumental variable approach (it’s damned hard to find a good instrument in Ecology), but, related conceptually in terms of pulling out clean causality with more “independent” variables (note the front-door is endogenous to the system wheras instuments are exogenous).

    So, in sum, it all comes down to thinking like a scientist first! What is the range of possibilities of multicausal systems that could be present!? What does theory and the literature tell you? What about natural history? And wherever possible, think this way **before** you start data collection, as if you suddenly realize you have a confound that trashes any causal inference after you collect your data, you’ll just be sad. We’ve all been there.

    • “I’m sure someone will come up with some biological reason this is a bad front-door” – I’ll give it a shot (not being an ecologist) – the extent of herbivore damage is probably a function of the plant’s ability to allocate resources to defend itself and so a function of nutrient levels, water levels, competition, allelopathy, soil microbes, light, temperature, etc. etc. These are all of course also factors effecting plant abundance and so common causes of both herbivore damage and plant abundance.

      • I think the notion is that herbivores damage plants to gain nutrients, therefore herbivore abundance and plant abundance can be causally connected. Again, not to say herbivores are the only things that damage plants, or that the scope of the damage is not modulated by other factors (.e.g, defenses, nutrient content, etc.) but one can infer *some* degree of causation (not ultimate) between plant abundance and herbivore abundance through herbivory.

      • I knew someone was going to take a potshot! *sigh* Bites? I guess I really think about this in terms of snails and kelp. The snails I work with leave perfect little holes in the kelp. You know it’s them. It’s nothing else. And allelopathy in my system is minimal relative to others.

      • Jon – Set a path from herbiv abund -> herbiv damage -> plant abund with effect coefficients beta1 and beta2. Then the effect beta of herbiv abund on plant abund is beta=beta1*beta2. Assuming confounders in U that have paths to herbiv abund and plant abund BUT NOT to plant damage, this is consistently estimated using front door. BUT if there is a path from U to plant damage (that is plant damage and plant abundance have a common effect) then the front door estimation is biased. My comment gave a list of reasonable paths from U to herbivore damage. For example, rainfall has a (+) path to plant abundance but a (-) path to herbivore damage (because healthy plants have more energy to repair damage, so less damage is measured). Here is some code to simulate the front door and the bias if the path from U to Z != 0.

        # simulation of front door for estimating effect of herbivore abundance (X)
        # on plant abundance (Y) using herbivore damage (Z) as the mediator
        # Jeffrey A. Walker
        # October 11, 2017
        # front door assumes uz = 0, if not the result is biased
        n <- 10^4
        u <- rnorm(n)
        ux <- 1 # effect of confounder on herbivore abundance
        uz <- 0 # effect of confounder on damage
        uy <- 1 # effect of confounder on plant abundance
        beta1 <- 1.3 # effect of herbivore abundance on damage
        beta2 <- 2.4 # effect of damage on plant abundance
        sigma.x <- 1
        sigma.z <- 1
        sigma.y <- 1
        x <- ux*u + rnorm(n)
        z <- beta1*x + uz*u + rnorm(n)*sigma.z
        y <- beta2*z + uy*u + rnorm(n)*sigma.y
        fit1 <- coefficients(summary(lm(y~x)))
        fit2 <- coefficients(summary(lm(y~z + x)))
        b <- fit1['x','Estimate']
        b.z <- fit2['x','Estimate']
        # results
        b – b.z # front door estimate of effect of herbivore abundance on plant abundance
        beta1*beta2 # E(effect of herbivore abundance on plant abundance)

      • “but one can infer *some* degree of causation (not ultimate) between plant abundance and herbivore abundance through herbivory”

        Set either beta1 or beta2 to zero in my script and as long as uz != 0 the front door will suggest an effect of insect abundance on plant abundance when the true effect is zero (beta1=0 seems biologically implausible but not beta2).

  5. IF
    A bunch of possible causal structures (some plausible, some ridiculous) are all consistent with the data.
    AND
    Confidence in the final conclusions depends strongly on writing down the “right” model(s) based on how we think nature works (seems like a strong emphasis on this point in comments so far).
    THEN
    Isn’t SEM (or some related method) highly susceptible to confirmation bias (i.e., nature will tend to appear to work as we thought)? Or, a “softer” conclusion would be that the sophistication of the method perhaps leads us to overestimate the degree of objectivity and underestimate the degree of subjectivity underlying final inferences.

    • “..sophistication of the method perhaps leads us to overestimate the degree of objectivity and underestimate the degree of subjectivity underlying final inferences.”
      I would respectfully argue that the opposite is true. Using some flavor of Causal network model (SEM or otherwise makes puts the subjectivity issue of potential alternative models at the forefront, so that they are transparent and can be discussed.

      A common workflow goes something like this:
      1) list fully your multivariate causal hypothesis, defending the inclusion and direction of each relationship with prior evidence (ie. logic, previous work, mathematical constraints)

      2) Identify the relationships you would like to estimate, and use the causal diagram to inform what data you must collect to get an unbiased estimate.

      3) fit the model whose structure is consistent with your causal hypothesis…this gives feedback on the response variables (what is “significant” and what is not) AND gives you feedback on your hypothesized structure (“the structure implies that these two variables should be conditionally independent, they are not. Oops, something in the structure is wrong.”)

      4) Either A) Publish B) use the feedback from your fit to update your causal hypothesis, knowing that, since it was informed by looking at data, that a strict test of the new structure requires evaluation with different data. We also must recognize that for any given causal structure, we can list all those that are statistically equivalent (this is knowable before fitting). If some of those are also scientifically reasonable, they could be worth considering.

      The feature of scientific interest is the causal hypothesis (which is just our ideas about how the world works encoded). We can test if the data are consistent with that (complex, multivariate) hypothesis. If somebody has a problem with that hypothesis, at least we have made the hypothesis clear and they are free to suggest an alternative (i.e. “Hey, you missed a counfounder due to X that is causing bias in your estimate”). Thus, scientific progresses as a clear debate driven by our ideas of how the world works.

      This strategy is MUCH more low risk and conservative than one in which we pretend that we have no ideas about how the world works and try to use statistical/machine learning techniques to identify relationships and make sense of them post-hoc. Which is, I think, the only way one can claim to be clear of a risk of confirmation bias.

    • Yup, CCM is another good technique, although it, too, has limitations that are similar to the above if common external drivers or intermediaries are present. Or so is my impression from talking to some folk who use it. Again, think about the structure of the system before you leap and where you uncertainties lie!

      • Oh, I had not. Also, boo on me for reading a comment but not the paper. Given a review I’m collaborating on right now this is… helpful? If troubling?

      • Thank you all for this very detailed discussion! Still wrapping my head around the implications of it all in PhD work. While reading up on this, I saw that George Sugihara and group wrote a response to the Baskerville and Cobey paper linked above which discusses how to go about the interpretation of causality when dealing with synchronous variables. Link here – http://www.pnas.org/content/114/12/E2272.full

      • I think it’s worth noting that CCM is particularly vulnerable to violations of *some* of its assumptions, just like any other technique. In particular, CCM assumes that systems must be weakly coupled (i.e. not synchronous). The reply from the Sugihara lab posted by Siddharth includes a nice explanation of how to identify synchrony, and how to go about analyzing systems where it appears. And, in fairness, strong synchrony will mess up pretty much any method that’s meant to separate causal drivers (even if you are applying some kind of an instrumental variable approach, you need to be able to detect separate information from the two processes).

        With respect to other violations of assumptions – I was initially super skeptical when I ran into the approach, but have since found that CCM performs surprisingly well. For example, in a paper we wrote back a few years ago, we found that its false positive and false negative rates given realistic amounts of process noise and observation error were more or less in line with those for standard linear regression models (http://onlinelibrary.wiley.com/doi/10.1890/14-1479.1/full).

        Some of the other caveats noted by Cobey and Baskerville – especially the problem of a secular trend in model parameters (which they call “transience”) – are definitely food for thought. In theory, you’d expect whatever drives those changes to be internalized somewhere in the system dynamics – but in practice, I can certainly imagine a trend (e.g. in competition coefficients caused by anthropogenic nitrogen deposition) that was not well-accounted for by the empirically observed manifold. I’ve actually been discussing this problem with a somewhat more algorithm-savvy friend of mine in hopes of finding a solution – perhaps a way to identify different time windows where causal forcing is strong vs. weak. But again, I don’t know of any other methods that are able to deal with a secular shift in the strength of causal forcing, short of already knowing the form of the model a priori.

        Any thoughts on the above? I’m certainly less tuned in to these methods than some other folks are, and it may be that there are other methods out there that do a better job dealing with synchrony etc. in complex systems.

    • I try and inundate my students with this. I’ve been thinking a lot about our model of a scientific workflow with respect to data analysis and where does the hard crunching of stats come in. Particularly after reading McElreath’s book, I’ve kind of come around to 1) Dig deep in theory and natural history and derive a question. 2) Based on that question and the system, conceive of how you would model the system. 3) Use that model to design a study – experimental or observational. 3) With that data in hand and your model of the system, fit the relevant statistical model. 4) Query that model to answer your question. This framework embodies a lot of situations, but at it’s core is science in #1, 2, and to some extent 4. The rest is mechanics. So, where do you spend your time?

  6. A thought inspired by Don’s comment above (https://dynamicecology.wordpress.com/2017/10/11/a-novel-check-on-causal-inference-methods-test-ridiculous-causal-hypotheses/#comment-62660): if you think the particular causal inference method evaluated by Chaudoin et al. 2016 is rubbish, well, that’s really a criticism of the field of international relations rather than of Chaudoin et al. Because Chaudoin et al. weren’t knocking down a straw man, they were evaluating a popular causal inference method in their field.

    Which is a sobering thought, if you think that method is rubbish. Because people working in international relations aren’t stupid or lazy or incompetent or ignorant. I’m sure that, as a group, they’re just as smart, hardworking, competent, and knowledgeable as people in any other discipline. So if you think they’re all using a method that’s just rubbish, then you’re admitting that it’s possible for an entire discipline full of smart, hardworking, competent, knowledgeable people to go off the rails and use a rubbish method. This thought should, I think, make you wonder if there are any ways in which ecology has gone off the rails, that are mostly or entirely invisible to ecologists but much more visible to outsiders.

  7. I think my view on using models to make inferences is very similar to Jarrett’s – in any practical setting, it’s impossible to account for all possibilities (maybe more so in ecology where there are likely to be many unobserved possible confounders). I view this as being not too dissimilar from the “no free lunch” problem in machine learning (https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization#Interpretations_of_NFL_results).

    A good measure of utility for modeling papers is probably not whether the tested model(s) fit the data, but whether the model makes testable predictions that can be followed up on in the future or by others. Hopefully the trend towards more open sharing of data and analysis code makes this easier in the future.

  8. One of our greatest spoofs in science, directed at systems ecology, but really at the idea of using ‘statistics’ alone to sort out causal relations is the ‘paper’ by isadora nabi [ pseudo for Richard levins and dick lewontin] called on the tendencies of motion….on what the laws of motion would look like if a’ big data combined with big statistics ‘ approach was used to find them.
    It is here: http://danny.oz.au/danny/quotes/isidore_nabi.html

    enjoy.
    ric

      • Hi Ben; this is worth pointing out, the paper influential, and the paper is available free …click on it at Lipson’s GS profile [http://scholar.google.com/citations?user=F_Go4V4AAAAJ&hl=en].
        BUT one should consider how many physics/math concepts one must put into the program to get the output for the well chosen experimental systems.
        The answer is lots and lots.
        The spoof above lacked all of these; indeed they were all invented/discovered because of Newton’s laws/math, etc.
        I was also struck by how much faster the discovery process was if one began with knowledge about simpler systems as inputs.

      • Of course I was predisposed to like the paper because of its first paragraph, which is here:
        :” Mathematical symmetries and invariants underlie nearly all physical laws in nature (1), suggesting that the search for many natural laws is inseparably a search for conserved quantities and invariant equations (2, 3)”.
        {This is of course just the beginning of what goes into their data search process.}

        More or less describes my own research program over the last 25 yrs. at least for life histories.
        eric

        .

  9. I agree with everything said here about how statistics are not a magic wand. Good hard thinking by good scientists is the only valid alternative.

    But I would like to point out that it is more general than that. NO one method and no one analysis is going to decisively determine causality. It is the accumulation of evidence that is mutually reinforcing that leads to conclusions.

    And in particular its not just statistics that has a hard time with causality. Even the gold standard of experiments have real limitations. OK if you follow Koch’s postulates for confirming which bug causes a disease you’re pretty sure. But that is way more rigorous than most ecological experiments. And maybe even if you do test tube experiments with everything else completely controlled like Gause (or Jeremy) you can have pretty good confidence about the direction of causality.

    But the typical field experiment? They are also at best suggestive about causality. To take a hypothetical example (chosen because I don’t know anybody who does this kind of work so honest its not a specific critique). Say you think seed predators like sparrows influence plant abundance. Set up control and treatment patches with a treatment of a mistnet cage that excludes birds. Say abundance rises in the mistnet cage. Is that suggestive of causality? Yes. But even there it takes biological knowledge. EG maybe it is a bird defecation fertilizer effect (but that should cause abundance to increase). Maybe it is a bird defecation poisoning (too alkaline or whatever). That is logically possible. Not biologically likely. And more plausibly maybe it is nothing to do with granivory but transport by a bird of a disease or arthropod herbivore that is really depressing abundance.

    Causality is an exceedingly fickle pursuit. And outside of extremely rigorous and controlled experiments that are rare if not impossible in ecology, experimentation, like statistical analysis is just suggestive and multiple rigorous confirmatory pieces of evidence are needed to draw a strong conclusion.

    • That’s not a good example, as Grace and Pugesek replied beautifully that Smith et al. were simply not using the technique correctly and had misunderstood the difference between standardized and unstandardized coefficients – Grace, J. B., and B. H. Pugesek. 1988. On the use of path analysis and related procedures for the investigation of ecological problems. American Naturalist 152:151– 159.

      I actually use this back and forth when I teach multigroup modeling, which is super cool if you have a question/dataset where you can use it.

  10. Step one in causal inference is to do “partial identification.” I can’t access the above paper, but from the description, it sounds like the authors don’t do it. In partial identification one goes through many steps to decrease the width of bounds of an effect in exchange for decreasing credibility, most of the methods on this list are way down the list of decreasing credibility. I’d argue that any paper deploying “matching” or “structural equation modelling” without doing some partial identification isn’t really doing causal inference. But that said the results are quite interesting none the less, especially considering a recent paper I contributed to on using causal thinking for more credible conservation projections.

    http://www.sciencedirect.com/science/article/pii/S0006320717305219

    • Ok, I googled and skimmed a set of slides that came up. Interesting.

      I’m reminded of Simon Wood’s idea of partially specified or “semiparametric” models. The idea is that you don’t specify the functional form for any bit of the model for which you’re unsure of the correct parametric form, but instead “let the data do the talking” by estimating the functional form using a smoothing spline or some other very flexible function. For instance, fitting a predator-prey model to time series data of predator and prey abundances, but estimating the shape of the predator functional response with a spline rather than assuming Type I or Type II or whatever and just estimating the parameters of that assumed shape.

      • That reminds me of a half-baked idea I once had – can one do non-parametric SEM with a distance-based correlation matrix (or other non-parametric correlation technique). I still like it conceptually, but it’s still totally half-baked (I admit this!)

  11. Hi all,
    as a user once of SEM and an ecologist point-of-view, I’d like to build on the idea of field experiment/observation and the need to clarify causal assumptions.
    I think SEM can be very useful especially when you want to pool together all the knowledge into the same framework. For instance, we worked on bird migration and you can find a lot of study involving 1 or 2 factors on breeding success or the effect of 1 variable measured in 1 season to another variable in the subsequent season. As Jeremy pointed out, it can be seemed as “you know the causality before using a SEM approach” (here or another discussion about causality and SEM), but I think of SEM as a nice framework to regroup such works and see if everything together is still supported by the data.
    So, we used only observation data, relied on existing 2×2 variables relationship and tried to figure out if it can still fit together. The results were quite surprising first but not much when thinking with a larger perspective that the one ised for 2×2 relationships.
    Here is the paper : http://onlinelibrary.wiley.com/doi/10.1111/oik.04247/full

    It’s not a mathematical paper, nor a theoretical one, but I think it can be an example of what can be done in ecology and we really hope that new dataset will provide a deeper insight in bird migration with a SEM framework, or not.

  12. Bottom line is — all approaches make assumptions, but not all assumptions are equally strong and equally likely to be met or violate in the study and data context or not. In ecology, the norm is not talking about the assumptions and their validity in the context of our datasets and analyses, which I hope is something that can be changed. We should be probing the robustness of our results to those assumptions, and examining the extent they may or may not be reasonable with our data and topic. A big part of this is the design — have we ruled out rival mechanisms that remain unobserved in the error — not the methods themselves.

    Are people here familiar with Ruben’s potential outcome’s framework (which Matt’s post mentions), and designing counterfactuals to rule out rival explanations? This is something that is really emphasized in Econometrics and quasi-experimental approaches, but not widely considered in Ecology (i.e., isolating and estimating a single causal effect, typically the focus of policy analysis). Can anyone here explain how CCM relates to this type of causal inference thinking? I had thought CCM was focused on prediction, which is not the same aim as estimating an unbiased coefficient estimate.

    The Pearl approach uses some of this thinking but ignores the other assumptions of endogeneity and identification that need to be meet for SEM to be causal (as Matt points out). The “back door” is similar to using instrumental variables (with added assumptions, see below). The front door is essentially selection on observables, making the assumption that none of the observables in the model are correlated with anything unobserved, and the additional assumption that all paths (and error) are correctly specified.

    • P.S. To relate this back to the post (hhah), test ridiculous causal hypotheses is one form of a falsification test, for testing robustness of results and their causal interpretation, so I think this is an interesting suggestion and certainly used in other fields like Econ and public health.

      • RE your question on effect sizes: neat paper by Ethan Deyle on how one might go about estimating effect sizes using some of the same theoretical insights as CCM – short answer is that it is a different, but related approach.

        http://rspb.royalsocietypublishing.org/content/283/1822/20152258

        In general, I’d say this method is less focused on identifying unbiased effect sizes, and more focused on dealing with the fact that your effect sizes are context specific (e.g. effect of X on Y will be different depending on the state of Z). But, as a disclaimer, I didn’t work on that paper and don’t totally understand the methods.

  13. Hi everyone,

    Jim Grace pointed me to this blog during an email exchange, and I’m really glad he did! I was excited to see our international relations paper discussed in a different field, and I was struck by (A) the many similarities in the challenges to inference across political science and ecology and (B) all the sharp, challenging comments.

    Two pieces of background are relevant for where I’m coming from:

    First, poli sci scholars are excellent at hoovering up new methodological approaches and implementing them. To use a more ecological analogy, we’re kind of like hyenas. We’re good at ranging around and finding food from tons of different sources. This can be good – it means we’re versatile, and we’re not bound by some of the theoretical dogma of other fields. But it can be bad – we might quickly start deploying some empirical method because it’s shiny, without always interrogating its assumptions and how they relate to the contexts we want to study. (For me, in graduate school, matching/potential outcomes were all the rage, and the former was sold unquestioningly as a solution to selection problems/shared causation/backdoor colliders).

    Second, in poli sci, causality has become a very blurry concept. For some, it means “using potential outcomes notation.” For others, it means “do you have to have an instrumental variable or regression discontinuity approach.” For others, it means “an experiment.”

    I disagree with almost all of these definitions. To me, causality is never a property of an estimator or research design alone. It’s a property of the relationship between the research design/empirical approach and our theoretical knowledge. A cross-tab can be “causal” if you can use logic or ancillary knowledge to convince me that the comparison likely supports the claim you’re making. The shiniest regression discontinuity paper can be not-causal if I can think of problems with the assumptions behind the approach.

    (I was heavily influenced by Judea Pearl’s book on this. In the appendix, 11.5.2, he has a snarky hypothetical dialogue between a PhD examiner and a PhD student who used SEM. It lays out how the persuasiveness of research is a combination of the empirical results and the assumptions that generated them.)

    What does this mean for SEM, since that was an important part of the post and discussions? To mean, SEM is neither causal nor non-causal. What makes it causal or not is how persuasively we can argue “This is why I chose this model… Here’s why I think I’d still reach the same conclusions with a different model… etc.”

    So for me, the biggest strength of SEM is its transparency. We can all look at the model see some of the most important assumptions – namely what relationships are included, and equally important, which ones are left out.

    In poli sci/our paper, we wanted to hammer home the value of transparency. My ideal world would be one where poli sci people were up front about the assumptions behind their empirical approach, so that we can debate them and probe empirical results accordingly. (Lauren Dee made this point eloquently above.)

    Sorry for the mile-long post! I was just excited to read this discussion.

    Stephen Chaudoin

    PS My apologies to Sarah Cobey – that’s a neat paper. I’m sorry that Jude, Raymond, and I hadn’t seen it before ours went to the publishers, otherwise we would have cited it.

    • Thanks for commenting Stephen!

      I’m interested in your remark that poli sci scholars hoover up lots of methods from other fields. I don’t think that happens nearly as much in ecology, though it does happen. There’s a critical mass of ecologists who use SEMs, and ecologists took up meta-analysis from medicine in a big way starting in the early 90s. But instrumental variables is almost entirely unknown in ecology, perhaps in part because finding good instruments is hard. And I’ve never seen an ecology paper use matched observations or differences-in-differences. Causal identification just has never been a big area of emphasis in ecology; I’m not sure why.

      • We’ll eat anything =)

        My hunch on the explanation is that the contexts we study in poli sci are full of interconnected strategic decisions. It’s often hard to tell what’s exogenous and what’s endogenous, so poli sci people wanted to look for estimation approaches or research designs that could help them deal with those situations.

        I think the emphasis on more credible approaches also coincided with a growing discontent with big, garbage can regressions which were prominent in the 1990’s and 2000’s.

        As a discipline sidebar, SEM to many poli sci and econ people means something specific. A lot of people use that term to refer to a set of regression equations derived directly from a game theoretic model, not just a set of equations implied by a graph of relationships.

      • “As a discipline sidebar, SEM to many poli sci and econ people means something specific. A lot of people use that term to refer to a set of regression equations derived directly from a game theoretic model, not just a set of equations implied by a graph of relationships.”

        That’s a big difference from ecology. Our SEMs are basically never derived from an explicit mathematical model. That approach of building a simple model that captures the essence of the problem and then trying to identify/parameterize it from data is standard in economics and other social scientific fields but it’s never done in ecology. Whether it could be is an interesting question.

        In part, this may be down to ecology not having many “stylized facts” that are susceptible to explanation via simple stylized models. Or maybe not, I’m not sure. Some discussion: https://dynamicecology.wordpress.com/2017/01/26/stylized-facts-in-ecology/

      • Thanks, Stephen! Jeremy, I think a key difference here is that randomized experiments (in theory) are a way to get causal estimates, and have a history in ecology but are not typically feasible or ethical in many other fields (but growing in use, for instance in Econ). As a consequence (or at least in the ecological stats courses I’ve taken vs the econometric ones), many of the tools are taught or developed with experimental data in mind. This all changes when dealing with observational data + causal inference in observation data.

        Related, it seems like ecologists are quite focused on modeling error in a way that is more focused on inference, rather than designing analyses to estimate causal and unbiased coefficient effects (like an IV, diff-in-diff or regression discontinuity, and panel regression approaches). Perhaps this is due to a history of smaller datasets?

        So I think it is deeper than finding a good instrument. Similarly, in Econometrics, the first thing that is taught is omitted variables bias and what that does to coefficient estimates. I’ve never seen that or endogeniety and simultaniety discussed in any ecological or biostats course I’ve taken. In my opinion, such considerations (and associated assumptions for observational data) will need to be more discussed in ecology for these methods to transfer over.

      • @Laura Dee:
        “Jeremy, I think a key difference here is that randomized experiments (in theory) are a way to get causal estimates, and have a history in ecology but are not typically feasible or ethical in many other fields ”

        Good point. I’m sure that’s the main reason why a lot of these observational approaches for causal inference haven’t ever been taken up in a big way in ecology. And yes, it’s reflected in the stats we teach. Ecologists teach a lot of GLMs and GzLMs (or special cases thereof, like ANOVA and linear regression), often with an associated emphasis on experimental design. Ok, we ecologists teach lots of other stats too. But that’s the core that pretty much every ecologist gets taught before possibly going on to learn other stuff.

      • Laura Dee – I think your right about the history of experiments. But there are real limits to the small sample, complex, noisy, poorly-controlled experiments in the field that ecologists use. See my comment above.

        Also I wonder about the ommitted variable biases. I seriously thought about teaching them in my grad stats class but I increasingly became convinced that there was not a lot one could do about it, and then sense the biases with many potential unknown omitted variables were essentially centered around zero one could ignore them. I’m sure a lot of econometricians would argue with that. But I think it is interesting how different fields worry much more about different specific violations. Ecologists are obsessed with pseudoreplication and non-normality for example (with some good reasons, but we go way over the top).

      • Hi Brian,
        Yes, I agree with that re: experiments, constraints still make it impossible to do the “idealized” experiment to address causal questions, and they are subject to the same external validity issues too.

        There are approaches for dealing with omitted variables bias, particularly with panel/longitudinal data (I’d be happy to follow up with you on that and talk offline; please contact me!).

        Laura

      • @ Brian:

        Yeah, ecologists worry way too much about conformity with the assumption of normality. More broadly, they worry way too much about conforming to distributional assumptions, relative to all the other statistical things one should worry about.

      • Brian: “Also I wonder about the ommitted variable biases…the biases with many potential unknown omitted variables were essentially centered around zero one could ignore them” This is true but any SE of the effects will be…optimistic. And the million dollar question is “how optimistic?” (how much are we underestimating our SEs)? My little bit of work exploring this (citation above) suggests we are severely underestimating, even using an optimistic model of missing variables. This result really bothers me but I cannot find a good argument against it.

      • Jeff, I agree that omitted variables bias is a huge and under-appreciated issue in ecology. The main issue is not with the standard errors but with the coefficient estimate itself being statistically biased. Depending on how the omitted variable is correlated with the variable of interest, omitting the variable can even lead to the wrong sign of the coefficient estimate we are studying: https://en.wikipedia.org/wiki/Omitted-variable_bias

        De-meaning variables only removes time-invariant differences that are unobserved across units, so it doesn’t completely solve the problem unfortunately.

        I’m interested in reading your paper; could you please send it to me?

      • @Jeff – I agree omitted variable problems are an SE and thence a p-value problem, not an estimation problem. At which point I fall back on the ways in which ecologists use p-values are so wrong that omitted variable problems are probably not the biggest (or just rolled into the general issue of excessive reliance on p-values)

        @Laura – yes you can cook up specific examples where things look very bad. But from a practical matter in the real world, there are often many variables being omitted and we know nothing about their correlation structure. In the end you are essentially back to being unbiased (albeit over confident as Jeff notes). I’m curious what you think of Clarke 2005 The Phantom Menace: Omitted Variable Bias in Econometric Research? I find the arguments pretty persuasive.

      • Hi Laura – Yes that was too hurried a response. Indeed, the collective effects of unmeasured confounders will result in some bias. Here is my point about SE…There will be bias, but we don’t have access to the direction or magnitude. So we have some unknown error of some unknown amount that could be in either direction. So I took the approach of simulating what this distribution of unknown error due to OVB might look like using different models of missing variables, from a few to many, with a distribution of effects expected from what we see in selection studies. It is the standard deviation of this distribution that my paper addresses. I am treating bias as a random variable centered at zero with some unknown standard deviation. Any actual bias of course will not be zero. Email me at walker@maine.edu for the pdf.

  14. Just came across this Twitter thread from developmental economist Chris Blattman, which is in part about why he’s skeptical of instrumental variables as an approach in that field:

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s