Ordinarily I’d save this for a Friday link, but it seemed sufficiently bloggable that I’m giving it its own post. Here’s econometrician (i.e. economics statistician) David Giles’ list of 10 things for econometricians to keep in mind when they analyze their data. I think it’s a really good list for ecology too.
- Always, but always, plot your data.
- Remember that data quality is at least as important as data quantity.
- Always ask yourself, “Do these results make economic/common sense”?
- Check whether your “statistically significant” results are also “numerically/economically significant”.
- Be sure that you know exactly what assumptions are used/needed to obtain the results relating to the properties of any estimator or test that you use.
- Just because someone else has used a particular approach to analyse a problem that looks like yours, that doesn’t mean they were right!
- “Test, test, test”! (David Hendry). But don’t forget that “pre-testing” raises some important issues of its own.
- Don’t assume that the computer code that someone gives to you is relevant for your application, or that it even produces correct results.
- Keep in mind that published results will represent only a fraction of the results that the author obtained, but is not publishing.
- Don’t forget that “peer-reviewed” does NOT mean “correct results”, or even “best practices were followed”.
Economist Mark Thoma adds a couple more:
Don’t take econometric techniques in search of questions. Instead, start with the important questions and then develop the econometrics needed to answer them.
Model the process that generates the data.
I’ll add a few more:
- If any of your data-analytic choices are data-dependent, there’s a very good chance you’re compromising the validity of your inferences. Best practice is to pre-specify everything. (This may be a restatement of Giles’ #7, but I’m not sure because I’m not sure what he means by “pre-testing”)
- Multiple comparisons are a real problem.
- The best analysis is the simplest, easiest-to-understand analysis adequate for addressing the question asked. (I think of this as Brian’s “no statistical machismo” rule)
- Overfitting is just as bad as underfitting.
- These guidelines can conflict. For instance, the desire to model all of the processes that generated the data is one source of statistical machismo. And it’s not always obvious how to apply these guidelines. For instance, in ecology one can often dream up a plausible-sounding post hoc explanation for any statistical result, thereby rendering David Giles’ guideline #3 useless. This means that doing statistics well is a matter of making judgment calls about how to apply the “rules”, and about what rules to break. Good judgment is built a foundation of both technical knowledge and experience.
So, what would you add to this list?
HT Economist’s View