Friday links: sea pigs, the challenge of reproducibility, and more

Also this week: lots of advice for faculty, and lots of reasons to quit worrying about whether you’re doing the things you “have” to do to become faculty. Oh, and presentation advice from Goldfinger.

From Brian:

In a really interest piece from rOpenSci, the authors get serious about what full reproducibility of analyses really entails.

And this comic from xkcd has the best one panel description of the magnitude of climate change we are facing that I’ve seen. Although the details are actively debated, it definitely gets the ballpark right. I used a conceptually similar slide in my lectures, but of course this one is much better executed!

From Meg:

Here’s an essay on how to be a good faculty mentor to junior faculty. Among other things, it says “Tenure-track faculty members are pretty clear about what they want: tenure and a life. It really is that simple…The good news is that the pathways to productivity and balance are well-studied and well-documented. The bad news is that it means learning behaviors that are the opposite of how most academics have been socialized in graduate school.” Some of the points in there link with points I raised in my post saying you do not need to work 80 hours/week to succeed in academia. I was recently on a panel of junior faculty, and was pleasantly surprised that all four of us panelists said a 40 hour work week is typical for us.

Tenure, She Wrote had an eye-opening post on what life is like for an assistant professor who is simultaneously trying to get her lab up-and-running, teach, and dealing with her husband’s severe depression. Her biggest regret is that she didn’t open up to her department chair sooner, since her chair was able to help with support, a flexible schedule, and connecting them with resources at the university.

And here’s another important post from Tenure, She Wrote that focuses on the challenges women can face when deciding whether it’s safe to go in the field, and the frustration of having to potential turn down a great research opportunity because of concerns about safety. This reminded me of work by Kate Clancy and colleagues on sexual harassment and rape while doing field work.

Robert Talbert at Casting Out Nines had a post on the four things he wishes he’d known about the flipped classroom before he started using it. It includes that “the flipped classroom entails significantly more work at the beginning than a traditional classroom.”

From Jeremy:

Female Science Professor asks why grant reviewers now seem to expect PI’s with long track records of mentoring success to propose elaborate mentoring plans for their trainees. Anyone else run into this? Because I’m with FSP on this one–given the context, these reviews seem rather unfair to me.

A debunking of the myth that papers in top journals are essential to getting hired and promoted at research universities. Publishing in more highly-regarded journals does help, but there’s no sharp break between a few top journals and all the rest. Not definitive, since there’s always lots of collinearity in these sorts of data, other predictors you could include/exclude, etc. But still: just because lots of people say that papers in top journals are essential doesn’t mean they are! See this old post for more debunkings of academic myths.

Relatedly: in this old post Emilio Bruna notes that separating correlation from causality when trying to predict scientists’ future career prospects is really hard. So don’t get too exited/upset/scared/depressed the next time you hear somebody make some overly precise or overly confident statement about what you “have” to do to have an academic career.

Trendy new news media startup Vox has an interview with Stuart Pimm about currently-high rates of species extinctions and what can be done about them.

How to name animals in German. Step 1: Does it look like a pig? 🙂 (ht Marginal Revolution)

And finally, this is nothing to do with ecology, but I thought the following quote from the piece was really funny. Even though I myself have never felt the urge it expresses! 🙂

There’s a scene in which Goldfinger stages an elaborate Powerpoint presentation to a roomful of henchmen, and then immediately gasses them all to death. As someone who has to give a lot of public talks, I found this progression of events curiously appealing.

27 thoughts on “Friday links: sea pigs, the challenge of reproducibility, and more

  1. That xkcd cartoon has some issues.

    Leaving aside the question of the uncertainty of delta T between the Wisconsin glaciation minimum global T and “The 20th century norm”, these include:

    1. Their ~ 0.84 C increase in global T relative to “The 20th century norm” is exagerrated. That value is ~ what they estimate with their “where we are today” timeline pointer (eyeballing it, 0.75 x 0.25 x 4.5 C = 0.84 C). The 20th century norm, i.e. mean, is slightly below the 1961-1990 climatology, which we are about 0.6 C as of 2012 (AR5 WG1 ch 2, Figure 2.20 & Table 2.7).

    2. Even if we assume a doubling of emissions, and corresponding atmospheric CO2 increase, over the next 86 years, from the current ~ 2 ppm/year, that’s an average of 3 ppm per year over the 86 years, or a total increase of 258 ppm. Add 90 ppm since 1950 and that’s a ~ 350 ppm total increase by 2100. That equates to 5.35 x ln(350/280) = 1.19 W/m^2 increase in radiative forcing. Using a middle of the road estimate of eq. climate sensitivity of 2.75 degrees C per W/m^2, that comes out to about 3.3 degrees C, not 4.5. And again that’s assuming a doubling of global CO2 emissions over that time.

    3. Their lead statement says “…the earth will likely warm by 4-5 C by the century’s end” which sort of implies relative to right now. If you subtract the 0.6 that’s already occurred, then you’re down to 2.7 from right now, so their exaggeration is even larger.

    • Scrap that, I did the calculations for point 2 wrong.

      It should be 5.35 x ln(660/310) = 4.04 W/m^2, where 660 (value in yr 2100) = 310 (starting value, 1950) + 350 (increase).

      More simply, since 660/310 = 2.13 that’s just over a doubling of CO2, so using 2.75 degree C per doubling as the middle road equilibrium sensitivity estimate, that’s going to be about (2.13/2.0) x 2.75 = 2.93 C total increase. So their exaggeration is even worse than I stated above.

      • Hi Jim – fair enough. and I don’t dispute your calculations (and yes I have seen everything from 4-7 C between last glacial maximum and today).But I did say “ballpark” correct. I am pretty sure they were using “2000” as now and “2100” as future which 15 years in is perhaps not correct, but a good deal of the numbers in the literature are centered around these end points.

        As for temperature change, it all depends on the future CO2 emissions, which the economists don’t have a good handle on. But I personally think a pessimistic view is justified. I recall 10 years ago I presented some future projections using the A1F1 scenario (because when I read the descriptions of scenarios it sounded most likely to me) and got told I was grandstanding and taking the extreme scenario. But unless I’m mistaken, we’ve actually done worse (more CO2 emissions) than the A1F1 in that time. And I haven’t seen rapid action taken by treaty nor have I seen any hope that China and India are going to take a less fossil-fuel based path to economic well-being than US and Europe did (which is a core assumption of the traditional moderate CO2 scenarios) to inspire me that we’re now going to snap back to a more moderate scenario. I increasingly think the high end of the original IPCC numbers 2-5 degrees C between 2000 and 2100 look most accurate.

        But in the end the reason I like the graph is it puts numbers whether it is 2 or 4 degrees C which are so hard to get a sense of the impact into context. My bottom line is that in the next 100 years we will see change comparable to somewhere between 50% and 100% of the change since the last glacial maximum when where many of us live we would have been under 1-2 km of ice. And that where we’re headed in 100 years is ballpark at least half way, maybe more, to the highest ever documented temperatures in the early Cenozoic hothouse. Or 100 years of forthcoming change =50-100% of the difference between ice age and now, and equally 50-100% of the difference between now and palm trees above the arctic circle. And how come we only ever talk about 100 years – it may be the only economically relevant scenario, but things look a lot worse in 200 or 300 years which is a blink of the eye in earth systems and ecology.

        The last paragraph, I think, is the level of communication we need to have to be having impact on the public debate, not debating 2.9 vs 3.3 and 2000 baseline vs 2015. And, coincidentally, that is also I think the level of accuracy we have as scientists (building on top of economists and paleo data). I am pretty sure that your temperature of 2.93 degree C has at least two too many significant digits (if you build in CO2 and paleo uncertainties)!

      • Thanks Brian.

        Out of time right now, but there are so many really important concepts embedded in that cartoon’s message, that I’m going to either write a long response here, or post specifically about it at my blog. I believe it’s only even “ballpark” correct if you assume really high emission rate increases over the next 86 years. Which are possible but it’s still an assumption the cartoon didn’t spell out. More later I hope.

      • Hi Jim,

        Without wanting to dispute your calculations, I think you may be expecting more precision and rigor from a cartoon than can be expected. Just my two cents.

      • HI Jeremy–I can see how you might think that because I led with the least important of my points. But there are real issues here. Not the least of which is if you aren’t able to include info on critical assumptions, then you probably shouldn’t try to cartoon it, because you’re going to mislead people. Look at the number of people who “re-tweeted” and “favorited” that–lots of wow-ing and high-fiving going on there. As Brian said, the emissions scenario assumed is huge in these predictions, and they completely neglect any mention thereof. There are some other things though. They’re making definite claims there on some very major concepts.

      • I just felt like it was clear enough that the cartoon was based on a “business as usual” scenario, Jim. Plus, much of xkcd’s stock in trade is back of the envelope calculations. I guess I felt like it was clear enough that it was a back of the envelope calculation, and that as such it was fine.

      • Jim – I guess my key question (which I am looking forward to your opinion on in your detailed response) is “the emissions scenarios are huge” in what sense?

        The only sense I know of is they are outside the center and at the high end of the range of scenarios economists cooked up circa 1996 and IPCC adopted in 2000 and climate scientists have kind of considered the gold standard. But all of the middle of the road scenarios assume “the rapid introduction of new and more efficient technologies”. They also have emissions peaking at the same time as population peaks in ~2050, which watching China (and India and Brazil which together equals >1/3 of the world population) I just don’t see happening – those countries are still going to have rapidly growing per capita GDP in 2050 and only an extreme optimist would now predict that 35 years from now renewable energy will be as cheap as fossil fuels (in both operating and capital investments costs) such that fossil fuels will not be be a big part of their per-capita growth. Those scenarios also assume a political will to address CO2 emissions which we now know seems completely lacking. We are now almost 20 years into these scenarios, and it is time to revisit them. As I noted, over the 2000-2009 period, we emitted CO2 at the rate of the worst case, business as usual, A1F1 scenario, and that is only by the “luck” of a massive recession in 2008-2009 that noticeably reduced energy consumption in the endpoint years (

        I personally, think it is irresponsible as a climate scientist to continue to embrace 1996 middle-of-the road scenarios as dogma given the intervening reality.

        But I know you have thought about this a lot more than I have, so I am really genuinely looking forward to hearing your opinions on this.

      • Thanks for asking Brian, because I worded that badly. In that sentence, by “huge” I meant that the importance of the assumed CO2 increase is a huge factor when computing the expected future T change by 2100 (i.e. agreeing with your point at x:58 yesterday), not that the emissions themselves were huge.

        Having said that though, it is possible (probable I think), that the cartoonist is in fact assuming that emissions themselves will be what I will call “huge”, relative to what they are in 2014. The other two major possibilities are (1) that they are assuming an ECS > 2.75, or (2) they are assuming a very positive carbon feedback that will manifest itself before 2100, such as a CH4 clathrate or permafrost melt. But I doubt that they’re doing either of those, because those are not really consensus views per AR5. But one of the three has to hold because that’s the only way you can get to a 4-5 deg C increase by 2100, either from 1950 or 2000.

        The actual RCP followed to 2100 is indeed hugely important, I agree with that. Keep in mind though that in the AR5, even RCP85 (8.5 W/m^2 GHG forcing), the most extreme of the four RCPs, generates a range of only 2.6-4.8 degree C increase by 2100.

        Anyway, this discussion is a very good thing because it forces me to lay everything out very systematically and I’m not sure I’ve ever done that, and I also don’t remember a detailed climate change discussion here before either.

      • I agree Jim – I have learned a lot from this conversation with you. And I agree with you that his 4.5 is hopefully based on high CO2, not something more tenuous (and I agree that it would be nice if the assumptions were documented somewhere outside the cartoon, but I don’t think he does that).

        Something I didn’t notice on previous visits to the cartoon is that the mouse over notes we could keep the rise to 2C with aggressive action. So a lot of our discussion about what to realistically assume about human behavior is built into the cartoon in a subtle (too subtle?) way (although fans of xkcd know that sometimes the mouseover is more important than the original cartoon).

        I just grow increasingly, depressingly convinced that some of the assumptions of action (technical and political) built into the most commonly used scenarios are no longer realistic and fear a lot of people don’t know those assumptions are part of the scenarios.

      • Brian, I spent a good chunk of yesterday coding up (with notes/comments) some basic radiative forcing expectations based on the four CMIP5/AR5 RCP scenarios and standard GHG RFs for transient and equilibrium sensitivity. And also reading various parts of the AR5 WG1 report to check on various assumptions made in the CMIP5 modeling and etc–lots to check on, especially since the effect of various assumptions on the accuracy of claims is really the main point I’m raising here. And this particular cartoon is just a jumping off point for that.

        As for the socio-economics of energy policy and changes thereto, I doubt that I have any more knowledge or optimism on that issue than you do, maybe less in some respects, due to worrying considerations that stem from the difference between transient vs equilibrium T sensitivity. I’m mostly interested in, and focusing, on the expected effects, physical and biological.

  2. Re: that rOpenSci piece Brian: how much does it worry you that an ecologist like me has only a vague idea what they and the many commenters are talking about? knitr, caching of dependencies, version control, github, continuous integration, packrat…not only have I never done/used any of those things, I don’t even have much idea what many of them are.

    • It doesn’t worry me that an ecologist such as yourself doesn’t know much or anything about these things. (I lament that these things weren’t taught more widely at grad school in the past, and think it’d be wonderful if more ecology profs knew about this stuff.) I’d be more worried if ecologists in grad school now aren’t at least familiar with version control and embedding reproducible research methods into their workflows.

      Once you get past some of the terminology, most of those things are just automated ways of recording stuff about your work (settings, options, etc) and recording how you changed things and why.

      • “I’d be more worried if ecologists in grad school now aren’t at least familiar with version control and embedding reproducible research methods into their workflows.”

        Then you should probably worry. If most of their supervisors don’t know about those things, their students mostly won’t learn them. Well, except the most computer-savvy ones, who teach it to themselves.

        Don’t get me wrong, times are changing, and I’m sure the up-and-coming generation of ecologists will know more about these tools than the current generation. But I think it’ll be a long time before all the stuff discussed in that post is a routine, universal part of graduate training in ecology.

      • Perhaps, but there are far more opportunities now for grad students whose mentors don’t know about these things to attend workshops, take online courses or even fly in some Software Carpentry people to train en masse.

        (and I do worry about this 🙂

    • Halfway worried?

      Caching of dependencies and packrat both relate to the idea that if you try and rerun something 5 years down the road you might have new verisons of the software installed that could in principal give you different answers. While this is a nice to fix, I don’t think its a must fix.

      Version control & github both refer to sharing your code to allow others to replicate the results form the code. This I think is important, but there are a lot of less fancy ways to share code than vc (of which github is an example). Supplemental online files in journals is an example. To me version control is much more about a tool when you have many people working on the same project, but it has gotten kind of wrapped up with the idea of open code, which is a bit of a confoundment in my mind.

      knitr is a tool that lets you wrap narrative text (and formulas) with R code and have the results of the R code automatically inserted in the document. This is not central to my idea of replication (although if you really think your submitted ms should be able to rerun the analyses w/ a push of a “buttton” in the ms you need it).

      To me the core essence is that you ought to have a program/script that can go from raw input files (e.g. what you download from the internet or the first order typing in of lab/field notes) to the data that goes in the final tables and to the final graphs. This way you fully document and can reproduce and can share with others any data cleaning you do, any analyses you do, and any transformations, subsettting etc for figures. Writing this single script and putting it in the online materials would make me completely happy (which is not to say the extra things they did aren’t nice or important too, but I’m a realist about what progress could happen and what won’t). But even my basic requirement is rarely met.

      • “To me the core essence is that you ought to have a program/script that can go from raw input files (e.g. what you download from the internet or the first order typing in of lab/field notes) to the data that goes in the final tables and to the final graphs. This way you fully document and can reproduce and can share with others any data cleaning you do, any analyses you do, and any transformations, subsettting etc for figures. Writing this single script and putting it in the online materials would make me completely happy (which is not to say the extra things they did aren’t nice or important too, but I’m a realist about what progress could happen and what won’t). But even my basic requirement is rarely met.”

        That sounds reasonable to me. Although I don’t think I’ve ever lived up to it myself! I just failed to live up to it this morning, in fact. We are all sinners, as the saying goes.

      • Oh and forgot to mention (before somebody else does) that version control also has the nice benefit of , well version control. As in save one version of your code that matches what you used on your first presentation, one version that matches what you used in your initial submission, and one version that matches your final submission of a ms. Version control is rather elegant at doing this (and has tools to compare to see what is different etc). But personally, I have to confess (I’m losing cred with the ecoinformatics community at a rapid rate here) that I don’t find the overhead of version control for this purpose alone worth the trade-off. I can just save a copy of my script as thescript_Jun14.R and then another version as thescript_first_submission_Aug18_2014.R, and then thescript_final.R The only time I personally find version control worthwhile is when multiple people are changing the code at the same time (it documents who is working on a file at a given time, lets you recombine different peoples edits, lets you split off branches on your code for software release vs maintenance, etc). But don’t tell anybody I said this or I’ll never be able to show my face in public at an ecoinformatic gathering 🙂

      • Supplementary material is a terrible place for code to languish. More often than not people and journals just don’t do this right; Word files full of R code for example! It’s also really difficult to build off code or keep it up to date there; whilst you may have a record of the code that was actually run that doesn’t help you if that code no longer runs and most journals don’t allow the supplementary materials to be updated.

        Version control is just as essential for any people working on code (well anything that is in ASCII/textual files), even if it is just one of you. It forces you to document and account for changes and it provides the track record of who you developed your analysis and made changes to your script/document/etc as you work on.

        People are taking a broader definition of reproducibility than just “here is a document with code go re-run it”. Version control lets you record the decisions leading up to the final code used in a paper and any changes made subsequently. These tools won’t allow one to reproduce the entire endeavour, but if used from the get go will cover a good part of the process. Good lab and field notebooks would go a long way to covering the rest.

      • “But don’t tell anybody I said this or I’ll never be able to show my face in public at an ecoinformatic gathering”

        Apparently it’s a good thing hardly anybody reads the comments. 🙂

      • @ucfagis – I don’t entirely appreciate your tone. I have used version control software since 1989, 25 years, probably before some of the people reading this blog were born. And as a development manager I used to force people to use version control software appropriately when they thought it was a waste of time.

        So my opinion is not an uninformed one. It is just different than yours. And I explicitly acknowledged that it wasn’t a universal opinion in a rather light hearted fashion and owned it as just my own opinion. Which is something you don’t seem willing to allow room for.

  3. A meta-thought: I think this thread is a good illustration about how everybody has their own hangups. Jim is annoyed enough with xkcd, and thinks the issue important enough, that he’s planning to write a long post on it. Whereas Brian and I are much more sanguine. Brian’s worried about the issues raised in that rOpenSci piece, and ucfagls is very worried about them, but again I’m fairly sanguine. And of course, there are other things that I worry a lot about that haven’t come up in this post (zombie ideas, researcher degrees of freedom…), which others are much more sanguine about.

    And while each of us can offer good reasons for worrying about the things we worry about, it’s much harder to make the case for worrying about those things *more* than the things other people worry about. For instance, natural history-oriented folks argue that ecology grad students would be better off if they learned more natural history. Computer-oriented folks argue that ecology grad students would be better off knowing more programming. And they’re both right! Ecology grad students would indeed be better off if they knew more about both of those things, and many other things too. But there’s only so much time available for training. And nobody’s arguments actually tell us how to spend that time, because nobody’s arguments tell us about trade-offs and opportunity costs. As far as I know, nobody who laments the decline in natural history training ever tries to weigh the costs of that loss against the benefits of increased training in things like statistics and programming. Conversely, nobody who argues for training ecology students even more heavily in statistics or programming ever tries to weigh the benefits of that against the costs of further-reduced training in natural history.

    I freely admit I have no answers here! Arguing for the optimal allocation among various alternatives is really hard. I guess I just wish the issue was explicitly acknowledged more often. At least in pieces in which its argued that we ought to allocate more of some limited resource (class time, funding agency budgets, whatever) towards X, and in which the alternatives to X are fairly limited in number.

    • You captured it well Jeremy. In a perfect world my graduate students would know everything about everything! Including how to use github, do a hierarchical mixed model, identify birds, plants and insects, put on a radio collar and run a releve, program preferably in several languages so as to use the best available language for each job, and know every paper in their subdiscipline for the last 10 years. But time is limited, we have to make trade-offs.

      And I for one am glad the field of ecology is populated with people who make different choices along the trade-off axis. I believe it definitely benefits the field as a whole and makes people happier.

  4. Pingback: Global temperature change computations using transient sensitivity, in comparison to AR5 | Ecologically Orientated

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.