Friday links: the culture of dish-doing, death to inferences, #hipsterscience, and more (UPDATED)

Also this week: the mission (impossible) of public research universities, who cares if the evidence for X is “growing”, journal editor pet peeves, timely advice for grad school recruitment visits, and more.

From Meg:

This is a fantastic fake article on tearoom culture and cleanliness, purported to be by Lotta Washinup and Duya Dishez, but actually by Jennifer Upchurch (@jamandcrumpets). She has posted the pdf in case you need to do a little passive aggressive prodding of fellow tea/break/coffee room users to encourage neatness. :)

Psych journal bans p-values. (ht: Tuomas Aivelo) (Jeremy adds: Hey, that was my link! [grabs link from Meg] See below for my comments.)

Eugenie Clark, an expert on sharks, has died at the age of 92. She did her last dive when she was 88, which is mind-blowing to me!

From Jeremy:

UPDATE: Just found this one and it’s so good I couldn’t wait until next week to share it. Loretta Jackson-Hayes nails it: why liberal arts training is so valuable for scientists.

Scitrigrrl with a nice post on how her third year on the tenure track is kicking her butt–and what she’s doing about it.

Current ecology grad students on questions they wish they’d asked during grad school recruitment visits. Glad they have some advice, because I don’t have any. I visited prospective supervisors individually; their departments didn’t have recruitment weekends as far as I know. I hadn’t realized until recently just how unusual this makes me.

The journal Nature is offering authors the option of double blind peer review. Hard to say how much difference it will make, since it’s not a controlled randomized experiment. And since it’s optional, the mere fact that an author’s chosen it might provide some information to reviewers as to the author’s identity or attributes.

Speaking of peer review, Brian Enquist (an editor at a couple of ecology journals) has some good advice for reviewers in the form of “editors’ pet peeves”.

Scientists often write that a “small but growing body of evidence” suggests X. I’ve probably done it myself at some point. But as Andrew Gelman points out, it’s odd phrasing. Shouldn’t we only care about the current state of the evidence, not about whether there’s more or less evidence than there used to be? And if we do care about the rate of change in the amount of evidence, how come you never hear anyone refer to the “large but shrinking body of evidence” for X, or the “small and unchanging body of evidence” for X, or etc.? I think Andrew’s right to be suspicious of the phrase–it often just functions as a way of making the existing evidence for X sound more compelling than it is.

In what sense is undergraduate teaching “central to the mission” of flagship public research universities in the US? And if you wanted to make it more central, what would be the most feasible way to do it? Of course, given that the “flagship public research universities” are a tiny fraction of all institutions of higher education in the US, and that different institutions have different missions, I don’t know that we should want to greatly adjust the mission of flagship public research universities.

Rich Lenski continues to answer my questions about his long-term evolution experiment. Thanks Rich!

As I feared, people are waaaay overinterpreting results from text mining Rate My Professor reviews (warning: it’s clickbait, don’t bother). Mathematician Courtney Gibbons reminds you to stop and think for two frickin’ seconds before you text mine Rate My Professor for anything other than entertainment purposes.

High achieving low income high school students often have misconceptions that prevent them from applying to selective colleges that would give them generous financial aid and offer the curricula and peers they seek. (ht Brad DeLong)

This week in Huh?: a social psychology journal is banning inferential statistics. As in, all inferential statistics–any attempt to make an inference from sample to population. Which leaves it hilariously unclear why they also will be expecting authors to have larger sample sizes than in the past. Click through in the same spirit that you would rubberneck a car crash. On the other hand, I suppose you could argue that it doesn’t do much harm for a single specialized journal to undertake a radical experiment (better them than Science). There’s value in crazy experiments that aren’t likely to work. On the third hand, there is such a thing as too crazy. And it’s not as if the world lacks for debate about appropriate statistical practice, so I don’t think this experiment has much value even just as a way to prompt discussion. Anyway, the first effect of the new policy has been to cause the entire internet to badger Andrew Gelman for his opinion. If this was trolling, it sure worked! (ht Stephen Heard)

A while back I was invited by a colleague to submit a paper to a special feature in Frontiers in Microbiology. At the time, Frontiers was a newish open access publisher I didn’t know anything about. Based on my mixed experience with them, I wasn’t sure whether to publish with them again. Now I’m sure I wouldn’t, and in retrospect I regret doing so. They’re a poor operation at best. For instance, they’re happy to make money off HIV denialists and to have editors double as peer reviewers. Further troubling examples in the post and comments here. Yes, these are only anecdotes and I’m sure there are negative anecdotes about all publishers. But with Frontiers, the nature and frequency of the negative anecdotes makes me uncomfortable. And Frontiers is old enough now that “growing pains” and “they’re experimenting and trying new things” are no longer excuses, if they ever were. I wouldn’t go so far as to call them a predatory publisher, and I recognize that some reputable academics have had positive experiences with them. But given that Plos One, Ecology and Evolution, and other reputable open access journals focused primarily on technical soundness exist, I don’t see any reason to consider publishing in Frontiers. Just my two cents.

#hipsterscience. “I use Excel for graphs, but I do it ironically.” :-)

And finally: how to get people to open boring work-related emails. Too bad this trick only works once (ht Brad DeLong). :-)

Hoisted from the comments:

Jeff Ollerton with some sensible advice to anyone just starting out as a blogger: it’s going to take months (at least) to build an audience, even if you post lots of good stuff.

The secret recipe for successful working group meetings?

As Meg noted recently, science is increasingly involving working groups. This is the big science aspect I discussed a while back in a lengthy footnote (and distinct from big data). Although the original (at least in ecology) synthesis center at NCEAS is no longer funded by NSF (but still very much alive funded by conservation NGOs), there are three other synthesis centers in the US (NESCent, NIMBios, SESynC). a somewhat differently functioning synthesis center iPlant, and centers in Canada, Australia, France, Germany and many other countries (http://synthesis-consortium.org/). And I’m increasingly seeing work done in “working group format” even when it is not tied to a center. The NSF RCN  (Research Coordination Network grant program) is an example but quite a few PIs on regular NSF grants or NGO/conservation grants are also choosing the working group format.

I am a self confessed working group junkie. I take (undue?) pride in the fact that I’ve been to groups at all five US centers (and led working groups at two of them), been part of an RCN, been to meetings at Germany’s sDiv and although not an official synthesis center part of the UK’s Kavli meetings, and will be at Canada’s CIEE in May  and if funded at CESAB in France soon. That leaves Australia as the only big miss on my list (at least for an ecologist), and I did participate in an NGO-funded working group remotely in Australia as well.

Working groups are a personal preference. Some people like them more than others. And some people are better at being part of them than others too! There is no best way to do science. But I think they’re a great format for doing a number of things including – addressing both sides of a debate and building consensus, reviewing a field, doing meta-analysis or assembling and analyzing large datasets, and coalescing ideas and energy at key points in the trajectory of a field (including at its launch and at its coming down from bandwagon status). Certainly they have been influential – NCEAS is one of the most cited institutions in ecology.

But working groups are not a format people are trained to work in, let a lone lead. Our whole PhD is focused on primarily solo work with a few interactions. Most “regular” papers are 1-5 people. Then we throw people into a group with 15-20 people and social dynamics that are an order of magnitude more complex with no training. What follows is my distillation of the key success factors of working groups. They do not unfortunately, despite the title, come together into a magic recipe that guarantees success. And there are of course some variation depending on goals. But in my experience, if you get all of the following ingredients you’ve got a good shot at success.

During the working group proposal process

  1. Group composition #1 – personalities matter – Working groups are first and foremost social enterprises (I will be repeating this sentence several times). And with the competing challenges on everyone’s time and only having a week to pull things together, you are on the edge of failure right from the start. So it may be tempting to get the biggest name in the field, but if they’re a colossal ego who doesn’t play well with others avoid the temptation. One bad apple really can spoil the barrel. Indeed only invite people that you know either personally or indirectly through a colleague to be a good collaborator. Twice I’ve been part of groups where the goal was explicitly to bring in people from opposing camps – but even here considerable effort was expended to only bring people in who could be part of a civil give-and-take dialogue and some of the extremists were intentionally left out..
  2. Group composition #2 – career stages – In my experience the ideal working groups has  a pyramid shape with the largest group being postdocs, the next largest group being early tenure track, and a much smaller sample of senior ecologists. I’ve never actually seen a truly pyramidal group, maybe a more realistic goal is rectangular – with equal representation of postdocs, early career, and senior. But definitely think about this.
  3. Meet for 5 days per session – There is a wide variety of opinion on this. And I’ve been part of 2 day meetings that are successful. But if you’re going to fly in people form around the world who are giving up 2-3 days to travel and jet lag, why would you meet for less than 4-5 days? Also in my experience it really does take that long to allow some of the and social processes and buy-in to a common goal to take place. It may be efficient to have small subset groups that meet shorter periods (or extensions to the 5 days).  And if everybody already knows each other so the social processes and goals are well worked out, sometimes fewer days works. But in most cases 5 days is an optimal number in my experience. And if people can’t commit the 5 days, they’re not going to be a big contributor anyway. The working group process is a slow one. There are many other advantages, but speed is not one.
  4. Who will do the work between meetings? – This is one of the motivations for #2 – everybody will leave a group meeting with good intentions. But who will actually spend more than 5 hours moving the project forward (i.e. doing data, simulations, analysis, writings)? If the PIs of the working group aren’t going to do this (and if they aren’t prepared to do this they probably shouldn’t be the PIs) and there aren’t any postdocs looking for good projects then odds are nobody will do this. There are some exceptions I’ve seen, where say the goal was a meta-analysis and during the meeting everybody was assigned say 10 papers to code before the next meeting. This very discrete chunk can be expected between meetings. And I’ve seen plenty of meetings where somebody uplanned stepped up to carry a load (but they were almost always postdocs or occasionally early career).

Beginning of meeting

  1. Do a Powerpoint death march on the first day – This is my tongue-in-cheek name for the idea of lettting everybody at the group stand up and give a presentation about their work related to the topic. This is oft-debated with many arguing it is a waste of time. But in my experience if you don’t give people a clear window to get their opinion out, they will spend the whole rest of the meeting slipping it in edgewise. I have seen this happen more than once and it can be really annoying when the whole group is converging and somebody is knocking on about their preconceived idea of how to do it – better to get it out of the way on the first day. It is in the long run more efficient to spend a day doing this. That said, the PIs can make this efficient or painful. Give people very clear instructions on what you wan them to present on. And give them absolute time limits (typically 10 or 15 minutes). Then ENFORCE the time limits rigidly. Conversation about a presentation is fine to run over a little since conversation is the point of a working group. But DO NOT let anybody deliver a monologue one minute over their planned time. This only needs to be done the first time a group meets.
  2. Do a regrouping and group agenda setting after the Powerpoint death march – After everybody has been heard from spending some time setting the agenda for the rest of the time. Many times the PIs will have a pretty clear idea. Other times, the goal really is to brainstorm the agenda together. But either way put it on a white board and talk it out a bit as a group and be open to changes. This will get you buy-in and understanding of the agenda. It will also get you the sum-is-greater-than-the-parts synergy that you are hoping for from a working group.
  3. PIs need to take their role as cruise director seriously – Working groups are first and foremost social enterprises (I promised you that idea would come back). I’ve never seen a successful working group that didn’t spend a lot of time going out to dinners. The PIs need to take the lead to make sure that these are organized by early afternoon so everybody knows and they need to set the example that this is an expected activity. There is an age old debate amongst group members who want to go to dinner right after work stops and those who want a couple of hours to go exercise first. Some compromise is needed. Some of the best working groups I’ve been part of have actually knocked off early one afternoon and gone for a hike or field trip. It might seem a waste of time, but trust me it pays off
  4. Lead a discussion about authorship expectations early – There is no right or wrong answer about who should be a co-author on papers from the group. But setting expectations in a group discussion up front is essential. Most groups I’ve been part of have decided that everybody present should be part of the core synthesis or review paper(s). You want to create an attitude where everybody is chipping in and not holding back their best ideas. Authorship is the best way to do this. Authorship rules on more subisidiary papers varies, but it should be collectively agreed up front.

Middle part of the meeting (e.g. days 2-4)

  1. Do the work – this is of course the end goal. But its the hardest to give generic advice about because the nature of the work varies. It may be finding and coding papers for a meta-analysis or assembling data sets. It might be a fairly large group discussion about consensus state of the field. It might be simulations. It might be a mixture of these things. But it probably occupies the bulk of the meeting – especially the middle days. And it probably involves breaking out into subgroups with different tasks or topics to cover.
  2. Regroup once or twice a day – even if much of the work will happen in breakout groups (and it almost certainly will) – bring the whole group back for 30 minutes before lunch and 30 minutes before dinner and have each group report in. This keeps everybody rowing in the same direction. It is also where much of the magic of working groups happens as recurring themes and areas of disagreement emerge.
  3. Follow a diamond-trajectory – This is true really of any brainstorming or group process. The goal in the beginning is to broaden out – open up minds, create crazy ideas, capture every thought. Then when things have gotten impossibly wide, it is time to pivot and turn energies into focusing and narrowing down. A key to a good working group is for the PIs to have the nerve to let things broaden out for a while (often several days)  and then have the leadership to firmly reign it back into a focus.
  4. Know when to force a turning of the corner to writing – closely related to #11. In no case should you start writing immediately. And one or two people will probably do the bulk of the writing probably after you go home. But you should definitely start writing (or at least detailed outlining) before you scatter. You might even assign sections and end up writing a whole draft while you’re at the working group. But this is another key decision point for the leaders – when to stop the talking/analyzing and start the writing. It should start (again at a minimum to outline stage) before you leave.
  5. Pace yourself – it is tempting to treat the working group time as so precious that you should work 12 hour days. But this is a big mistake. Aside from the great importance of social bonding (#7), you are doing a creative activity that requires fresh bright minds. Many of your members will have flown 12-24 hours to get there and be jet lagged. And the rest will be exhuasted by an intense pace long before the week is over. I’ve personally found that keeping the working group to 9-5 with at least an hour for lunch (plus joint dinners that are social) keeps things productive through day 5 while anything more leads to severe drooping by the end.
  6. Manage the email and phone calls – everybody will want/need to keep up on email and may make an occasional phone call to their lab managers, other collaborations, etc. In my experience the best way is to tackle this head on by building in time for it and then putting out a pretty clear expectation to be fully focused on the meeting the rest of time. I usually allow 60 minutes for lunch (this is a social enterprise …) and then a good 30-45 minutes immediately after lunch for phone calls and catching up on email. This way people can run a little long on lunch or end a little early and have more time for email as they wish. And you can expect (and demand) full attention the rest of the time.

End of the meeting (e.g. Day 5)

  1. When the meeting really ends – If you tell people the meeting ends at noon, they will book flights out at 9. If you tell people the meeting ends at 5, they will book flights out at 12 or 1. So tell them it ends at 5 and secretly (don’t let on your real plan) know that you really will end at 1:00PM. But don’t forget that long distance travellers will usually not fly out until the next day. You can still get some work done, and have one last dinner. You just won’t have everybody. As a PI you should definitely plan to stay until the day after the meeting is officially over and lead this tail end.
  2. Leave with clear assignments  - well before people start peeling out – i.e the morning of the last day – put a list on the projector or white board of tasks, deadlines and 1-2 names attached (5 names attached is the same as no names attached). Discuss this with the whole group.
  3. Accountability – Find a way to keep work flowing between meetings. Emails with reminders of tasks is a good way to do this. Circulating draft versions of papers or working versions of datasets is a good way too. In my experience scheduling a monthly phone call is also a good idea. Having somebody setup to be a “nagger” (either a PI or a postdoc) who keeps track of timelines is important too.

So – being a good leader of a working group just requires staying on top of 17 different things! If it sounds like leading a working group is exhausting – it is! Being a participant at a working group is exhausting, but being a leader and riding herd on the whole process is a whole other level of exhausting.

Obviously my 17 points are not a magic formula. Its just the wisdom I’ve pieced together over a couple of dozen working group meetings. And a couple like #11 and #12 require serious judgement on the PIs part – all I can do is highlight the question. And some will disagree with my list – I know from discussions I’ve had #3 and #5 are definitely not universally agreed upon.

What are your experiences? What are the ingredients in your secret recipe to a successful working group? What works and doesn’t work?

How do you make figures?

Continuing on my stats and figure theme from last week, I’m curious as to how most of our readers make figures. I drafted this post before those posts appeared, and had no idea how common my approach of moving figures from a stats program into another program for final touch ups was. It seems like something that people mostly don’t talk about, though the few times I’ve mentioned doing that to other scientists, they’ve generally been really relieved to hear they weren’t the only person who does this. The comments on those posts from last week suggest there’s a lot of variation in how people do things, but that it might be pretty common for people to do some or a lot of processing of a figure in a program like Powerpoint or Illustrator.

So, I think it would be interesting to poll our readers. As an example, here’s a figure from a recent article by Wolak, Roff, and Fairbairn, entitled “Are we underestimating the genetic variances of dimorphic traits?”:

Wolaketal

(I have no connection to this paper – I picked the figure because I think it shows the sorts of data that ecologists often want to plot, and has multiple panels.)

Let’s assume this was your manuscript and you wanted to make this figure. How would you do it? (Feel free to ignore panel B if that helps. Here, I’m interested in how you’d plot data you collected.)

 

On to the next question:

*I’m just going with options I can think of people mentioning off the top of my head. I’m sure I’m forgetting some. When I was first coming up with the list, I thought of CricketGraph, then realized that’s more than a tad outdated.

 

Finally, let’s assume a reviewer asked you to remake the figure excluding all the data from one of your study sites. How long would this take you (assuming you would be doing this 6 months or so after making the original figure)?

 

I will be very interested in seeing what people do!

 

How often are ecologists mentioned in the news?

More specifically, how often are they mentioned in the NY Times? Turns out the Times has an online tool to tell you! It gives you a time series of the % of all NY Times articles containing a specified word or phrase. Here’s the time series for “ecologist” (which seems not to be case sensitive):

ecologist

The general trend is upward, though with ups and downs that may or may not be interpretable.

Ecologists are mentioned less often than members of many other scientific fields. We’re about on par with or a bit above “geneticist”, “geologist”, “demographer”, “paleontologist”, and “linguist”, but dwarfed by “biologist”, “sociologist”, “anthropologist”, “psychologist”, “historian”, and especially “economist” (interactive graph).

You can also look at the names of fields, which reveals rather different patterns. For instance, mentions of “ecology” in the NY Times skyrocketed starting in the mid-1960s and peaked in 1972, at the height of the US environmental movement. Then they declined precipitously, rebounded a bit in the runup to the 20th Earth Day in 1990, and have held steady or drifted downward slowly ever since. But mentions of ecology are dwarfed by, and show different temporal trends than, other related words like “environmental” (interactive graph). So I suggest not reading too much into the trends for any single word.

Must…resist…urge…to…spend…all…day…playing…with…this! :-)

Related posts:

Fun with Google Ngrams: what’s the most popular subfield of ecology?

More fun with Google Ngrams: the ups and downs of ecological ideas, societies, and journals

Weird, and unwise, things to include on your cv (UPDATED)

So, what’s the weirdest thing you’ve ever seen someone include on their cv?

Probably the weirdest I’ve heard of is someone who listed their IQ. Apparently IQ tests don’t measure “good judgement about what to put on your cv.”

I’ve also seen or heard of a couple of faculty who list their high school, or high school achievements. Which I wouldn’t say is weird, but does seem a bit odd to me. Not to the extent that it would ever affect, say, a hiring decision, I don’t think (it’s rare for any one little thing like that to derail anyone’s job application). But a bit odd.

I have a few things on my cv that might be considered slightly weird. Or maybe not? I’m not sure. You be the judge:

  • My cv notes that my undergraduate degree is magna cum laude, which I believe is Latin for “I went to a fancy liberal arts college or Ivy League university and did reasonably well, and I am still proud of it.” ;-) Although then again, it’s pretty common to list Latin honors. So maybe I’m just overly self-conscious in worrying that listing Latin honors might look a little weird?
  • I list all the working groups, symposia, and editorial boards I was invited to join but declined. I list them because such invitations are one line of evidence of my standing in the field. But I’m probably at the point in my career where I should just drop them for the sake of brevity. And you don’t see many other people listing declined invitations, so it might look a little weird that I do.
  • I have a paragraph on my blogging right at the end, in its own section. I actually don’t think many people would consider that weird, at least not after they read it. But I suppose a few people might. I’m not worried about this possibility, but it is possible.

There are other things you sometimes see people include on their cv’s that aren’t weird, in that they don’t make the reader think “Why would you list that?”, but that nevertheless give a bad impression:

  • “In prep” papers, unless you are a grad student or postdoc. The only two reasons to list “in prep” papers are (i) to help convey what you work on, and (ii) to show that you are indeed actively working on something. But once you’re past the postdoc stage, you should have enough of a track record that you shouldn’t need in prep publications to convey (i) or (ii). Nobody reading your cv gives you even a smidgen of “partial credit” for “in prep” publications. No, not even if you say they’re to be submitted in the next three months and you specify the target journals, and not even if your target journal is Science or Nature. You can probably leave off submitted publications too unless you’re a grad student or postdoc, for the same reason. By the way, I learned this fairly late, and listed in prep and submitted publications on my cv until a couple of years ago. In retrospect, that was a (minor) mistake.*
  • Listing anything other than peer reviewed papers in the same section of your cv as peer reviewed papers. Papers in prep, online preprints (even those that have received “post-publication review” in some form), letters to the editor of Nature and Science, invited papers that weren’t peer reviewed, particularly witty tweets…If you list any of that in the same section of your cv as peer-reviewed papers, people will think you’re trying to pass that other stuff off as peer reviewed papers.**
  • Continuing to list retracted papers. I’ve heard of people doing this. You don’t want to be one of those people. No matter what the reason for the retraction.

And of course, there are other things on which you’ll get conflicting advice as to whether to list them. So the floor is open. Got questions or advice on what not to include on your cv? Fire away!

*Until very recently I also listed papers on which a revision had been invited, but I’ve stopped now. Listing “in press” papers is fine, of course. If you do list submitted, in review, or in press publications, provide some sort of identifying information that could in principle be checked–the doi if there is one, or else the ms tracking number.

**And don’t put that other stuff in an earlier section of your cv, before your peer-reviewed papers. In general, you should first list your degrees and employment. The remaining sections should be in rough order of their importance to whatever position you’re applying for. You don’t want whoever’s evaluating your application to have to dig for the information they care most about.

UPDATE: This post describes some N. American norms of cv construction. Norms vary between countries. In general, I’m of the view that you should follow the local norms, so that those reading your cv can read it easily and don’t raise an eyebrow. See the comments for some discussion of norms in other countries.

Friday links: sharks and pythons and unicorns, oh my!

Also: articles about women in science, economics as the most confusing subject, and risk-reward tradeoffs in grants

Mammal March Madness is about to return! The four divisions are Mighty Mini Mammals, Mythical Mammals (the photo seems to indicate unicorns will be a contender!), Critically Endangered Mammals, and Sexy Beasts. My 4-year-old knows what sugar gliders are thanks to last year’s MMM. She will be very excited to fill out a bracket again this year!

This is a great, accessible summary of the problem of Burmese pythons in the Everglades. I love the drawings!

More very impressive animal pancakes, including butterflies and sharks. (ht: David Shiffman)

SWEEET (the Symposium for Women Entering Ecology & Evolution Today) has a webpage listing recent peer-reviewed articles about women in science. This looks like a great resource! (ht: Susan Cheng)

From Brian:

Terry McGlynn has a great poll on the risk-reward tradeoff for grants. Go answer the poll. This is a really important conversation to be having right now.

And Rich Lenski has a 3rd round of question answering!

From Jeremy (even though he’s traveling!):

Hoisted from the comments
In my post yesterday on how to learn new skills in R, I admitted that I want to download the stats- and R-knowledge from Ben Bolker’s brain. That prompted Noam Ross to come up with this excellent comment:

Ben Bolker’s Brain has been scattered into bits
Among SO, R-sig-ME and a thousand different lists
Now I’ve asked him all my questions, and I hope my model fits.
His code is marching on.

(With apologies to Ben, and the many authors of “John Brown’s Body”)

How do you learn new skills in R?

As I wrote about yesterday, I have slowly shifted from using Systat and SAS to using R. I now do all of my analyses and make my figures in R, but still regularly bump up against things I don’t know how to do. These things generally fall into one of three categories:

  1. manipulating a dataframe,
  2. trying to figure out how to do an analysis that I haven’t done in R before,
  3. trying to make pretty figures.

This has me wondering how to best learn new skills in R. I know I am not alone in trying to figure this out! So, please let us know in the comments what approaches have worked for you and/or people you know!

As my lab was initially shifting to R, we had a series of stats boot camps at lab meeting, where we learned how to import data to R and some of the basics of working with data in R. We then also had different lab members teach everyone else how to do some analyses in R that we were all likely to need at some point (e.g., survival analysis). That worked really well at first, but now we’ve run into the problem of having had some turnover in the lab. As new people join the lab, how do we get them up to speed? And what about things that not everyone needs to know how to do?

As I’ve learned R, my general approach to trying to learn new things has been (roughly in order):

  1. Look back through old code if I think I maybe have done something similar before,
  2. Search on something like Cookbook for R (especially if my question relates to graphics),
  3. Look in Crawley or Zuur,
  4. Wish I could just download all the R- and stats knowledge from Ben Bolker’s brain into mine,
  5. Consult Dr. Google, which often leads to Stack Overflow,
  6. If still stuck, ask on twitter (usually remembering to add the #rstats hashtag),
  7. Email someone who might be able to help me. (I try hard not to do this last step, though, because I don’t want to bother other folks.)

Based on this tweet from Hadley Wickham:

I am definitely doing it right when learning new things in R!

I’ve also been trying to keep this in mind:

As came up in the comments on yesterday’s post, yes, sometimes it’s a battle to figure out how to make a figure in R, but that knowledge is useful in the future.

Usually, I can figure out what I need, but it sometimes takes a really long time. Sometimes I give up and resort to a less elegant approach. With dataframe manipulation, that less elegant approach is usually brute forcing things. For example, I recently wanted to assign a unique ID to each infection-lake-year category, so that I could make one big box plot containing data from all of them. I couldn’t figure out how to do this and it was nearing the end of the day, so I just manually went in and told R that rows 1:20 should be “A”, 21:39 should be “B”, etc. It worked, but it means that if something about the data changes, I will need to remember to change the row indexing. And it means I can’t easily use that code again for a similar purpose in the future. For figures, the brute force approach for me generally involves moving things into Powerpoint and rearranging figure panels or centering labels there. I will come back to the specific topic of figures in a post next week, but my ideal would be to not need to move to another program at all. I’m getting closer to that, but I’m not all the way there. (Comments on yesterday’s post suggest maybe not all advanced R users view this as something to aim for.)

As I’ve thought about how to learn these techniques, I’ve wondered how others learn how to program, especially in R. And, more specifically, I wonder what I could be doing differently to pick up R faster.

One idea I’ve considered would be to have an R lounge – a room reserved where people can come and work on analyses, with the idea that they could interrupt others or get interrupted by others to ask about a problem they’re running into. But I don’t think this would be really useful. It would only work if some people who know a lot in R were generous with their time and came and worked there. And, when I am trying to figure something out, I want to know the answer approximately 10 minutes ago, so waiting until others come by would drive me up the wall.

Another option would be that I could also try posting to Stack Overflow. I certainly often find helpful suggestions by looking through posts there. But I feel like there’s a culture to it that I haven’t learned, and that makes me hesitant to wade in there. (For example, sometimes the reply to a post is a curt indication that the question has already been asked and answered elsewhere, or an admonishment for selecting an answer too quickly.) Plus, something about posting there feels a little too public to me (which, yes, might seem weird for someone who blogs and tweets to say!) I tend to feel like any specific problem I post would seem so incredibly basic.

In the end, I haven’t come up with a better option than slowly battling through, task-by-task. It still feels incredibly slow sometimes, but maybe that’s just the nature of the beast.

How did you learn R? What would you recommend to people who are complete R novices? (When I mentioned writing this post on twitter, Zhian Kamvar recommended swirlstats, which looks great.) What about to people who’ve mastered the basics but are trying to learn more?

Some general resources I’ve found helpful:
1. RStudio cheatsheets (currently for data wrangling and R markdown)
2. Beautiful plotting in R: a ggplot2 cheatsheet, by Zev Ross
3. Cookbook for R

The biggest benefit of my switch to R? Reproducibility

Your primary collaborator is yourself 6 months from now, and your past self doesn’t answer emails

The quote above*, which is commonly used in Software Carpentry workshops, is a great, succinct way of reminding people to annotate code well, and to give thought to how they organize (and name!) data files and code. It is similar to something that I emphasize to all people in my lab regarding lab notebooks: write things in there in so much detail that it pains you, because, in six months, you will have completely forgotten** all the things that seem totally obvious now. But it’s probably not something I’ve emphasized enough in terms of analyses, and I should fix that. For me, the biggest unanticipated bonus of my shift to R has been how much easier it has made redoing analyses and remaking figures.

As I whine about on twitter periodically, I have slowly shifted from using Systat and SAS as my primarily stats and graphics tools to using R. The main motivation for this shift was because it seems obvious to me that my students should learn R, given that it is powerful and open source. My students working in R meant that I felt like I needed to learn R, too. It’s been a sloooooow process for me to shift to R (it is so frustrating to learn how to do something in R when I know I could have the results after 5 minutes in SAS!), but I finally view R as my default stats program. When I first made the shift, I mainly just wanted to get to the point where I could do the same things in R that I could already do in Systat and SAS. But I now see that a huge advantage of the shift is that my analyses and figures are much more easily reproduced in R.

Prior to shifting to R, my basic approach was to enter data in Excel, import the data into Systat, and then use the command line to do a bunch of manipulations in there. I generally viewed those as one-off manipulations (e.g., calculating log density), and, while I could have saved a command file for those things, I didn’t. This meant that, if I discovered an error in the original Excel file, I needed to redo all that manually. For making figures, I would again use the command line in Systat; in this case, I would save those command files. I would then paste the figure and command file I had used to make it into Powerpoint, and then would get the figure to publication-quality in Powerpoint. (For example, the tick marks on Systat figures never show up as straight once the figure has been exported, so I would manually draw a new tick mark over the old one to make it so that the tick mark appeared straight. That sort of thing was clearly really tedious.) For analyses, if it was really straightforward (e.g., a correlation), I would do it in Systat (again, usually saving the command file). But if the analysis was more complicated, I would go to SAS to do the analysis. That would mean importing the excel file to there, and then doing the analyses there. I would then paste the output of those analyses into a Word file, along with the code (which I also saved separately).

Overall, on a scale from completely hopeless to completely reproducible, my analyses were somewhere in the middle. I at least had the command files (making it more reproducible than if I had used the GUI to do everything), but I would end up with a folder full of a whole ton of different command files, different Systat data files plus the original Excel files, and with some results in a Word file, some in a Powerpoint file, and some just left as output in Systat. And, if I later needed to change something (or if a reviewer asked for a change in an analysis), it took a lot of effort to figure out which was the relevant command and data file, and I would have to go back through and manually redo a whole lot of the work. One paper of mine has a ton of figures in the supplement. I no longer recall exactly what change a reviewer wanted, but that change meant that I had to remake all of them. It took days. And, yes, surely some of this could have been improved if I’d come up with a better workflow for those programs, but it certainly wasn’t something that arose naturally for me.

Now, with R, I can much more easily reproduce my analyses. I think I do a pretty good job of annotating my code so that I can figure out in the future what I was doing (and so that others who are looking at the code can figure out what I was doing). I recently was doing a long series of analyses on field data and, after working on them for a while, realized I had forgotten an important early filtering step.*** With my old system, this would have been immensely frustrating and resulted in me having to redo everything. With my new system, I just went back, added one line of code, and reran everything. It was magical.

But I still haven’t reached full reproducibility yet, and I am surely far from many people’s ideal for reproducibility. (For starters, I haven’t used github or something similar yet.) For the manuscript I’m working on now, I still exported figures to Powerpoint to arrange them into panels. I know that, in theory, I can do this in R, and I could get the basic arrangement worked out in there, but I couldn’t figure out how to get it to arrange them in a way that didn’t include a lot more white space between the panels than I wanted. I imagine that, with enough time, I could have figured that out. But, at the time, it didn’t seem worth the effort. Of course, if something comes up and I need to remake all of them, I might come to a different conclusion about whether it would have been worthwhile! (Update: I decided to add two new panels to the figure, and spent all day Monday working on it. I got to the point where I could get them arranged nicely in R, but never did figure out why two of the y-axis labels weren’t centering properly. So, that last step still happened in powerpoint. Sigh. I was so close!)

That brings me to the topic of my post for tomorrow: how do you learn new analyses and programming skills? I’m looking forward to hearing what other people do! And next week I’ll come back to the topic of making figures.

 

* This is the version that Christie Bahlai used when she taught at the Women in Science and Engineering Software Carpentry workshop at UMich in early January. The quote has become a standard in SWC workshops. In an email thread that included an amusingly expanding number of SWC instructors, Paul Wilson pointed me to this tweet by Karen Cranston as the original motivation for the quote:

** It is embarrassing to me how often I forget not just details of experiments, but entire experiments. For example, for the manuscript I am working on now, I forgot that we had done an experiment to test for vertical transmission of the parasite. Fortunately, the undergrad who has been working on the project remembered and had it in his writeup!

*** I remove lake-date-host species combinations where we analyzed fewer than 20 individuals for infection. Our goal is to analyze at least 200 individuals of each host species from each lake on each sampling date, but sometimes there are fewer than 200 individuals in the sample. If we have fewer than 20, I exclude that lake-date-host combination because infection prevalence is based on so few animals that it is impossible to have much confidence in the estimate.

In praise of slow science

Its a rush rush world out there. We expect to be able to talk (or text) anybody anytime anywhere. When we order something from half a continent away we expect it on our doorstep in a day or two. We’re even walking faster than we used to.

Science is no exception. The number of papers being published is still growing exponentially  at a rate of over 5% per year (i.e. doubling every 10 years or so). Statistics on growth in number of scientists are harder to come by – the last good analysis I can find is a book by Derek de Solla Price in 1963 (summarized here) – but it appears the doubling time of scientists, while also fast, is a bit longer than for the doubling time of the number of papers. This means the individual rate of publication (papers/year) is going up. Students these days are being pressured to have papers out as early as their second year*. Before anxiety sets in, it should be noted that very few students meet this expectation and it is probably more of a tactic to ensure publications are coming out in year 4 or so. But even that is a speed up from publishing a thesis in year 6 or so and then whipping them into shape for publication which seemed to be the norm when I was in grad school. I’ve already talked about the growing number of grant submissions.

Some of this is modern life. Some of this a fact of life of being in a competitive field (and there are almost no well paying, intellectually stimulating jobs that aren’t highly competitive).

But I fear we’re losing something. My best science has often been torturous with seemingly as many steps back as forward. My first take on what my results mean are often wrong and much less profound than my 3rd or 4th iteration. The first listed hypothesis of my NSF postdoc proposal turned out to be false (tested in 2003-2004). I think I’ve finally figured out what is going on 10 years later. My first two papers did not come out until the last year of my PhD (thankfully I did not have an adviser who believed in hurry up science). But both of them had been churning around for several years. In both cases I felt like my understanding and my message greatly improved with the extra time. The first of these evolved from a quick and dirty test of neutral theory to some very heavy thinking about what it means to do models and test theory in ecology. This caused the second paper (co-authored with Cathy Collins) to evolve from a single prediction to a many prediction paper. It also lead to a paper in its own right. And influenced my thinking to this day. And in a slightly different vein since it was an opinion paper, my most highly cited paper was the result of more than 6 months of intense (polite but literally 100s of emails) back and forth debate among the four authors that I have no doubt resulted in a much better paper.

I don’t think I’m alone in appreciating slow science. There is even a “slow science” manifesto although it doesn’t seem to have taken off. I won’t share the stories of colleagues without permission, but I have heard plenty of stories of a result that took 2-3 years to make sense of. And I’ve always admired the people who took that time and in my opinion they’ve almost always gotten much more important papers out of it. I don’t think its a coincidence that Ecological Monographs is cited more frequently than Ecology – the Ecological Monographs are often magnum opus type studies that come together over years. Darwin spent 20 years polishing and refining On the Origin of Species. Likewise, Newton developed and refined the ideas and presentation behind Principia for over a decade after the core insight came.

Hubbell’s highly influential neutral theory was first broached in 1986 but he then worked on the details in private for a decade and a half before publishing his 2001 book. Would his book have had such high impact if he hadn’t ruminated, explored, followed dead ends, followed unexpected avenues that panned out, combined math with data and literature and ecological intuition and generally done a thorough job? I highly doubt it.

I want to be clear that this argument for “slow science” is not a cover for procrastination nor the fear of writing or the fear of releasing one’s ideas into print (although I confess the latter influenced some of the delay in one of my first papers and probably had a role with Darwin too). Publication IS the sine qua non of scientific communication – its just a question of when something is ready to write-up. There are plenty (a majority) of times I collect data and run an analysis and I’m done. Its obvious what it means. Time to write it up! So not all science is or should be slow science. Nor is this really the same as the fact that sometimes challenges and delays happen along the way in executing the data collection (as Meg talked about yesterday).

But there are those other times, after the data is already collected, where there is this nagging sense that I’m on to something big but haven’t figured it out yet. Usually this is because I’ve gotten an unexpected result and there is an intuition that its not just noise or a bad experiment or a bad idea but a deeper signal of something important. Often there is a pattern in the data – just not what I expected. In the case of the aforementioned paper I’ve been working on for a decade, I got a negative correlation when I (and everybody else) expected a positive correlation (and the negative correlation was very consistent and indubitably statistically and biologically different from zero). Those are the times to slow down. And the goal is not procrastination nor fear. It is a recognition that truly big ideas are creative, and creative processes don’t run on schedules. They’re the classic examples of solutions that pop into your head while you’re taking a walk not even thinking about the problem. They’re also the answers that come when you try your 34th different analysis of the data. These can’t be scheduled. And these require slow science.

Of course one has to be career-conscious even when practicing slow science. My main recipe for that is to have lots of projects in the pipeline. When something needs slowing down, then you can put it on the back burner and spend time on something else. That way you’re still productive. You’re actually more productive because while you’re working on that simpler paper, your subconscious mind is turning away on the complicated slow one too.

What is your experience? Do you have a slow science story? Do you feel it took your work from average to great? Is there still room for slow science in this rush-rush world? or is this just a cop-out from publishing?


*I’m talking about the PhD schedule here. Obviously the Masters is a different schedule but the same general principle applies.

Science is hard: culturing problems edition

Science is hard. That’s not exactly a newsflash to any of the readers of this blog, but it’s a point that Science has been reminding me of recently. This has been reminding me of an earlier Science Is Hard episode that my lab went through. I’ve been reminding myself that we got through that one (and that one was definitely worse), and we’ll get through this one, too.

Both of these Science Is Hard episodes have involved culturing problems, and, in both cases, I feel like we did really good science to figure out the cause of the problem. But it’s the sort of science that goes completely unreported. In many ways, it fits as the story behind the paper – really, for a whole number of papers, because none of them would have existed if we hadn’t figured out the problem.

The first major culture problem occurred when I was at Georgia Tech, after I’d been there for about a year. For the first semester or so, I was mainly ordering things and setting up the lab. And then, in my second semester, we started doing research. There were definitely stumbling blocks (it took us a long time to get our algae chemostats really going, for example), but things were moving along.

Until they weren’t. At some point, we started having lots of problems with animals dying. It was so frustrating, because we had felt like we were about to be going full steam, only to find ourselves unable to do any experiments. So, of course, we started trying to figure out why.

My grad student Rachel was in her first year in the lab, and had recently started doing some experiments. She was really worried that maybe she was doing something wrong in the lab that was causing the problems. And, frankly, the timing was a little suspicious, as the problems had started right around when she really started to do work in the lab. So, she and I tried to set up an experiment side-by-side, doing everything at the same time. We both had tons of animals die. That ruled out the Rachel hypothesis (which was a relief to Rachel and to me!), but didn’t tell get us much further to figuring out what was going on.

Rachel ended up having the key insight that got us moving on the right track: she was the one who first noticed that the deaths were a beaker-level phenomenon. Either all of the animals died in a beaker or none died. Based on that observation, we put some beakers in the acid bath, rinsed them well with DI water, and then set up animals in them. None died. Breakthrough!

So, it was something on the beakers. But what? And when was it getting on there? To get at that, we first did an experiment where we acid washed a bunch of beakers, and then rinsed half with DI water and put the other half through our normal dishwashing process (which involves scrubbing with soapy water, then rinsing with tap water, then putting them in the dishwasher for a tap rinse followed by DI rinses. We are serious about getting our beakers clean.) The animals in the beakers that had only been rinsed with DI water all lived. The ones that had gone through the regular dishwashing process died. More progress!

So, then we needed to figure out which part of the dishwashing process was a problem. We had the people who ran our water system come and put in a way for us to draw from the DI tanks that fed the dishwasher in between the tanks and the dishwasher. We then used the DI water from that new feed to rinse the dishes (after washing them with soapy water and rinsing with tap water), and compared those to beakers that were rinsed on a DI cycle in the dishwasher. Again, the animals in beakers we’d rinsed by hand did great; the ones in beakers that had gone in the dishwasher died.

By this point we’d been troubleshooting for months, but we had at least made lots of progress. At this point, Al Dove from the Georgia Aquarium heard about our problems and very kindly offered to run some water samples for us. We put a beaker in the dishwasher upright to collect water and sent it over. The copper in the water was 74 ug/L. As my colleague Terry Snell pointed out, the LC50 for copper for the rotifer Brachionus is 30 ug/L.

At that point, I was so ready to buy a new dishwasher and put the problem behind us! I had been using a dishwasher purchased at Lowe’s, because that’s what the lab I’d been in as a grad student had done, and there weren’t any problems. But I decided that, given that the problem was with the dishwasher, I needed to get a fancy lab-grade dishwasher. So, I did. When the new dishwasher came and they took the old one out to replace it, they found that the DI line had been attached with a copper fitting. This is a huge no-no, since DI water is very pure and leaches the copper from the pipe into the water. But, it explained why we had so much copper in the water!

At that point, our problem was identified, but not solved. We have a LOT of glassware in my lab (thousands of beakers), and we had no way of knowing which had gone through the dishwasher while the copper contamination was occurring. So, we concluded that we had to acid wash every piece of glassware in the lab. That took weeks of work, mostly done by my excellent technician, Jessie.

In the end, we lost about a semester of work due to the copper problem. We haven’t figured out the source of our current problem yet. One thing that I find interesting is that, when I run into a problem like this, the first person I contact is my PhD advisor, Alan Tessier. I’d like to think I’m a grown up scientist now, but I still really value advice from Alan!

For now, we’re going through all the trouble-shooting. Acid washing beakers didn’t help, nor did using brand new beakers. So, it doesn’t seem to be a glassware problem. Now we’re on to testing whether it’s an issue with the water. One possibility is that something about the water has changed as we stored it over the winter. (We culture our Daphnia in filtered lake water.) Perhaps some compound the Daphnia really like has broken down over the winter. So, we’ll go out and get new water and see if that solves the problem. I was dragging my heels on going out and breaking through the ice, since it seems like a major pain, but my lab is really excited about our upcoming winter limnology expedition. And we’re all really excited about the prospect of getting this problem solved soon!

We’ll get through it. But, boy, science is hard.

 

Related posts:
1. System envy and experiment failures
2. Tractable != easy