Why am I a scientist again? – The concept of a data present

Posted on March 30, 2016 by Brian McGill

(This is a guest post from Isla Myers-Smith, early-ish career academic at the University of Edinburgh, with a conversation at the end with Gergana Daskalova, an undergraduate in her lab)

Sometimes I like to worry about why I have chosen a scientific career path and the meaning of life and big esoteric questions that really have no particular answer. I have wondered many times why do I push myself so hard to succeed in science? I know the pipeline is leaky for early career scientists and many choose to leave the Ivory Tower to make different contributions with their careers, but at least for now, I have stuck with the halls of academia and here is why.

Way back when I was a first year MSc student at the University of Alaska Fairbanks, I took a course in stable isotope biogeochemistry taught by one Matthew Wooller. In that course, each of us had to do a small research project to answer a question with a few samples that could be run at the Alaska Stable Isotope Facility. My project involved looking at deuterium and oxygen isotopes in Sphaghum leaves from a peat core at my field research site to test for changes in soil moisture/water table depth over time – if I remember correctly that is. We prepared our samples and then a few weeks later the data came back from the lab.

Surprisingly, Mat handed out our printed results wrapped up in wrapping paper with ribbon – our “data present”! And then he told us (I am paraphrasing from memory): “If, when you are unwrapping this data present you feel the same excitement that you felt as a child at your birthday party unwrapping presents, then you know that science is the career for you. If on the other hand, you feel no real excitement at revealing these data, then perhaps this isn’t your best career option.”

Matt had introduced us all to the “data present”, a concept that sticks with me to this day. A data present is the moment when a dataset is analyzed for the first time and the result is revealed. It could be that moment when you press run on your R code once all the bugs are fixed and you finally get that model to work or that figure to print. It could be when your student presents an amazing analysis at lab meeting and reveals their preliminary findings on a manuscript they are working on.

My students and collaborators know that I just can’t resist a “data present”! I will stay up late at night to reveal one or, if I don’t bother staying up to finish it, I will lose sleep thinking about what an analysis might show. I even have to hold my self back from revealing other people’s data presents for them, when collaborating on analyses, as some of my colleagues will no doubt be aware. I am a total “data present” geek. I love them!

And it is this passion for “data presents”, that is very reaffirming for me. Every time I feel that excitement when I get a data present – when I see a question answered for the very first time – I know that science is the right career for me. If I am ever feeling overwhelmed by the stresses of academia, I try to devote some time to getting back to analyzing data. And then I usually quickly regain my love for the at-times-all-consuming job of being an ecologist.

Like Mat, I have been sharing the excitement of data presents with the early career researchers that I mentor and teach. I try to emphasize that it is really important to take a pause sometimes, and ponder why we do the science that we do. For me, the concept of a data present is an incredibly powerful way to remind myself of my own personal scientific motivation and to answer that reoccurring question: why am I a scientist again?

Response from 4^th year undergraduate student, Gergana Daskalova, University of Edinburgh:

The data present idea is similar to ‘data kittens’. FemaleScienceProfessor discusses the disappointment after getting a kitten (result) that is just not as cute as she hoped it would be. I am not sure how to feel about being disappointed in results – they are what they are, not ‘good’ or ‘bad’. Obviously some results would make us happier than others, some results have wider implications, etc. As a conservation scientist in training, I would be sad to find out that the recovery plan for my favorite birds is not working. I hope that when that time comes, I will be able to put my emotions aside, ask ‘why?’ and carry on working.

If we continue with the data present analogy, another question to ask ourselves would be with what, if any, expectations do we unwrap a data present. As scientists, we are (or aim to be) unbiased, but then we are also people, with personal interests and passions. So are we the kid who didn’t make a wish list and is just happy to get a present, or the kid who knew exactly what they wanted, and upon unwrapping a present that wasn’t on the wish list, quickly moved onto unwrapping the next? Wish lists seem like a rather dangerous thing to make in science, as I can see how there could be a thin line between really wanting a present and really wanting a certain kind of present outcome.

I haven’t had many presents to unwrap thus far in my career, but some, especially the most recent ones, have indeed felt like my birthday – but if a birthday present doesn’t live up to its expectations, how do you deal with that disappointment? I would imagine that your attitude towards ‘the day after your birthday’, as well as towards ‘your birthday itself’, has an impact on what kind of a scientist you are, and whether being one is really the right thing for you.

Reply from Isla:

For me the excitement is there regardless of the result, and the only disappointment is if the answer to my research question is still not clear. Data presents are about learning something new that potentially no one else in the world has ever known, not so much whether the answer meets my preconceived expectations, or at least I hope that is the case. I was chatting with one of my PhD students this week about one of the key parts of a manuscript or proposal, and one of these is the place that sets out the anticipation for the soon to be revealed data present: “If we find this… this will mean this, if we find that… this will mean that”. It wasn’t until I was post-docing with Mark Vellend four years ago, that I finally understood how setting up your results in your overall pitch, can allow others to share in your “data present” enthusiasm, whether your findings do or do not support your hypothesis.

What do you think of the concept of a data present? Is the first time you analyze data like opening a present for you? Do you feel this excitement is part of the reason why you became a scientist/or why you have stuck with science as a career? What was the best data present that you ever opened?

25 thoughts on “Why am I a scientist again? – The concept of a data present”

Jeremy Fox on March 30, 2016 at 8:54 am said:

This is a big reason I like working in microcosms. The data presents come often, because the experiments are short (a few months or even less). And we enter and plot the data as we go, so we don’t have to wait until Christmas to open our presents, as it were. Entering and plotting the data as we go is actually kind of like ripping the wrapping paper off a present slowly. You tear off a bit, and so can start to guess what the present will be. Then you tear off some more, hoping the present is what you think it is. 🙂

Reply ↓
- Mark Vellend on March 30, 2016 at 3:46 pm said:
  
  Interesting. As much as it requires will power to do so, I actually try not to unwrap data presents until (almost) all the data are in, just so I don’t get too excited about this or that outcome that then turns out not to happen. When that awesome Lego set turns out to be school supplies in an old Lego box, it’s a serious bummer! More seriously, it counters any unconscious and subtle bias with respect to adjusting criteria for inclusion/exclusion of subsequent data/trials based on things that “don’t work”, for example.
  
  Reply ↓
  - Jeremy Fox on March 30, 2016 at 5:10 pm said:
    
    Good point. For us, plotting data as we go doesn’t dictate our choice of which data to include or what analyses to do or whatever. As far as possible, those choices aren’t data-dependent for us.
    
    Yes, I have had cases where it looked like an experiment was going to turn out one way, but in the end it turned out some other way. That’s life.
    
    One benefit of plotting data as we go is that it lets us decide when to cut our losses on an experiment that’s totally not working. For instance, if it’s an experiment about effects of species diversity on some ecosystem process, and we lose most of the species within a couple of weeks, the experiment is a lost cause. We’re obviously not going to be able to get any answer to the question of interest (neither the answer we “wanted” or any other answer), so we cut our losses and stop the experiment.
Lee J Rickard on March 30, 2016 at 8:59 am said:

I think that noticing a sense of disappointment when you open your data present can be really important. If you start to think about why you feel that way… what was it that you were expecting and didn’t get?… why do you expect that?… more important, why is your present different from what you expected?… how much more interesting the world can be than what we expect from it!

Reply ↓
- Jeremy Fox on March 30, 2016 at 9:22 am said:
  
  I’d differentiate a couple of scenarios. Unexpected results that are still interesting, and uninteresting results. For instance, my lab just did a microcosm experiment to test a hypothesis about the spatial synchrony of predator-prey cycles. But the predators and prey in question didn’t cycle–indeed, the predators never really established populations at all. So an unexpected result, but unfortunately not interesting–we can’t even use it to address a question other than the one we originally set out to address.
  
  Sometimes you open your data present, and it’s the scientific equivalent of a pair of socks or a book you already own.
  
  Reply ↓
Jeremy Fox on March 30, 2016 at 9:05 am said:

The best data present I know of was received by my labmate Christina Kaunzinger and supervisor Peter Morin back in grad school. Christina had just finished the experiment that became Kaunzinger and Morin 1998 Nature. As I recall, she stayed late in the lab processing and plotting her data, printed out the key figure (a *perfect* match to the theoretical prediction) and taped it to Peter’s door with “Beautiful, isn’t it?” written on it. Peter found it the next morning when he came in to the office.

Reply ↓
Meghan Duffy on March 30, 2016 at 12:26 pm said:

I often think of that FSP post on data kittens! I love them. The one that stands out most to me is when we spent years working on a study that was a follow up to some studies I’d done as a grad student. It was a huge amount of field work, followed by a huge amount of lab work. And, in the end, it all could be plotted on one very simple figure which would show if we’d been correct. When I first plotted the figure and saw the relationship, I was so excited that I printed it off and ran down the hall to show it to a colleague. He had no idea what I was talking about, but shared in my excitement anyway. 🙂

Reply ↓
Carlos Aguilar-Trigueros on April 4, 2016 at 12:14 pm said:

Hello, my name is Carlos, I have been reading with great interest these posts for long, and this time I could not resist making comment!

I also spend several hours/days wondering why I decided to became an scientist/ecologist (I am new here, and assuming someone is reading: I´m an early stage post-doc doing community ecology with soil fungi).

The point is, that I did my undergraduate in El Salvador, a country with virtually no contribution to science. So for me, and many of my fellows in the University, “being a scientist” was a very surreal/abstract career path. For long, the reason I wanted to be a scientist it was because I wanted to “study” interactions among living things and their environment. The realization that data analysis was a big component of the whole process came way, way later in my career.

During an internship in Panama, I had a better glimpse at how science is done. Two aspects fascinated me completely during that stay: a) that something as messy as tropical forests is predictable (or so I was led to believe…) b) that a lot of those patterns in species distributions/abundance were due to interaction of soil fungi (just some explanation, I was working with people studying soil and plant symbiotic fungi). It was not until my first year of PhD that stumbled on data analysis.

Maybe because of that context, for me the main reason for staying in science is the process of hypothesis formulation. Experimental/Sampling design and data analysis is “just” the process that follow this first part. I know, without data analysis one could not test/refine hypotheses. Maybe I´m just building a circular argument here, but I would have never said that the main reason why I chose/stayed in science is “excitement to analyze data”.

CA

Reply ↓
- Brian McGill on April 4, 2016 at 3:18 pm said:
  
  Thanks for commenting! A very interesting point. I would have to think long and hard if I’m a scientist more because of forming hypotheses or because of analyzing data/testing it.
  
  Reply ↓
  - imyerssmith on April 8, 2016 at 4:54 am said:
    
    For me, both data presents and hypothesis development are important and major motivators. When I am asked why am I a scientist, which happens from time to time, particularly during science outreach or the odd media interview, I try to be really honest. One reason is the adventure – I work mostly in the Arctic and I love it up there! And the second is that we ecologists are like detectives, which is awesome!
    
    Hypotheses – Part of the process is coming up with the questions and hypotheses – I love that part too!
    
    Design – Then there is designing the experiment/data collection – sometimes more painful, but it feels great when you have your experiment all set up or your last data point collected! Though, I can’t wait for time machines, so that we can one day get to the final data collection right away with out that long painful waiting period – which is particularly long in tundra ecosystems!
    
    Analyses – There is the designing of the analyses – a process that usually goes on through out the entire period, but should really occur at the very start – and something that I am more and more stoked about as I progress through my career.
    
    Data presents! – Then there is the data present opening – that point at which you get the answer, which is still the pinnacle of the ecological experience for me – but is definitely just one step in the process.
    
    And back to the top – And then finally there is the point where you reassess how that finding changes your understanding of this system and what the next most important step might be – which often happens for me in the process of writing manuscripts or proposals or putting together talks – which is a part of the process that I am only relatively recently realizing is really fun too!
    
    So I guess this is just the Hypothetico-deductive scientific process, of which I am a big believer! But it is more fun to call it ecological detective work. Why does an ecosystem change the way it does? How will this landscape look like in the future? Which is the part of an ecological community that is responding the fastest to climate warming? Those are the type of questions I get stoked about trying to answer day to day.
imyerssmith on April 8, 2016 at 4:37 am said:

Further reflections on data presents…

Thanks for the comments everyone! I love concept of data presents that are “the scientific equivalent of a pair of socks or a book you already own”, I guess we all open those from time to time! But it is still a data present as you have the answer to the question you asked, even if it wasn’t as exciting as you hoped – and sometimes those socks are Marino wool socks, which are very useful! And I have definitely had the experience of the “awesome Lego set turns out to be school supplies in an old Lego box” data present. Though, I guess really we should really be most circumspect about the data presents seem to be awesome Lego sets, as the unexpected result is also the most ecologically improbable. It is a real challenge to incorporate all the appropriate uncertainties into your analyses to appropriately test your scientific questions with some level of confidence.

As I work with larger and larger datasets, the process of “unwrapping” a data present becomes more and more of a process. The week that the blog was posted, I was working with my postdoc Anne to unwrap our latest data presents from a biome-wide analysis of spatial and temporal patterns of tundra plant traits. Each trait’s pattern is produced from a series of Bayesian models that take around 24 hours to run, and that is after as much optimization on our parts as we could muster. Once we have got the model structure figured out and the models converge, then we can plot the predictions, but it still takes me hours to wrap my brain around what those predictions might mean. Does no change mean nothing has changed at the community level? Or does it mean that changes for some species are offsetting changes for other species? Then we found ourselves going back to the raw data and plotting away until we could convince ourselves of the results that the hierarchical models show – and this process has been taking days if not weeks, even though I am super anxious to know and understand what the final result might be. Not exactly like the seconds it takes to rip the paper off of a birthday present.

Gergana and I were chatting at lunch last week about how sometimes you can go back much later to data presents that you had previously unwrapped, and unwrap them again with a different perspective or different analytical technique that you have gained in the intervening period. I have recently been working on a manuscript from my PhD (I know, shouldn’t I have published that years ago!!!) and in this case, I am glad that I have had the chance to form a somewhat new perspective on those data. One day, I hope that papers in the literature won’t be so static and that we will be able to go back and add to datasets, reanalyze data and allow our understanding of the science to shift over time. Perhaps the future of data presents is that they will keep on giving and giving in different ways across our careers as datasets grow and science advances.

Reply ↓
- imyerssmith on April 8, 2016 at 4:58 am said:
  
  Here is an excellent post on unexpected results, false positives and why the design of scientific analyses should occur at the beginning of the scientific process in ecology and evolution by Jarrod Hadfield:
  https://methodsblog.wordpress.com/2015/11/26/madness-in-our-methods/
  
  Reply ↓
Pingback: Isla and Gergana post on Dynamic Ecology! | Tundra Ecology Lab – Team Shrub
Pingback: Response from Team Drone: Willow Wishes | Tundra Ecology Lab – Team Shrub
Pingback: Gearing up for analysis – tundra teabag experiment
Pingback: Droning on about Arctic change | Tundra Ecology Lab – Team Shrub
Pingback: Qikiqtaruk Book Club Part IV: Theory and high-level processes in the Arctic | Tundra Ecology Lab – Team Shrub
Pingback: Qikiqtaruk Book Club Part IV: Theory and high-level processes in the Arctic | Tundra Ecology Lab – Team Shrub
Pingback: PhD begins, Coding Club returns – Gergana Daskalova
Pingback: The start of a new chapter (or many) | Tundra Ecology Lab – Team Shrub
Pingback: Getting quantitative & testing Island Biogeography Theory | Cons. Sci.
Pingback: Same data – different results? ConSci 2017 introduces AQMCS! | Cons. Sci.
Pingback: Highlights of the 2018 Conservation Science course | Cons. Sci.
Pingback: What does it mean to be an explorer? – Team Shrub
Pingback: Guest post: Personal journeys towards developing quantitative skills | Dynamic Ecology