When I first thought about switching to R and doing reproducible data analysis, the idea was daunting. As a grad student, I couldn’t figure out how to even get my data into R. How would I figure out that plus mixed model analyses plus how to make figures in ggplot, with version control and a beautiful github rep for all of my work?! What I eventually accepted is: it’s okay to start small. Or, as a colleague of mine suggests: for any given project, aim to do one thing in R that you couldn’t before.
I’m not sure why I set the bar so high for initially learning R. When I was first learning how to knit (actually knit, with yarn and needles, not the R version of knit), I knit a square washcloth, not a sweater. So when learning R, why was I expecting I’d be able to start out with the coding version of knitting a sweater with multiple colors, a fancy pattern, and buttons?
The “do one thing you couldn’t do before” advice was really freeing for me. If I was working on an analysis and realized life would be easier if I renamed a data column, I wasn’t cheating by doing that in excel and then reimporting the data. I was trying to keep projects moving along while learning new skills, but I didn’t need to learn them all at once. And, while my goal is ultimately full reproducible data analysis, I still sometimes have projects where I end up taking the dataframe from R, creating a csv file in excel, doing some manipulation that I haven’t figured out yet in R, and then going back to R.* I don’t do that nearly as often as I used to, but I still do sometimes, and I’ve come to accept that that’s okay. I don’t have to figure it all out today.
So, to people learning R (or some other new technical skill): it’s okay to start small. Consider this your official permission to aim for improvement, not perfection.
*Interestingly, in one case this saved me from making an error that I’m not sure I would have noticed. At first, I didn’t realize that, if you want to convert a factor to a numeric variable, if you just use as.numeric, it will give you the numeric coding of the factor, which is not what you want! Instead, you need to do as.numeric(as.character(factor)). Interestingly, the way I discovered this was by working on an analysis with a student. She was trying to do a lot in R but did some in excel. She did that part in excel, whereas I did it in R. We got different results and, in troubleshooting, I discovered this problem. So, my undergrad doing something in excel led to me learning an important R lesson! When I realized the problem, I got panicked that it may have messed up an earlier analysis that I’d already published. But I checked that and realized I hadn’t tried to do that step in R, but had done it in Excel instead. So, in that case, my kludgy approach to data analysis ended up saving me.