As I wrote about yesterday, I have slowly shifted from using Systat and SAS to using R. I now do all of my analyses and make my figures in R, but still regularly bump up against things I don’t know how to do. These things generally fall into one of three categories:
- manipulating a dataframe,
- trying to figure out how to do an analysis that I haven’t done in R before,
- trying to make pretty figures.
This has me wondering how to best learn new skills in R. I know I am not alone in trying to figure this out! So, please let us know in the comments what approaches have worked for you and/or people you know!
As my lab was initially shifting to R, we had a series of stats boot camps at lab meeting, where we learned how to import data to R and some of the basics of working with data in R. We then also had different lab members teach everyone else how to do some analyses in R that we were all likely to need at some point (e.g., survival analysis). That worked really well at first, but now we’ve run into the problem of having had some turnover in the lab. As new people join the lab, how do we get them up to speed? And what about things that not everyone needs to know how to do?
As I’ve learned R, my general approach to trying to learn new things has been (roughly in order):
- Look back through old code if I think I maybe have done something similar before,
- Search on something like Cookbook for R (especially if my question relates to graphics),
- Look in Crawley or Zuur,
- Wish I could just download all the R- and stats knowledge from Ben Bolker’s brain into mine,
- Consult Dr. Google, which often leads to Stack Overflow,
- If still stuck, ask on twitter (usually remembering to add the #rstats hashtag),
- Email someone who might be able to help me. (I try hard not to do this last step, though, because I don’t want to bother other folks.)
Based on this tweet from Hadley Wickham:
I am definitely doing it right when learning new things in R!
I’ve also been trying to keep this in mind:
As came up in the comments on yesterday’s post, yes, sometimes it’s a battle to figure out how to make a figure in R, but that knowledge is useful in the future.
Usually, I can figure out what I need, but it sometimes takes a really long time. Sometimes I give up and resort to a less elegant approach. With dataframe manipulation, that less elegant approach is usually brute forcing things. For example, I recently wanted to assign a unique ID to each infection-lake-year category, so that I could make one big box plot containing data from all of them. I couldn’t figure out how to do this and it was nearing the end of the day, so I just manually went in and told R that rows 1:20 should be “A”, 21:39 should be “B”, etc. It worked, but it means that if something about the data changes, I will need to remember to change the row indexing. And it means I can’t easily use that code again for a similar purpose in the future. For figures, the brute force approach for me generally involves moving things into Powerpoint and rearranging figure panels or centering labels there. I will come back to the specific topic of figures in a post next week, but my ideal would be to not need to move to another program at all. I’m getting closer to that, but I’m not all the way there. (Comments on yesterday’s post suggest maybe not all advanced R users view this as something to aim for.)
As I’ve thought about how to learn these techniques, I’ve wondered how others learn how to program, especially in R. And, more specifically, I wonder what I could be doing differently to pick up R faster.
One idea I’ve considered would be to have an R lounge – a room reserved where people can come and work on analyses, with the idea that they could interrupt others or get interrupted by others to ask about a problem they’re running into. But I don’t think this would be really useful. It would only work if some people who know a lot in R were generous with their time and came and worked there. And, when I am trying to figure something out, I want to know the answer approximately 10 minutes ago, so waiting until others come by would drive me up the wall.
Another option would be that I could also try posting to Stack Overflow. I certainly often find helpful suggestions by looking through posts there. But I feel like there’s a culture to it that I haven’t learned, and that makes me hesitant to wade in there. (For example, sometimes the reply to a post is a curt indication that the question has already been asked and answered elsewhere, or an admonishment for selecting an answer too quickly.) Plus, something about posting there feels a little too public to me (which, yes, might seem weird for someone who blogs and tweets to say!) I tend to feel like any specific problem I post would seem so incredibly basic.
In the end, I haven’t come up with a better option than slowly battling through, task-by-task. It still feels incredibly slow sometimes, but maybe that’s just the nature of the beast.
How did you learn R? What would you recommend to people who are complete R novices? (When I mentioned writing this post on twitter, Zhian Kamvar recommended swirlstats, which looks great.) What about to people who’ve mastered the basics but are trying to learn more?
Some general resources I’ve found helpful:
1. RStudio cheatsheets (currently for data wrangling and R markdown)
2. Beautiful plotting in R: a ggplot2 cheatsheet, by Zev Ross
3. Cookbook for R
These are perennial problems to a lot of areas, where our first layer of expertise is not in coding or even perhaps statistics. I’ve found that hanging out in StackOverflow and starting to read and answer other people’s questions that are similar to issues you have dealt with can be very educational. These people often use neat tricks that hadn’t occurred to you (me), and you can often help them vice versa with your (my) own trial and error learning results. Additionally, you often encounter expert solutions that may open up new ways of thinking about things. It also seems to make the knowledge a much more coherent whole, than my earlier problem-based R learning.
It’s definitely an issue to talk about, and I’ll be looking forward to other ideas that may be posted. Keep up writing the blog! 🙂
I like the idea of reading more broadly on Stack Overflow (instead of when I am just desperately searching for a solution!) I should give this a try!
I’ll be rooting for your first badges! 🙂 You can add a few tags to the favourites and start following new posts under them, there’s a fairly constant stream, with an active bunch of responders providing even a bit of a competitive element. I just started an active user a month ago, and I’ve already feeling the difference, with some memorable Aha! moments to tag along.
This comment from yesterday fits well with the theme of today’s post: https://dynamicecology.wordpress.com/2015/02/18/the-biggest-benefit-of-my-shift-to-r-reproducibility/comment-page-1/#comment-38809
This is in the context of genomics, not ecology, but our usual way to get new lab members up to speed is to give them a tutorial to work through at their own pace, with the opportunity to ask questions. The tutorial is one we use as part of a week long course in R for computational biology, so it’s written by lab members and updated every year. When I joined I was also asked “reproduce this figure from this paper”, and that was a great way for me to learn.
Datacamp is supposed to be good for R novices.
As for learning new skills after the very beginning stage, mostly I’ve learnt through Bioconductor vignettes and e.g. dplyr tutorials — both of these are aimed at learning to use specific packages, though, not generally improving R skills. Hadley Wickham Advanced R book is on my reading list for that!
See also http://www.rseek.org/, which is a direct search for R.
“I recently wanted to assign a unique ID to each infection-lake-year category, so that I could make one big box plot containing data from all of them”. If I understand this correctly, here is some data.table code
This creates a new column in the data.table ‘dat’ called ‘ID’ which is assigned the concatenation of the values in the columns infection, lake, and year (with the values separated by an underscore. Hyphens don’t work so well in data.table because it thinks the label is a function”.
interaction()was what jumped to my mind as I was reading Meghan’s post.
A data frame version would be:
or using the
interaction()function, which was designed for this sort of thing:
It’s a bit trickier, and less useful, to then reduce these to A, B, C, etc as Meghan originally did.
To be honest, if the need was just a set of boxplots for the combinations, I’d have used `ggplot()` and not worried about creating a redundant variable. Say something like:
if you wanted the lakes stacked one on top of another with time running along the x axis.
Bam! Visiting R is like visiting Boston. Ask 10 Bostonians how to get to the Symphony Hall and you get 15 answers. Same with R!
@Jeff 🙂 Normally I wouldn’t have bothered (here), but data.table is not something a novice R user should be using (IMHO) at the point that they are just starting to climb the learning curve.
FWIW you need to be a little cautious with interaction() because it defaults to drop = FALSE, which generates all possible combinations (and hence it’s easy to run out of memory)
@Hadley Yeah – I got bitten by that very thing when checking I was writing the code correctly. I forgot to edit in the
drop = TRUE
bit. But thanks for pointing this out!
As I said, I'd prefer to just generate the plot a different way rather than create a variable in my data that provide no useful information beyond what is already there.
Really, what I probably need to do for this particular figure is read more about small multiples. But, for now, what I was trying to do was order them in all a particular way so that I had all of them on one box plot in an order that made sense. The fastest way I could think of doing that was to just assign them letters in the order in which I wanted to appear. But that’s definitely a kluge, as my father-in-law would put it.
I need to get better at concatenating data, so I need to figure out paste and interaction, too. I am definitely succeeding today at getting frustrated and trying to learn things that will help long term. 🙂
I found SO overwhelming at first, and also felt that imposter-syndrome-ness of being public about my ignorance. But then I asked a few questions, and once I learned the style, really got to enjoy the back-and-forth, and got answers to some very difficult questions.
Two resources which I find useful:
http://www.r-bloggers.com/ – this is an aggregator of folk who blog about R (assuming they’ve put some information into the system). Not only is it inspirational (I check it almost daily) and helps you learn new tricks and the hottest new packages out there, but, searching it often produces blog posts that are spot-on for topics you are looking for, like publication quality graphs (I admit, I nearly always use ggplot2 for everything now).
My student Jillian also just recommended Hadley Wickham’s Advanced R to me – http://adv-r.had.co.nz/ for online, http://www.amazon.com/Advanced-Chapman-Hall-Hadley-Wickham/dp/1466586966/ref=sr_1_2?ie=UTF8&qid=1424353056&sr=8-2&keywords=advanced+R for book form. I’m so far finding that it has a lot of things I’ve figured out (over many many years, so, it will bootstrap you up quite fast) and many new tricks that have already sped things up for me.
I also sometimes browse http://rpubs.com/ for when folk publish interesting methods, and in particular http://rpubs.com/bbolker/ (seriously, his piece on making your mixed models converge? It saved my bacon a month ago).
And along those lines, this isn’t a journal we often read, but like Methods in Ecology & Evolution, it’s one of the most useful ones out there – the Journal of Statistical Software – http://www.jstatsoft.org/ – it’s where a lot of folk publish papers about their R packages, so, you’ll find the original plyr, ggplot, and other papers there. Reading these sorts of papers can often reveal many tips and tricks to highly useful packages you may have otherwise missed. Heck, I just found one I need to read right now! (About the spTimer package http://www.jstatsoft.org/v63/i15/paper)
We had a 1 credit (or was it 2?) “R Seminar” graduate course that was helpful. Each week, the professors would provide a step-by-step exercise to introduce the day’s functions/concepts and then we’d all work on some open-ended coding challenge(s) related to the day’s topic. Students had all levels of prior R experience, and I think everyone found it beneficial. If you tried all of the googling and digging in R help files that you could stand and were still stuck, you could just ask your neighbor for help. And if you were already proficient in the day’s topic, you could just dedicate the time to your own analyses.
I think that idea would work well as an informal R group, rather than a class, where everyone gets together at a certain time each week to code. It’s like an R Lounge, but no waiting around for someone to turn up.
I’m certainly no ninjaR, but have muddled my way to a reasonable amount of self-sufficiency with R through a combination of the following (in no order):
1. dedicating time to working through relevant chapters of Crawley,
2. completing short courses (e.g. there’s a Data Science (?) course on Coursera, run by John Hopkins University)
3. Dr Google is an absolute life-saver – often pointing me to StackOverflow or Quick-R
[I’ve never posted on S.O. for the exact same reasons as Meghan: it’s a jungle out there…]
4. Going through the daily summary emails from R-bloggers often gives some interesting reads/alternative approaches to existing problems (similar to Peeter’s notion of reading broadly on Stack Overflow)
5. Dedicating time specifically to learning a tool/R component. I’m currently trying to learn ggplot2, something I’ve avoided til now…
Also, with all the discussion of stackoverflow, I’d be remiss not making sure that Crossvalidated – http://stats.stackexchange.com/ – a SO for statistics (and a LOT of R) didn’t get mentioned. It’s a bit gentler, and I’ve learned a ton from that site.
And lest SO and CV seem intimidating, remember, sometimes they’re great places for accidental wacky R things – http://stackoverflow.com/questions/12675147/how-can-we-make-xkcd-style-graphs-in-r
This alone is a solid reason to learn R 🙂
Yes, this is fantastic!
I’m not sure whether this means the internet has too much free time on its hands or not enough. 🙂
As a reasonably prolific contributor to the r tag on Stack Overflow, I’d like to push back a bit on the notion that, as @chrisceaser put it, “it’s a jungle out there”, or the impression some might get from Meghan’s post that curtness or admonishment is par for the course.
In my experience, and I’ve contributed to SO for several years, the r tag is a reasonably friendly place and you’ll find no better source of expert R knowledge aside from your kidnapping some or all the members of R Core and locking them up in your lab.
You’ll only get a curt response or admonishment if you blatantly haven’t done your homework about the question or the SO site. By asking a question you are implicitly requesting for someone with the knowledge to contribute their time to help you out. This behoves those asking questions to put in some effort themselves. Search SO or use Google to see if your question has been asked before. Include a reproducible example so we can recreate the problem, or include a small example of the expected output so we can see what you want to achieve not what you wrote down in words (which can be ambiguous, especially when English is not the first language of everyone contributing to SO). Read the R FAQ or scan the Frequent tab for the r tag: http://stackoverflow.com/questions/tagged/r?sort=frequent&pageSize=50 which contains the most popular Q&As (in terms of being linked to). Read the help page for the function you are stuck on. Don’t ask stats questions on SO, use the CrossValidated sister site for that.
If as an Asker, you do that, you’ll have no trouble on SO. If you ask a questions that’s been asked and answered countless times before, then expect a curt response; people need to clean up your mess! We don’t want SO to be some uncurated, incoherent pile of rubbish on the internet and hence users that have accrued sufficient reputation are expected to help out maintaining the site.
This is not to excuse outright rudeness, which has no place on SO and should be flagged as such to be dealt with by the Mods.
I suspect I’m (perhaps overly) sensitive to the potential for running afoul of cultural norms based on an experience with wikipedia last year. I originally had that in the post, but then took it out because I was worried it was getting overly rambly. I certainly really value the information on SO! I probably need to spend a bit more time lurking there before feeling like wading in.
I’m a little confused about the Frequent tab, though. Are those frequently asked questions? If so, how do they get tagged that way?
If you’ve put in effort and demonstrate that (reproducible example, example of what you want to achieve, answer isn’t the first google hit for reasonable search terms) you’ll be fine. If you don’t see the obscure duplicate on SO and someone closes your Q don’t worry about it. See if the duplicate really answers your question and if not, edit your own Q to reflect the differences as you perceive them.
I think this perception arises because some people do get given a hard time on SO. They’re most often repeat offenders, users that don’t follow up comments asking for clarification etc. The regular users spot these people an their continued abuse of the SO site and people’s time often results in snarky comments when handling their mess. New users very rarely get treated the same way, because for the most part we’re genuinely nice folk over on the SO r tag 🙂
The Frequent tab is generated by number of links to a question. So the How to create a great reproducible example question is at the top because we link to it all the time when people don’t provide an example that illustrates the problem. I don’t think you can tag a question such that it appears in that list.
We do curate an [r-faq] tag: http://stackoverflow.com/questions/tagged/r-faq which the community of R users on SO maintains as a list of what we see as being the most FAQs.
Thanks for the explanation! I can see how there could be additional backstory to some of the curt replies (especially in cases where people are abusing the site).
I’ll echo @jebyrnes’ student’s recommendation of Hadley Wickham’s Advanced R book. And the online version is great, but I really recommend the hard copy. I just finished reading it cover to cover, and it is a surprisingly easy read for a programming book. It builds on itself very well, and was full of “ah-ha” moments for me where I found many opportunities to improve my coding practices.
We’ve been doing similar R workshops in my lab. Although we’re only a few weeks in, I’ve been trying to keep an annotated R script for each session, so that when new students come along in the future, they should be able to use those as tutorials to at least get the basics down for common analyses we do in the lab.
A few thoughts from someone whose work relies heavily on programming, in R and other languages:
1) Google, google, google. A friend of mine who is a pro software engineer (Microsoft, a startup, Google, and now Facebook) once recorded how many times he googled for help in a day, and came up with something like 150 times. My advisor wonders how he ever learned to code without SO. There’s no shame!
2) If something seems impossible, or you can’t figure it out, take a step back and see if you can re-phrase your question or re-conceive your problem. You are probably not the first to attempt this data analysis, and so the trick–which gets easier with practice–is to figure out the terms other people are using for your problem.
3) The plyr, reshape, and ggplot packages are really worth learning. I’m convinced they have saved me thousands of lines of code over the past few years, not to mention hours of effort and bugs. They also allow/force you to think about your analysis at a much higher level than you might otherwise.
4) I get the sense a lot of ecologists learn R at first as a set of routines and only later as a programming language. “Learning R” is one skill, and “learning to program” is a related, but more general skill. Just a little bit of general computer science knowledge can make things a lot clearer. Some good things to have a cursory knowledge of: data types (e.g. integers, floats, characters), data structures (e.g. vectors, arrays, lists), classes/methods, and functions.
5) Learn another language–really! Learning a second programming language way easier than learning the first, and it will make you a better programmer. Even if you just take a couple of hours to work through an online tutorial and never use it again. I’d suggest Python for anyone, Julia for the adventurous, and Matlab for those interested in paying through the nose for an obsolete language (Ha ha, only serious!).
So many excellent points here. In previous discussions here and elsewhere, I usually see the problem of teaching statistics with R as a trade-off between teaching statistics and teaching R. I actually think its a three-way competition: 1) teaching statistics, 2) teaching algorithmic thinking (necessary for programming), and 3) teaching the specifics of R scripting.
And then, as Brian and I discussed on one of his earlier posts, there’s also a need to train in experimental design (or study design, more broadly), which is sometimes rolled in with a course on statistics and sometimes not. My impression is that those experimental design courses have fallen out of favor and been replaced by programming based courses, but I don’t have data on that.
that is a really good point
For several years in grad school, my husband ran a weekly “R group.” He started it because, as our department’s R expert, he’d regularly get emails asking for his help with the same types of problems, over and over again. He realized he’d spend less time repeating himself if he spent an hour a week leading an R help group, and forced people who wanted his help to come to that first. People trying to learn R or using R regularly came. We’d ask questions, share newfound R knowledge, and occasionally tackle research problems together to learn new skills. We started by going around the room and sharing what we each had done in R that week – often we’d all learn something new and useful during this time. This sort of thing requires someone willing to lead it, but that person need not be a Bolker-level R genius. Our group became a place for users of R to swap tips, advice, and help. You quickly learn that there’s rarely one right way to do anything when programming, and so we all learned new functions and ways to approach the language from each other. I can’t recommend this sort of weekly help session enough. (Also, it was as a result of his R group that my husband and I first started dating. R can be romantic!)
I used to go to one of these too, and I’ll second this. It was really, really helpful…even if I didn’t find a spouse there…
It’s great to hear that the lounge approach can work (based on Noam’s comment below, too)!
+1 to the “lounge” idea, and @dinoverm’s suggestion of fixed meeting times. This is the model we adopted at Davis with our users’ group. We also supplement the meeting times with a local listserv run through Google Groups, which works like R-help or Stack Overflow, but since it’s smaller it doesn’t need such stringent standards for posting and can be more welcoming/accessible to new users.
I wrote up a bit about the users’ group model here: http://software-carpentry.org/blog/2014/11/users-groups-for-ongoing-learning.html
+1 for tutorials and one-credit graduate courses specifically on programming!
I think it is so, so important to start from the start. It’s tempting to jump right into manipulating your own data, doing analyses, or making publication-quality plots, but each of these is a far more complicated task than they seem on the surface. I think moving too fast toward an advanced application makes it much harder to approach future coding problems more generally, and it makes debugging a nightmare.
Much better, in my opinion, to first spend time learning about common data structures and how basic functions work (mantra: what arguments does it take in?, what does it do with those arguments?, what one thing does it return?). Everything builds on these fundamentals and I think the best tutorials spend time introducing these concepts first.
I learned best by seeking out R puzzles: How can I recreate that figure? How can I slice these data differently? I talked to a lot of other graduate students in my cohort when I was learning to see what kinds of basic problems they were trying to solve, and then tried to solve them also.
I liked the idea of giving a set of those most-helpful puzzles to new students as a motivating example.
I just wanted to say that I love the idea of downloading Ben Bolker’s brain. Preferably in the form of an R package: bbbrain. 🙂
The only obstacle I can see is that the package’s dependencies would include “Ben Bolker’s body”, and as far as I’m aware there’s no cran mirror for three-dimensional objects. 🙂
Ben Bolker’s Brain has been scattered into bits
Among SO, R-sig-ME and a thousand different lists
Now I’ve asked him all my questions, and I hope my model fits.
His code is marching on.
(With apologies to Ben, and the many authors of “John Brown’s Body”)
Wait, are you indicating that Ben’s brain isn’t 3D?
And, Noam: 🙂
“Wait, are you indicating that Ben’s brain isn’t 3D?”
Touche’. I meant the content of Ben’s brain.
I’m now envisioning redubbing that scene in The Matrix when Neo first learns kung fu etc. Except instead of martial arts, he learns lme4. “I’m going to learn…mixed models?” 🙂
@ Noam: I thought I had the thread won, but I bow to your Ben Bolker’s brain joke superiority. 🙂
Not yet mentioned:
1. Maintain a reference card for yourself. Every time you figure out one of those weird and puzzling things, take the extra 30 seconds to make quick notes about the procedure on your reference card. I keep mine in a word-processing document. Much easier than trying to hunt down that one script…
2. I find myself referring back to _Data Manipulation With R_ (Phil Spector) frequently, so I recommend a hard copy.
3. I toyed around with the idea of running a “case studies” series at the university I just left, to help with medium-large challenges. The idea is to get a users’ group going (university-wide R-specific listserv), then have people submit specific case studies based on what they want to learn, summarize and present the case studies to the list to see if anyone can self-identify as an expert (and gauge interest in different topics), then have the expert give a short (1-hour) presentation on the outcome. My thinking was to make this a once-a-month series so it wouldn’t be overly burdensome, but would encourage continued skill-sharing and development. I think this works best in an institution where there’s an ongoing introductory workshop helping to get new users on board.
Its already said, but post on Stack Overflow when you are really stuck and google cannot answer your question! Nothing more satisfying than getting your question upvoted because you did ask a good question, and its a unique problem. I also made my fair share of dum-dum posts, and people yell at you, and you learn how to rephrase your post. No big deal. Just nerds correcting other nerds on the internet.
One thing that I have yet to see mentioned is simply to __read other people’s code__. Yes, there is the concept that hell is other people’s code and R is no exception to the rule, but R has so many functions that it’s nearly impossible to know them all. Looking into code from packages to see how they run a procedure can really help your understanding. A great resource for looking through code is the [CRAN](https://github.com/cran) repository.
Something pointed out by @noamross is to search across github for how certain functions are used in R:
we have an internal R users LISTSERV at UF (which may have even been started by Ben before he ABANDONED US….yeah, I went there!). It is very n00b friendly, responses come quickly, and all reciprocate when they can – all of which might be byproducts of the fact that users might be your students, your professors, or people you run into at seminar or committee meetings and we all need help sometimes (or often). I tend to go there after exhausting Google butbefore posting to SO.
(lucky for us Ben seems to have lost the instructions to unsubscribe, because he still answers questions once in a while.)
Do you troll Ben about the weather, too? 😉
I would if he woukd take the bait!
As someone with a degree and professional experience in computer science, I want it on the Dynamic Ecology record that R is an awful, awful programming language. But I use it. Just because it’s awful, doesn’t mean it’s not useful.
But it really is terrible. It’s hacked together by everyone and their cousin, with no standard syntax or naming conventions. Things that should be easy to do are hard and things that should be impossible to do (so you don’t accidentally shoot yourself in the foot) can be done fairly easily. Sometimes there are 20 functions to do a single thing spread across as many packages. Sometimes there is something that every other language can do simply that have exactly 0 functions in R and the only way to do it is to write several lines of code to do it. Things that no other language would consider (such as using “.” as a regular character, or assigning a value to something that should be a function (colnames(x) <- c("a","b") comes to mind)) are done in R, which means that people who learn R first are getting into bad programming habits and thinking that they are bad a programming.
What this means is that if you have trouble learning R — or remembering things in R — it is not you that is the problem; it is R. R is awful to learn and awful to remember. If you learn R first and think to yourself, "I'm no good at programming," stop! And remember, it's not you, it's R.
As I said, though, I do use R. It's free. It's open. It does some simple stats stuff quickly and easily. And "everyone" in ecology uses it.
I highly recommend "aRrgh: a newcomer's (angry) guide to R" to anyone who learned another programming language first and is trying to learn R. It explains all the vagaries of the basic R framework, and includes such tips as "Sequence indexing is base-one. Accessing the zeroth element does not give an error but is never useful."
I am a big believer in code reuse, so I heavily comment my R code (although the comments sometimes contain expletives because R is just awful) and reuse code as much as I can. I make sure to always use scripts and never code directly into the interpreter.
I use StackOverflow a lot, usually by searching in Google for what I want to do. I do this both to learn how to do new things, and also (and more frequently) to remember the syntax that a particular package/function likes best. Meg, please do try to post questions there. You can make an anonymous account. The worst that happens is that someone tags it as a duplicate and links to the duplicate question; you get your answer just as quick. I learned to program before StackOverflow existed, and it is perhaps one of the most wonderful things on the internet. I don't post questions often, as I usually find my answer has already been asked, but for the few I have posted, I've gotten invaluable advice. Protip: start off your question with "This is my first question on StackOverflow and I've searched the archives for a half-hour, but haven't found the answer to my question." Anyone who treats you poorly will be chastised by one of the majority of StackOverflow users for beating up a newbie. Instead you should get some kind tips on how to better write your question. (This happened to me my first question.)
My general rule of thumb is to work on a programming problem for about 20 minutes. If I can't make headway in that time (including Googling the Internet), I need to ask someone for help. StackOverflow, my favorite R guru du jour, whatever. It's good for productivity.
My go-to documentation for graphing in R has been Quick-R:
It looks like they have nice documentation/tutorials for other parts of basic R, as well.
But like I mentioned yesterday, I'm leaving R's plot() and ggplot2 and eloping with Python's matplotlib library. Because it's easy and elegant and very flexible and doesn't make me want to stick a hot poker in my eye every time I make a graph.
P.S. Today I learned that in the package 'arm' (for hierarchical modeling), they've decided not to use the $ character to sub-reference, but rather the @ character. They *used* to use $. But the package creators decided to change it for some reason (and break any previously written code written with the package). I'm going to add to my reasons why R is awful: the code that ran fine yesterday may no longer run today.
I’m printing this out and giving a copy to everyone in the lab. The 20 minute rule is great advice I am going to start taking as of tomorrow morning.
Yes, I also like the 20 minute rule! I need to adopt it (I say after spending way too long this morning trying to figure out some things in R).
This is patently not true; the only people that can contribute to R are the 20 odd people that were or are part of R Core. If, by this comment you are referring to CRAN, well that’s what you get when you let anyone write R code. You’ll find the same varying quality in python libraries or Matlab packages for example where these are created at will by a user community.
That said, yes, R has its inconsistencies. Naming conventions for functions are a mess, such that we have seq_along(),
saveRDS()for example. There are inconsistencies in commonly-used functions like the
applyfamily. Some of these are being addressed by the user community, with plyr, etc. Some are historical artefacts arising from the original S language and how it developed, and the legacy of S-PLUS.
Why you hate replacement functions (
colnames(x) <- c("a","b")
)? I love these things. I suppose a computer scientist would have some
Starting indexing at zero is stupid. There, I said it ;-) No-one, except proper programmers thinks like. Most people using R aren't programmers. If I want the first row of my object, I don't want to have to think "ah now, I need to subtract 1 from this to get the thing I want". Spreadsheets don't start at row 0. Data analysis languages shouldn't either.
Is R a nice general purpose language from a computer science point of view - no, I doubt it. Is R code a succinct and natural way to work with and analyse data? I think so; it is very expressive for that sort of work. You computer science types can keep your Java's; I'll stick with R (and some Python) for data analysis thank you very much ;-)
@is for accessing the slots of an S4-classed object. I can only assume they've moved from using S3 classes to using S4 ones as that allowed them greater control over method dispatch. You should be shouting at them for not providing accessor or extractor functions so you didn't need to know which
$foocomponent to extract to get what you wanted. (Or perhaps shouting at yourself for not using their accessor/extractor functions such that you got bitten when they changed the internal representation of the returned objects 🙂 [I have no idea whether arm had any of these extractor functions so the blame may reside fully with the developers.] By the way, they've been using S4 methods (for at least some of the package) since 2009. There is no indication in the Changelog that they suddenly switched to S4.
As for the last line of that PS: This is a problem not just for R.
I knew I’d get some R defenders in reply. 🙂 True enough about core R being a small group of people. I meant “R broadly” in the sense that ecologists use it, i.e. including the various packages. Core R isn’t terribly useful except for very simple things. And ecological analyses are rarely simple. (But shame on the core R group for not doing a better job to document/enforce syntax conventions for package creators.)
> colnames(x) Starting indexing at zero is stupid.
Hmm…. I’m willing to concede that one-indexing might be tolerable for data analysis, as you’ve made some good points. But it’s inexcusable not to throw an out-of-bounds error when a user tries to index the zeroth element. (And really, since every other language an ecologist is likely to learn indexes starting at zero, it sets up bad habits.)
> Is R code a succinct and natural way to work with and analyse data?
I think not. I have to look up syntax every time I use R. That doesn’t feel very “natural” to me. Likewise, there are some succinct ways to do things in R, but sometimes the syntax is so contorted, I write something longer just so I know I’ll read it correctly when I come back to it in six months.
> As for the last line of that PS: This is a problem not just for R.
Fair enough. But usually there’s reasonable centralized documentation when things aren’t backwards compatible in other (common) languages.
I agree that if someone wants to learn programming, R is not the language to learn first (nor is Haskell, for that matter). A better start would be python. For people like ecologists or population geneticists who simply want analysis done and don’t want to have to use 10 different standalone programs, R is a wonderful resource. Basically, while the argument that R is a terrible programming language in the realm of programming languages has validity, it is a non-sequitur regarding the topic of switching from SAS to R.
+1 – I got a major chuckle out of this.
My conversation with a student literally just yesterday.
Her: It keeps reading my temperature column as a factor and as.numeric doesn’t convert it back (summary of a long conversation debugging her problem).
Me: I hate R. The default behaviors are terrible.
Her: Then why do you teach something you hate?
Me: Because everybody else uses it.
I still don’t use R for my own research. I have used Matlab and am working on switching over to Python.
And the lack of backwards compatibility (and the seeming total lack of concern for this among the developer community) is a major pain. Every time I teach my stats/R course I have to rerun all my code because a good 10% of it is broken. And its not just the packages. Its the core too. R people act like its natural. But I can run 10 year old Matlab code without a problem (or in the very rare instances it is I get a warning message about usign something deprecated). And its not because Matlab hasn’t innovated at a high rate. There are some very simple approaches to ensure upward compatibility. The majority of the R community just doesn’t care about it.
I have little doubt that ecologists (and most academics) have gotten sucked into a local optimum that is well below the global optimum. I don’t know what it will take to get out (decades I fear).
At the end of this rant it is probably important to note that there are elegant corners of R (vegan and Raster and Wickham’s packages being the only ones I’ve had the pleasure to encounter).
Oh and some of Ben Bolker’s stuff too (bbmle, lmer)
I think the data.table package belongs among the elegant corner of R.
I find it really useful, just check out the awesome fread function. Or how about the nice feature that calling your data.table in the console will only print the top and bottom few lines (instead of filling your console with stuff like a data.frame is doing). Or the fact that you now can rename columns without the whole object being copied internally. Or how about… Yes, I really like the data.table package – check it out 🙂
@ Brian – as.numeric(as.character(data.frame$factorColumn)) works in a pinch, since as.character() replaces the levels in a factor with their values first.
Maybe it was mentioned before. But also a great source of help is R-sig-ecology Mailing list (https://stat.ethz.ch/mailman/listinfo/r-sig-ecology). Especially, when you are a community ecologist and use the package vegan since Jari Oksanen and Gavin Simpson are very active there
+1 Jari Oksanen, Gavin Simpson, Bob O’Hara (& others that don’t spring to mind) are very active and, most importantly, very patient with n00Bz 😉
Pingback: Friday links: sharks and pythons and unicorns, oh my! | Dynamic Ecology
These are all really great suggestions! Though, of course, emblematic of the problem with R — there’s 90 million different ways to do anything, but people become very loyal to “their way,” and, while fascinating from a cultural diffusion standpoint, it makes it very hard for a novice to decode what’s out there!
My department tried an “R lounge” type set-up a few years ago, aimed at grad students but open to anyone, and it ended up being just one or two people sorting out everyone else’s problems, which was extremely annoying for all involved. What does work well in our department, though, is a set of weeklong R stats courses — one for total beginners to R, then a few weeks later one for people who kind of get R, or kind of get stats, but don’t really have the confidence to push forward with anything particularly complicated. Not only do these provide practical training in R and in more general “how to use helpfiles,” but they allow the attendees to form small groups of R users who can then help each other after the workshops are over.
Two suggestions from the previous comments that I want to highlight: reproducible code and learning to program. If you’re going to ask me for help, include some dummy data (or, heck, your real data), so that I can run the thing for myself and see exactly what’s going on. Next, and this applies more to students than to faculty, but, if you can, learn a programming language. Doesn’t really matter which one. Doesn’t have to be anything fancy. Just learn it. I think the reason why so many people hate R stems from the fact that “learning R” often means “simultaneously learning stats, how to think like a computer, and how to use R itself,” which is just far too much for most people to handle. I was a geeky programmer kid, learned R as a programming language for an undergraduate applied maths course, and then only when I started doing biological research did I realize that you could do *stats* in R (at which point my stats was already pretty solid, having come from a maths background). I think this was an ideal way to be introduced to R, and I feel so sorry for my undergrads now, who don’t know why a dataframe is, don’t know what a t-test is, and yet are somehow expected to write and understand beautiful code on their first try.
Meg, we started doing code review as one of our normal lab meeting formats. A student or I will post ~100 lines of code and each of us (but not the presenter) will decipher what each line does. We do this for about 45 min and see where we get. It’s turned out to be a great way to share methods, see what others are doing, and confirm that people are doing things well. It’s very simple and I’m shocked at how helpful people have found it so far.
Out of curiousity do you use code that uses only existing R-functions or also self-written loops/functions/etc. as well?
My old institution just created an R mail list where anyone subscribed can ask questions, share new packages and methods, etc. It’s only been working for a few days, so I cannot tell you if it will be useful, but it seems promising, kind of the ‘lounge’ idea but without needing to be physically there. That’s maybe something to do more at the department level than at the lab level.
I teach an introductory biostats course using R. Most of my students have never used R before and are quite intimidated by the code. I have found that practice is the best solution – I give them weekly problems to build confidence and skill.
For myself, practice is also important. If I get away from R for a couple of weeks or more, I find I get rusty. When I need to learn new things I talk to my students. They are generally more familiar with the latest packages and trade a lot of knowledge among themselves. Asking them how they would do “x” generally leads to slick new packages I was not aware of.
The best recent suggestion has been Inkscape https://inkscape.org/en/. While not an R package, it is a great image editor that is just what is needed to complete the last 5% of publication-quality graphics formatting.
So how I learned R?
I remeber that my experience with R started a cold January, sitting alone in a dark hotelroom in Uppsala and being extremely frustrated over just getting my data into R!
Oh man, it feels like a long time ago 🙂
At that time it was pretty much trial and error (and google of course!)
Pretty quickly I found the QuickR page, which helped me a lot. I then moved towards using books quite a bit. Particularly the Zuur et al books were useful, as were Hadley Wickhams ggplot2 book and Bivand et al’s ASDAR book.
What I would recommend who are completely novices?
Already lots of good stuff mentioned in previous comments, I’d give +1 for:
A basic R course of some sort (would have saved me lots time!)
R-SIG-stuff (I used to skim the daily digest of R-SIG-GEO which was really helpful)
Books (Zuur et al, ggplot2, Bivand et al’s, Wickham’s Adv R)
R-Bloggers (just skimming the headlines and reading whatever looks interesting)
Vignettes (when available, I find these to be a very good way to learn new stuff)
Read other peoples code (mostly functions in packages)
One thing I haven’t seen mentioned in the comments (or I missed it, sorry) is the importance
of a proper editor. This I think is very important. I sometimes see people new to R using
the editor that’s shipping with R! On win this is a disaster – it is not helping you in any way,
no syntax highlight, no function hints, no nothing! So that should be a first step for anyone
new to R – get yourself a decent editor. I always recommend R-Studio because it has a lot of
helpful features for R programming. Personally I use Sublime Text for almost all of my coding, but still use R-studio when compiling R-Markdowm documents. Taking some time to actully learn to use the editor you end up with, can save you an awful lot of time.
PS, the weird formatting of that last section was unintended. Is there a way to edit your comment?
Related to ecologists learning R and maybe of interest to folks on this thread – Gavin Simpson, Noam Ross, Andrew Tredennick, Andrew MacDonald, Leah Wasser, and I are all teaching one or more R workshops at ESA in Baltimore this year (there are at least 5 that I know about scheduled for the Saturday and Sunday before the conference), including some start-at-zero introductions and some more advanced topics (Gavin’s leading one on advanced vegan, for example). Might be helpful for those new folks in your lab, Meg–if they can wait that long 🙂
Pingback: How do you make figures? | Dynamic Ecology
Pingback: Por que vale a pena usar o R? – Sobrevivendo na Ciência