UPDATE: I’ve modified the title of this post a bit to clarify what I was really thinking when I wrote it. What I was really thinking was which programming language to choose to teach some fellow researchers how to get into the absolute basics of programming, out of the very limited set of languages I know. The tasks they need to do need only a minimal understanding of programming, and of R, so many of the issues that can be experienced won’t even come up for them. To put things into context, it only took two days for me to work out how to do everything that I need to do in R going from scratch, so it’s not as if I’m writing packages or doing anything particularly fancy myself, and these people who I will be teaching will be doing stuff that is less complicated than what I needed to do.
That being said, I’d like to thank those who commented for pointing out why R isn’t a great language for people starting out with programming. I’m still new to using R, so obviously don’t have the depth of experience with potential problems that others do, so it’s helpful to learn from others’ experience (or should that be “misery”?)! Python, Pascal and Ruby all sound like great options for getting into programming. I’m going to leave my initial post, with all it’s inaccuracies, intact below: first, because I think it’s good to have a record of what I have said so I can look back at how daft I was in the future, and second because, as people took the time to post comments, I don’t want the time they spent making comments and correcting me to have gone to waste. If I deleted most of what I said or removed the post, then their comments would seem odd or incorrect.
—
I’m helping out some colleagues learn programming from having zero experience with it in any shape or form. It’s quite a daunting task in some senses, because, well, it may not be easy! They are researchers, so they’ll need it for processing data and generating output, and perhaps processing BIG DATA at some point too.
After some debate about the best way to go ahead, I’ve settled with R as being my weapon of choice to train these lucky individuals. The choices were as follows – note that I don’t know that many programming languages, so it’s not a huge list. I thought it would be worth sharing the pros and cons of each.
Pros: Dead easy to use. Nice and easy integration with databases which can be used to deal with data processing. Can be extended to, for example, generate images (a plus for these people who study visual cognition, so often need to make pretty pictures to show to participants in experiments). There’s also an immense number of tutorials and guides on the net, and people who aren’t into research can help you out just by knowing their PHP.
Cons: Probably overkill. Running a webserver all the time can be a pain, even if XAMPP is used. It’s not easy (or even possible, as far as I am aware) to run statistical tests using PHP or any classes that can be added in.
Pros: Forces users to write clean code, and again it’s very easy to use. Possible to integrate with databases to churn through datasets. Like PHP, it can be used to generate images for use in experiments (pygame), and again there are plenty of examples and tutorials. Plenty of extensions to do stats and plot graphs (NumPy and Matplotlib). Oh, and it’s named after Monty Python. Ni.
Cons: again, probably overkill. Forcing people to worry about indentation can get horribly confusing when they are barely aware of what they are doing, and they can get tripped up. Just a personal issue I guess, but I’ve not quite managed to get to grips with OOP in python. Maybe that’s because I did it first in PHP and never could do more than crash my computer when trying to learn Java. Ho hum.
Javascript
Pros: Easy syntax, and its power is growing with the new HTML 5 specifications. I mention it because I recently saw this illustration of basic programming and it seemed worth considering. There’s no need to compile anything which is often good for beginners too.
Cons: not really intended for churning big datasets and the kind of things I have in mind. Quite a bit of the decent libraries out there need to be paid for to be used.
Pros: syntax is very simple, with few gotchas present in other languages (e.g., ending lines with a semicolon or forcing tabs in lines and so on). As it’s loosely typed, this can be both a blessing and a curse. It’s a blessing because users don’t have to worry about declaring variables. It’s a curse because they can slip into bad habits and not understand variable types properly. Oh, and I don’t need to say that it can work on all sorts of databases, churn through data very rapidly, generate images, run statistical tests and plot graphs that are of publication quality.
Cons: Had to really think about this, but I guess that R is a nightmare to google for any kind of help when you’re stuck. I think it’s a fundamental issue relating to the fact that calling something a letter of the alphabet probably doesn’t help SEO rankings all that much. The official documentation would benefit from being a bit more like the PHP documentation (though maybe there is a site like that for R, I’ve just not found it), with users able to comment and give better examples than those provided initially. That being said, there are more blogs on R than you can shake even a very large proverbial stick at, which more than make up for it. I always search the legendary R-bloggers.com search box before googling anything to do with R now. I’ve never had to look any further than that.
Is R an ideal language to teach the fundamentals of programming to beginners?
I think the answer is “yes”. The beginners I have in mind are researchers and have specific needs regarding data processing, and it would benefit them to learn how to run stats in R, opening up future possibilities as well (e.g., LMEs). I’ve not mentioned Matlab, which I know is a favourite for researchers, because (1) it’s a gigantic monster to download and install, (2) I don’t know it that well and (3) it’s prohibitively expensive. I was also tempted to evaluate the use of LOLCODE to see if there was any mileage in using it (“IM IN YR LOOP UPPIN YR VAR TIL BOTH SAEM VAR”).
I myself first dabbled in programming back when I had a Sinclair back in the old days, and we did some very basic BASIC at primary school. Later on, I used BASIC to make emulators that mimicked my friends’ phrases and behaviour. Some of them were spot on! I guess I’ve always been trying to model human behaviour. I’ll post up the material I use to teach my colleagues to help them out and have a permanent copy of the material we go through.
That’s it for now, please feel free to share any other languages you may have found to be good for beginners. I’m sure there are some things that I have missed.
R community at the http://stackoverflow.com/questions/tagged/r is pretty alive and well. Don’t forget to stop by R chatroom.
I use R daily, it’s terrific for statistical analysis, but the ideal language for beginners? You have got to be kidding.
The syntax is not simple; it’s downright ugly. The error messages and inbuilt documentation are terse, obtuse and unhelpful. And whereas other languages share many features which are quite obvious with a little inspection (e.g. arrays, strings, control, loops), R stands apart as quite different. “Everything is a vector” is not intuitive. On top of that, other languages are general-purpose; R is primarily for doing statistics.
I don’t want to get into “language X over language Y” arguments but many people find Ruby easy (and fun) to learn. Think Python without the boring, rigid attitude.
As much as I like R, I have to question your judgement. Here are three reasons why R is not suited for beginners. First, its syntax is not overly intuitive, and not a good template to learn other language. It is a functional language without the purity of LISP. Its object-oriented features are a mess, as proven by the fact there there are at least 4 OOP frameworks (5 if you count reference classes) and some of them are still evolving. I can’t think of a worse introduction to OOP (maybe MATLAB). Second, the language has plenty of gotchas. Check out “R Inferno” for a list of some of them. Third, it’s a domain-specific language. Sure if your colleagues need to do exclusively data analysis, then R is great, but if they need to branch out/scale up, then they’re better off with something else.
I believe that the best beginner (and a great advanced language as well) is Python. So think quite a few schools that are teaching introductory classes in Python; that includes MIT, which replace Scheme with Python. It has clean, intuitive syntax, a huge community, great documentation, major corporate backing. One can learn Python bit by bit, and indentation is a help for beginners, by the way.
Hi Neil,
You make a good point! I don’t know Ruby so can’t comment on it. Is it worth getting into? My only knowledge of it is in use with web apps.
I think I made a mistake in explaining the purpose of my post – will correct it now. What I meant to suggest was whether it was an ideal starting language for researchers to get into, rather than for everyone. More than that, I was thinking mostly that it would be easy to illustrate the absolute basics – i.e., loops, variables and conditionals using R. I’ll modify my post now.
Thanks for the comment! I agree with you, and have attempted to clarify my points by updating the post a bit. Python is definitely an awesome choice, and I would have gone for that as a second option – I use it really often. The colleagues I’m aiming at are pretty much only going to be using their programming skills for data analysis, so it does seems sensible to stick with R to get the basics through to them!
Realised I should have hit “reply” to your comment, but the site appears to be being sluggish at the moment…!
I agree with @gappy on this, but not because there’s anything inherently wrong with the S language, but because the implementation details require too much awareness of what’s going on under the hood. (And why the thing you think should work happens not to, and can’t be fixed at the risk of breaking the scripts of thousands of statisticians.) This said, the underlying S language, that’s based on Scheme and uses Fortran’s vectors as the basic unit of data, and that includes attributes that can label anything, would actually make a pretty good first language for people interested in statistical programming. But the fact that libraries are half in C and Fortran, and the weird quirks, make it hard to argue for the existing R platform/implementation being a great start, unfortunately. I’d have to go with Python or Ruby first, R second, for people interesting in a data-scientist or similar career path.
You forgot Pascal. It’s an excellent beginner’s
language and feeds into R and C. Pascal teaches arrays in a simpler
fashion than R and C, doesn’t have the dataframe/zoo complexity better
saved for advanced users, and allows newbie development of numerical
computation skill sets. Ideally they could learn how to Sum, Diff, and
basic stats before jumping into R.
R is not even on my list of languages that I would use to teach programming, because it is not even a little bit like the other programming languages. Teach R if your audience is going to use R, needs R, and will never have to ‘program’ anything else. Do not teach R if it’s actually about programming.
I also agree. I am using both Python and R, and for beginner Python is far better than R. The vectorised operations of R will be confusing to many beginners.
The big question is probably how much programming skills those “beginners” need. If that’s not much and all they will ever do is cookbook style analysis, R might be enough, so you can start with that. If not, start with Python and then make them learn R. Python is designed a lot more coherently, and there is good support for statistics and math in Python, even up to computer aided algebra and if all else fails theres RPy2…. Python has been repeatedly been hailed as the easiest computer language to learn, one of the most productive, and it is very well accepted in science an web programming.