Python versus R: Who is choosing what?
This is a obviously a somewhat controversial topic – one which has been covered quite extensively, both on LinkedIn and elsewhere. But I want to tackle this subject in a slightly different way.
Instead of looking at each language in terms of what they can do and what their strengths are, I want to explore what is actually happening out there in the real world. How many of each are downloaded each day? What are the numbers of contributors to online forums? What are the numbers of projects submitted in online data science competitions in each language? The figures should speak for themselves, should they not?
So I have done a bit of digging around and came across a number of websites that track these things.
The first is the Tiobe (The Importance of Being Earnest) Index. This is an indicator of the relative popularity of programming languages, and it tracks quite a few of them across a number of channels, including job postings and search queries.
Here is the top 20 list for May 2017 compared to May 2016:
I’ve highlighted Python and R, which appear to have both increased their relative positions over the past year, and whilst Python is 10 points higher than R, they are separated by less than 1.5%.
It is probably also prudent to see how each language has tracked over time, to gain a greater insight into the trends. This can be see below in the extracts I’ve pulled from the Tiobe website.
For Python:
Highest Position (since 2001): #4 in May 2017
Lowest Position (since 2001): #13 in Feb 2003
Language of the Year: 2007, 2010
For R:
Highest Position (since 2001): #12 in May 2015
Lowest Position (since 2001): #73 in Dec 2008
It seems that Python has come to a kind of plateau, whereas R appears to be rapidly gaining in popularity – something to keep in mind.
We can also turn to the Redmonk statistics which plot the number of GitHub downloads versus mentions in StackOverflow. This gives a relatively good point-in-time view of the activity of the two languages. I have have reproduced the chart for the first quarter of 2017, and it seems to pretty much align with the Tiobe index.
As you can see, they are two of the most downloaded and talked about languages out there!
Finally, I’ve reproduced the latest chart published on PYPL – The Popularity of Programming Language Index below, which closely aligns with the other measurement websites. PYPL bases its ratings on Google search trends data.
The Data Science competition hosting platform, Kaggle, have mentioned that they see a pretty even spread of R and Python entries, which appears to reflect the findings above.
So it appears that both Python and R are up there. Whichever language you choose, get good at it and join the data science community. It’s an awesome adventure!