R Moves Up From #9 to #6, But What Does It Mean to Really be Proficient in a Language?

There's been a lot of noise in the data science community this past week about IEEE Spectrum's 2015 language rankings, where R moved up three notches from #9 in 2014 to #6 in 2015. The Spectrum post gives some lip service to needing to know a domain in addition to just the language itself. But here I drill down into what it means to really know a language.

  • API for the standard library. I was first introduced to this concept about 18 years ago in my transition from C++ when I was interviewing for jobs that used Java. In one interview, I said Java was easy step from C++, and the interviewer responded, "But it's know all of the API that's the bigger step." So while polyglot programmers may be familiar with structured, object-oriented, and maybe even functional and object-functional languages, really knowing a language means also knowing its standard library well.
  • Third party libraries. If you don't know the landscape of available third-party libraries, and actually know how to use a few of them, you'll be at a disadvantage compared to someone with actual experience in the language.
  • Process tools. Every programming language has its preferred process tools: build, deploy, revision control, bug tracking, unit testing, etc. Without experience in those preferred tools, you'll need a lot of hand-holding at first.
  • Deployment/Systems Engineering. Although I listed deploy tools as a process tool above, I decided to break out deployment as its own bullet here because it is tied in with systems engineering, which is its own area of expertise. Each programming language has its own expected systems engineering expertise. For C, embedded engineering. For Java, web/application servers. For IPython Notebook/Jupyter or R, it's doubly complicated because there are two distinct ways such a programmer's work is published:
    • Publishing reports. This involves more soft skills, "telling a story", targeting the audience more than hard systems engineering.
    • Online processing. Typically this involves Big Data engineering, and possibly involves complex event processing (CEP)/streaming data engineering. While we're beyond the webmaster phase for data scientists, such that a data scientist is now just a team member of an overall data team that also includes e.g. Big Data engineers, data scientists still need to be aware of the overall principles of Big Data or CEP.
  • Domain technical knowledge. For C/embedded, it might be SPI and I2C communication protocols. For Java, it might be server load balancing. For R, obviously it's all of data science, which is vast: data cleansing, ETL, regression, statistics, probability distributions, machine learning, visualization.
  • Idioms & patterns. For object-oriented languages like C++ and Java, there are the well-known object patterns. For functional programming languages like Scala, there are patterns and idioms specific to functional programming in general, and to each functional programming language specifically. For R, there are fewer idioms & patterns, but knowing these is still important when it comes to being able to read example and library source code.
  • Where to get info. Not just documentation, but also where to get help, and perhaps most important of all, how to stay current. With technology moving so fast, knowing yesterday's knowledge is only a fraction as helpful as knowing today's.

So, do you really know R?