What is the best programming language for Machine Learning?

I’m new in data science, what language should I learn? What’s the best language for machine learning?
There’s an abundance of articles attempting to answer these questions, either based on personal experience or on job offer data. Τhere’s so much more activity in machine learning than job offers in the West can describe, however, and peer opinions are of course very valuable but often conflicting and as such may confuse the novices. We turned instead to our hard data from 2,000+ data scientists and machine learning developers who responded to our latest survey about which languages they use and what projects they’re working on — along with many other interesting things about their machine learning activities and training. Then, being data scientists ourselves, we couldn’t help but run a few models to see which are the most important factors that are correlated to language selection. We compared the top-5 languages and the results prove that there is no simple answer to the “which language?” question. It depends on what you’re trying to build, what your background is and why you got involved in machine learning in the first place.

Which machine learning language is the most popular overall?

First, let’s look at the overall popularity of machine learning languages. Python leads the pack, with 57% of data scientists and machine learning developers using it and 33% prioritising it for development. Little wonder, given all the evolution in the deep learning Python frameworks over the past 2 years, including the release of TensorFlow and a wide selection of other libraries. Python is often compared to R, but they are nowhere near comparable in terms of popularity: R comes fourth in overall usage (31%) and fifth in prioritisation (5%). R is in fact the language with the lowest prioritisation-to-usage ratio among the five, with only 17% of developers who use it prioritising it. This means that in most cases R is a complementary language, not a first choice. The same ratio for Python is at 58%, the highest by far among the five languages, a clear indication that the usage trends of Python are the exact opposite to those of R. Not only is Python the most widely used language, it is also the primary choice for the majority of its users. C/C++ is a distant second to Python, both in usage (44%) and prioritisation (19%). Java follows C/C++ very closely, while JavaScript comes fifth in usage, although with a slightly better prioritisation performance than R (7%). We asked our respondents about other languages used in machine learning, including the usual suspects of Julia, Scala, Ruby, Octave, MATLAB and SAS, but they all fall below the 5% mark of prioritisation and below 26% of usage. We therefore focused our attention on the top-5 languages.

Python is prioritised in applications where Java is not.

Our data reveals that the most decisive factor when selecting a language for machine learning is the type of project you’ll be working on — your application area. In our survey we asked developers about 17 different application areas while also providing our respondents with the opportunity to tell us that they’re still exploring options, not actively working on any area. Here we present the top and bottom three areas per language: the ones where developers prioritise each language the most and the least.
Machine learning scientists working on sentiment analysis prioritise Python (44%) and R (11%) more and JavaScript (2%) and Java (15%) less than developers working on other areas. In contrast, Java is prioritised more by those working on network security / cyber attacks and fraud detection, the two areas where Python is the least prioritised. Network security and fraud detection algorithms are built or consumed mostly in large organisations — and especially in financial institutions — where Java is a favourite of most internal development teams. In areas that are less enterprise-focused, such as natural language processing (NLP) and sentiment analysis, developers opt for Python which offers an easier and faster way to build highly performing algorithms, due to the extensive collection of specialised libraries that come with it.
Artificial Intelligence (AI) in games (29%) and robot locomotion (27%) are the two areas where C/C++ is favoured the most, given the level of control, high performance and efficiency required. Here a lower level programming language such as C/C++ that comes with highly sophisticated AI libraries is a natural choice, while R, designed for statistical analysis and visualisations, is deemed mostly irrelevant. AI in games (3%) and robot locomotion(1%) are the two areas where R is prioritised the least, followed by speech recognition where the case is similar.
Other than in sentiment analysis, R is also relatively highly prioritised — as compared to other application areas — in bioengineering and bioinformatics (11%), an area where both Java and JavaScript are not favoured. Given the long-standing use of R in biomedical statistics, both inside and outside academia, it’s no surprise that it’s one of the areas where it’s used the most. Finally, our data shows that developers new to data science and machine learning who are still exploring options prioritise JavaScript more than others (11%) and Java less than others (13%). These are in many cases developers who are experimenting with machine learning through the use of a 3rd-party machine learning API in a web application.
Image for post

Comments

Popular posts from this blog

Understanding Artificial Intelligence

office space available without any charge

Why I Moved From Commerce to Machine Learning?