Less than 10 years ago there was little content available online for “cutting-edge” machine learning with a few notable exceptions.
There was no Github, no Kaggle, no cloud services, very little educational content on Youtube, few vibrant developer communities and few company-sponsored open source libraries. The statistical software packages used by professionals were largely Matlab, SAS, SPSS, Stata and these were mostly unavailable to the general public.
So much has changed since then. For the past couple of years, it has been an amazing, dazzling and awesome time to be learning, experimenting and working in Machine Learning! You can start a course, MOOC or lecture today with little prior knowledge and by the end of the day, you will likely be able to train a useful image classifier or apply powerful language processing!
Libraries and Platforms
The competition for developer mindshare between the massive data companies has lead to an over-abundance of open source libraries, pre-trained models and extremely affordable web services and ML platforms.
Between Google, Amazon, Microsoft, IBM, Facebook and Baidu – to name just a few – there are hundreds and hundreds of services, libraries and training data available for most machine learning tasks that you might want to tackle.
Compute and Storage
Compute and Storage has become incredibly accessible and affordable. Between AWS EC2, AWS S3, Google BigTable and their equivalents, together with extremely powerful pre-trained models, large-scale compute and storage are now accessible to almost everyone for only a few dollars.
While some see a “paradox of choice” in the available options for any given task, I see instead a competitive landscape that mostly empowers developers and data scientists and prevents (for the time being) a monopolization by any of the large players. Looking at Machine Learning frameworks alone there is a staggering array of mature and performant general purpose (largely open source options) available: Tensorflow, Torch, mxnet, caffee, scikit-learn, Spark MLib, Mahout, Weka, Chainer, Amazon AML, Microsoft CNTK, Deeplearning4J, H2O, BigDL, etc.
The breakthroughs, popularization and applications of Deep Learning together with the availability of powerful, affordable GPUs have revolutionized previously relatively stagnant fields of research such as speech recognition/synthesis, computer vision, natural language processing, machine translation and reinforcement learning.
Training Data and Pre-Trained Models
Often one of the most difficult resources to acquire are training and test data; this is in many cases not possible at all for individual developers. However, these days much training data or the resulting pre-trained models are made available by large companies and other organisations, and can be extended and repurposed through transfer learning such as InceptionNet, SyntaxNet, DeepMask, Wavenet and SentimentNeuron.
Never before has so much data been available to the general public. Many services and websites make their live data available through APIs, or at least large subsets thereof, such as Twitter, Github, GDELT, StackOverflow, HackerNews, Reddit, Wikipedia, CommonCrawl, GeoNames and OpenStreetMaps.
In addition, many datasets can be found on AWS Open Data and Google Public Datasets. This data availability together with cloud compute capabilities makes it possible to analyze terabytes of data and to build your own powerful models.
At the other side of the spectrum, the amount of educational material available online about Data Science, Machine Learning, Deep Learning and Natural Language Processing is astounding:
- The Kaggle competitions and forums are probably the best places to learn from practitioners for practitioners
- Platforms like EdX, Coursera, Udacity are offering (free) high-quality courses and MOOCs that provide most of the skills needed to become proficient at Machine Learning
- Leading universities and organizations like Microsoft, Nvidia, OpenAI, fast.ai are making very high-quality learning resources available for free online
Where to go from here?
The time is ripe for digital disruption using Machine Learning, Deep Learning and AI. All the stars are aligned to enable people all over the world, in their bedrooms rather than their garages, to upskill in Machine Learning, to apply existing libraries and models to new domains and to leverage all the available training data.
I believe that many of the innovations and disruptions of the coming decade will not be driven exclusively by the Frightful Five of Amazon, Apple, Facebook, Microsoft and Google, but rather by startups, entrepreneurs and individuals who are able to leverage all the resources at their disposal and who recognize the burning business needs and opportunities that can be addressed using the already-existing Machine Learning capabilities.
In this spirit, I am proud to support the AI Awards, to celebrate the AI and ML activities already taking place in Ireland and to promote even more startups, companies and individuals to engage in the exciting and rewarding field of Machine Learning.
By Johannes Ahlmann
CEO of fluquid.com, engaging in large-scale Web Crawling, Data Mining and Natural Language Processing. He is also the founder of webdata.org where he is working to create a community around making Machine Learning resources even more available and approachable for individuals and Small and Medium Enterprises.