We keep gathering more and more data. At the moment, the total amount of saved data doubles almost yearly. Innovations such as cheap data storage, working in the cloud, Big Data, API’s and Internet of Things ensure that as a whole it becomes even cheaper and easier to store more data. However, data alone is not much use. Data is only useful when you can change your data into information.
This is exactly where machine learning can help us. Machine learning often sounds threatening, the idea that uncontrollable technology from a self-learning computer network will take over the world and all humanity.... just think of the infamous scenes of A. Schwarzenegger in the film Terminator. In the meantime we are six terminator films further and we are still, as far as we know, not taken over by one or the other. Although Elon Musk & Stephen Hawking expressed their legitimate concerns about killer robots, it cannot do much harm to dive a little bit deeper into what machine learning exactly is... as long as we use it for conversion optimization, rather than Artificially Intelligent Weaponry. At least after reading this blog post you'll know what hit you when killer robots do takeover.
In this blog post we'll introduce basic machine learning concepts, the different types of machine learning algorithms and we'll share some example real world applications.
Think of a traditional computer program as a sort of box. On the one end of the box you input data. This input data is processed within the box so that you receive the desired output from the other side of the box. The developer is responsible for the logic within the box that ensures that the input is correctly processed into output.
Machine learning is the overall shift in this classic paradigm. Machine learning means that we put into the left side of the black box both the input and output. Through this the machine learning algorithms within the black box “independently” try to work out what the “rules” are based on what input could have generated the output. The algorithm can use these rules to predict the output of new information. More and more often are the dataset divided into a training set and a test set. With this test set we can see if the learning points from the training set are validated.
So for example imagine we have a dataset containing some demographic user data and the user income. Machine Learning can now learn from this dataset how to predict a person's income based on their demographic profile.
Whilst it may seem that this is only a small change, it has huge implications. Before, in order to solve a certain task we had to understand exactly what was causing the underlying problem and know how to solve it. With a set of possible problems, you would first have to figure out which factors do or do not influence the solution. Next, based on which factors could have a potential influence, you needed to test these hypotheses to see what these factors influence exactly. The big difference, and also the benefit of machine learning, is that it is no longer needed to first find out what the original cause of the problem is. Instead, we find this out using large amounts of data which is easy to gather and process.
Therefore, machine learning algorithms can help us with technical as well as statistical analysis, decision trees and through a simple way to test all the possible combinations. Based on learning using historical data, you can learn how to predict outputs using the input variables, without these algorithms actually having knowledge/understanding as to why the rules deliver the correct results. So “what” instead of “why”, or in other words, correlation rather than causality. For example, a doctor would no longer have to investigate different hypotheses in order to predict if someone is likely to get a certain illness. Instead, a computer program could analyse a large set of historical data and use this to predict the probability of it happening.
Machine Learning Algorithms can basically be divided into two types: supervised and unsupervised algorithms.
A supervised algorithm is an algorithm that you guide by providing the actual outputs and the input data. With an unsupervised output you do not provide the output and allow the algorithm to draw its own predicted conclusions based on only the input data. For these predicted outputs the algorithm often also provides the probability that the output is correct.
Supervised algorithms can be classified further into: classification algorithms and regression algorithms.
Unsupervised algorithms are made to draw conclusions from the data without ”help”. For example, if you have an excel sheet with client details. In the columns are the different details and each row is a client. Using this with the unsupervised algorithm could independently group the clients into data groups for one of the details. This is known as clustering and allows you to use your clients data more effectively.
There is more than one way to implement and use each of these algorithms There are different decision tree algorithms, probabilistic networks, neural networks, genetic algorithms, etc. We often also combine different algorithms with each other in order to make even stronger predictions.
Seeing as times are changing, it is important that you keep providing the machine with new data. We call this retraining. Next to retraining with new data, it is also important to provide the system with the incorrect prediction. Unlike people, a machine learning algorithm does learn from its mistakes. A machine learning algorithm will never make the same mistake twice. In this way there is a constant feedback loop and the algorithms keep improving their predictions.
In the previous section, we briefly explained what machine learning is and the different types of algorithms that exist. Now it’s high time to discover how we can use machine learning.
A common use of machine learning is used in what is known as recommendation engines. With this, think of restaurant suggestions based on a user profile and preferences. Probably the most well known and one of the most successful recommendation engines is the “Buyers who bought this, also bought:...” function on Amazon. This recommendation engine increases the revenue of Amazon by 29%. In the past, these complex algorithms could only be used by the internet giants such as Amazon, but now the barrier to entry for these techniques is low (thanks to the rapid adoption of the cloud).
The following are some short concrete examples of how Machine Learning can be used:
Unlike a cohort analysis where you have to define the target groups, machine learning figures out itself the target group based on data rather than assumptions.
Classify in real time the leads that enter your website in order to optimize the conversion rate.
Based on historical data a machine learning algorithm can predict whether a price is likely to increase or decrease.
With help from this team can you in real time, based on the user profile, adjust the product offering in order to maximize the chance of conversions. Multi armed Bandits perform better in complicated scenarios than AB-tests.
In this blog post we have explained in a simple way what Machine Learning is and its relevance in todays world. Has this blog post grabbed your interest? Are you curious about how the growth ambitions from your organization can be changed by Machine Learning? Come by our Rotterdam office for a cup of coffee. We can predict based on historical data sets if you drink your coffee with or without milk and sugar ;-).
The Big Data Revolution, Mayer-schonberger V, Cukier K
How Machine Learning Is Eating the Software World, Woodie A, 2015, http://www.datanami.com/2015/05/18/how-machine-learning-is-eating-the-software-world/
Mark Hall, Ian Witten, Eibe Frank , Data Mining: Practical Machine Learning Tools and Techniques