Bernard Marr* says Machine Learning and Big Data are increasingly heard in business speak, but often their respective functions are misunderstood.
Big Data and Machine Learning are two exciting applications of technology that are often mentioned together in the same breath.
In reality, there are important distinctions that need to be understood when we are making decisions about our business data strategy.
Today, data is often described as the fuel (or oil) of the information age.
It’s what powers our ability to build tools and platforms that can change the world through analytics and increasingly, accurate modelling and forecasting.
For an easy example, look at the speed at which pharmaceutical companies were able to create entirely new vaccines against COVID-19.
At the start of the pandemic, we regularly heard that it was usual to take up to 10 years to develop a new vaccine.
The rapidly accelerated pace at which it was done during 2020 was largely due to the way in which our ability to collect and process data has advanced in the last decade.
If the pandemic had broken out as recently as 2010, it would have taken far longer to crack the problem.
Was it Big Data that made it possible, or Machine Learning? In truth, it was a bit of both.
Let’s start by defining what each term refers to, then move on to looking at how you can make a decision about which one will work best for you.
Big Data is something of a catch-all term that refers to the vast increase in information that’s being created and pumped into the world, as well as the tools, techniques, and methodologies that have been developed to make use of it.
Big Data was first identified as a powerful force for change around the time the internet started to become a tool for everyday life, rather than a niche project.
A key concept to understand in order to ‘get’ what is meant when we talk about Big Data is that it’s about far more than the size of the data.
An early attempt at defining it suggested that there were three ‘V’s that had to be present for a data project to be considered Big Data.
These were volume (size), variety (the data will be of different types), and velocity (the dataset is quickly growing or changing).
Other important concepts to understand include the difference between structured data (information such as numbers that fits into database tables) and unstructured data (information like pictures, video, and speech, that doesn’t).
Machine Learning, on the other hand, refers to a type of computer algorithm that can be thought of as a subset of another loosely defined term, artificial intelligence (AI).
Machine learning is specifically concerned with creating programs that can get better at performing a task as they are fed increasing amounts of information.
An important concept to understand is the difference between supervised and unsupervised learning.
Supervised learning involves training algorithms with labelled data, so they can immediately ‘know’ whether they carried out a particular operation correctly.
Unsupervised learning involves data that is not labelled, and as such, the algorithm never specifically learns if its operations are resolved correctly or incorrectly.
All decisions are made based on what the algorithm can determine from the data itself, and its relationship to other pieces of data the algorithm has been fed.
The truth is probably that you will get the best results by understanding and choosing the most relevant processes and practices from both disciplines.
It’s perfectly possible to use Big Data techniques and tools to extract insights and meaning from information and then use it to drive business growth and improve decision-making.
On the other hand, if you’re using machine learning methods, it’s most likely that your work will tick many of the boxes that qualify it as Big Data.
Most likely, you will be working with datasets that have volume, velocity, and variety.
This is because Machine Learning algorithms need to be trained on data, and in order to become efficient, they need access to a lot of it.
Another way to think of it is that if you’re not working with Big Data, it’s unlikely that you’ll need to use Machine Learning.
The main benefit of Machine Learning is that it helps to extract value from datasets that are too complicated for ‘traditional’ computer and statistical analysis.
If your dataset is static, structured, and of a manageable size (such as fitting comfortably into an Excel sheet), then Machine Learning, which often requires a large amount of compute power, might be overkill.
Ultimately, Big Data and Machine Learning are two highly interdependent fields, but it’s important to remember that, by default, Big Data doesn’t necessarily mean ‘smart’.
Unlike Machine Learning, it doesn’t necessarily ‘learn’ anything, and the same algorithm will give you the same result again and again.
However, Machine Learning needs Big Data in order to work. Without it, it’s never going to ‘learn’ anything.
A final concept that can help make a decision on where you should be focusing your efforts is automation.
This means creating processes that carry out tasks automatically, with no (or minimal) need for human input.
Setting up an out-of-office auto-reply email is an example of automation that doesn’t need any form of Machine Learning or AI.
However, if you want to set up more complex automations — such as varying the reply depending on the content of the email, you might want to look into Machine Learning.
Using it, it would be quite possible to create a program that would scan the contents of the email (unstructured data) and then send an appropriate response.
You don’t always need both, but Big Data together with Machine Learning makes a very powerful combination.
*Bernard Marr is a bestselling business author and is recognised as an expert in strategy, performance management, analytics, KPIs and big data. He is the founder of Bernard Marr & Co and can be contacted at bernardmarr.com.
This article first appeared on LinkedIn.