In order to have an in-depth understanding of our next blogs about Tensorflow and BigQuery, it’s important to understand the concept of Machine Learning (ML). Machine Learning is a well-known branch of artificial intelligence (AI) that involves deriving meaning from data that is readily available. When we say data, it can refer to text, image, videos as well as speech that can be analysed to make life much easier for you by making observations, predictions and identifying certain trends. While it is possible for humans to interpret data and come to relevant conclusions, the volume of data that is currently produced on a daily basis can make it can become nearly impossible for humans to analyse data and come to sound conclusions in a short period of time, this is why ML is the new go-to and plays such a vital role!
ML automates the process of analysing and forecasting data and can adapt to continuously changing datasets, with ease! Some of the current applications of ML include, text, image, speech and even voice recognition, fraud detection, recommendation systems and optimisation. There are also various tools currently available that can make integrating ML into different applications relatively easy, these tools include Google’s Tensorflow and BigQuery which we’ll tell you all about in our next two upcoming blogs!
There are however various components that are necessary for Machine Learning to be integrated with an application:
- Firstly, data needs to be gathered which can be data that is generated over time and collected or data that is already available. This data needs to be stored in one place and must be in abundance and of good quality so that the observations made using ML are sound and relevant. This speaks to a common computer science term known as, Garbage In Garbage Out (GIGO), which states that if poor quality inputs are used, then of course poor quality outputs would result. An ML dataset consists of two key components including a target feature, which is what we are trying to predict or identify, and descriptive features, which are properties and observations associated with a specific target feature.
- Secondly, after the data has been gathered, the dataset must undergo preparation/exploration. This involves making sure that there are no missing values in the dataset and can include visualising the dataset to see if there are any biases (skewness) or outliers. Although missing values, biases and outliers can be dealt with accordingly using ML best practices, such as imputation, statistical methods, clamp transformation etc.
- The third component of ML, is to split your dataset into a training set and test set. The proportion of data in your training and test set depends on the type and amount of data in your dataset. The training set is used to fine-tune your ML model which can be a neural network, k-nearest neighbours, regression model etc. (An ML model is a representation of what is learnt from the dataset using a specific algorithm and A test set is used to evaluate the predictions of the ML model against data that hasn’t been used to train the model, in essence, a test set can be used to understand how the model would be used in real life). The user can then determine whichever ML model/ algorithm they would like to use. With each model, there is a set of parameters that teach the model how to make predictions, for example, how to predict the target feature, based on certain descriptive features.
Through training the model, appropriate values for these parameters can be found that can then improve the performance of the model. The performance of the model can be evaluated using the test set. Finally after testing, these parameters can be refined through a process called Hyperparameter tuning, a process that ensures better performance of the model using an experimental process in which different values for the ML model parameters are chosen and adjusted according to the model’s performance. These parameters are unique to your model, however the good news is that there is an extensive amount of research currently available on how to initialise them!
After training and testing the ML model, it can now be used in real-life applications to swiftly and very conveniently answer business questions, make predictions as well as differentiations. ML can be integrated into any application in which an appropriate dataset is available, thereby assisting you in making sound observations that can benefit your business and help you make valuable and extremely informed business decisions. Ultimately benefiting your company beyond belief!
Contact us to get your company started with ML for new and improved, convenient and smart observations and informed business decisions, e-mail web@piidigital.co.za.