Modified on
02 Jan 2023 06:02 pm
Skill-Lync
Combining the predictions from various models, ensemble learning, a broad meta approach to machine learning aims to improve predictive performance.
Three techniques rule the world of ensemble learning, even though you can create an apparently infinite amount of ensembles for your predictive modelling issue. So much so that it is a topic of study that has given rise to numerous more specialised approaches rather than algorithms per se.
For any given dataset, there are multiple models that can be fit and used for predictions. Sometimes we may not know which model performs the best.
Multiple models are given the dataset, and the predictions are combined to overcome the time that may go into fitting and finding the hyperparameters that would give the best results. Since an ensemble of models is involved, the approach is called ensemble learning.
Boosting, Bagging, and Stacking are three popular methods of combining machine learning models.
Here multiple models are added sequentially. The errors in model one are corrected by model 2, etc. Each model corrects; consequently, we may have to stop overfitting later. In AdaBoost, weighted datasets are given. Here the emphasis is more on the data where the models went wrong rather than the correct ones.
In this boosting there is only one node decision tree. It is called a decision stump.
As an extension of AdaBoost, loss functions are also added to minimize overfitting and error. XGBoost and LightGradientBoost are two more methods involving sequential boosting and loss functions.
Here samples(rows or also called ensembles) are randomly given as inputs to multiple decision trees. The samples given to the dataset are also returned to the original dataset. This is called replacement or bootstrapping. The final prediction of all those decision trees is combined. A final decision is made using statistical techniques such as averaging or voting.
As an extension Random forest of ensemble technique, bootstrapping happens with features. That is to say that some features + ensembles (rows) are taken for the first tree, then it is returned to the original data set, and another set of features (may include the features that were selected first also, but not compulsory) are again sent to the second tree. Likewise, the process is repeated for n trees. Again the final decision is made by average or voting of output from all trees. In both these models, the models work in parallel.
Here we take a majority rule on predictions from multiple models. Multiple models are trained for the same dataset, and then prediction or classification is made. The class or the prediction with the maximum votes is the output. This is also termed hard voting. Soft voting happens in classification problems, where each model gives a probability value for the various classes. The label with the largest sum of all probabilities is the final output.
Here a new ML algorithm is set up for which the input is nothing but the output of various ML algorithms. This could be linear regression in the case of prediction or logistic regression in the case of classification. Although it is not a hard and fast rule to use the same.
The output of various ML algorithms in an ensemble is fed as input to another ML algorithm which makes a decision.
Author
Navin Baskar
Author
Skill-Lync
Subscribe to Our Free Newsletter
Continue Reading
Related Blogs
When analysing SQL data, Microsoft Excel can come into play as a very effective tool. Excel is instrumental in establishing a connection to a specific database that has been filtered to meet your needs. Through this process, you can now manipulate and report your SQL data, attach a table of data to Excel or build pivot tables.
08 Aug 2022
Microsoft introduced and distributes the SQL Server, a relational database management system (RDBMS). SQL Server is based on SQL, a common programming language for communicating with relational databases, like other RDBMS applications.
23 Aug 2022
Machine Learning is a process by which we train a device to learn some knowledge and use the awareness of that acquired information to make decisions. For instance, let us consider an application of machine learning in sales.
01 Jul 2022
Companies seek candidates who can differentiate themselves from the colossal pool of engineers. You could have a near-perfect CGPA and be a bookie, but the value you can provide to a company determines your worth.
04 Jul 2022
Often while working with datasets, we encounter scenarios where the data present might be very scarce. Due to this scarcity, dividing the data into tests and training leads to a loss of information.
27 Dec 2022
Author
Skill-Lync
Subscribe to Our Free Newsletter
Continue Reading
Related Blogs
When analysing SQL data, Microsoft Excel can come into play as a very effective tool. Excel is instrumental in establishing a connection to a specific database that has been filtered to meet your needs. Through this process, you can now manipulate and report your SQL data, attach a table of data to Excel or build pivot tables.
08 Aug 2022
Microsoft introduced and distributes the SQL Server, a relational database management system (RDBMS). SQL Server is based on SQL, a common programming language for communicating with relational databases, like other RDBMS applications.
23 Aug 2022
Machine Learning is a process by which we train a device to learn some knowledge and use the awareness of that acquired information to make decisions. For instance, let us consider an application of machine learning in sales.
01 Jul 2022
Companies seek candidates who can differentiate themselves from the colossal pool of engineers. You could have a near-perfect CGPA and be a bookie, but the value you can provide to a company determines your worth.
04 Jul 2022
Often while working with datasets, we encounter scenarios where the data present might be very scarce. Due to this scarcity, dividing the data into tests and training leads to a loss of information.
27 Dec 2022
Related Courses