前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Bias Variance Tradeoff – Clearly Explained

Bias Variance Tradeoff – Clearly Explained

作者头像
用户1148525
发布2021-01-13 16:09:16
6320
发布2021-01-13 16:09:16
举报
文章被收录于专栏:机器学习、深度学习

https://www.machinelearningplus.com/machine-learning/bias-variance-tradeoff/

Bias Variance Tradeoff is a design consideration when training the machine learning model. Certain algorithms inherently have a high bias and low variance and vice-versa. In this one, the concept of bias-variance tradeoff is clearly explained so you make an informed decision when training your ML models

Introduction What exactly is Bias? What is Variance Error? Example of High Bias and Low Variance Example of Low Bias and High Variance Bias – Variance TradeOff How to fix bias and variance problems?

Introduction A machine learning model’s performance is evaluated based on how accurate is its prediction and how well it generalizes on another independent dataset it has not seen.

The errors in a machine learning model can be broken down into 2 parts:

  1. Reducible Error
  2. Irreducible Error

Irreducible errors are errors that cannot be reduced even if you use any other machine learning model.

Reducible errors, on the other hand, is further broken down into square of bias and variance. Due to this bias-variance, it causes the machine learning model to either overfit or underfit the given data. I will be discussing these in detail in this article.

What exactly is Bias? Bias is the inability of a machine learning model to capture the true relationship between the data variables. It is caused by the erroneous assumptions that are inherent to the learning algorithm. For example, in linear regression, the relationship between the X and the Y variable is assumed to be linear, when in reality the relationship may not be perfectly linear.

Let’s look at an example of artificial dataset with variables study hours and marks.

This graph shows the original relationship between the variables. Notice, there is a limit to the marks you can get on the test. That is even if you study an extraordinary amount of time, there is always a certain ‘maximum mark’ you can score. You can see the line flattening beyond a certain value of the X-axis. So the relationship is only piecewise linear. This sort of error will not be captured by the vanilla linear regression model.

You can expect an algorithm like linear regression to have high bias error, whereas an algorithm like decision tree has lower bias. Why? because decision trees don’t make such hard assumptions. So is the case with algorithms like k-Nearest Neighbours, Support Vector Machines, etc.

在这里插入图片描述
在这里插入图片描述

In general,

High Bias indicates more assumptions in the learning algorithm about the relationships between the variables. Less Bias indicates fewer assumptions in the learning algorithm. What is the Variance Error? This is nothing but the concept of the model overfitting on a particular dataset. If the model learns to fit very closely to the points on a particular dataset, when it used to predict on another dataset it may not predict as accurately as it did in the first.

Variance is the difference in the fits between different datasets.

Generally, nonlinear machine learning algorithms like decision trees have a high variance. It is even higher if the branches are not pruned during training.

Low-variance ML algorithms: Linear Regression, Logistic Regression, Linear Discriminant Analysis.

High-variance ML algorithms: Decision Trees, k-NN, and Support Vector Machines.

Let’s look at the same dataset and try to fit the training data better. Fitting the training data with more complex functions to reduce the error.

在这里插入图片描述
在这里插入图片描述

See that we have got nearly zero error in the training data. Now let’s try this curve to the test data.

在这里插入图片描述
在这里插入图片描述

The errors in the test data are more in this case. If there is more difference in the errors in different datasets, then it means that the model has a high variance. At the same time, this type of curvy model will have a low bias because it is able to capture the relationships in the training data unlike straight line.

Example of High Bias and Low Variance: Linear Regression Underfitting the Data If a model has high bias, then it implies that the model is too simple and does not capture the relationship between the variables. This is called the underfitting of data. You can think of using a straight line to fit the data as in the case of linear regression as underfitting the data.

Bias – Variance Tradeoff

在这里插入图片描述
在这里插入图片描述

Let’s summarize:

If a model uses a simple machine learning algorithm like in the case of a linear model in the above code, the model will have high bias and low variance(underfitting the data).

If a model follows a complex machine learning model, then it will have high variance and low bias( overfitting the data).

You need to find a good balance between the bias and variance of the model we have used. This tradeoff in complexity is what is referred to as bias and variance tradeoff. An optimal balance of bias and variance should never overfit or underfit the model.

This tradeoff applies to all forms of supervised learning: classification, regression, and structured output learning.

How to fix bias and variance problems?Fixing High Bias Adding more input features will help improve the data to fit better.

Add more polynomial features to improve the complexity of the model.

Decrease the regularization term to have a balance between bias and variance.

Fixing High Variance Reduce the input features, use only features with more feature importance to reduce overfitting the data.

Getting more training data will help in this case, because the high variance model will not be working for an independent dataset if you have very data.

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021/01/08 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档