Overfitting in machine learning
- Get link
- X
- Other Apps
Overfitting in machine learning
Overfitting refers to a model that models the training data too well. Overfitting happen when a model learn the detail and noise in the training data to the extent that it negatively impacts the performance of the model an new data. This means that the noise or random fluctuations in the training data is picked up and learned as concept by the model. The problem is that these concepts do not apply to new data and negatively impact the model ability to generalize.
For example, decision tree are a nonparametric machine learning algorithm that is very flexible and is subject to overfitting training data. This problem can be addressed by pruning a tree after it has learned in order to remove some of the detail it has picked up.
Technique to reduce overfitting:-
There are several techniques to reduce overfitting in machine learning.
1) Cross Validation
2) L1/L2 regulariszation
3) Feature Selection
4) Hold Out
5) Dropout
1) Cross Validation:- One of the most powerful features to reduce overfitting is cross validation. The idea behind this is to use the initial training data to generate mini traintest splits and then use these splits to tune your model.
2) L1/L2 regulariszation:- Regulariszation is a technique to constrain our network from learning a model that is too complex, which may therefore overfit. In L1 or L2 regulariszation, we can add a penalty term on the cost function to push the estimated coefficients towards zero (and not take more extreme values).
3) Feature Selection:- If we have only a limited amount of training samples, each with a large number of features, we should only select the most important feature for training so that our model doesn't need to learn for so many features and eventually overfit.
4) Hold Out:- Rather than using all of our data for training, we can simply spilt our dataset into two sets: training and testing. A common spilt ratio is 80% for training and 20% for testing.
5) Drop Out:- By applying dropout, which is a from of regulariszation to our layers we ignore a subset of unit of our network with a set probability using dropout, we can reduce interdependent learning among units, which many have led to overfitting.
- Get link
- X
- Other Apps
Comments
Post a Comment