- Imbalanced dataset can affect machine learning decision boundaries.
- Distribution of target class matters!
- Why is imbalanced class bad - source 1
- Sometimes imbalanced dataset is "good" such as in fraud detection. It is important to find those anomalies.
- "imbalancec class puts accuracy out of business". It is important to not choose accuracy as metric because the model can cheat by just guessing the majority class and achieve high accuracy. The high accuracy in this case is an illusion. It is about using a trade-off of precision and recall, confusion matrix.
Related Concepts:
- Imbalanced datasets
- Confusion matrix
- Resampling
- Random under-sampling
- Random over-sampling
- Python imbalanced-learn module
- Random under-sampling and over-sampling with imbalanced-learn
- Under-sampling: Tomek links
- Under-sampling: Cluster Centroids
- Over-sampling: SMOTE
Source 4
- Source 1 https://medium.com/data-science-bootcamp/class-imbalanced-explained-machine-learning-data-science-basics-22caaeb81133
- Source 2 Wikipedia https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
- Source 3 https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
- Source 4
Amazingly written blog. Great Post!!! Thanks for the data update and waiting for your new updates.
ReplyDeleteWordpress Development Company
Mobile App Ideas
Web Development
This is very nice post. Thank you for sharing.
ReplyDeleteLaravel Development Company
i appreciate this blog. this is amazing post
ReplyDeleteWordpress App Development Company