- 1. Model evaluation and validation
- 1.1 STATISTICAL ANALYSIS
- 1.2 DATA MODEL
- 1.3 EVALUATION AND VALIDATION
- 1.4 MANAGING ERROR AND COMPLEXITY
- 1.5 PROJECT
- 1.3 EVALUATION AND VALIDATION
- 1.3.1 TRAINING AND TESTING
- 1.3.1.1 Benefit of testing
- 1.3.1.2 Train / Test Split in sklearn
- Useful concepts : train_test_split function
- 1.3.2 EVALUATION METRICS
- 1.3.2.1 Metrics
- 1.3.2.2 Classification and Regression
- Useful concepts: Categorical data vs continuous data
- 1.3.2.3 Classification metrics
- Useful concepts: discrete predictions
- 1.3.2.4 Accuracy
- Useful concepts: proportion of items classified or labeled correctly, my_model.score(X_test, y_test). Shortcoming of accuracy if data is skewed, or need to err on side of innocence or git. Accuracy: no. of items in a class labeled correct / all items in that class (Erron has a small number of innocent people)
- Picking the Most Suitable Metric
- Concept: information asymmetry
- Confusion Matrix
- Concept: if care about asymmetric learning, may want to shift the decision front up or down to include certain results
- Decision Tree: confusion matrix
- Precision and Recall
- Equation for Precision
- Concept: precision = true positives / (true positives and false positives)
- Equation for Recall
- Concept: precision = true positives / (true positives and false negatives)
Precision vs Recall
F1 Score
Regression metrics
Mean Absolute Error
Mean Squared Error
Regression Scoring Function
Managing Error and Complexity
Cause of Error
Error due to bias
Linear Learner, Quadratic Data (programming learning curve)
Error due to Variance - Precision and Overfitting
Representative Power of a Model
- 1.1. Curse of Dimensionality
1.2. Curse of Dimensionality Two
Learning Curves and Model Complexity
1.1 Learning Curves
1.2 Learning Curves II
1.3 Ideal Learning Curves
1.4 Model Complexity
1.5 Learning Curves and Model Complexity
1.6 Practical Use of Model Complexity
Section Syllabus
- Supervised Learning
- Regression
- Continuous supervised learning