- Import dependencies: numpy, pandas, sklearn, matplotlib
- Data cleaning:
- Replace all data with numeric value such as binaries 0 and 1 or scale down to between -1 to 1, or 0 to 1 (normalization).
- Replace yes/no binary answers with 1,0
- Replace categorical data A, B, C with dummy columns |A|B|C| use 1 if true, 0 if false
- Split data into features and target aka label
- Perform initial exploration, turns data CSV into Pandas.DataFrame
- Computer summary stats: mean, counts etc.
- from sklearn import model
- clf = sklearnmodel.model() #specify the classifier
- clf.fit( ... ) #fit the model wither parameters
- clf.predict() #make predictions
- R^2 R squared - great for linear regression 0 to 1, 1 being the best
- This list is under construction
|Sklearn machine learning model cheat sheet|
What are the best algorithms to use for each machine learning problem?
Classification versus regression
Supervised versus unsupervised