My Little Green Book of Machine Learning and Deep Learning, Artificial Intelligence

Friday, March 27, 2020

My Little Green Book of Machine Learning and Deep Learning, Artificial Intelligence

Data pre-processing

Turn Complex Data into Numbers

Turn data into features. Turn data into feature vectors. Machine Learning models can only take numeric data. All input data must be represented numerically. For example, words need to be converted to word embeddings in some Natural Language Processing tasks.

Training vs Inference Models

There are two major tasks in machine learning 1. build and train a model 2. deploy a model for inference. Part 1 takes known data, uses it to tune parameters of the model such as weights. Part 2 takes in unknown data, real world data or test data and calls a dot predict method on the new data.

Normalization, Scaling Data

Normalize data, need to scale data to bound it. For example in Machine Learning, an error term can be arbitrarily large because the model can be arbitrarily bad, causing the lower bound of error term for f(x) = wx+b to be essentially unbounded. Bounding the error term by scaling the features numeric value can make the result easier to compute and make the search space easier for gradient descent.

Bias Variance Tradeoff

High bias may refer to underfitting, where the model is too simple, not complex enough to make accurate predictions. It can also mean when the model is practically ignoring the data.

High variance may refer to overfitting. That's when the model overfits, hence cannot generate to future data well.

Silicon Vanity | Tech lifestyle in Silicon Valley

Ad

Friday, March 27, 2020