How does machine learning differ from procedural programming aka traditional programming? In traditional programming, we must specify the step-by-step line-by-line code and in some cases control flows and logic. Generally we need to tell the program exactly what to do. In machine learning, we choose the right algorithm and supply the training data to train and tune the algorithm, turn it into a model that can be used for prediction. Often more data points is better.
Another way to put it in traditional programming, we have to tell the computer what exactly the formula, function is, how does it calculate the output. For machine learning, we give the algorithm many examples so that it can approximate what is the formula or function.
Loss functions: There are many viable loss functions, each has strengths and weaknesses. Like everything else in machine learning, the choice is often a trade-off. Loss functions measure how good our model is at making prediction on input data.
Gradient Descent: often machine learning models use gradient descent to figure out the best or max direction of changes needed to update weights and parameters so that the loss can be decreased.
Some data is readily available as mentioned above. There is also data that is expensive and hard-to-collect such as financial and health data. Some data can be easily obtained such as image data. It estimated that 95 million photos are shared on Instagram each day.
Labeled Data Unlabeled Data
Supervised vs Unsupervised Learning
One question to ask is: Is the data labeled or not labeled? Supervised learning requires labeled data. A cat, a dog, there should be no overlap among the categories. Supervised learning can be regression as well. Unsupervised learning finds natural grouping among the data points, which do not have labels. The number of centers aka centroids is a hyperparameter that needs to be tuned and decided.