Wednesday, February 8, 2017

Machine Learning K Nearest Neighbors KNN Algorithm

On a high level, KNN tries to look for the most similar existing instances (defined by a chosen distance function), then retrieve the corresponding labels. Then using a voting method to decide the final label based on k votes. Voting mechanism could be a simple majority. May weigh instances unevenly. Commonly used distance function is euclidean e.g. as in sklearn. FYI: Euclidean is a special case of Minkowski with p=2

Can modify the weight parameter: uniform treat all neighbors equally, distance as measured in a specified distance function. Can customize func.

KNNk nearest neigbhors to a given point
kthe number of neighbors
nthe number of data points, data is sorted to speed up the algorithm
distanceEuclidean distance shortest line connecting the pointsOR a custom function that defines the distance
visualizescatter plot, each point is a circle with a radius that include certain number of neighbors
KNNa query intensive algorithm, not learning intensive
performancebig o notation, log(n) binary search, 1 is constant, n is linear
intuitionKNN stores all the data, then performs a binary search on the data when querying. Linear regression only stores the model y = mx+b. Key concepts: LEARN vs QUERY
Running TimeSpace
1 NN 1-nearest neigbhor, 1 dimensional list e.g. [1 2 4 7 8 9]learning1nKNN all data to storage without learning, so running time is 1 which means constant in Big O notation, and storage space is n for the number of data points
querylog(n) binary search to find one point1
K NNlearning1n
querylog(n) + k binary search log(n) to find one point and the k items next to it in a sorted list1
linear regressionlearningn1

No comments:

Post a Comment