Ad

Wednesday, February 8, 2017

Machine Learning K Nearest Neighbors KNN Algorithm

On a high level, KNN tries to look for the most similar existing instances (defined by a chosen distance function), then retrieve the corresponding labels. Then using a voting method to decide the final label based on k votes. Voting mechanism could be a simple majority. May weigh instances unevenly. Commonly used distance function is euclidean e.g. as in sklearn. FYI: Euclidean is a special case of Minkowski with p=2

Can modify the weight parameter: uniform treat all neighbors equally, distance as measured in a specified distance function. Can customize func.

KNNk nearest neigbhors to a given point
kthe number of neighbors
nthe number of data points, data is sorted to speed up the algorithm
distanceEuclidean distance shortest line connecting the pointsOR a custom function that defines the distance
visualizescatter plot, each point is a circle with a radius that include certain number of neighbors
KNNa query intensive algorithm, not learning intensive
performancebig o notation, log(n) binary search, 1 is constant, n is linear
intuitionKNN stores all the data, then performs a binary search on the data when querying. Linear regression only stores the model y = mx+b. Key concepts: LEARN vs QUERY
Running TimeSpace
1 NN 1-nearest neigbhor, 1 dimensional list e.g. [1 2 4 7 8 9]learning1nKNN all data to storage without learning, so running time is 1 which means constant in Big O notation, and storage space is n for the number of data points
querylog(n) binary search to find one point1
K NNlearning1n
querylog(n) + k binary search log(n) to find one point and the k items next to it in a sorted list1
linear regressionlearningn1
query11

No comments:

Post a Comment

Developing apps for airtable using Airtable Blocks

The airtable smart sheets now has an app platform called Airtable Blocks, which allows developers to add custom code, and build apps quickly...