"Machine Learning Algorithms and Implementations - Python Programming and Application Examples" - Second Post K Nearest Neighbor Algorithm
[Copy link]
This post was last edited by a media student on 2024-7-15 00:51
introduction:
Chapter 4 of this book explains the K nearest neighbor algorithm, which is also a common algorithm in previous algorithm design competitions such as ACM.
What is K-Nearest Neighbors Algorithm?
The kNN (k-Nearest Neighbor) algorithm is one of the most basic algorithms in machine learning and can be used for both classification and regression. The core idea is to measure the feature distance between a given sample and all samples in the training data set, and assign the category with the most occurrences among the k samples closest to the given sample feature as the category of the optimal estimate of the sample;
K nearest neighbor principle:
In the classification problem, for a new sample, its category can be determined based on the category of the nearest sample. Because the k-nearest neighbor algorithm mainly relies on the limited number of samples nearby rather than the method of distinguishing categories to determine the category, it is more suitable than other methods for sample sets with more cross-category or overlap.
Problems:
When the sample is unbalanced, misjudgment may occur, which can be improved by weighting (neighbors with a short distance to the sample have a large weight).
It is only applicable when the amount of data is small, otherwise the calculation complexity will be too high.
Feature distance calculation:
Common distance metrics in machine learning include Manhattan distance, Euclidean distance, etc. The Euclidean distance commonly used in the kNN algorithm is
Manhattan distance , also known as city block distance or taxi distance, is a method of calculating the distance between two points in a standard coordinate system by expressing the distance between the two points in the north-south direction plus the distance in the east-west direction. This distance calculation method is named after cities planned as square building blocks, such as Manhattan, where the shortest driving path (ignoring one-way roads and diagonal lanes on certain avenues) is determined by summing the straight-line distances in the north-south direction and the east-west direction.
The calculation formula of Manhattan distance is:http://www.w3.org/1998/Math/MathML">(,)=|12|+|12|d(i,j)=|X1X2|+|Y1Y2|, wherehttp://www.w3.org/1998/Math/MathML">(1,1)(X1,Y1) and (2,2)(X2,Y2) are two points on a two-dimensional plane. This formula directly reflects the definition of Manhattan distance, which is the distance between two points in the north-south direction plus the distance in the east-west direction.
|