"Machine Learning Algorithms and Implementations - Python Programming and Application Examples" - Second Post K Nearest Neighbor Algorithm

传媒学子

"Machine Learning Algorithms and Implementations - Python Programming and Application Examples" - Second Post K Nearest Neighbor Algorithm [Copy link]

This post was last edited by a media student on 2024-7-15 00:51

introduction:

Chapter 4 of this book explains the K nearest neighbor algorithm, which is also a common algorithm in previous algorithm design competitions such as ACM.

What is K-Nearest Neighbors Algorithm?

The kNN (k-Nearest Neighbor) algorithm is one of the most basic algorithms in machine learning and can be used for both classification and regression. The core idea is to measure the feature distance between a given sample and all samples in the training data set, and assign the category with the most occurrences among the k samples closest to the given sample feature as the category of the optimal estimate of the sample;

K nearest neighbor principle:

In the classification problem, for a new sample, its category can be determined based on the category of the nearest sample. Because the k-nearest neighbor algorithm mainly relies on the limited number of samples nearby rather than the method of distinguishing categories to determine the category, it is more suitable than other methods for sample sets with more cross-category or overlap.

Problems:

When the sample is unbalanced, misjudgment may occur, which can be improved by weighting (neighbors with a short distance to the sample have a large weight).

It is only applicable when the amount of data is small, otherwise the calculation complexity will be too high.

Feature distance calculation:

Common distance metrics in machine learning include Manhattan distance, Euclidean distance, etc. The Euclidean distance commonly used in the kNN algorithm is

Manhattan distance , also known as city block distance or taxi distance, is a method of calculating the distance between two points in a standard coordinate system by expressing the distance between the two points in the north-south direction plus the distance in the east-west direction. This distance calculation method is named after cities planned as square building blocks, such as Manhattan, where the shortest driving path (ignoring one-way roads and diagonal lanes on certain avenues) is determined by summing the straight-line distances in the north-south direction and the east-west direction.

The calculation formula of Manhattan distance is:http://www.w3.org/1998/Math/MathML">(,)=|12|+|12|d(i,j)=|X1X2|+|Y1Y2|, wherehttp://www.w3.org/1998/Math/MathML">(1,1)(X1,Y1) and (2,2)(X2,Y2) are two points on a two-dimensional plane. This formula directly reflects the definition of Manhattan distance, which is the distance between two points in the north-south direction plus the distance in the east-west direction. $The concept of Manhattan distance not only comes from geography and urban planning, but is also related to the extreme value problem in mathematics. It was created to solve a specific type of mathematical problem, solving the minimum value through the geometric meaning of absolute value and the angle of function graph. In computer science and data analysis, Manhattan distance is often used in scenarios where the distance between points on a grid structure needs to be calculated, such as image processing, path planning, etc.In addition, Manhattan distance is different from Euclidean distance (commonly known as straight-line distance), which uses the Pythagorean theorem to calculate the shortest straight-line distance between two points. The calculation method of Manhattan distance more intuitively reflects the path walked in the street network, so it is also called "taxi geometry" because when taxi drivers are driving, they usually choose to go straight instead of taking a detour to save time or distance.Euclidean distance: usually refers to the straight-line distanceThe distance formula for a two-dimensional plane is:image-20240715004800-1.png (8.13 KB, downloads: 0) download attach save to album 2024-7-15 00:48 上传Can be extended to multi-dimensional space:image-20240715004800-2.png (8.53 KB, downloads: 0) download attach save to album 2024-7-15 00:48 上传Algorithm implementation:This book uses Python to demonstrate a complete set of kNN algorithm processes from data generation-algorithm implementation-programming examples and result analysis.Insights:Through the kNN algorithm, we need to understand that the core of machine learning or artificial intelligence algorithms ismathematics. Mathematicsis the core of artificial intelligence. Programming is only a tool for the implementation of artificial intelligence, and data is the necessary food for the growth of algorithms. If young friends want to peek into the world of artificial intelligence, please be sure to learn mathematics, advanced mathematics/linear algebra/probability theory, and graduate students' analytic geometry and other mathematical knowledge. Otherwise, we are just an operator, and a low-level operator. If you want to become a scientist in the field of artificial intelligence, mathematics is the foundation that cannot be bypassed.$

位置我

Through the kNN algorithm, we need to understand that the core of machine learning or artificial intelligence algorithms is mathematics, mathematics is the core of artificial intelligence, and programming is only a tool for implementing artificial intelligence.

"Machine Learning Algorithms and Implementations - Python Programming and Application Examples" - Second Post K Nearest Neighbor Algorithm [Copy link]

Latest reply