Peter teaches you how to talk about love and AI | 11 Support Vector Machine (Part 2) - Using Lagrangian to solve the SVM prototype
Visualize the constraints on a function
If we plot f(x, y) in a three-dimensional rectangular coordinate system - if we "graph" f(x, y) - it will be a three-dimensional surface - such a function actually expresses the relationship between x, y, and z.
The constraint of the function f ( x , y ) is: g ( x , y ) = 0 .
So, how does a two-dimensional figure constrain a three-dimensional figure? As shown in the figure:
In this example, we can see that f ( x , y ) has a maximum value, and because the constraint is g ( x , y ) = 0 , if we want to take the following goal:
The corresponding point must be located on the curve formed by projecting g ( x , y ) = 0 onto f ( x , y ), such as point B in the figure above. Although it is not necessarily the maximum value of z = f ( x , y ), it is the maximum value that meets the constraint condition g ( x , y ) = 0!
Lagrange multiplier method
Objective function conversion:
The previous section introduced the objective function of the linearly separable SVM:
We can take it a step further and summarize it as follows:
This constraint can actually be written as:
Because inequality constraints are a little more difficult, we start with equality constraints.
Equality constraints:
Our objective function becomes:
Now we make the contour lines of f(x,y) and g(x,y) in a graph (the result of projecting a three-dimensional graph onto a two-dimensional plane), as shown in the figure:
The green line is the contour line of g(x,y), and the blue line is the contour line of f(x,y). The functions corresponding to the two blue lines in the figure are f(x,y)=d1 and f(x,y)=d2. d1 and d2 are two constants, corresponding to the z-axis coordinates of the two blue circles in the figure above. Here d2<d1.
We assume that the independent variable value of the red point is , then the gradient of at the red point is perpendicular to the tangent line of f(x,y)=d2 at , and the gradient of is perpendicular to the tangent line of g(x,y)=0 at .
And because the blue line corresponding to f(x,y)=d2 and the green line corresponding to g(x,y)=0 are tangent at , the gradients of f(x,y) and g(x,y) at the point are either in the same direction or in opposite directions.
Therefore, there must exist such that:
At this point we call λ the Lagrange multiplier !
Definition Lagrangian function : ——where λ is the Lagrangian multiplier.
The Lagrangian function integrates the original objective function and its constraints into one function. Therefore, the original constrained optimization problem can be transformed into an unconstrained optimization problem of the Lagrangian function.
Inequality constraints
Now that we understand the situation where the constraints are equalities, let's look at the situation where the constraints are inequalities.
The original proposition is as follows:
First, we still construct the Lagrangian function:
make:
Then the original proposition is equivalent to:
Lagrangian dual problem
Now we have transformed the inequality constraint problem into a p* problem.
But it is still a difficult problem to solve, because we have to solve the max problem of the inequality constraint first, and then find the minimum value on w. What should we do? If we can change the order, first solve the minimum value of w, and then solve the inequality constraint problem on a. That is:
This d* is the dual form of p*, which is also the dual form of the original problem. Unfortunately, duality is duality, but it is not necessarily equal! Because:
This inequality is called the Week Duality property . The smallest of the maximum values must be greater than or equal to the largest of the minimum values. This property is understandable from a common sense point of view. At the same time, we can get a duality gap, namely p*-d*.
In convex optimization theory, there is a Slater theorem. When this theorem is satisfied, the duality gap disappears, that is:
This is called strong duality . Fortunately, we have Slater's theorem.
【Recommended reading】
Peter teaches you how to talk about love with AI | 01 Introduction
Peter teaches you how to talk about AI | 02 What is machine learning
Peter teaches you how to talk about AI | 03 Three elements of machine learning
Peter teaches you how to talk about AI | 04 Gradient Descent Method
Peter teaches you how to talk about AI | 05 Using gradient descent to find linear regression model
Peter teaches you how to talk about AI | 06 Naive Bayes classifier
Just scan and follow us~
Featured Posts