Peter teaches you how to talk about love and AI | 11 Support Vector Machine (Part 2) - Using Lagrangian to solve the SVM prototype

Latest update time：2021-09-04 14:58

Reads：

Visualize the constraints on a function

If we plot f(x, y) in a three-dimensional rectangular coordinate system - if we "graph" f(x, y) - it will be a three-dimensional surface - such a function actually expresses the relationship between x, y, and z.

The constraint of the function f ( x , y ) is: g ( x , y ) = 0 .

So, how does a two-dimensional figure constrain a three-dimensional figure? As shown in the figure:

In this example, we can see that f ( x , y ) has a maximum value, and because the constraint is g ( x , y ) = 0 , if we want to take the following goal:

The corresponding point must be located on the curve formed by projecting g ( x , y ) = 0 onto f ( x , y ), such as point B in the figure above. Although it is not necessarily the maximum value of z = f ( x , y ), it is the maximum value that meets the constraint condition g ( x , y ) = 0!

Lagrange multiplier method

Objective function conversion:

The previous section introduced the objective function of the linearly separable SVM:

We can take it a step further and summarize it as follows:

This constraint can actually be written as:

Because inequality constraints are a little more difficult, we start with equality constraints.

Equality constraints:

Our objective function becomes:

Now we make the contour lines of f(x,y) and g(x,y) in a graph (the result of projecting a three-dimensional graph onto a two-dimensional plane), as shown in the figure:

The green line is the contour line of g(x,y), and the blue line is the contour line of f(x,y). The functions corresponding to the two blue lines in the figure are f(x,y)=d1 and f(x,y)=d2. d1 and d2 are two constants, corresponding to the z-axis coordinates of the two blue circles in the figure above. Here d2<d1.

We assume that the independent variable value of the red point is , then the gradient of at the red point is perpendicular to the tangent line of f(x,y)=d2 at , and the gradient of is perpendicular to the tangent line of g(x,y)=0 at .

And because the blue line corresponding to f(x,y)=d2 and the green line corresponding to g(x,y)=0 are tangent at , the gradients of f(x,y) and g(x,y) at the point are either in the same direction or in opposite directions.

Therefore, there must exist such that:

At this point we call λ the Lagrange multiplier !

Definition Lagrangian function : ——where λ is the Lagrangian multiplier.

The Lagrangian function integrates the original objective function and its constraints into one function. Therefore, the original constrained optimization problem can be transformed into an unconstrained optimization problem of the Lagrangian function.

Inequality constraints

Now that we understand the situation where the constraints are equalities, let's look at the situation where the constraints are inequalities.

The original proposition is as follows:

First, we still construct the Lagrangian function:

make:

Then the original proposition is equivalent to:

Lagrangian dual problem

Now we have transformed the inequality constraint problem into a p* problem.

But it is still a difficult problem to solve, because we have to solve the max problem of the inequality constraint first, and then find the minimum value on w. What should we do? If we can change the order, first solve the minimum value of w, and then solve the inequality constraint problem on a. That is:

This d* is the dual form of p*, which is also the dual form of the original problem. Unfortunately, duality is duality, but it is not necessarily equal! Because:

This inequality is called the Week Duality property . The smallest of the maximum values must be greater than or equal to the largest of the minimum values. This property is understandable from a common sense point of view. At the same time, we can get a duality gap, namely p*-d*.

In convex optimization theory, there is a Slater theorem. When this theorem is satisfied, the duality gap disappears, that is:

This is called strong duality . Fortunately, we have Slater's theorem.

【Recommended reading】

Peter teaches you how to talk about love with AI | 01 Introduction

Peter teaches you how to talk about AI | 02 What is machine learning

Peter teaches you how to talk about AI | 03 Three elements of machine learning

Peter teaches you how to talk about AI | 04 Gradient Descent Method

Peter teaches you how to talk about AI | 05 Using gradient descent to find linear regression model

Peter teaches you how to talk about AI | 06 Naive Bayes classifier

Peter teaches you how to talk about love with AI | 07 Decision Tree (Part 1) — A model that can perform both regression and classification

Peter teaches you how to talk about love with AI | 08 Decision Tree (Part 2) — A model that can perform both regression and classification

Peter teaches you how to talk about love with AI | 09 Decision Tree (Part 2) — A model that can perform both regression and classification