Moving to multiple dimensions – Gradient Descent

Our prior discussion is of a function in one variable g(w) where the derivative is with respect to the one variable; . In fact, even in our simple linear equation, we have two variables, . So we need to generalize the notion of derivative to the notion of gradient to support more than one variable.

the Greek letter nabla is the symbol for gradient.

is a function of the vector w. (don’t let the g confuse with you, it could be any letter – it is not specific to the gradient).

w is a vector of length p + 1. Each element is one coefficient, named from to .

[] the result is a vector of length p + 1, where each element is the partial derivative in one element on w, from to , respectively. In a partial derivative, say for , we take the derivative with respect to and treat all other elements of w as constants.

So, in effect, the gradient is a set of derivatives, each with respect to one coefficient in the function. The resulting gradient has the same dimension as the original function.

So our hill descent algorithm becomes a gradient descent algorithm by using the gradient rather than the derivative;

Written out more fully;

Convergence in this case is when the magnitude of the gradient goes to zero or in practical terms, when the magnitude of the gradient is within some tolerance to zero.

The magnitude of a vector is the square root of the sum of the squares of its elements. So for the vector w with p + 1 elements the magnitude is given by;

results matching ""

    No results matching ""