Approach 2: Gradient Descent of RSS(w)
Remember the gradient for our residual sum of squares function is given by;
is our observed value. In our house example, it is the actual price of the house i.
is the function of the line that produces a prediction given a value for . In the case of our house example, would be the square footage of the house i.
We can make this notation more succinct by substituting the line equation with the notation for the predicted value for y;
or for the t-th iteration of the gradient descent the prediction is;
So re-writing the gradient for the RSS function;
So the gradient descent algorithm is
While not converged:
We can move the constant -2 out and get final formulation for the gradient descent calculation
Simple Linear Regression Gradient Descent
While not converged: