Hello i_write_for_me! How’s your Friday night going?
alpha * slope yields Δx (not a Y value). You are right in that we should subtract “X” from theta though.
I guess now your question would be: why does alpha * slope yield Δx?
Δx is a step towards the minimum point. To figure out what is the best direction to minimize f, we take the gradient ∇f of it. When we multiply alpha (step size) into the slope, it approximates how much we should move towards the optimum point.
Think of y = x² graph and let’s say we start at x=4 and alpha= 0.01. (Draw it on the paper yourself. It helps!) Alpha is constant throughout the process. The more x moves towards 0 from 4, your slope becomes smaller and slope* alpha (which is Δx) becomes smaller as well. That’s how it converges.