Math

The process of adjusting an image in the direction, that would improve the results, is called "gradient descent". Adjusting the image actually means adjusting the RGB values of each pixel. In mathematics, a gradient is the direction, in which a scalar function of multiple variables increases the fastest. It is a vector of the partial derivatives of the function with respect to each of its variables. In our case, the variables are the RGB values of each pixel. But when training the network itself, the variables are the network's parameters (filters). And the function is the "loss" (or "cost") function, which measures the difference (or distance) between the desired output and the network's actual output at the current step.

The algorithm to compute the gradient, in tolerable time, is called "backpropagation". Because each layer's output is a function of the previous layer's output, this algorithm uses the "chain rule" of calculus and starts from the last layer, recursing backwards through the layers, and multiplying partial derivatives at each step. Hence the name "backpropagation". After the gradient is computed, all that is left to do is to adjust the parameters. And because the gradient is the direction of fastest increase of the cost function, we adjust the parameters in the opposite direction - by subtracting the gradient. Hence the name "gradient descent".