A generalized form of the delta rule, developed by D.E. Rumelhart, G.E. Hinton, and R.J. Williams, is needed for networks with hidden layers. They showed that this method works for the class of semilinear activation functions (non-decreasing and differentiable).
Generalizing the ideas of the delta rule, consider a hierarchical network with an input layer, an output layer and a number of hidden layers. We will consider only the case where there is one hidden layer. The network is presented with input signals which produce output signals that act as input to the middle layer. Output signals from the middle layer in turn act as input to the output layer to produce the final output vector. This vector is compared to the desired output vector. Since both the output and the desired output vectors are known, the delta rule can be used to adjust the weights in the output layer. Can the delta rule be applied to the middle layer? Both the input signal to each unit of the middle layer and the output signal are known. What is not known is the error generated from the output of the middle layer since we do not know the desired output. To get this error, backpropagate through the middle layer to the units that are responsible for generating that output. The error genrated from the middle layer could be used with the delta rule to adjust the weights.
Copyright 1996 by Ingrid Russell.