src package#
Submodules#
src.activation module#
- class src.activation.LeakyReLU(alpha=0.01)[source]#
Bases :
Module
Leaky ReLU activation function.
\[\begin{split}\text{LeakyReLU}(x) = \max(\alpha x, x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \alpha \times x, & \text{ otherwise } \end{cases}\end{split}\]- backward_delta(input, delta)[source]#
- \[\begin{split}\frac{\partial M}{\partial z^h} = \begin{cases} 1 & \text{if } x>0, \\ \alpha & \text{otherwise}. \end{cases}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.LogSoftmax[source]#
Bases :
Module
LogSoftmax activation function.
\[\text{LogSoftmax}(x_{i}) = \log \left( \frac{\exp(x_i)}{\sum_j \exp(x_j)} \right)\]- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.ReLU[source]#
Bases :
Module
ReLU (rectified linear unit) activation function.
\[\text{ReLU}(x) = x^+ = \max(0, x)\]- backward_delta(input, delta)[source]#
- \[\frac{\partial M}{\partial z^h} = 1 \text{ if } x > 0 \text{ else } 0.\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.Sigmoid[source]#
Bases :
Module
Sigmoid activation function.
\[\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + \exp(-x)}\]- backward_delta(input, delta)[source]#
- \[\frac{\partial M}{\partial z^h} = \sigma(z^h) * (1 - \sigma(z^h))\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.Softmax[source]#
Bases :
Module
Softmax activation function. Commonly used along with a cross entropy loss. See [Softmax and cross-entropy loss](https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/) and [Derivative of Cross Entropy Loss with Softmax](https://www.parasdahal.com/softmax-crossentropy)
\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]- backward_delta(input, delta)[source]#
- \[\frac{\partial M(x_i)}{\partial x_i} = M^h(x_i) * (1 - M^h(x_i))\]
Plus précisement
\[\begin{split}\frac{\partial M^h(x_i)}{x_j} = \begin{cases} M^h(x_i) * ( 1 - M^h(x_j) ) &\text{si } i = j \\ - M^h(x_j) M^h(x_i) &\text{ si } i \neq j \\ \end{cases}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- forward(X)[source]#
Implemented using a log sum exp trick to avoid NaN. See [Computing softmax and numerical stability](https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/).
- class src.activation.Softplus[source]#
Bases :
Module
Smooth approximation of the ReLU activation function.
\[\text{Softplus}(x) = \ln(1 + e^x)\]- backward_delta(input, delta)[source]#
- \[\frac{\partial M}{\partial z^h} = \sigma (x) = \frac{1}{1 + e^{-x}}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.StableSigmoid[source]#
Bases :
Module
Numerically stable Sigmoid activation function.
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.activation.TanH[source]#
Bases :
Module
Hyperbolic Tangent activation function.
\[\begin{split}\begin{align*} \text{TanH}(x) &= \tanh(x) \\ &= \frac{\sinh x}{\cosh x} \\ &= \frac{\exp(x) - \exp(-x)} {\exp(x) + \exp(-x)} \\ &= \frac{e^{2x} - 1 }{e^{2x} + 1} \end{align*}\end{split}\]- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
src.convolution module#
We tried to vectorize our convolutions to the maximum, prioritizing the performance.
It implies creating special views of our array, by using the numpy.lib.stride_tricks functions. sliding_window_view is the easiest to understand, while maybe not the fastest compared to as_strided (but maybe less risky too).
The calculations are done using np.einsum, which is relatively easy to understand and use. The key relies in understanding the shapes of your inputs/outputs.
Shape#
- Reminder for 1D:
input : ndarray (batch, length, chan_in)
d_out : ndarray (batch, length, chan_in) == input.shape
X_view : ndarray (batch, out_length, chan_in, self.k_size)
delta : ndarray (batch, out_length, chan_out)
_gradient[« weight »] : ndarray (k_size, chan_in, chan_out)
_parameters[« weight »] : ndarray (k_size, chan_in, chan_out)
Notes#
- Notation used for np.einsum:
b : batch_size
w : width (2D) / length (1D)
h : height (2D)
o : out_width (2D) / out_length (1D)
p : out_height (2D)
c : chan_in
d : chan_out
k : k_size (ij for 2D)
Examples#
Quick demonstration of sliding_window_view in 1D: .. code-block:: python
>>> batch, length, chan_in, k_size = 1, 8, 1, 3
>>> input = np.random.randn(batch, length, chan_in)
>>> input
array([[[-0.41982262],
[ 1.10111123],
[-0.41115195],
[ 1.18733225],
[-1.93463567],
[-0.22472025],
[-0.30581971],
[ 0.40578667]]])
>>> window = np.lib.stride_tricks.sliding_window_view(input, (1, k_size, chan_in))
>>> window
array([[[[[[-0.41982262],
[ 1.10111123],
[-0.41115195]]]],
[[[[ 1.10111123],
[-0.41115195],
[ 1.18733225]]]],
...
How to deal with stride != 1? .. code-block:: python
>>> stride = 3
>>> window = np.lib.stride_tricks.sliding_window_view(input, (1, k_size, chan_in))[::1, ::stride, ::1]
>>> window
array([[[[[[-0.41982262],
[ 1.10111123],
[-0.41115195]]]],
[[[[ 1.18733225],
[-1.93463567],
[-0.22472025]]]]]])
Then it is just a matter of reshape, to drop unnecessaries dimensions, e.g. : .. code-block:: python
>>> window = window.reshape(batch, out_length, chan_in, k_size)
>>> window
array([[[[-0.41982262, 1.10111123, -0.41115195]],
[[ 1.18733225, -1.93463567, -0.22472025]]]])
And voilà!
- class src.convolution.AvgPool1D(k_size, stride)[source]#
Bases :
Module
1D average pooling.
- Paramètres:
k_size (int) – Size of the convolving kernel.
stride (int, optional, default=1) – Stride of the convolution.
Shape#
Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(x, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.convolution.Conv1D(k_size: int, chan_in: int, chan_out: int, stride: int = 1, bias: bool = False, init_type: Literal['normal', 'uniform', 'zeros', 'ones', 'he_normal', 'he_uniform', 'xavier_normal', 'xavier_uniform'] = 'xavier_normal')[source]#
Bases :
Module
1D convolution.
- Paramètres:
k_size (int) – Size of the convolving kernel.
chan_in (int) – Number of channels in the input image.
chan_out (in) – Number of channels produced by the convolution.
stride (int, optional, default=1) – Stride of the convolution.
bias (bool, optional, default=False) – If True, adds a learnable bias to the output.
init_type (str, optional, default="xavier_normal") – Change the initialization of parameters.
Shape#
Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)
Weight : ndarray (k_size, chan_in, chan_out)
Bias : ndarray (chan_out)
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.convolution.Flatten[source]#
Bases :
Module
Flatten an output.
Shape#
Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, length * chan_in)
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- class src.convolution.MaxPool1D(k_size, stride)[source]#
Bases :
Module
1D max pooling.
- Paramètres:
k_size (int) – Size of the convolving kernel.
stride (int, optional, default=1) – Stride of the convolution.
Shape#
Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
src.encapsulation module#
- class src.encapsulation.Optim(network: Sequential, loss: Loss, eps: float)[source]#
Bases :
object
- SGD(X, y, batch_size: int, epochs: int, network: Sequential | None = None, shuffle: bool = True, seed: int = 42)[source]#
- SGD_eval(X, y, batch_size: int, epochs: int, test_size: float, patience: int = 10, network: Sequential | None = None, shuffle_train: bool = True, shuffle_test: bool = False, seed: int = 42, return_dataframe: bool = False, online_plot: bool = False)[source]#
src.linear module#
- class src.linear.Linear(input_size: int, output_size: int, bias: bool = True, init_type: Literal['normal', 'uniform', 'zeros', 'ones', 'he_normal', 'he_uniform', 'xavier_normal', 'xavier_uniform'] = 'he_normal')[source]#
Bases :
Module
Linear module.
- Paramètres:
input_size (int) – Size of input sample.
output_size (int) – Size of output sample.
bias (bool, optional, default=False) – If True, adds a learnable bias to the output.
init_type (str, optional, default="normal") – Change the initialization of parameters.
Shape#
Input : ndarray (batch, input_size)
Output : ndarray (batch, output_size)
Weight : ndarray (input_size, output_size)
Bias : ndarray (1, output_size)
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- forward(X)[source]#
Forward pass.
Notes
X @ w = (batch, input_size) @ (input_size, output_size) = (batch, output_size)
src.loss module#
- class src.loss.CELogSoftmax[source]#
Bases :
Loss
\[\text{CE}(y, \hat{y}) = - \log \frac {e^{\hat{y}_y}} {\sum_{i=1}^{K} e^{\hat{y}_i}} = -\hat{y}_y + \log \sum_{i=1}^{K}e^{\hat{y}_i}\]
src.module module#
- class src.module.Module[source]#
Bases :
object
- backward_delta(input, delta)[source]#
Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).
\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]
- backward_update_gradient(input, delta)[source]#
Update gradient value given module.
\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]