src package#

Submodules#

src.activation module#

class src.activation.LeakyReLU(alpha=0.01)[source]#

Bases : Module

Leaky ReLU activation function.

\[\begin{split}\text{LeakyReLU}(x) = \max(\alpha x, x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \alpha \times x, & \text{ otherwise } \end{cases}\end{split}\]

backward_delta(input, delta)[source]#: \[\begin{split}\frac{\partial M}{\partial z^h} = \begin{cases} 1 & \text{if } x>0, \\ \alpha & \text{otherwise}. \end{cases}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.LogSoftmax[source]#

Bases : Module

LogSoftmax activation function.

\[\text{LogSoftmax}(x_{i}) = \log \left( \frac{\exp(x_i)}{\sum_j \exp(x_j)} \right)\]

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.ReLU[source]#

Bases : Module

ReLU (rectified linear unit) activation function.

\[\text{ReLU}(x) = x^+ = \max(0, x)\]

backward_delta(input, delta)[source]#: \[\frac{\partial M}{\partial z^h} = 1 \text{ if } x > 0 \text{ else } 0.\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.Sigmoid[source]#

Bases : Module

Sigmoid activation function.

\[\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + \exp(-x)}\]

backward_delta(input, delta)[source]#: \[\frac{\partial M}{\partial z^h} = \sigma(z^h) * (1 - \sigma(z^h))\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.Softmax[source]#

Bases : Module

Softmax activation function. Commonly used along with a cross entropy loss. See [Softmax and cross-entropy loss](https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/) and [Derivative of Cross Entropy Loss with Softmax](https://www.parasdahal.com/softmax-crossentropy)

\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

backward_delta(input, delta)[source]#: \[\frac{\partial M(x_i)}{\partial x_i} = M^h(x_i) * (1 - M^h(x_i))\]

Plus précisement

\[\begin{split}\frac{\partial M^h(x_i)}{x_j} = \begin{cases} M^h(x_i) * ( 1 - M^h(x_j) ) &\text{si } i = j \\ - M^h(x_j) M^h(x_i) &\text{ si } i \neq j \\ \end{cases}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Implemented using a log sum exp trick to avoid NaN. See [Computing softmax and numerical stability](https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/).

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.Softplus[source]#

Bases : Module

Smooth approximation of the ReLU activation function.

\[\text{Softplus}(x) = \ln(1 + e^x)\]

backward_delta(input, delta)[source]#: \[\frac{\partial M}{\partial z^h} = \sigma (x) = \frac{1}{1 + e^{-x}}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.StableSigmoid[source]#

Bases : Module

Numerically stable Sigmoid activation function.

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.activation.TanH[source]#

Bases : Module

Hyperbolic Tangent activation function.

\[\begin{split}\begin{align*} \text{TanH}(x) &= \tanh(x) \\ &= \frac{\sinh x}{\cosh x} \\ &= \frac{\exp(x) - \exp(-x)} {\exp(x) + \exp(-x)} \\ &= \frac{e^{2x} - 1 }{e^{2x} + 1} \end{align*}\end{split}\]

backward_delta(input, delta)[source]#: \[\frac{\partial M}{\partial z^h} = 1 - \tanh (z^h)^2\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

src.convolution module#

We tried to vectorize our convolutions to the maximum, prioritizing the performance.

It implies creating special views of our array, by using the numpy.lib.stride_tricks functions. sliding_window_view is the easiest to understand, while maybe not the fastest compared to as_strided (but maybe less risky too).

The calculations are done using np.einsum, which is relatively easy to understand and use. The key relies in understanding the shapes of your inputs/outputs.

Shape#

Reminder for 1D:

input : ndarray (batch, length, chan_in)
d_out : ndarray (batch, length, chan_in) == input.shape
X_view : ndarray (batch, out_length, chan_in, self.k_size)
delta : ndarray (batch, out_length, chan_out)
_gradient[« weight »] : ndarray (k_size, chan_in, chan_out)
_parameters[« weight »] : ndarray (k_size, chan_in, chan_out)

Notes#

Notation used for np.einsum:

b : batch_size
w : width (2D) / length (1D)
h : height (2D)
o : out_width (2D) / out_length (1D)
p : out_height (2D)
c : chan_in
d : chan_out
k : k_size (ij for 2D)

Examples#

Quick demonstration of sliding_window_view in 1D: .. code-block:: python

>>> batch, length, chan_in, k_size = 1, 8, 1, 3
>>> input = np.random.randn(batch, length, chan_in)
>>> input
array([[[-0.41982262],
      [ 1.10111123],
      [-0.41115195],
      [ 1.18733225],
      [-1.93463567],
      [-0.22472025],
      [-0.30581971],
      [ 0.40578667]]])

>>> window = np.lib.stride_tricks.sliding_window_view(input, (1, k_size, chan_in))
>>> window
array([[[[[[-0.41982262],
         [ 1.10111123],
         [-0.41115195]]]],
      [[[[ 1.10111123],
         [-0.41115195],
         [ 1.18733225]]]],
   ...

How to deal with stride != 1? .. code-block:: python

>>> stride = 3
>>> window = np.lib.stride_tricks.sliding_window_view(input, (1, k_size, chan_in))[::1, ::stride, ::1]
>>> window
array([[[[[[-0.41982262],
         [ 1.10111123],
         [-0.41115195]]]],
      [[[[ 1.18733225],
         [-1.93463567],
         [-0.22472025]]]]]])

Then it is just a matter of reshape, to drop unnecessaries dimensions, e.g. : .. code-block:: python

>>> window = window.reshape(batch, out_length, chan_in, k_size)
>>> window
array([[[[-0.41982262,  1.10111123, -0.41115195]],
      [[ 1.18733225, -1.93463567, -0.22472025]]]])

And voilà!

class src.convolution.AvgPool1D(k_size, stride)[source]#

Bases : Module

1D average pooling.

Paramètres:

k_size (int) – Size of the convolving kernel.
stride (int, optional, default=1) – Stride of the convolution.

Shape#

Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(x, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.convolution.Conv1D(k_size: int, chan_in: int, chan_out: int, stride: int = 1, bias: bool = False, init_type: Literal['normal', 'uniform', 'zeros', 'ones', 'he_normal', 'he_uniform', 'xavier_normal', 'xavier_uniform'] = 'xavier_normal')[source]#

Bases : Module

1D convolution.

Paramètres:

k_size (int) – Size of the convolving kernel.
chan_in (int) – Number of channels in the input image.
chan_out (in) – Number of channels produced by the convolution.
stride (int, optional, default=1) – Stride of the convolution.
bias (bool, optional, default=False) – If True, adds a learnable bias to the output.
init_type (str, optional, default="xavier_normal") – Change the initialization of parameters.

Shape#

Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)
Weight : ndarray (k_size, chan_in, chan_out)
Bias : ndarray (chan_out)

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.convolution.Flatten[source]#

Bases : Module

Flatten an output.

Shape#

Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, length * chan_in)

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

class src.convolution.MaxPool1D(k_size, stride)[source]#

Bases : Module

1D max pooling.

Paramètres:

k_size (int) – Size of the convolving kernel.
stride (int, optional, default=1) – Stride of the convolution.

Shape#

Input : ndarray (batch, length, chan_in)
Output : ndarray (batch, (length - k_size) // stride + 1, chan_out)

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#: Passe forward.

update_parameters(learning_rate)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

src.encapsulation module#

class src.encapsulation.Optim(network: Sequential, loss: Loss, eps: float)[source]#

Bases : object

SGD(X, y, batch_size: int, epochs: int, network: Sequential | None = None, shuffle: bool = True, seed: int = 42)[source]#

SGD_eval(X, y, batch_size: int, epochs: int, test_size: float, patience: int = 10, network: Sequential | None = None, shuffle_train: bool = True, shuffle_test: bool = False, seed: int = 42, return_dataframe: bool = False, online_plot: bool = False)[source]#

score(X, y)[source]#

step(batch_x, batch_y)[source]#

class src.encapsulation.Sequential(*args: Module)[source]#

Bases : object

add(module: Module)[source]#: Add a module to the network.

backward(input, delta)[source]#

forward(input)[source]#

insert(idx: int, module: Module)[source]#: Insert a module to the network at a specified indice.

reset()[source]#: Reset network to initial parameters and modules.

update_parameters(eps=0.001)[source]#

zero_grad()[source]#

src.linear module#

class src.linear.Linear(input_size: int, output_size: int, bias: bool = True, init_type: Literal['normal', 'uniform', 'zeros', 'ones', 'he_normal', 'he_uniform', 'xavier_normal', 'xavier_uniform'] = 'he_normal')[source]#

Bases : Module

Linear module.

Paramètres:

input_size (int) – Size of input sample.
output_size (int) – Size of output sample.
bias (bool, optional, default=False) – If True, adds a learnable bias to the output.
init_type (str, optional, default="normal") – Change the initialization of parameters.

Shape#

Input : ndarray (batch, input_size)
Output : ndarray (batch, output_size)
Weight : ndarray (input_size, output_size)
Bias : ndarray (1, output_size)

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

forward(X)[source]#

Forward pass.

Notes

X @ w = (batch, input_size) @ (input_size, output_size) = (batch, output_size)

update_parameters(learning_rate=0.001)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.

src.loss module#

class src.loss.BCELoss[source]#

Bases : Loss

Binary Cross Entropy loss function.

backward(y, yhat)[source]#

forward(y, yhat)[source]#

class src.loss.CELogSoftmax[source]#

Bases : Loss

\[\text{CE}(y, \hat{y}) = - \log \frac {e^{\hat{y}_y}} {\sum_{i=1}^{K} e^{\hat{y}_i}} = -\hat{y}_y + \log \sum_{i=1}^{K}e^{\hat{y}_i}\]

backward(y, yhat)[source]#

forward(y, yhat)[source]#

class src.loss.CrossEntropyLoss[source]#

Bases : Loss

Cross Entropy loss function.

backward(y, yhat)[source]#

forward(y, yhat)[source]#

class src.loss.MSELoss[source]#

Bases : Loss

Mean Squared Error loss function.

\[MSE = ||y - \hat{y}||^2\]

\[\nabla_{MSE} = -2(y - \hat{y})\]

backward(y, yhat)[source]#

forward(y, yhat)[source]#

src.module module#

class src.module.Loss[source]#

Bases : object

backward(y, yhat)[source]#

forward(y, yhat)[source]#

class src.module.Module[source]#

Bases : object

backward_delta(input, delta)[source]#: Calculates the derivative of the error and the next delta (derivative of the module with respect to the to the inputs).

\[\begin{split}\delta_j^{h-1}=\frac{\partial L}{\partial z_j^{h-1}}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial z_j^{h-1}}, \text { let } \nabla_{\mathbf{z}^{h-1}} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{z_1^{h-1}} & \frac{\partial z_2^h}{z_1^{h-1}} & \cdots \\ \frac{\partial z_2^h}{z_2^{h-1}} & \ddots & \cdots \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

backward_update_gradient(input, delta)[source]#: Update gradient value given module.

\[\begin{split}\frac{\partial L}{\partial w_i^h}=\sum_k \frac{\partial L}{\partial z_k^h} \frac{\partial z_k^h}{\partial w_i^h}=\sum_k \delta_k^h \frac{\partial z_k^h}{\partial w_i^h}, \text { let } \nabla_{\mathbf{w}^h} L=\left(\begin{array}{ccc} \frac{\partial z_1^h}{\partial w_1^h} & \frac{\partial z_2^h}{\partial w_1^h} & \cdots \\ \frac{\partial z_1^h}{\partial w_2^h} & \ddots & \\ \vdots & \end{array}\right) \nabla_{\mathbf{z}^h L}\end{split}\]

calculate_gain()[source]#

forward(X)[source]#: Passe forward.

update_parameters(learning_rate=0.001)[source]#: Update the parameters according to the calculated gradient and the learning rate.

zero_grad()[source]#: Réinitialise le gradient.