Neural Network implementation from scratchΒΆ

In this notebook we will implement a neural network to model the XOR function. We will also create the metric functions as well and discuss the theory of a neural network (backpropagation, various activation functions, etc.). At the end, we will compare the results with a Keras model.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import math
from tensorflow import keras
from tensorflow.keras import layers, metrics

Metrics FunctionsΒΆ

MAE=1nβˆ‘|ytrueβˆ’ypred|MSE=1nβˆ‘(ytrueβˆ’ypred)2RMSE=√1nβˆ‘(ytrueβˆ’ypred)2R2 Score=1βˆ’βˆ‘(ytrueβˆ’ypred)2βˆ‘(ytrueβˆ’ymean)2(1)(2)(3)(4)(1)MAE=1nβˆ‘|ytrueβˆ’ypred|(2)MSE=1nβˆ‘(ytrueβˆ’ypred)2(3)RMSE=1nβˆ‘(ytrueβˆ’ypred)2(4)R2 Score=1βˆ’βˆ‘(ytrueβˆ’ypred)2βˆ‘(ytrueβˆ’ymean)2

In [ ]:
def calc_metrics(y_true, y_pred):
  """
  Calculates and prints mean absolute error, mean squared error, root mean squared error, and r2 score
  @param y_true: Target labels
  @param y_pred: Target predictions
  @return:
  """
  # Residuals
  if len(y_true) != len(y_pred):
    raise ValueError("Mismatched input lengths")

  n = len(y_true)
  residuals_squared = []
  residuals_absolute = []
  sum_squares = 0
  y_mean = np.mean(y_true)
  for t, p in zip(y_true, y_pred):
    residuals_squared.append(math.pow(t - p, 2))
    residuals_absolute.append(abs(t - p))
    sum_squares += math.pow(t - y_mean, 2)

  mae = sum(residuals_absolute) / n
  mse = sum(residuals_squared) / n
  rmse = math.pow(mse, 0.5)
  r2 = 1 - (sum(residuals_squared) / sum_squares)

  print(f"Mean Absolute Error     = {mae:.5f}")
  print(f"Mean Squared Error      = {mse:.5f}")
  print(f"Root Mean Squared Error = {rmse:.5f}")
  print(f"R2 Score                = {r2:.5f}")

TheoryΒΆ

Picture.JPG

A B Q
0 0 0
0 1 1
1 0 1
1 1 0

The dataset is fixed. To model the XOR function, want to create a neural network that will memorize the dataset. In other words, an overfit model to the dataset.

The more neurons there are, the more prone to overfitting the model is.

The model architecture that will be used is

  1. Input Layer: = (A, B)

  2. Hidden Layer: Dense (4 units)

  3. Output Layer: = (Q)

Loss function: Mean squared error

Optimizer: Gradient descent

Different activation functions will be tested.

Total Number of Parameters = 12

Forward PropagationΒΆ

Each neuron has two halves. The left half is denoted as netnet, which is computed from the incoming edges and output of the previous layer. The right half is denoted as outout, which is a(net)a(net) where aiai is an activation function of layer ii.

Hidden Layer 1

neth1=x0Γ—w1+x1Γ—w2outh1=a1(neth1)neth2=x0Γ—w3+x1Γ—w4outh2=a1(neth2)neth3=x0Γ—w5+x1Γ—w6outh3=a1(neth3)neth4=x0Γ—w7+x1Γ—w8outh4=a1(neth4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(5)neth1=x0Γ—w1+x1Γ—w2(6)outh1=a1(neth1)(7)(8)neth2=x0Γ—w3+x1Γ—w4(9)outh2=a1(neth2)(10)(11)neth3=x0Γ—w5+x1Γ—w6(12)outh3=a1(neth3)(13)(14)neth4=x0Γ—w7+x1Γ—w8(15)outh4=a1(neth4)(16)

Output Layer

neto1=outh1Γ—w9+outh2Γ—w10+outh3Γ—w11+outh4Γ—w12outo1=a2(neto1)(17)(18)(17)neto1=outh1Γ—w9+outh2Γ—w10+outh3Γ—w11+outh4Γ—w12(18)outo1=a2(neto1)

BackpropagationΒΆ

Each weight needs to be updated depending on how much error it has contributed to the total error of the model. The error of each weight can be calculated by using backpropagation.

The rule to update each weight is

wi=wiβˆ’Ξ±Γ—βˆ‚Eo1βˆ‚wiwi=wiβˆ’Ξ±Γ—βˆ‚Eo1βˆ‚wi

Forward propagation must be run to update the netnet and outout states of the network as backpropagation will use those values. The algorithm implemented uses a batch size of 1 so the weights will update after each training sample.

Error of the output (only 1 output)

Eo1=12(ytrueβˆ’outo1)2βˆ‚Eo1βˆ‚outo1=βˆ’(ytrueβˆ’outo1)(19)(20)(19)Eo1=12(ytrueβˆ’outo1)2(20)βˆ‚Eo1βˆ‚outo1=βˆ’(ytrueβˆ’outo1)

Updating the weights of the output layer

For w9w9

βˆ‚Eo1βˆ‚w9=βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1Γ—βˆ‚neto1βˆ‚w9(21)(21)βˆ‚Eo1βˆ‚w9=βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1Γ—βˆ‚neto1βˆ‚w9

Finding each partial derivative

βˆ‚Eo1βˆ‚outo1=βˆ’(ytrueβˆ’outo1)βˆ‚outo1βˆ‚neto1=aβ€²2(neto1)βˆ‚neto1βˆ‚w9=outh1(22)(23)(24)(22)βˆ‚Eo1βˆ‚outo1=βˆ’(ytrueβˆ’outo1)(23)βˆ‚outo1βˆ‚neto1=a2β€²(neto1)(24)βˆ‚neto1βˆ‚w9=outh1

Therefore, for w9w9,

βˆ‚Eo1βˆ‚w9=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[outh1](25)(25)βˆ‚Eo1βˆ‚w9=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[outh1]

Similarly, for the other weights to the output layer,

βˆ‚Eo1βˆ‚w10=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[outh2]βˆ‚Eo1βˆ‚w11=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[outh3]βˆ‚Eo1βˆ‚w12=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[outh4](26)(27)(28)(26)βˆ‚Eo1βˆ‚w10=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[outh2](27)βˆ‚Eo1βˆ‚w11=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[outh3](28)βˆ‚Eo1βˆ‚w12=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[outh4]

Updating the weights of the hidden layer

The path from the output error is backpropagated back to the edge that is being updated

For w1w1,

βˆ‚Eo1βˆ‚w1=βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1Γ—βˆ‚neto1βˆ‚outh1Γ—βˆ‚outh1βˆ‚neth1Γ—βˆ‚neth1βˆ‚w1(29)(29)βˆ‚Eo1βˆ‚w1=βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1Γ—βˆ‚neto1βˆ‚outh1Γ—βˆ‚outh1βˆ‚neth1Γ—βˆ‚neth1βˆ‚w1

Finding each partial derivative βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1=Known from updating the output layerβˆ‚neto1βˆ‚outh1=w9βˆ‚outh1βˆ‚neth1=aβ€²1(neth1)βˆ‚neth1βˆ‚w1=Input A(30)(31)(32)(33)(30)βˆ‚Eo1βˆ‚outo1Γ—βˆ‚outo1βˆ‚neto1=Known from updating the output layer(31)βˆ‚neto1βˆ‚outh1=w9(32)βˆ‚outh1βˆ‚neth1=a1β€²(neth1)(33)βˆ‚neth1βˆ‚w1=Input A

Therefore, for w1w1,

βˆ‚Eo1βˆ‚w1=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w9]Γ—[aβ€²1(neth1)]Γ—[Input A](34)(34)βˆ‚Eo1βˆ‚w1=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w9]Γ—[a1β€²(neth1)]Γ—[Input A]

Similarly, for w2w2, also on neuron h1h1, the partial derivatives are the same, except for βˆ‚neth1βˆ‚w2βˆ‚neth1βˆ‚w2 which will be equal to Input BInput B

βˆ‚Eo1βˆ‚w2=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w9]Γ—[aβ€²1(neth1)]Γ—[Input B](35)(35)βˆ‚Eo1βˆ‚w2=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w9]Γ—[a1β€²(neth1)]Γ—[Input B]

Likewise, for all the other weight pairs connected to the hidden neurons,

h2h2

βˆ‚Eo1βˆ‚w3=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w10]Γ—[aβ€²1(neth2)]Γ—[Input A]βˆ‚Eo1βˆ‚w4=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w10]Γ—[aβ€²1(neth2)]Γ—[Input B](36)(37)(36)βˆ‚Eo1βˆ‚w3=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w10]Γ—[a1β€²(neth2)]Γ—[Input A](37)βˆ‚Eo1βˆ‚w4=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w10]Γ—[a1β€²(neth2)]Γ—[Input B]

h3h3

βˆ‚Eo1βˆ‚w5=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w11]Γ—[aβ€²1(neth3)]Γ—[Input A]βˆ‚Eo1βˆ‚w6=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w11]Γ—[aβ€²1(neth3)]Γ—[Input B](38)(39)(38)βˆ‚Eo1βˆ‚w5=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w11]Γ—[a1β€²(neth3)]Γ—[Input A](39)βˆ‚Eo1βˆ‚w6=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w11]Γ—[a1β€²(neth3)]Γ—[Input B]

h4h4

βˆ‚Eo1βˆ‚w7=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w12]Γ—[aβ€²1(neth4)]Γ—[Input A]βˆ‚Eo1βˆ‚w8=[βˆ’(ytrueβˆ’outo1)]Γ—[aβ€²2(neto1)]Γ—[w12]Γ—[aβ€²1(neth4)]Γ—[Input B](40)(41)(40)βˆ‚Eo1βˆ‚w7=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w12]Γ—[a1β€²(neth4)]Γ—[Input A](41)βˆ‚Eo1βˆ‚w8=[βˆ’(ytrueβˆ’outo1)]Γ—[a2β€²(neto1)]Γ—[w12]Γ—[a1β€²(neth4)]Γ—[Input B]

Activation FunctionsΒΆ

Multiple activation functions are implemented to be plug and played into the XOR model.

Linear

a(x)=xaβ€²(x)=1(42)(43)(42)a(x)=x(43)aβ€²(x)=1

Rectified Linear (ReLu)

a(x)=max(0,x)aβ€²(x)=1 if x > 0 else 0(44)(45)(44)a(x)=max(0,x)(45)aβ€²(x)=1 if x > 0 else 0

Sigmoid

a(x)=11+eβˆ’xaβ€²(x)=a(x)(1βˆ’a(x))(46)(47)(46)a(x)=11+eβˆ’x(47)aβ€²(x)=a(x)(1βˆ’a(x))

Tanh

a(x)=tanh(x)aβ€²(x)=1βˆ’a2(x)(48)(49)(48)a(x)=tanh(x)(49)aβ€²(x)=1βˆ’a2(x)

Softplus

a(x)=ln(1+ex)aβ€²(x)=11+eβˆ’x(50)(51)(50)a(x)=ln(1+ex)(51)aβ€²(x)=11+eβˆ’x

Gaussian

a(x)=eβˆ’x2aβ€²(x)=βˆ’2xeβˆ’x2(52)(53)(52)a(x)=eβˆ’x2(53)aβ€²(x)=βˆ’2xeβˆ’x2

In [ ]:
class ActivationFn(object):
  def compute(self, x):
    return x
  def derivative(self, x):
    return 1

class ReluActivationFn(ActivationFn):
  def compute(self, x):
    if x < 0:
      return 0
    return x
  def derivative(self, x):
    if x < 0:
      return 0.0
    return 1.0

class SigmoidActivationFn(ActivationFn):
  def compute(self, x):
    return 1.0 / (1.0 + np.exp(-1*x))
  def derivative(self, x):
    return self.compute(x) * (1 - self.compute(x))

class TanhActivationFn(ActivationFn):
  def compute(self, x):
    return np.tanh(x)
  def derivative(self, x):
    return 1.0 - np.power(self.compute(x), 2)

class SoftplusActivationFn(ActivationFn):
  def compute(self, x):
    return np.log(1 + np.exp(x))
  def derivative(self, x):
    return 1.0 / (1.0 + np.exp(-1*x))

class GaussianActivationFn(ActivationFn):
  def compute(self, x):
    return np.exp(-1*np.power(x, 2))
  def derivative(self, x):
    return -2*x*np.exp(-1*np.power(x, 2))

Neural Network ImplementationΒΆ

In [ ]:
class XORModel(object):
  """
  Represents XOR Neural Network model in the figure above
  =============
  Model Summary
  =============

  input layer:
    input_a
    input_b

  hidden 1 neuron. Activation function = hidden_layer_activation
    h1
    h2
    h3
    h4

  output layers. Activation function = output_layer_activation
    o1

  Total Number of Parameters = 12
  """
  def __init__(self, hidden_layer_activation: ActivationFn, output_layer_activation: ActivationFn, learning_rate=0.01):
    self.n_input_units = 2
    self.n_dense_units = 4
    self.n_outputs = 1
    self.learning_rate = learning_rate
    self.hidden_layer_activation = hidden_layer_activation
    self.output_layer_activation = output_layer_activation

    np.random.seed(23)
    self.h1_weights = np.random.rand(self.n_input_units)
    np.random.seed(13)
    self.h2_weights = np.random.rand(self.n_input_units)
    np.random.seed(8962)
    self.h3_weights = np.random.rand(self.n_input_units)
    np.random.seed(65486)
    self.h4_weights = np.random.rand(self.n_input_units)

    self.layer01_weights = np.array([self.h1_weights, self.h2_weights, self.h3_weights, self.h4_weights])

    np.random.seed(42)
    self.o1_weights = np.random.rand(self.n_dense_units)
    self.layer12_weights = np.array([self.o1_weights])

    self.inputs_ab = np.array([0, 0])

  def predict(self, a, b):
    """
    Predict
    @param a: Input A value
    @param b: Input B value
    @return: Out o1
    """
    self.inputs_ab[0] = a
    self.inputs_ab[1] = b
    self.forward_propagation()
    return self.out_o1

  def fit(self, X, y, epochs=1):
    """
    Train the model using gradient descent with batch size = 1
    @param X: Training samples
    @param y: Labels
    @param epochs: Number of epochs to run
    @return: Loss history
    """
    history = []

    for i in range(epochs):
      epoch_error = 0
      for x, y_true in zip(X, y):
        self.inputs_ab[0] = x[0]
        self.inputs_ab[1] = x[1]
        self.y_true = y_true

        # Update model error state
        self.backpropagation()

        # Update weights
        self.layer01_weights[0][0] = self.layer01_weights[0][0] - self.learning_rate*self.derror_o1_dw1
        self.layer01_weights[0][1] = self.layer01_weights[0][1] - self.learning_rate*self.derror_o1_dw2

        self.layer01_weights[1][0] = self.layer01_weights[1][0] - self.learning_rate*self.derror_o1_dw3
        self.layer01_weights[1][1] = self.layer01_weights[1][1] - self.learning_rate*self.derror_o1_dw4

        self.layer01_weights[2][0] = self.layer01_weights[2][0] - self.learning_rate*self.derror_o1_dw5
        self.layer01_weights[2][1] = self.layer01_weights[2][1] - self.learning_rate*self.derror_o1_dw6

        self.layer01_weights[3][0] = self.layer01_weights[3][0] - self.learning_rate*self.derror_o1_dw7
        self.layer01_weights[3][1] = self.layer01_weights[3][1] - self.learning_rate*self.derror_o1_dw8

        self.layer12_weights[0][0] = self.layer12_weights[0][0] - self.learning_rate*self.derror_o1_dw9
        self.layer12_weights[0][1] = self.layer12_weights[0][1] - self.learning_rate*self.derror_o1_dw10
        self.layer12_weights[0][2] = self.layer12_weights[0][2] - self.learning_rate*self.derror_o1_dw11
        self.layer12_weights[0][3] = self.layer12_weights[0][3] - self.learning_rate*self.derror_o1_dw12

        # Get new error
        self.forward_propagation()

        epoch_error += ((self.y_true - self.out_o1)**2)

      history.append(epoch_error / len(x))
    return history

  def forward_propagation(self):
    """
    Update the state of this instance by forward propagation
    @return: None
    """
    self.net_h1 = np.tensordot(self.layer01_weights[0], self.inputs_ab, axes=1)
    self.out_h1 = self.hidden_layer_activation.compute(self.net_h1)

    self.net_h2 = np.tensordot(self.layer01_weights[1], self.inputs_ab, axes=1)
    self.out_h2 = self.hidden_layer_activation.compute(self.net_h2)

    self.net_h3 = np.tensordot(self.layer01_weights[2], self.inputs_ab, axes=1)
    self.out_h3 = self.hidden_layer_activation.compute(self.net_h3)

    self.net_h4 = np.tensordot(self.layer01_weights[3], self.inputs_ab, axes=1)
    self.out_h4 = self.hidden_layer_activation.compute(self.net_h4)

    self.out_h1234 = np.array([self.out_h1, self.out_h2, self.out_h3, self.out_h4])

    self.net_o1 = np.tensordot(self.out_h1234, self.layer12_weights[0], axes=1)
    self.out_o1 = self.output_layer_activation.compute(self.net_o1)

  def backpropagation(self):
    """
    Update the error states of this instance by backpropagation
    @return: None
    """
    ############
    # layer 12 #
    ############
    self.forward_propagation()

    derror_o1_dout_o1   = -(self.y_true - self.out_o1)
    dout_o1_dnet_o1     = self.output_layer_activation.derivative(self.net_o1)
    dnet_o1_dw9         = self.out_h1
    dnet_o1_dw10        = self.out_h2
    dnet_o1_dw11        = self.out_h3
    dnet_o1_dw12        = self.out_h4

    self.derror_o1_dw9  = derror_o1_dout_o1 * dout_o1_dnet_o1 * dnet_o1_dw9
    self.derror_o1_dw10 = derror_o1_dout_o1 * dout_o1_dnet_o1 * dnet_o1_dw10
    self.derror_o1_dw11 = derror_o1_dout_o1 * dout_o1_dnet_o1 * dnet_o1_dw11
    self.derror_o1_dw12 = derror_o1_dout_o1 * dout_o1_dnet_o1 * dnet_o1_dw12

    ############
    # layer 01 #
    ############
    derror_o1_dnet_o1   = derror_o1_dout_o1 * dout_o1_dnet_o1
    # w1, w2
    dnet_o1_dout_h1     = self.layer12_weights[0][0] # w9
    dout_h1_dnet_h1     = self.hidden_layer_activation.derivative(self.net_h1)
    dnet_h1_dw1         = self.inputs_ab[0] # a
    dnet_h1_dw2         = self.inputs_ab[1] # b

    self.derror_o1_dw1  = derror_o1_dnet_o1 * dnet_o1_dout_h1 * dout_h1_dnet_h1 * dnet_h1_dw1
    self.derror_o1_dw2  = derror_o1_dnet_o1 * dnet_o1_dout_h1 * dout_h1_dnet_h1 * dnet_h1_dw2

    # w3, w4
    dnet_o1_dout_h2     = self.layer12_weights[0][1] # w10
    dout_h2_dnet_h2     = self.hidden_layer_activation.derivative(self.net_h2)
    dnet_h2_dw3         = self.inputs_ab[0] # a
    dnet_h2_dw4         = self.inputs_ab[1] # b

    self.derror_o1_dw3  = derror_o1_dnet_o1 * dnet_o1_dout_h2 * dout_h2_dnet_h2 * dnet_h2_dw3
    self.derror_o1_dw4  = derror_o1_dnet_o1 * dnet_o1_dout_h2 * dout_h2_dnet_h2 * dnet_h2_dw4

    # w5, w6
    dnet_o1_dout_h3     = self.layer12_weights[0][2] # w11
    dout_h3_dnet_h3     = self.hidden_layer_activation.derivative(self.net_h3)
    dnet_h3_dw5         = self.inputs_ab[0] # a
    dnet_h3_dw6         = self.inputs_ab[1] # b

    self.derror_o1_dw5  = derror_o1_dnet_o1 * dnet_o1_dout_h3 * dout_h3_dnet_h3 * dnet_h3_dw5
    self.derror_o1_dw6  = derror_o1_dnet_o1 * dnet_o1_dout_h3 * dout_h3_dnet_h3 * dnet_h3_dw6

    # w7, w8
    dnet_o1_dout_h4     = self.layer12_weights[0][3] # w12
    dout_h4_dnet_h4     = self.hidden_layer_activation.derivative(self.net_h4)
    dnet_h4_dw7         = self.inputs_ab[0] # a
    dnet_h4_dw8         = self.inputs_ab[1] # b

    self.derror_o1_dw7  = derror_o1_dnet_o1 * dnet_o1_dout_h4 * dout_h4_dnet_h4 * dnet_h4_dw7
    self.derror_o1_dw8  = derror_o1_dnet_o1 * dnet_o1_dout_h4 * dout_h4_dnet_h4 * dnet_h4_dw8

TrainingΒΆ

In this section the model will be trained and also will try different activation functions to select the best model.

In [ ]:
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]

def plot_xor_model_error(e):
  """
  Plot error of XOR model
  @param e: Loss history
  @return: None
  """
  fig, ax = plt.subplots()
  ax.set_title('Loss (Mean squared error)')
  ax.plot(e)
  ax.grid(visible=True)
  ax.set_ylabel("Loss")
  ax.set_xlabel("Epoch")

def test_xor_model(model_under_test, epochs):
  """
  Utility function to run various tests on an XOR model with the method "predict"
  @param model_under_test: The model with the predict method
  @param epochs: Number of epochs to train
  @return: None
  """
  y_pred_1 = []
  y_pred_2 = []

  print("Initial predictions")
  print(f"X | Y | Q True   | Q Predicted")
  print(f"------------------------------")
  for x, y_true in zip(X, y):
    y_pred = model_under_test.predict(x[0], x[1])
    y_pred_1.append(y_pred)
    print(f"{x[0]} | {x[1]} | {y_true}        | {y_pred}")
  calc_metrics(y, y_pred_1)
  error_m = m.fit(X, y, epochs=epochs)
  print("===========================")
  print("Predictions after training")
  print(f"X | Y | Q True   | Q Predicted")
  print(f"------------------------------")
  for x, y_true in zip(X, y):
    y_pred = model_under_test.predict(x[0], x[1])
    y_pred_2.append(y_pred)
    print(f"{x[0]} | {x[1]} | {y_true}        | {y_pred}")
  calc_metrics(y, y_pred_2)
  plot_xor_model_error(error_m)

Gaussian activation functionΒΆ

In [ ]:
m = XORModel(learning_rate=0.01, hidden_layer_activation=GaussianActivationFn(), output_layer_activation=GaussianActivationFn())
test_xor_model(m, 5000)
Initial predictions
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.0008640834548181034
0 | 1 | 1        | 0.023038474655564935
1 | 0 | 1        | 0.019038121277748808
1 | 1 | 0        | 0.3781749380990913
Mean Absolute Error     = 0.58424
Mean Squared Error      = 0.51494
Root Mean Squared Error = 0.71759
R2 Score                = -1.05976
===========================
Predictions after training
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.02830903562522035
0 | 1 | 1        | 0.9869397507295415
1 | 0 | 1        | 0.9888564478317782
1 | 1 | 0        | 0.020690989391717766
Mean Absolute Error     = 0.01830
Mean Squared Error      = 0.00038
Root Mean Squared Error = 0.01952
R2 Score                = 0.99848
No description has been provided for this image

Other activation functionsΒΆ

Tanh ModelΒΆ

In [ ]:
m = XORModel(learning_rate=0.5, hidden_layer_activation=TanhActivationFn(), output_layer_activation=TanhActivationFn())
test_xor_model(m, 400)
Initial predictions
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.0
0 | 1 | 1        | 0.8294150947494272
1 | 0 | 1        | 0.8366625823802332
1 | 1 | 0        | 0.962012506115427
Mean Absolute Error     = 0.32398
Mean Squared Error      = 0.24531
Root Mean Squared Error = 0.49529
R2 Score                = 0.01875
===========================
Predictions after training
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.0
0 | 1 | 1        | 0.9693165049621585
1 | 0 | 1        | 0.967214884205445
1 | 1 | 0        | -0.1289160080290679
Mean Absolute Error     = 0.04810
Mean Squared Error      = 0.00466
Root Mean Squared Error = 0.06826
R2 Score                = 0.98136
No description has been provided for this image

Softplus ModelΒΆ

In [ ]:
m = XORModel(learning_rate=0.7, hidden_layer_activation=SoftplusActivationFn(), output_layer_activation=SoftplusActivationFn())
test_xor_model(m, 5000)
Initial predictions
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 1.9882063689622802
0 | 1 | 1        | 2.728134832244606
1 | 0 | 1        | 2.695628231054407
1 | 1 | 0        | 3.6223532831710012
Mean Absolute Error     = 2.25858
Mean Squared Error      = 5.73400
Root Mean Squared Error = 2.39458
R2 Score                = -21.93601
===========================
Predictions after training
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.0317827685033709
0 | 1 | 1        | 1.103553898571937
1 | 0 | 1        | 1.2122441621677822
1 | 1 | 0        | 0.13469700774795554
Mean Absolute Error     = 0.12057
Mean Squared Error      = 0.01873
Root Mean Squared Error = 0.13686
R2 Score                = 0.92508
No description has been provided for this image

Compare to Keras tanh ModelΒΆ

In [ ]:
x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])
model5 = keras.models.Sequential()
model5.add(layers.InputLayer(input_shape=(2)))
model5.add(layers.Dense(4, activation="tanh", use_bias=False))
model5.add(layers.Dense(1, activation="tanh", use_bias=False))
model5.compile(loss="mean_squared_error", optimizer="Adam", metrics=[metrics.MeanSquaredError()])
model5_history = model5.fit(x, y, epochs=10000, verbose=False)
plot_xor_model_error(model5_history.history['loss'])
y_pred = []
print(f"X | Y | Q True   | Q Predicted")
print(f"------------------------------")
for test_x, y_true in zip(x, y):
  y_ = model5.predict(np.array([test_x]), verbose=False)[0][0]
  y_pred.append(y_)
  print(f"0 | 0 | {y_true}        | {y_}")
calc_metrics(y, y_pred)
X | Y | Q True   | Q Predicted
------------------------------
0 | 0 | 0        | 0.0
0 | 0 | 1        | 0.9880496263504028
0 | 0 | 1        | 0.9868031740188599
0 | 0 | 0        | 0.00022530555725097656
Mean Absolute Error     = 0.00634
Mean Squared Error      = 0.00008
Root Mean Squared Error = 0.00890
R2 Score                = 0.99968
No description has been provided for this image