Skip to main content

Command Palette

Search for a command to run...

Implementing Deep learning model in Python

L layer model implementation

Published
6 min read
Implementing Deep learning model in Python

In this blog we will try to implement binary classification problem using python. For this example we will use the data downloaded through Scikit dataset which comes as part of scikit datasets. The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset can also be downloaded from: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

From the given data set of the breast cancer , we will try to classify to classify whether it is a malignant cancer or benign cancer.

My previous blog https://ml-world.hashnode.dev/single-layer-neural-network has discussed about the equations which will be leveraged here by implementing in Python .

Lets import the packages in python

import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
%autoreload 2
np.random.seed(1)

Now we load the Breast Cancer data using Scikit and take a quick look what it contains

data = load_breast_cancer()

print (data.feature_names)
print (data.target_names)
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
['malignant' 'benign']

Data contains 30 features and Y label tells if cancer is malignant or benign. We divide the data as training data and test with test data being 20% of available samples.

x_data=data[0]
y_lable=data[1]
shape of x_data=(569, 30)
shape of y_lable=(569,))
 train_x.shape=(455, 30)
 test_x.shape=(114, 30)

Now we have training sample of shape (455,30), 455 rows of data with 30 features. Test sample has shape of (114,30) , 114 rows with 30 features, Target training labels has shape(1,455) and test label has shape (1,114).

Architecture for this implementation in simple terms can be represented as shown in below picture

  • The input is a vector of size (30,455 ) where 30 features are vertically stacked.

  • The corresponding vector: is then multiplied by the weight matrix and then we add the bias . The result is called the linear unit.

  • Next, we take the relu of the linear unit. This process is repeated several times for each layer (L-1) .

  • Finally, we take the sigmoid of the final linear unit. If it is greater than 0.5, classify it as a malignant.

Steps

We will follow the Deep learning method to build the model, steps involved are

  1. Initialize parameters( weights w and bias b) and Define hyperparameters

    Now loop for the number of epochs carrying out

  2. Forward propagation

  3. Cost computation

  4. Backward propagation

  5. Updating Parameters

  6. Train the model

  7. Use trained parameters to predict labels from test data set

1. Initialize parameters

  • Weight matrices uses random initialization using np.random.randn(d0, d1, ..., dn) * 0.01.

  • Biases are initialized using zeros initialization

 for l in range(1, L):
        params['W' + str(l)] = np.random.randn(layers_dimention[l], layers_dimention[l-1]) / np.sqrt(layers_dimention[l-1]) #*0.01
        params['b' + str(l)] = np.zeros((layers_dimention[l], 1))

2. Forward Propagation

Forward propagation will include calculation of

\(Z=W^Tx+b \)

 Z = W.dot(previous_A) + b

and activation function ReLU \( ReLU(Z) = max(0, Z) \) for L-1 layers and

  A = np.maximum(0,Z)

sigmoid $sigma(Z)$ for layer L

A = 1/(1+np.exp(-Z))

All the parameters and activation values are saved in cache to be used in backward propagation.

3.Cost computation

Cross-entropy cost is computed using the formula \( \begin{flalign*} & J(w,b)=1/m \sum_1^m L(\hat{y^i},y^i)=-1/m\sum_1^m y^i \log{}\hat{y^i} +(1-y^i)\log(1-\hat{y^i})) &\\ \end{flalign*}\)

cost = (1./m) * (-np.dot(Y,np.log(AL).T) - np.dot(1-Y, np.log(1-AL).T))

4.Backward propagation

backward propagation is used to calculate the gradient descent of the loss function with respect to the parameters weights and bias. Just like forward propagation backward propagation is calculated by multiple steps

  • Calculating derivative of loss with respect to activation \(\frac{dL}{dA}\)for last layer L
    # Initializing the backpropagation, derivative of loss with respect to activation dL/dA
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
  • Calculating dZ derivative of Loss with respect to Z \(\frac{dL}{dZ}\)or \(dZ^{l}=dA^{l}* g^{'}(Z)\) where \(g^{'}(Z) \) is derivative of activation function , if last layer then sigmoid otherwise ReLU

if Relu for l layers then \(dZ^{l}\)

 dZ = np.array(dA, copy=True) # just converting dz to a correct object.

 # When z <= 0, you should set dz to 0 as well. 
 dZ[Z <= 0] = 0
  • if sigmoid for last layer L then \(dZ^{l}\)
 a = 1/(1+np.exp(-Z))
 dZ = dA * a * (1-a)
  • With dZ calculated now we calculate derivative of Loss with respect to each \(dW^{l}\) , \(db^{l}\) an activation for previous layer \(dA^{l-1}\)

  • \(dW^{l}=\frac{1}{m}dZ^{l}A^{(l-1)T}\)

  • \(db^{l}=\frac{1}{m}dZ^{l}\)

  • \(dA^{l-1}=W^{(l)T}dZ^{l}\)

 dW = 1./m * np.dot(dZ,A_prev.T)
 db = 1./m * np.sum(dZ, axis = 1, keepdims = True)
 dA_prev = np.dot(W.T,dZ)

5.Updating parameters

We update parameters using gradient descent on every \(W^{l}\)and \(b^{l}\)for \(l=1,2....L\)

\(\begin{flalign*} &W^{l}=W^{l}-\alpha dW^{l} &\\ & b^{l}=b^{l}-\alpha db^{l} \end{flalign*}\)

where \(\alpha\) is a learning rate

for l in range(L):
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]

6.Train the model

Lets define the constants for training our model

### CONSTANTS ###
layers_dimention = [30,3, 2,1] #  4-layer model
learning_rate = 0.01

Now we call the our model by calling function model

parameters, costs = model(train_x, train_y, layers_dimention,learning_rate, epoch = 20000, print_cost = True)

we print the iteration number and the cost over each epoch

Cost after iteration 0: 0.702932778627873
Cost after iteration 1000: 0.2679702110551074
Cost after iteration 2000: 0.21951025802491067
Cost after iteration 3000: 0.20751873459457024
Cost after iteration 4000: 0.2054742445528246
Cost after iteration 5000: 0.18955923724215862
Cost after iteration 6000: 0.18457018072319623
Cost after iteration 7000: 0.1805046963942877
Cost after iteration 8000: 0.1769964493345927
Cost after iteration 9000: 0.17448193948857382
Cost after iteration 10000: 0.17253399062967198
Cost after iteration 11000: 0.16886680377190746
Cost after iteration 12000: 0.16859260037770446
Cost after iteration 13000: 0.16699795347321172
Cost after iteration 14000: 0.16560844330274763
Cost after iteration 15000: 0.16432314394048772
Cost after iteration 16000: 0.16322862127537052
Cost after iteration 17000: 0.1623925528244854
Cost after iteration 18000: 0.16117369207763999
Cost after iteration 19000: 0.1601388641555328
Cost after iteration 19999: 0.1604861479844001

As we can cost seems to stabilize after 11000 iteration after which model will start overfitting.

Lets plot the cost with respect to iteration and call it

plot_costs(costs, learning_rate)
def plot_costs(costs, learning_rate=0.0075):
    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()

Lets predict the accuracy on training data

pred_train = predict(train_x, train_y, parameters)
Accuracy: 0.9296703296703294

Lets run prediction on test data

pred_test = predict(test_x, test_y, parameters)
Accuracy: 0.8859649122807017

lets print the predicted labels

array([[0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1.,
        0., 0., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
        0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1.,
        0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1.,
        1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
        0., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,
        0., 1.]])

o stands for benign and 1 stands for malignant

This model can be further fine tuned by manipulating hyperparameters , layers and hidden inputs inside layers and see if we can get better accuracy.

Repo of the model is accessible as Jupyter notebook in github at https://github.com/learner14/deeplearning

More from this blog

Path To Machine Learning

37 posts