Implementing Deep learning model in Python
L layer model implementation

In this blog we will try to implement binary classification problem using python. For this example we will use the data downloaded through Scikit dataset which comes as part of scikit datasets. The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset can also be downloaded from: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic
From the given data set of the breast cancer , we will try to classify to classify whether it is a malignant cancer or benign cancer.
My previous blog https://ml-world.hashnode.dev/single-layer-neural-network has discussed about the equations which will be leveraged here by implementing in Python .
Lets import the packages in python
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
%autoreload 2
np.random.seed(1)
Now we load the Breast Cancer data using Scikit and take a quick look what it contains
data = load_breast_cancer()
print (data.feature_names)
print (data.target_names)
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']
['malignant' 'benign']
Data contains 30 features and Y label tells if cancer is malignant or benign. We divide the data as training data and test with test data being 20% of available samples.
x_data=data[0]
y_lable=data[1]
shape of x_data=(569, 30)
shape of y_lable=(569,))
train_x.shape=(455, 30)
test_x.shape=(114, 30)
Now we have training sample of shape (455,30), 455 rows of data with 30 features. Test sample has shape of (114,30) , 114 rows with 30 features, Target training labels has shape(1,455) and test label has shape (1,114).
Architecture for this implementation in simple terms can be represented as shown in below picture

The input is a vector of size (30,455 ) where 30 features are vertically stacked.
The corresponding vector: is then multiplied by the weight matrix and then we add the bias . The result is called the linear unit.
Next, we take the relu of the linear unit. This process is repeated several times for each layer (L-1) .
Finally, we take the sigmoid of the final linear unit. If it is greater than 0.5, classify it as a malignant.
Steps
We will follow the Deep learning method to build the model, steps involved are
Initialize parameters( weights w and bias b) and Define hyperparameters
Now loop for the number of epochs carrying out
Forward propagation
Cost computation
Backward propagation
Updating Parameters
Train the model
Use trained parameters to predict labels from test data set
1. Initialize parameters
Weight matrices uses random initialization using
np.random.randn(d0, d1, ..., dn) * 0.01.Biases are initialized using zeros initialization
for l in range(1, L):
params['W' + str(l)] = np.random.randn(layers_dimention[l], layers_dimention[l-1]) / np.sqrt(layers_dimention[l-1]) #*0.01
params['b' + str(l)] = np.zeros((layers_dimention[l], 1))
2. Forward Propagation
Forward propagation will include calculation of
\(Z=W^Tx+b \)
Z = W.dot(previous_A) + b
and activation function ReLU \( ReLU(Z) = max(0, Z) \) for L-1 layers and
A = np.maximum(0,Z)
sigmoid $sigma(Z)$ for layer L
A = 1/(1+np.exp(-Z))
All the parameters and activation values are saved in cache to be used in backward propagation.
3.Cost computation
Cross-entropy cost is computed using the formula \( \begin{flalign*} & J(w,b)=1/m \sum_1^m L(\hat{y^i},y^i)=-1/m\sum_1^m y^i \log{}\hat{y^i} +(1-y^i)\log(1-\hat{y^i})) &\\ \end{flalign*}\)
cost = (1./m) * (-np.dot(Y,np.log(AL).T) - np.dot(1-Y, np.log(1-AL).T))
4.Backward propagation
backward propagation is used to calculate the gradient descent of the loss function with respect to the parameters weights and bias. Just like forward propagation backward propagation is calculated by multiple steps
- Calculating derivative of loss with respect to activation \(\frac{dL}{dA}\)for last layer L
# Initializing the backpropagation, derivative of loss with respect to activation dL/dA
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
- Calculating dZ derivative of Loss with respect to Z \(\frac{dL}{dZ}\)or \(dZ^{l}=dA^{l}* g^{'}(Z)\) where \(g^{'}(Z) \) is derivative of activation function , if last layer then sigmoid otherwise ReLU
if Relu for l layers then \(dZ^{l}\)
dZ = np.array(dA, copy=True) # just converting dz to a correct object.
# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0
- if sigmoid for last layer L then \(dZ^{l}\)
a = 1/(1+np.exp(-Z))
dZ = dA * a * (1-a)
With dZ calculated now we calculate derivative of Loss with respect to each \(dW^{l}\) , \(db^{l}\) an activation for previous layer \(dA^{l-1}\)
\(dW^{l}=\frac{1}{m}dZ^{l}A^{(l-1)T}\)
\(db^{l}=\frac{1}{m}dZ^{l}\)
\(dA^{l-1}=W^{(l)T}dZ^{l}\)
dW = 1./m * np.dot(dZ,A_prev.T)
db = 1./m * np.sum(dZ, axis = 1, keepdims = True)
dA_prev = np.dot(W.T,dZ)
5.Updating parameters
We update parameters using gradient descent on every \(W^{l}\)and \(b^{l}\)for \(l=1,2....L\)
\(\begin{flalign*} &W^{l}=W^{l}-\alpha dW^{l} &\\ & b^{l}=b^{l}-\alpha db^{l} \end{flalign*}\)
where \(\alpha\) is a learning rate
for l in range(L):
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]
6.Train the model
Lets define the constants for training our model
### CONSTANTS ###
layers_dimention = [30,3, 2,1] # 4-layer model
learning_rate = 0.01
Now we call the our model by calling function model
parameters, costs = model(train_x, train_y, layers_dimention,learning_rate, epoch = 20000, print_cost = True)
we print the iteration number and the cost over each epoch
Cost after iteration 0: 0.702932778627873
Cost after iteration 1000: 0.2679702110551074
Cost after iteration 2000: 0.21951025802491067
Cost after iteration 3000: 0.20751873459457024
Cost after iteration 4000: 0.2054742445528246
Cost after iteration 5000: 0.18955923724215862
Cost after iteration 6000: 0.18457018072319623
Cost after iteration 7000: 0.1805046963942877
Cost after iteration 8000: 0.1769964493345927
Cost after iteration 9000: 0.17448193948857382
Cost after iteration 10000: 0.17253399062967198
Cost after iteration 11000: 0.16886680377190746
Cost after iteration 12000: 0.16859260037770446
Cost after iteration 13000: 0.16699795347321172
Cost after iteration 14000: 0.16560844330274763
Cost after iteration 15000: 0.16432314394048772
Cost after iteration 16000: 0.16322862127537052
Cost after iteration 17000: 0.1623925528244854
Cost after iteration 18000: 0.16117369207763999
Cost after iteration 19000: 0.1601388641555328
Cost after iteration 19999: 0.1604861479844001
As we can cost seems to stabilize after 11000 iteration after which model will start overfitting.
Lets plot the cost with respect to iteration and call it
plot_costs(costs, learning_rate)
def plot_costs(costs, learning_rate=0.0075):
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()

Lets predict the accuracy on training data
pred_train = predict(train_x, train_y, parameters)
Accuracy: 0.9296703296703294
Lets run prediction on test data
pred_test = predict(test_x, test_y, parameters)
Accuracy: 0.8859649122807017
lets print the predicted labels
array([[0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1.,
0., 0., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1.,
0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
0., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,
0., 1.]])
o stands for benign and 1 stands for malignant
This model can be further fine tuned by manipulating hyperparameters , layers and hidden inputs inside layers and see if we can get better accuracy.
Repo of the model is accessible as Jupyter notebook in github at https://github.com/learner14/deeplearning




