Getting Started With Neural Networks

from pathlib import Path

DATA_DIR = Path("/kaggle/input")
if (DATA_DIR / "ucfai-core-fa19-nns").exists():
    DATA_DIR /= "ucfai-core-fa19-nns"
elif DATA_DIR.exists():
    # no-op to keep the proper data path for Kaggle
    # You'll need to download the data from Kaggle and place it in the `data/`
    #   directory beside this notebook.
    # The data should be here:
    DATA_DIR = Path("data")

Before we get started

We need to import some libraries. PyTorch is imported as just torch, otherwise we’ve seen everything else before.

import numpy as np
import pandas as pd
import torch 
import torch.nn as nn
import torch.nn.functional as F
from import Dataset, DataLoader
from torch import optim
import time


Tensors live at the heart of PyTorch.

You can think of tensors as an $N$-dimensional data container similar to the containers that exist in numpy.

Below we have some magical tensor stuff going on to show you how to make some tensors using the built-in tensor generating functions.

# create a tensor
new_tensor = torch.Tensor([[1, 2], [3, 4]])

# create a 2 x 3 tensor with random values
empty_tensor = torch.Tensor(2, 3)

# create a 2 x 3 tensor with random values between -1and 1
uniform_tensor = torch.Tensor(2, 3).uniform_(-1, 1)

# create a 2 x 3 tensor with random values from a uniform distribution on the interval [0, 1)
rand_tensor = torch.rand(2, 3)

# create a 2 x 3 tensor of zeros
zero_tensor = torch.zeros(2, 3)

To see what’s inside of the tensor, put the name of the tensor into a code block below and run it.

These notebook environments are meant to be easy for you to debug your code, so this will not work if you are writing a python script and running it in a command line.

tensor([[1., 2.],
        [3., 4.]])

You can replace elements in tensors with indexing. It works a lot like arrays you will see in many programming languages.

new_tensor[0, 0] = 5
tensor([[5., 2.],
        [3., 4.]])

How the tensor is put together is going to be important, so there are some built-in commands in torch that allow you to find out some information about the tensor you are working with.

# type of a tensor

# shape of a tensor

# dimension of a tensor
torch.Size([2, 2])
torch.Size([2, 2])

Coming from Numpy

Much of your data manipulation will be done in either pandas or numpy. To feed that manipulated data into a Tensor for use in torch, you will have to use the .from_numpy command.

np_ndarray = np.random.randn(2,2)
array([[-1.78410271, -0.98539235],
       [-0.06980782, -1.69461514]])
# NumPy ndarray to PyTorch tensor
to_tensor = torch.from_numpy(np_ndarray)

tensor([[-1.7841, -0.9854],
        [-0.0698, -1.6946]], dtype=torch.float64)

Checking for CUDA (NVIDIA GPUs only)

CUDA will speed up the training of your Neural Network greatly.

Your notebook should already have CUDA enabled, but the following command can be used to check for it.

TL;DR: CUDA rocks for NNs


Defining Networks

In the example below, we are going to make a simple example to show how you will go about building a Neural Network using a randomly generated dataset. This will be a simple network with one hidden layer.

First, we need to set some placeholder variables to define how we want the network to be set up.

n_in, n_h, n_out, batch_size = 10, 5, 1, 10

Next, we are going to generate our lovely randomised dataset.

We are not expecting any insights to come from this network as the data is generated randomly.

x = torch.randn(batch_size, n_in)
y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])

Next, we are going to define what our model looks like. The Linear() part applies a linear transformation to the incoming data, with Sigmoid() being the activation function that we use for that layer.

So, for this network, we have two fully connected layers with a sigmoid as the activation function. This looks a lot like the network we saw in the slide deck with one input layer, one hidden layer, and one output layer.

model = nn.Sequential(
    nn.Linear(n_in, n_h),
    nn.Linear(n_h, n_out),

Next, let’s define our loss function.

For this example, we are going to use Mean Squared Error, but there are many different loss functions we could use.

criterion = nn.MSELoss()

optimizer is how the network will be training.

We are going to be using a standard gradient descent method in this example.

We will have a learning rate of 0.01, which is pretty standard too. You are going to want to keep this learning rate pretty low, as high learning rates cause problems in training.

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Now, let’s train!

To train, we combine all the different parts that we defined into one for loop.

for epoch in range(50):
    # Forward Propagation
    y_pred = model(x)
    # Compute and print loss
    loss = criterion(y_pred, y)
    print('epoch: ', epoch,' loss: ', loss.item())
    # Zero the gradients
    # perform a backward pass (backpropagation)
    # Update the parameters
epoch:  0  loss:  0.24477167427539825
epoch:  1  loss:  0.24474850296974182
epoch:  2  loss:  0.24472537636756897
epoch:  3  loss:  0.24470224976539612
epoch:  4  loss:  0.24467918276786804
epoch:  5  loss:  0.24465613067150116
epoch:  6  loss:  0.24463315308094025
epoch:  7  loss:  0.24461016058921814
epoch:  8  loss:  0.2445872277021408
epoch:  9  loss:  0.24456433951854706
epoch:  10  loss:  0.2445414811372757
epoch:  11  loss:  0.24451863765716553
epoch:  12  loss:  0.24449582397937775
epoch:  13  loss:  0.24447302520275116
epoch:  14  loss:  0.24445028603076935
epoch:  15  loss:  0.24442759156227112
epoch:  16  loss:  0.2444048821926117
epoch:  17  loss:  0.24438223242759705
epoch:  18  loss:  0.2443595826625824
epoch:  19  loss:  0.24433700740337372
epoch:  20  loss:  0.24431447684764862
epoch:  21  loss:  0.24429190158843994
epoch:  22  loss:  0.2442694455385208
epoch:  23  loss:  0.2442469298839569
epoch:  24  loss:  0.24422451853752136
epoch:  25  loss:  0.24420210719108582
epoch:  26  loss:  0.24417972564697266
epoch:  27  loss:  0.2441573590040207
epoch:  28  loss:  0.2441350668668747
epoch:  29  loss:  0.2441127598285675
epoch:  30  loss:  0.2440905123949051
epoch:  31  loss:  0.24406829476356506
epoch:  32  loss:  0.24404609203338623
epoch:  33  loss:  0.24402391910552979
epoch:  34  loss:  0.24400177597999573
epoch:  35  loss:  0.24397964775562286
epoch:  36  loss:  0.24395756423473358
epoch:  37  loss:  0.24393554031848907
epoch:  38  loss:  0.24391348659992218
epoch:  39  loss:  0.24389147758483887
epoch:  40  loss:  0.24386951327323914
epoch:  41  loss:  0.2438475638628006
epoch:  42  loss:  0.24382564425468445
epoch:  43  loss:  0.2438037395477295
epoch:  44  loss:  0.24378187954425812
epoch:  45  loss:  0.24376006424427032
epoch:  46  loss:  0.24373821914196014
epoch:  47  loss:  0.24371647834777832
epoch:  48  loss:  0.2436947077512741
epoch:  49  loss:  0.24367299675941467

In this example, we printed out the loss each time we completed an epoch.

This ran very quickly, but with more complex models, those outputs are going to be very important for checking on how your network is doing during the training process which could take hours if not days!

More likely than not, you’re going to see that this network is not converging, which is to be expected with random data.

In our next example, we’re going to be building a proper model with an awesome dataset.

Diabetes dataset

With this dataset, we are going to see if some basic medical data about a person can help us predict if someone is diabetic or not using magical neural networks.

First though, let’s get that dataset and see what’s inside.

dataset = pd.read_csv(DATA_DIR / "train.csv", header=None).values


What are we looking at?

This is a fairly small dataset that includes some basic information about an individual’s health.

Using this information, we should be able to make a model that will allow us to determine if a person has diabetes or not.

The last column, Outcome, is a single digit that tells us if an individual has diabetes.

We need to clean up the data a bit, so let’s get rid of the first row with the labels on them.

dataset = np.delete(dataset, 0, 0)


Alright, now let’s break up our data into test and train set. Once we have those sets, we’ll need to set them to be tensors. This bit of code below does just that!

# split into x and y sets
X = dataset[:,:-1].astype(np.float32)

Y = dataset[:,-1].astype(np.float32)

# Needed to make PyTorch happy
Y = np.expand_dims(Y, axis = 1)

# Test-Train split
from sklearn.model_selection import train_test_split

split = train_test_split(X, Y, test_size=0.1)
xTrain, xTest, yTrain, yTest = split

# Here we're defining what component we'll use to train this model
# We want to use the GPU if available, if not we use the CPU
# If your device is not cuda, check the GPU option in the Kaggle Kernel

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

PyTorch Dataset

Our next step is to create PyTorch Datasets for our training and validation sets. is an abstract class that represents a dataset and has several handy attributes we’ll utilize from here on out.

To create one, we simply need to create a class which inherits from PyTorch’s Dataset class and override the constructor, as well as the __len__() and __getitem__() methods.

class PyTorch_Dataset(Dataset):
    def __init__(self, data, outputs): = data
        self.outputs = outputs

    def __len__(self):
        'Returns the total number of samples in this dataset'
        return len(

    def __getitem__(self, index):
        'Returns a row of data and its output'
        x =[index]
        y = self.outputs[index]

        return x, y

With the class written, we can now create our training and validation datasets by passing the corresponding data to our class

train_dataset = PyTorch_Dataset(xTrain, yTrain)
val_dataset = PyTorch_Dataset(xTest, yTest)

datasets = {'Train': train_dataset, 'Validation': val_dataset}

PyTorch Dataloaders

It’s quite inefficient to load an entire dataset onto your RAM at once, so PyTorch uses DataLoaders to load up batches of data on the fly. We pass a batch size of 16, so in each iteration the loaders will load 16 rows of data and return them to us.

For the most part, Neural Networks are trained on batches of data so these DataLoaders greatly simplify the process of loading and feeding data to our network. The rank 2 tensor returned by the DataLoader is of size (16, 8).

dataloaders = {
    x: DataLoader(datasets[x], batch_size=16, shuffle=True, num_workers=4)
    for x in ['Train', 'Validation']

PyTorch Model

We need to define how we want the neural network to be structured, so let’s set those hyper-parameters and create our model.

inputSize    =  8    # how many classes of input
hiddenSize   = 15    # Number of units in the middle
numClasses   =  1    # Only has two classes
numEpochs    = 20    # How many training cycles
learningRate = 0.01  # Learning rate

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    def forward(self, x):
        x = F.relu(self.fc1(x))
        return torch.sigmoid(self.fc2(x))

PyTorch Training

Now we create an instance of this NeuralNet() class and define the criterion and optimizer we’ll use to train our model. In our case we’ll use Binary Cross Entropy Loss, a commonly used loss function for binary classification problems.

For the optimizer we’ll use Adam, an easy to apply but powerful optimizer which is an extension of the popular Stochastic Gradient Descent method. We need to pass it all of the trainable parameters with model.parameters() and the learning rate we’ll use.

# Creating our model
model = NeuralNet(inputSize, hiddenSize, numClasses)
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr = learningRate)

At this point we’re finally ready to train our model! In PyTorch we have to write our own training loops before getting to actually train the model. This can seem daunting at first, so let’s break up each stage of the training process.

The bulk of the function is handled by a nested for loop, the outer looping through each epoch and the inner looping through all of the batches of images in our dataset. Each epoch has a training and validation phase, where batches are served from their respective loaders. Both phases begin by feeding a batch of inputs into the model, which implicity calls the forward() function on the input. Then we calculate the loss of the outputs against the true labels of the batch.

If we’re in training mode, here is where we perform back-propagation and adjust our weights. To do this, we first zero the gradients, then perform backpropagation by calling .backward() on the loss variable. Finally, we call optimizer.step() to adjust the weights of the model in accordance with the calculated gradients.

The remaining portion of one epoch is the same for both training and validation phases, and simply involves calculating and tracking the accuracy achieved in both phases. A nifty addition to this training loop is that it tracks the highest validation accuracy and only saves weights which beat that accuracy, ensuring that the best performing weights are returned from the function.

def run_epoch(model, dataloaders, device, phase):
    running_loss = 0.0
    running_corrects = 0
    if phase == 'Train':
    # Looping through batches
    for i, (inputs, labels) in enumerate(dataloaders[phase]):
        # ensures we're doing this calculation on our GPU if possible
        inputs =
        labels =
        # Zero parameter gradients
        # Calculate gradients only if we're in the training phase
        with torch.set_grad_enabled(phase == 'Train'):
            # This calls the forward() function on a batch of inputs
            outputs = model(inputs)

            # Calculate the loss of the batch
            loss = criterion(outputs, labels)

            # Adjust weights through backpropagation if we're in training phase
            if phase == 'Train':
        # Get binary predictions
        preds = torch.round(outputs)

        # Document statistics for the batch
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels)
    # Calculate epoch statistics
    epoch_loss = running_loss / datasets[phase].__len__()
    epoch_acc = running_corrects.double() / datasets[phase].__len__()
    return epoch_loss, epoch_acc
def train(model, criterion, optimizer, num_epochs, dataloaders, device):
    start = time.time()

    best_model_wts = model.state_dict()
    best_acc = 0.0
    print('| Epoch\t | Train Loss\t| Train Acc\t| Valid Loss\t| Valid Acc\t|')
    print('-' * 73)
    # Iterate through epochs
    for epoch in range(num_epochs):
        # Training phase
        train_loss, train_acc = run_epoch(model, dataloaders, device, 'Train')
        # Validation phase
        val_loss, val_acc = run_epoch(model, dataloaders, device, 'Validation')
        # Print statistics after the validation phase
        print("| {}\t | {:.4f}\t| {:.4f}\t| {:.4f}\t| {:.4f}\t|".format(epoch + 1, train_loss, train_acc, val_loss, val_acc))

        # Copy and save the model's weights if it has the best accuracy thus far
        if val_acc > best_acc:
            best_acc = val_acc
            best_model_wts = model.state_dict()

    total_time = time.time() - start
    print('-' * 74)
    print('Training complete in {:.0f}m {:.0f}s'.format(total_time // 60, total_time % 60))
    print('Best validation accuracy: {:.4f}'.format(best_acc))

    # load best model weights and return them
    return model

Now, let’s train the model!

model = train(model, criterion, optimizer, numEpochs, dataloaders, device)
# Function which generates predictions, given a set of inputs
def test(model, inputs, device):
    inputs = torch.tensor(inputs).to(device)
    outputs = model(inputs).cpu().detach().numpy()
    preds = np.where(outputs > 0.5, 1, 0)
    return preds
preds = test(model, xTest, device)

Now that our model has made some predictions, let’s find the Mathew’s Correlation Coefficient:

# import functions for matthews and confusion matrix
from sklearn.metrics import confusion_matrix, matthews_corrcoef

matthews_corrcoef(preds, yTest)

Let’s check the confusion matrix

confusion_matrix(preds, yTest)

Ehhhhhh, that’s not bad…

There’s probably a bunch of things we could do to improve accuracy.

Why don’t we have you give it a shot?

Make this model better!

There is no right or wrong way to optimise this model. Use your understanding of Neural Networks as a launching point. You can use the previous code-cells to save some time.

There are many aspects to this model that can be changed to increase accuracy, like:

Just Do It

# TODO, make a better model!

inputSize =  8         # how many classes of input
hiddenSize = 15        # Number of units in the middle
numClasses = 1         # Only has two classes
numEpochs = 69         # How many training cycles
learningRate = .01     # Learning rate

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size) 
        self.fc3 = nn.Linear(hidden_size, num_classes)  
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.fc3(x))

model = train(model, criterion, optimizer, numEpochs, dataloaders, device)

predictions = test(model, xTest, device)
# Run this to generate the submission file for the competition!
### Make sure to name your model variable "model" ###

# load in test data:
test_data = pd.read_csv(DATA_DIR / "test.csv", header=None).values
# remove row with column labels:
test_data = np.delete(test_data, 0, 0)

# convert to float32 values
X = test_data.astype(np.float32)
# get indicies for each entry in test data
indicies = [i for i in range(len(X))]

# generate predictions
preds = test(model, X, device)

# create our pandas dataframe for our submission file. Squeeze removes dimensions of 1 in a numpy matrix Ex: (161, 1) -> (161,)
preds = pd.DataFrame({'Id': indicies, 'Class': np.squeeze(preds)})

# save submission csv
preds.to_csv('submission.csv', header=['Id', 'Class'], index=False)

