In a previous post we looked at the theory behind Perceptron. In this tutorial we will implement the algorithm using Python.

Note: We will implement the algorithm from scratch, so that we get a better understanding of how the algorithm actually works. If you want to use it to solve a problem, it’s better if you used an external library that already implements the algorithm (like scikit-learn for Python).

Read Data

Feel free to skip this section if you don’t care how to read data from a file.

First we need to read the data from a file. Each line in the file represents an item, and the first line holds the feature names (+ the keyword “Class” at the end). The file follows this format:


You can find an example data file here.

Let’s jump right into the implementation. We need to get the number of features, which can be found on the first line.

featuresNumber = len(lines[0].split(','));

We also need to store the item information and the different classes that appear in the data set. For that, we will use three lists, items, classes and features. The second and third ones will hold the names of the classes and features respectively. The first one is a bit more complicated though. It will hold dictionaries, with each dictionary representing an item. The keys to these dictionaries are the feature names and will hold the values the item has for the feature/key, “Class” to hold the class the item is categorized in, and “Bias” which will hold a value of 1. With Bias we are essentially augmenting the item/feature vector with a value of 1.

For each line in the file, we will split it to get the features, storing them into a temporary array before adding them to the items list. At the end, we will shuffle the list to make sure the data set is properly randomized.

Auxiliary Functions

First we will need functions to add and subtract dictionaries (d1 and d2). We operate at a rate (eg. instead of subtracting the whole of d2, we subtract its third), which is given as a parameter.

We also need a function to calculate the confidence (the dot product) of an item and a weight. We can calculate the dot product by multiplying the values for each feature of the weight and item vectors and adding them together.


Now we move onto the meat of the algorithm.

We will create the function to train the perceptron (aka calculating the weights). We will name this function CalculateWeights. It takes as input the training data, the learning rate (otherwise known as correction rate), the classes and features, and the number of epochs (aka how many times we will loop through the data).

First we want to initialize the weights. As we are building a mulit-class Perceptron using Kesler’s Construction, each class has its own weights for the features. To implement this, we will use a dictionary of dictionaries. The keys of the outer dictionary are the classes and for each class-dictionary, the keys are the features. We will initialize the values of each weight to 0.

weights = {};

for c in classes:
    weights[c] = {"Bias":0};
    for f in features:
        weights[c][f] = 0;

To get a better feel of how the dictionary weights is laid out, we will quickly read data from it:

print weights[c][f]; #Prints the weight of feature f for class c.
print weights[c]; #Prints the feature weights for class c.

Having initialized our weights, we move to adjusting them. The algorithm in pseudocode:

For each epoch:
    For each item i in training set:
        Find confidence of i for all classes
        Pick max confidence
            c is the class corresponding to max confidence

        Compare c with actual class of i
        If they are the same, the guess was correct
            Move to next item
        If they are different, do the following:
            For the weights of c, subtract the item values
            For the weights of the correct class, add item values

Let’s implement the above:

Bringing all the parts of the function together we have:

With the above function we can train the weights, now we need to write a function to classify an item. We will call this function Perceptron. Given an item and the weights, first it will augment the item vector with the bias, then it will find the maximum confidence and it will classify the item in the class that produced that max confidence.


To use the Perceptron we will first read our data from a text file, we will set the parameters of the Perceptron (often called hyper-parameters and they are the learning rate, epochs, etc.) then we will calculate the weights and finally we will classify the new item.

items, classes, features = ReadData('data.txt');

lRate = 0.1;
epochs = 300;

item = {'PW' : 1.4, 'PL' : 4.7, 'SW' : 3.2, 'SL' : 7.0};
print Perceptron(item,weights);

With that the code is complete. You can find the complete implementation on my github. It includes two evaluation functions.

Perceptron Theory