2019-09-11
What are the reasons a perceptron is not able to learn?
ai.stackexchange
Question

I'm just starting to learn about neural networking and I decided to study a simple 3-input perceptron to get started with. I am also only using binary inputs to gain a full understanding of how the perceptron works. I'm having difficulty understanding why some training outputs work and others do not. I'm guessing that it has to do with the linear separability of the input data, but it's unclear to me how this can easily be determined. I'm aware of the graphing line test, but it's unclear to me how to plot the input data to fully understand what will work and what won't work.

There is quite a bit of information that follows. But it's all very simple. I'm including all this information to be crystal clear on what I'm doing and trying to understand and learn.

Here is a schematic graphic of the simple 3-input perceptron I'm modeling.

3 Input Perceptron

Because it only has 3 inputs and they are binary (0 or 1), there are only 8 possible combinations of inputs. However, this also allows for 8 possible outputs. This allows for training of 256 possible outputs. In other words, the perceptron can be trained to recognize more than one input configuration.

Let's call the inputs 0 thru 7 (all the possible configurations of a 3-input binary system). But we can train the perceptron to recognize more than just one input. In other words, we can train the perceptron to fire for say any input from 0 to 3 and not for inputs 4 thru 7. And all those possible combinations add up to 256 possible training input states.

Some of these training input states work, and others do not. I'm trying to learn how to determine which training sets are valid and which are not.

I've written the following program in Python to emulate this Perceptron through all 256 possible training states.

Here is the code for this emulation:

import numpy as np
np.set_printoptions(formatter={'float': '{: 0.1f}'.format})

# Perceptron math fucntions. 
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
    return x * (1 - x)
# END Perceptron math functions.

# The first column of 1's is used as the bias.  
# The other 3 cols are the actual inputs, x3, x2, and x1 respectively
training_inputs = np.array([[1, 0, 0, 0],
                         [1, 0, 0, 1],
                         [1, 0, 1, 0],
                         [1, 0, 1, 1],
                         [1, 1, 0, 0],
                         [1, 1, 0, 1],
                         [1, 1, 1, 0],
                         [1, 1, 1, 1]])

# Setting up the training outputs data set array                         
num_array = np.array
num_array = np.arange(8).reshape([1,8])
num_array.fill(0)

for num in range(25):
    bnum = bin(num).replace('0b',"").rjust(8,"0")
    for i in range(8):
        num_array[0,i] = int(bnum[i])

    training_outputs = num_array.T
# training_outputs will have the array form: [[n,n,n,n,n,n,n,n]]
# END of setting up training outputs data set array                      

    # -------  BEGIN Perceptron functions ----------
    np.random.seed(1)
    synaptic_weights = 2 * np.random.random((4,1)) - 1
    for iteration in range(20000):
        input_layer = training_inputs
        outputs = sigmoid(np.dot(input_layer, synaptic_weights))
        error = training_outputs - outputs
        adjustments = error * sigmoid_derivative(outputs)
        synaptic_weights += np.dot(input_layer.T, adjustments)
    # -------  END Perceptron functions ----------


    # Convert to clean output 0, 0.5, or 1 instead of the messy calcuated values.
    # This is to make the printout easier to read.
    # This also helps with testing analysis below.
    for i in range(8):
        if outputs[i] <= 0.25:
            outputs[i] = 0
        if (outputs[i] > 0.25 and outputs[i] < 0.75):
            outputs[i] = 0.5
        if outputs[i] > 0.75:
            outputs[i] = 1
    # End convert to clean output values.

    # Begin Testing Analysis
    # This is to check to see if we got the correct outputs after training.
    evaluate = "Good"
    test_array = training_outputs
    for i in range(8):
        # Evaluate for a 0.5 error.
        if outputs[i] == 0.5:
            evaluate = "The 0.5 Error"
            break
        # Evaluate for incorrect output
        if outputs[i] != test_array[i]:
            evaluate = "Wrong Answer"
    # End Testing Analysis

    # Printout routine starts here:
    print_array = test_array.T
    print("Test#: {0}, Training Data is: {1}".format(num, print_array[0]))
    print("{0}, {1}".format(outputs.T, evaluate))
    print("") 

And when I run this code I get the following output for the first 25 training tests.

Test#: 0, Training Data is: [0 0 0 0 0 0 0 0]
[[ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0]], Good

Test#: 1, Training Data is: [0 0 0 0 0 0 0 1]
[[ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0]], Good

Test#: 2, Training Data is: [0 0 0 0 0 0 1 0]
[[ 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0]], Good

Test#: 3, Training Data is: [0 0 0 0 0 0 1 1]
[[ 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0]], Good

Test#: 4, Training Data is: [0 0 0 0 0 1 0 0]
[[ 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0]], Good

Test#: 5, Training Data is: [0 0 0 0 0 1 0 1]
[[ 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0]], Good

Test#: 6, Training Data is: [0 0 0 0 0 1 1 0]
[[ 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5]], The 0.5 Error

Test#: 7, Training Data is: [0 0 0 0 0 1 1 1]
[[ 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0]], Good

Test#: 8, Training Data is: [0 0 0 0 1 0 0 0]
[[ 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0]], Good

Test#: 9, Training Data is: [0 0 0 0 1 0 0 1]
[[ 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5]], The 0.5 Error

Test#: 10, Training Data is: [0 0 0 0 1 0 1 0]
[[ 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0]], Good

Test#: 11, Training Data is: [0 0 0 0 1 0 1 1]
[[ 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0]], Good

Test#: 12, Training Data is: [0 0 0 0 1 1 0 0]
[[ 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0]], Good

Test#: 13, Training Data is: [0 0 0 0 1 1 0 1]
[[ 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0]], Good

Test#: 14, Training Data is: [0 0 0 0 1 1 1 0]
[[ 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0]], Good

Test#: 15, Training Data is: [0 0 0 0 1 1 1 1]
[[ 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0]], Good

Test#: 16, Training Data is: [0 0 0 1 0 0 0 0]
[[ 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0]], Good

Test#: 17, Training Data is: [0 0 0 1 0 0 0 1]
[[ 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0]], Good

Test#: 18, Training Data is: [0 0 0 1 0 0 1 0]
[[ 0.0 0.0 0.5 0.5 0.0 0.0 0.5 0.5]], The 0.5 Error

Test#: 19, Training Data is: [0 0 0 1 0 0 1 1]
[[ 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0]], Good

Test#: 20, Training Data is: [0 0 0 1 0 1 0 0]
[[ 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.5]], The 0.5 Error

Test#: 21, Training Data is: [0 0 0 1 0 1 0 1]
[[ 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0]], Good

Test#: 22, Training Data is: [0 0 0 1 0 1 1 0]
[[ 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0]], Wrong Answer

Test#: 23, Training Data is: [0 0 0 1 0 1 1 1]
[[ 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0]], Good

Test#: 24, Training Data is: [0 0 0 1 1 0 0 0]
[[ 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0]], Wrong Answer

For the most part, it appears to be working. But there are situations where it clearly does not work.

I have the labels in two different ways.

The first type of error is "The 0.5 Error" which is easy to see. It should never return any output of 0.5 in this situation. Everything should be binary. The second type of error is when it reports the correct binary outputs but they don't match what it was trained to recognize.

I would like to understand the cause of these errors. I'm not interested in trying to correct the errors as I believe these are valid errors. In other words, these are situations where the perceptron is simply incapable of being trained for. And that's ok.

What I want to learn is why these cases are invalid. I'm suspecting that they have something to do with the input data not being linearly separable in these situations. But if that's the case, then how do I go about determining which cases are not linearly separable? If I could understand how to do that I would be very happy.

Also, are the reasons why it doesn't work in specific cases the same? In other words, are both types of errors caused by linear inseparability of the input data? Or is there more than one condition that causes a Perceptron to fail in certain training situations.

Any help would be appreciated.

Answer
1

UPDATE:

I found the answers on my own.

To begin with I figured out how to plot the input data and output training data on a scatter plot using matplotlib. And once I was able to do that I could instantly see exactly what's going on.

When the answers are correct the input data is indeed linearly separable based on what the training output is looking to recognize.

The "0.5 error" occurs when a single violation of linear separability occurs.

The "Wrong Answer" error occurs when linear separation is violated twice. In other words, there are conditions where linear separation is violated in two separate planes. Or when the data can be separated by planes but it would require more than one plane to do this. (see graphic examples below)

I suspected that there would be a difference between these different types of errors and the answer is, yes, there is a difference.

So I have solved my own question. Thanks to anyone who may have been working on this.

If you'd like to see some of my graphs here are some specific examples:

A graph of all possible binary inputs

All Possible Inputs

An example of a good training_outputs = np.array([0, 0, 0, 0, 1, 1, 1, 0]). As you can see in this graph the red points are linearly separable from the blue points.

Good Training Example

An example of a 0.5 error, training_outputs = np.array([0, 0, 0, 1, 0, 0, 1, 0]). The points are not linearly separable in one plane

0.5 error example

Example of a 2-plane wrong answer error, training_outputs = np.array([0, 0, 0, 1, 0, 1, 1, 0]). You can see that there are two planes that are not linearly separable

Wrong answer 2 planes violation

An example of a different kind of wrong answer, training_outputs = np.array([0, 0, 0, 1, 1, 0, 0, 0]). In this case the data can be separated by planes, but it would require 2 different planes to do this which a perceptron cannot handle.

wrong answer requiring 2 planes to separate

So this covers all possible error conditions. Aren't graphs great!

What are the reasons a perceptron is not able to learn?
See more ...