2/23/17

Neural networks are complicated and difficult. They involve all sorts of fancy mathematics. While this is all fascinating (and incredibly important to scientific research), a lot of the techniques are not very practical in the world of building interactive, animated Processing sketches. Not to mention that in order to cover all this material, we would need a whole book—or more likely, a series of books.
So instead, we’ll begin our last hurrah in the nature of code with the simplest of all neural networks, in an effort to understand how the overall concepts are applied in code. 

The Perceptron

Invented in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory, a perceptron is the simplest neural network possible: a computational model of a single neuron. A perceptron consists of one or more inputs, a processor, and a single output.
Nature of Code Image
Figure 10.3: The perceptron
A perceptron follows the “feed-forward” model, meaning inputs are sent into the neuron, are processed, and result in an output. In the diagram above, this means the network (one neuron) reads from left to right: inputs come in, output goes out.
Let’s follow each of these steps in more detail.
Step 1: Receive inputs.
Say we have a perceptron with two inputs—let’s call them x1 and x2.
Input 0: x1 = 12
Input 1: x2 = 4
Step 2: Weight inputs.
Each input that is sent into the neuron must first be weighted, i.e. multiplied by some value (often a number between -1 and 1). When creating a perceptron, we’ll typically begin by assigning random weights. Here, let’s give the inputs the following weights:
Weight 0: 0.5
Weight 1: -1
We take each input and multiply it by its weight.
Input 0 * Weight 0 ⇒ 12 * 0.5 = 6
Input 1 * Weight 1 ⇒ 4 * -1 = -4
Step 3: Sum inputs.
The weighted inputs are then summed.
Sum = 6 + -4 = 2
Step 4: Generate output.
The output of a perceptron is generated by passing that sum through an activation function. In the case of a simple binary output, the activation function is what tells the perceptron whether to “fire” or not. You can envision an LED connected to the output signal: if it fires, the light goes on; if not, it stays off.
Activation functions can get a little bit hairy. If you start reading one of those artificial intelligence textbooks looking for more info about activation functions, you may soon find yourself reaching for a calculus textbook. However, with our friend the simple perceptron, we’re going to do something really easy. Let’s make the activation function the sign of the sum. In other words, if the sum is a positive number, the output is 1; if it is negative, the output is -1.
Output = sign(sum) ⇒ sign(2) ⇒ +1
Let’s review and condense these steps so we can implement them with a code snippet.
The Perceptron Algorithm:
  1. For every input, multiply that input by its weight.
  2. Sum all of the weighted inputs.
  3. Compute the output of the perceptron based on that sum passed through an activation function (the sign of the sum).
Let’s assume we have two arrays of numbers, the inputs and the weights. For example:

float[] inputs  = {12 , 4};
float[] weights = {0.5,-1};
“For every input” implies a loop that multiplies each input by its corresponding weight. Since we need the sum, we can add up the results in that very loop.

Steps 1 and 2: Add up all the weighted inputs.
float sum = 0;
for (int i = 0; i < inputs.length; i++) {
  sum += inputs[i]*weights[i];
}

Once we have the sum we can compute the output.

Step 3: Passing the sum through an activation function
float output = activate(sum);
 
The activation function
int activate(float sum) {
Return a 1 if positive, -1 if negative.
  if (sum > 0) return 1;
  else return -1;
}

Simple Pattern Recognition Using a Perceptron

Now that we understand the computational process of a perceptron, we can look at an example of one in action. We stated that neural networks are often used for pattern recognition applications, such as facial recognition. Even simple perceptrons can demonstrate the basics of classification, as in the following example.
Nature of Code Image
Figure 10.4
Consider a line in two-dimensional space. Points in that space can be classified as living on either one side of the line or the other. While this is a somewhat silly example (since there is clearly no need for a neural network; we can determine on which side a point lies with some simple algebra), it shows how a perceptron can be trained to recognize points on one side versus another.
Let’s say a perceptron has 2 inputs (the x- and y-coordinates of a point). Using a sign activation function, the output will either be -1 or 1—i.e., the input data is classified according to the sign of the output. In the above diagram, we can see how each point is either below the line (-1) or above (+1).
The perceptron itself can be diagrammed as follows:
Nature of Code Image
Figure 10.5
We can see how there are two inputs (x and y), a weight for each input (weightx and weighty), as well as a processing neuron that generates the output.
There is a pretty significant problem here, however. Let’s consider the point (0,0). What if we send this point into the perceptron as its input: x = 0 and y = 0? What will the sum of its weighted inputs be? No matter what the weights are, the sum will always be 0! But this can’t be right—after all, the point (0,0) could certainly be above or below various lines in our two-dimensional world.
To avoid this dilemma, our perceptron will require a third input, typically referred to as a bias input. A bias input always has the value of 1 and is also weighted. Here is our perceptron with the addition of the bias:
Nature of Code Image
Figure 10.6
Let’s go back to the point (0,0). Here are our inputs:
0 * weight for x = 0
0 * weight for y = 0
1 * weight for bias = weight for bias
The output is the sum of the above three values, 0 plus 0 plus the bias’s weight. Therefore, the bias, on its own, answers the question as to where (0,0) is in relation to the line. If the bias’s weight is positive, (0,0) is above the line; negative, it is below. It “biases” the perceptron’s understanding of the line’s position relative to (0,0).

Coding the Perceptron

We’re now ready to assemble the code for a Perceptron class. The only data the perceptron needs to track are the input weights, and we could use an array of floats to store these.

class Perceptron {
  float[] weights;
The constructor could receive an argument indicating the number of inputs (in this case three: x, y, and a bias) and size the array accordingly.

  Perceptron(int n) {
    weights = new float[n];
    for (int i = 0; i < weights.length; i++) {
The weights are picked randomly to start.
      weights[i] = random(-1,1);
    }
  }
A perceptron needs to be able to receive inputs and generate an output. We can package these requirements into a function called feedforward(). In this example, we’ll have the perceptron receive its inputs as an array (which should be the same length as the array of weights) and return the output as an integer.

  int feedforward(float[] inputs) {
    float sum = 0;
    for (int i = 0; i < weights.length; i++) {
      sum += inputs[i]*weights[i];
    }
Result is the sign of the sum, -1 or +1. Here the perceptron is making a guess. Is it on one side of the line or the other?
    return activate(sum);
  }
Presumably, we could now create a Perceptron object and ask it to make a guess for any given point.
Nature of Code Image
Figure 10.7

Create the Perceptron.
Perceptron p = new Perceptron(3);
The input is 3 values: x,y and bias.
float[] point = {50,-12,1};
The answer!
int result = p.feedforward(point);
Did the perceptron get it right? At this point, the perceptron has no better than a 50/50 chance of arriving at the right answer. Remember, when we created it, we gave each weight a random value. A neural network isn’t magic. It’s not going to be able to guess anything correctly unless we teach it how to!
To train a neural network to answer correctly, we’re going to employ the method of supervised learning that we described in section 10.1.
With this method, the network is provided with inputs for which there is a known answer. This way the network can find out if it has made a correct guess. If it’s incorrect, the network can learn from its mistake and adjust its weights. The process is as follows:
  1. Provide the perceptron with inputs for which there is a known answer.
  2. Ask the perceptron to guess an answer.
  3. Compute the error. (Did it get the answer right or wrong?)
  4. Adjust all the weights according to the error.
  5. Return to Step 1 and repeat!
Steps 1 through 4 can be packaged into a function. Before we can write the entire function, however, we need to examine Steps 3 and 4 in more detail. How do we define the perceptron’s error? And how should we adjust the weights according to this error?
The perceptron’s error can be defined as the difference between the desired answer and its guess.
ERROR = DESIRED OUTPUT - GUESS OUTPUT
The above formula may look familiar to you. In Chapter 6, we computed a steering force as the difference between our desired velocity and our current velocity.
STEERING = DESIRED VELOCITY - CURRENT VELOCITY
This was also an error calculation. The current velocity acts as a guess and the error (the steering force) tells us how to adjust the velocity in the right direction. In a moment, we’ll see how adjusting the vehicle’s velocity to follow a target is just like adjusting the weights of a neural network to arrive at the right answer.
In the case of the perceptron, the output has only two possible values: +1 or -1. This means there are only three possible errors.
If the perceptron guesses the correct answer, then the guess equals the desired output and the error is 0. If the correct answer is -1 and we’ve guessed +1, then the error is -2. If the correct answer is +1 and we’ve guessed -1, then the error is +2.
DesiredGuessError
-1
-1
0
-1
+1
-2
+1
-1
+2
+1
+1
0
The error is the determining factor in how the perceptron’s weights should be adjusted. For any given weight, what we are looking to calculate is the change in weight, often called Î”weight (or “delta” weight, delta being the Greek letter Δ).
NEW WEIGHT = WEIGHT + ΔWEIGHT
Δweight is calculated as the error multiplied by the input.
ΔWEIGHT = ERROR * INPUT
Therefore:
NEW WEIGHT = WEIGHT + ERROR * INPUT
To understand why this works, we can again return to steering. A steering force is essentially an error in velocity. If we apply that force as our acceleration (Δvelocity), then we adjust our velocity to move in the correct direction. This is what we want to do with our neural network’s weights. We want to adjust them in the right direction, as defined by the error.
With steering, however, we had an additional variable that controlled the vehicle’s ability to steer: the maximum force. With a high maximum force, the vehicle was able to accelerate and turn very quickly; with a lower force, the vehicle would take longer to adjust its velocity. The neural network will employ a similar strategy with a variable called the “learning constant.” We’ll add in the learning constant as follows:
NEW WEIGHT = WEIGHT + ERROR * INPUT * LEARNING CONSTANT
Notice that a high learning constant means the weight will change more drastically. This may help us arrive at a solution more quickly, but with such large changes in weight it’s possible we will overshoot the optimal weights. With a small learning constant, the weights will be adjusted slowly, requiring more training time but allowing the network to make very small adjustments that could improve the network’s overall accuracy.
Assuming the addition of a variable c for the learning constant, we can now write a training function for the perceptron following the above steps.

A new variable is introduced to control the learning rate.
float c = 0.01;
 
Step 1: Provide the inputs and known answer. These are passed in as arguments to train().
void train(float[] inputs, int desired) {
 
Step 2: Guess according to those inputs.
  int guess = feedforward(inputs);
 
Step 3: Compute the error (difference between answer and guess).
  float error = desired - guess;
 
Step 4: Adjust all the weights according to the error and learning constant.
  for (int i = 0; i < weights.length; i++) {
    weights[i] += c * error * inputs[i];
  }
}
We can now see the Perceptron class as a whole.

class Perceptron {
The Perceptron stores its weights and learning constants.
  float[] weights;
  float c = 0.01;
 
  Perceptron(int n) {
    weights = new float[n];
Weights start off random.
    for (int i = 0; i < weights.length; i++) {
      weights[i] = random(-1,1);
    }
  }
 
Return an output based on inputs.
  int feedforward(float[] inputs) {
    float sum = 0;
    for (int i = 0; i < weights.length; i++) {
      sum += inputs[i]*weights[i];
    }
    return activate(sum);
  }
 
Output is a +1 or -1.
  int activate(float sum) {
    if (sum > 0) return 1;
    else return -1;
  }
 
Train the network against known data.
  void train(float[] inputs, int desired) {
    int guess = feedforward(inputs);
    float error = desired - guess;
    for (int i = 0; i < weights.length; i++) {
      weights[i] += c * error * inputs[i];
    }
  }
}
To train the perceptron, we need a set of inputs with a known answer. We could package this up in a class like so:

class Trainer {
 
A "Trainer" object stores the inputs and the correct answer.
  float[] inputs;
  int answer;
 
  Trainer(float x, float y, int a) {
    inputs = new float[3];
    inputs[0] = x;
    inputs[1] = y;
Note that the Trainer has the bias input built into its array.
    inputs[2] = 1;
    answer = a;
  }
}
Now the question becomes, how do we pick a point and know whether it is above or below a line? Let’s start with the formula for a line, where y is calculated as a function of x:
y = f(x)
In generic terms, a line can be described as:
y = ax + b
Here’s a specific example:
y = 2*x + 1
We can then write a Processing function with this in mind.

A function to calculate y based on x along a line
float f(float x) {
  return 2*x+1;
}

So, if we make up a point:

float x = random(width);
float y = random(height);
How do we know if this point is above or below the line? The line function f(x) gives us the y value on the line for that x position. Let’s call that yline.

The y position on the line
float yline = f(x);
If the y value we are examining is above the line, it will be less than yline.
Nature of Code Image
Figure 10.8

if (y < yline) {
The answer is -1 if y is above the line.
  answer = -1;
} else {
  answer = 1;
}
We can then make a Trainer object with the inputs and the correct answer.

Trainer t = new Trainer(x, y, answer);
Assuming we had a Perceptron object ptron, we could then train it by sending the inputs along with the known answer.

ptron.train(t.inputs,t.answer);

No comments:

Post a Comment