Neural Network Fundamentals
Part 4: The Perceptron - First Prediction
The Brain's Decision Committee - Chapter 4
Previously: In Parts 1-3, our committee member learned:
- How to read images as numbers (matrices)
- How to weigh evidence and apply personal thresholds (weights & bias)
- How to cast meaningful votes (activation functions)
Today's Mission: Our committee member is now fully equipped. It's time for their first real attempt at classifying lines! We'll build a complete Perceptron - the original neural network from 1958 - and watch it make predictions.
Spoiler: It won't go well at first. And that's exactly the point.
What You'll Learn in Part 4
By the end of this notebook, you will:
- Understand the Perceptron - The first working neural network (Rosenblatt, 1958)
- Generate a dataset - Create V/H line examples on-the-fly
- Implement the forward pass - Input → Weighted Sum → Activation → Output
- Build a Perceptron class - Clean, reusable code
- Make predictions - Watch the untrained network guess
- Understand why it fails - Random weights = random guesses
Prerequisites
Make sure you've completed:
- Part 0 & 1: Welcome & Matrices (
neural_network_fundamentals.ipynb)
- Part 2: The First Committee Member (
part_2_single_neuron.ipynb)
- Part 3: Activation Functions (
part_3_activation_functions.ipynb)
Setup: Import Dependencies
Let's import our tools and recreate the building blocks from previous notebooks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# =============================================================================# PART 4: THE PERCEPTRON - SETUP# ============================================================================= import numpy as npimport matplotlib.pyplot as pltfrom IPython.display import display, clear_output # Try to import ipywidgets for interactive featurestry: import ipywidgets as widgets WIDGETS_AVAILABLE = Trueexcept ImportError: WIDGETS_AVAILABLE = False print("Note: ipywidgets not installed. Interactive features will be limited.") # Set up matplotlib stylestyle_options = ['seaborn-v0_8-whitegrid', 'seaborn-whitegrid', 'ggplot', 'default']for style in style_options: try: plt.style.use(style) break except OSError: continue plt.rcParams['figure.figsize'] = [10, 6]plt.rcParams['font.size'] = 12np.random.seed(42) # For reproducible random numbers # =============================================================================# RECREATE OUR CANONICAL LINE IMAGES (from Parts 1-3)# ============================================================================= # Vertical line: bright pixels in the middle columnvertical_line = np.array([ [0, 1, 0], [0, 1, 0], [0, 1, 0]]) # Horizontal line: bright pixels in the middle rowhorizontal_line = np.array([ [0, 0, 0], [1, 1, 1], [0, 0, 0]]) # Flattened versions (9 pixels as a 1D array)vertical_flat = vertical_line.flatten()horizontal_flat = horizontal_line.flatten() print("Setup complete!")print("="*60)print("\nOur canonical images (as 3x3 matrices):")print(f"\nVertical Line: Horizontal Line:")print(f" {vertical_line[0]} {horizontal_line[0]}")print(f" {vertical_line[1]} {horizontal_line[1]}")print(f" {vertical_line[2]} {horizontal_line[2]}")print(f"\nAs flattened vectors (9 pixels):")print(f" Vertical: {vertical_flat}")print(f" Horizontal: {horizontal_flat}")
4.1 What is a Perceptron?
The Perceptron is the original neural network, invented by Frank Rosenblatt in 1958. It's the simplest possible neural network - just a single neuron!
Why Start with the Perceptron?
Before diving in, let's understand why the Perceptron matters:
| Question | Answer |
|---|
| What problem does it solve? | Binary classification (yes/no, cat/dog, vertical/horizontal) |
| Why is it fundamental? | ALL neural networks are built from Perceptron-like units |
| Why learn it first? | Simple enough to understand completely, complex enough to be useful |
The Key Insight: Once you understand ONE neuron, you understand the building block of ALL deep learning. Everything else is just more neurons connected together!
Historical Significance
The Perceptron was revolutionary. For the first time, a machine could learn to classify patterns without being explicitly programmed. Rosenblatt famously predicted it would eventually "be able to walk, talk, see, write, reproduce itself and be conscious of its existence."
(Spoiler: We're still working on most of those.)
Why This Architecture?
The Perceptron's design is inspired by biological neurons:
| Biological Neuron | Perceptron Equivalent | Purpose |
|---|
| Dendrites (receive signals) | Inputs (x) | Receive information |
| Synapses (connection strength) | Weights (w) | Determine importance |
| Cell body (integrates) | Weighted sum (Σ) | Combine all inputs |
| Axon hillock (threshold) | Bias (b) | Decision threshold |
| Axon (fires/doesn't fire) | Activation (f) | Output a decision |
This isn't just an analogy - it's the actual inspiration! Rosenblatt was trying to model how real neurons make decisions.
The Architecture
A Perceptron is exactly what we built in Parts 2-3:
INPUTS (x) WEIGHTS (w) SUM ACTIVATION OUTPUT
┌─────┐ ┌─────┐
│ x₁ │──────────────│ w₁ │─────┐
└─────┘ └─────┘ │
┌─────┐ ┌─────┐ │ ┌─────┐ ┌─────┐
│ x₂ │──────────────│ w₂ │─────┼────────▶│ Σ │──────────────│ f() │────────▶ ŷ
└─────┘ └─────┘ │ │+bias│ └─────┘
┌─────┐ ┌─────┐ │ └─────┘
│ x₃ │──────────────│ w₃ │─────┘
└─────┘ └─────┘
The Complete Formula (Everything Together!)
y^=f(∑i=1nwi⋅xi+b)=f(w⋅x+b)
Where:
- x = input vector (our flattened 9-pixel image)
- w = weight vector (9 weights, one per pixel)
- b = bias (the personal threshold)
- Σ = weighted sum (dot product + bias)
- f() = activation function (sigmoid for us)
- ŷ = predicted output (probability it's a vertical line)
Committee Analogy: The First Working Committee Member
"Our committee member is now fully trained in procedure. They can:
- Read the evidence (input)
- Weigh each piece by importance (weights)
- Apply their personal threshold (bias)
- Cast a meaningful vote (activation)
Now it's time for their first REAL case!"
4.2 Generating Our Dataset
To test our Perceptron, we need examples to classify. Instead of loading a dataset from a file, we'll generate one on-the-fly. This is a powerful technique!
First, What IS a Dataset?
A dataset is a collection of examples used to train or test a machine learning model. Each example has:
- Features (X): The input data (for us, 9 pixel values)
- Label (y): The correct answer (for us, 0 or 1)
This is called supervised learning because we "supervise" the model by giving it the right answers to learn from.
| Term | Meaning | Our Example |
|---|
| Sample | One example (input + label) | One 3x3 image + whether it's vertical |
| Feature | One piece of input data | One pixel value |
| Label | The correct answer | 0 (horizontal) or 1 (vertical) |
| Dataset | Collection of samples | 100 images with their labels |
Why Do We Need Datasets?
Machine learning models learn by example, not by rules:
| Traditional Programming | Machine Learning |
|---|
| Human writes rules | Human provides examples |
| "If middle column is bright, it's vertical" | Model sees 50 vertical + 50 horizontal lines |
| Rules are explicit | Model discovers patterns itself |
| Hard to handle edge cases | Learns from variety in data |
The magic: Instead of us figuring out the rules, the model discovers them from data!
Our Classification Task
| Image Type | Label (y) | Meaning |
|---|
| Vertical Line | 1 | "This is a vertical line" |
| Horizontal Line | 0 | "This is a horizontal line" |
Dataset Requirements
For a proper machine learning experiment, we need:
- Multiple examples - Not just 2 images, but many variations
- Balanced classes - Equal numbers of vertical and horizontal
- Some variety - Lines in different positions
- Optional noise - To make the problem harder (later)
The Dataset Generator Function
We'll create a function that generates any number of V/H line examples:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# =============================================================================# DATASET GENERATOR: Create V/H Line Examples On-The-Fly# ============================================================================= def generate_line_dataset(n_samples=100, noise_level=0.0, seed=None): """ Generate a dataset of vertical and horizontal line images. Parameters: ----------- n_samples : int Total number of samples (will be split evenly between V and H) noise_level : float (0.0 to 0.5) Amount of random noise to add (0.0 = clean, 0.3 = noisy) seed : int or None Random seed for reproducibility Returns: -------- X : numpy array of shape (n_samples, 9) Flattened 3x3 images y : numpy array of shape (n_samples,) Labels: 1 for vertical, 0 for horizontal """ if seed is not None: np.random.seed(seed) X = [] # Will hold all images (as flattened arrays) y = [] # Will hold all labels # Generate n_samples/2 vertical lines and n_samples/2 horizontal lines for i in range(n_samples): if i < n_samples // 2: # ----- VERTICAL LINE (label = 1) ----- # Pick a random column (0, 1, or 2) for variety col = np.random.randint(0, 3) # Create blank 3x3 image image = np.zeros((3, 3)) # Fill the chosen column with 1s image[:, col] = 1 # Add noise if requested if noise_level > 0: image = image + np.random.randn(3, 3) * noise_level image = np.clip(image, 0, 1) # Keep values in [0, 1] X.append(image.flatten()) y.append(1) # Label: Vertical else: # ----- HORIZONTAL LINE (label = 0) ----- # Pick a random row (0, 1, or 2) for variety row = np.random.randint(0, 3) # Create blank 3x3 image image = np.zeros((3, 3)) # Fill the chosen row with 1s image[row, :] = 1 # Add noise if requested if noise_level > 0: image = image + np.random.randn(3, 3) * noise_level image = np.clip(image, 0, 1) X.append(image.flatten()) y.append(0) # Label: Horizontal # Convert to numpy arrays X = np.array(X) y = np.array(y) # Shuffle the dataset (so V and H are mixed, not grouped) shuffle_idx = np.random.permutation(n_samples) X = X[shuffle_idx] y = y[shuffle_idx] return X, y print("Dataset generator function created!")print("="*60)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# =============================================================================# GENERATE AND VISUALIZE OUR DATASET# ============================================================================= # Generate 20 clean examples for visualizationX_small, y_small = generate_line_dataset(n_samples=20, noise_level=0.0, seed=42) print("DATASET GENERATED!")print("="*60)print(f"\nDataset shape: X = {X_small.shape}, y = {y_small.shape}")print(f" - {X_small.shape[0]} total samples")print(f" - Each sample has {X_small.shape[1]} features (3x3 = 9 pixels)")print(f"\nLabel distribution:")print(f" - Vertical lines (y=1): {np.sum(y_small == 1)} samples")print(f" - Horizontal lines (y=0): {np.sum(y_small == 0)} samples") # Show first few samplesprint("\n" + "="*60)print("FIRST 6 SAMPLES:")print("="*60) for i in range(6): image = X_small[i].reshape(3, 3) label = y_small[i] label_name = "VERTICAL" if label == 1 else "HORIZONTAL" print(f"\nSample {i}: Label = {label} ({label_name})") print(f" {image[0]}") print(f" {image[1]}") print(f" {image[2]}")1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# =============================================================================# VISUALIZE SAMPLE IMAGES FROM OUR DATASET# ============================================================================= # Show a grid of 10 sample imagesfig, axes = plt.subplots(2, 5, figsize=(12, 5)) for i, ax in enumerate(axes.flat): image = X_small[i].reshape(3, 3) label = y_small[i] label_name = "VERTICAL" if label == 1 else "HORIZONTAL" ax.imshow(image, cmap='Blues', vmin=0, vmax=1) ax.set_title(f"{label_name}\n(y={label})", fontsize=10) ax.axis('off') # Add grid lines for j in range(4): ax.axhline(j - 0.5, color='gray', linewidth=0.5) ax.axvline(j - 0.5, color='gray', linewidth=0.5) plt.suptitle('Sample Images from Our Generated Dataset', fontsize=14, fontweight='bold')plt.tight_layout()plt.show() print("\nNotice: The lines can appear in different positions (left/center/right columns,")print("top/center/bottom rows). This variety makes our dataset more realistic!")
4.3 The Forward Pass: Step-by-Step
The forward pass is how a neural network makes a prediction. Information flows forward from input to output.
What is the Forward Pass?
The term "forward pass" comes from the direction information flows:
INPUT → WEIGHTS × INPUT → ADD BIAS → ACTIVATION → OUTPUT
x → w · x → + b → f(z) → ŷ
| Term | Meaning |
|---|
| Forward | Information flows left-to-right, input-to-output |
| Pass | One complete journey through the network |
| Inference | Another name for making predictions (vs. training) |
Why "Forward"? Later in Part 5, we'll see the backward pass where error flows in the opposite direction. Together, they form the complete learning process!
Forward Pass vs Training
It's important to understand when each happens:
| Forward Pass (Inference) | Training |
|---|
| Make a prediction | Learn from mistakes |
| Uses current weights | Updates the weights |
| Fast (one direction) | Slower (forward + backward) |
| Used after training | Used to create the model |
| "What do I think this is?" | "How can I do better?" |
Right now, we're just doing the forward pass - making predictions. Training comes in Part 5!
The Four Steps of a Forward Pass
| Step | Operation | Formula | Purpose |
|---|
| 1 | Receive Input | x | Get the flattened image (9 values) |
| 2 | Weighted Sum | z = w · x | Compute dot product with weights |
| 3 | Add Bias | z = z + b | Add the personal threshold |
| 4 | Apply Activation | ŷ = f(z) | Convert score to meaningful output |
Let's trace through this step-by-step with actual numbers.
Committee Analogy
"The forward pass is the committee member reading a case file:
- They receive the evidence (input)
- They multiply each piece by their priority (weights)
- They add their personal standard (bias)
- They cast their vote (activation)"
Let's see this in code with EVERY step shown:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# =============================================================================# THE FORWARD PASS: Step-by-Step Walkthrough# ============================================================================= # Define the sigmoid activation function (from Part 3)def sigmoid(z): """Sigmoid activation: squashes any value to range (0, 1).""" return 1 / (1 + np.exp(-z)) # Let's use our canonical vertical line as the inputx = vertical_flat.copy() # Create some random weights (as if the Perceptron is untrained)np.random.seed(123) # For reproducibilityw = np.random.randn(9) * 0.5 # 9 random weightsb = np.random.randn() * 0.1 # 1 random bias print("="*70)print("FORWARD PASS: Step-by-Step with Real Numbers")print("="*70) # ----- STEP 1: Receive Input -----print("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ STEP 1: Receive Input │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nInput image (as 3x3 grid):")print(f" {x.reshape(3,3)[0]}")print(f" {x.reshape(3,3)[1]}")print(f" {x.reshape(3,3)[2]}")print(f"\nFlattened input vector x:")print(f" x = {x}") # ----- STEP 2: Weighted Sum (Dot Product) -----print("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ STEP 2: Weighted Sum (Dot Product) │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nWeights vector w:")print(f" w = [{', '.join([f'{wi:.3f}' for wi in w])}]") # Show element-wise multiplicationprint(f"\nElement-wise products (x[i] × w[i]):")products = x * wprint(f" = [{', '.join([f'{p:.3f}' for p in products])}]") # Sum the productsdot_product = np.sum(products)print(f"\nSum of products (the dot product):")print(f" w · x = {dot_product:.4f}") # ----- STEP 3: Add Bias -----print("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ STEP 3: Add Bias │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nBias value:")print(f" b = {b:.4f}")print(f"\nPre-activation value z:")z = dot_product + bprint(f" z = (w · x) + b")print(f" z = {dot_product:.4f} + {b:.4f}")print(f" z = {z:.4f}") # ----- STEP 4: Apply Activation -----print("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ STEP 4: Apply Activation (Sigmoid) │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nApplying sigmoid to z = {z:.4f}:")y_hat = sigmoid(z)print(f" ŷ = sigmoid(z) = 1 / (1 + e^(-z))")print(f" ŷ = 1 / (1 + e^(-{z:.4f}))")print(f" ŷ = {y_hat:.4f}") # ----- FINAL RESULT -----print("\n" + "="*70)print("FORWARD PASS COMPLETE!")print("="*70)print(f"\nFinal output: ŷ = {y_hat:.4f}")print(f"\nInterpretation: The Perceptron is {y_hat*100:.1f}% confident this is a VERTICAL line.")print(f"\nPrediction: {'VERTICAL (y=1)' if y_hat >= 0.5 else 'HORIZONTAL (y=0)'}")print(f"Actual label: VERTICAL (y=1)")print(f"{'✓ Correct!' if y_hat >= 0.5 else '✗ Wrong!'}")
4.4 Building the Perceptron Class
Now let's package everything into a clean, reusable Perceptron class. This is how real neural networks are implemented - as modular, reusable code.
Why Use a Class?
In programming, a class is a blueprint for creating objects. For neural networks, classes help us:
| Benefit | Explanation |
|---|
| Organization | Keep weights, bias, and methods together |
| Reusability | Create multiple Perceptrons easily |
| State | Remember weights between method calls |
| Readability | perceptron.predict(x) is clearer than raw math |
What Our Perceptron Needs
| Component | What It Does |
|---|
__init__() | Initialize weights and bias (randomly) |
forward() | Compute the forward pass (returns probability) |
predict() | Make a binary decision (0 or 1) |
Why Random Initialization?
Before training, we need some starting values for weights. Why random?
| Alternative | Problem |
|---|
| All zeros | All neurons would output the same thing! |
| All ones | Would overwhelm the activation function |
| Same value everywhere | All weights would update identically |
| Random small values | ✓ Breaks symmetry, allows diverse learning |
Key Insight: The SPECIFIC random values don't matter much - training will adjust them. But they must be:
- Small (typically between -0.1 and 0.1) to avoid saturating the sigmoid
- Different from each other to allow diverse learning
The scale * 0.1 keeps initial outputs near 0.5 (middle of sigmoid), where learning is fastest.
The Core Math (Keep It Simple!)
All the math fits in just two lines:
Forward pass: z = np.dot(weights, x) + bias
Activation: output = 1 / (1 + np.exp(-z))
Prediction: 1 if output >= 0.5 else 0
Understanding the Threshold (0.5)
The sigmoid outputs a probability between 0 and 1. To make a decision, we need a threshold:
| Output | Decision Rule | Prediction |
|---|
| 0.0 - 0.49 | "Probably NOT vertical" | 0 (Horizontal) |
| 0.50 - 1.0 | "Probably IS vertical" | 1 (Vertical) |
Why 0.5? It's the natural midpoint - if the model is >50% confident it's vertical, we call it vertical.
Note: In some applications, you might use a different threshold (e.g., 0.7 for "high confidence only"). But 0.5 is the standard starting point.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# =============================================================================# THE PERCEPTRON CLASS: Clean, Reusable Implementation# ============================================================================= class Perceptron: """ A single-layer Perceptron for binary classification. This is the simplest possible neural network - just one neuron! Attributes: n_inputs (int): Number of input features (9 for our 3x3 images) weights (array): One weight per input feature bias (float): The threshold/offset term """ def __init__(self, n_inputs): """ Initialize the Perceptron with random weights and bias. Parameters: n_inputs: Number of input features (pixels in our image) """ # Random weights, small values centered around 0 self.weights = np.random.randn(n_inputs) * 0.1 # Bias starts at 0 self.bias = 0.0 # Store for reference self.n_inputs = n_inputs # Storage for debugging/visualization self.last_z = None # Pre-activation value self.last_output = None # Final output def forward(self, x): """ Compute the forward pass - make a prediction. Parameters: x: Input array (can be 2D image or 1D flattened) Returns: float: Probability between 0 and 1 """ # Ensure x is a 1D array x = np.array(x).flatten() # STEP 1 & 2: Weighted sum + bias # Formula: z = w · x + b self.last_z = np.dot(self.weights, x) + self.bias # STEP 3: Apply sigmoid activation # Formula: output = 1 / (1 + e^(-z)) self.last_output = 1 / (1 + np.exp(-self.last_z)) return self.last_output def predict(self, x): """ Make a binary prediction (0 or 1). Parameters: x: Input array Returns: int: 0 (horizontal) or 1 (vertical) """ probability = self.forward(x) return 1 if probability >= 0.5 else 0 def __repr__(self): return f"Perceptron(inputs={self.n_inputs})" # Create our Perceptron!print("="*60)print("PERCEPTRON CLASS CREATED!")print("="*60) # Instantiate a Perceptron for 9 inputs (3x3 = 9 pixels)perceptron = Perceptron(n_inputs=9) print(f"\nOur Perceptron: {perceptron}")print(f"\nInitial weights (random, untrained):")print(f" Shape: {perceptron.weights.shape}")print(f" Values: [{', '.join([f'{w:.3f}' for w in perceptron.weights])}]")print(f"\nInitial bias: {perceptron.bias}")print("\nThe Perceptron is ready, but completely UNTRAINED!")print("Its weights are random - it doesn't know what a vertical line looks like.")
4.5 Initial Predictions: The Confused Perceptron
Now the moment of truth! Let's see how our untrained Perceptron performs.
What is Accuracy?
Accuracy is the simplest way to measure how well a model performs:
Accuracy=Total Number of PredictionsNumber of Correct Predictions×100%
For example:
- 80 correct out of 100 = 80% accuracy
- 50 correct out of 100 = 50% accuracy
The Baseline: What's "Random Guessing"?
For any classification task, there's a baseline accuracy - what you'd get by guessing randomly:
| Task Type | Classes | Random Baseline |
|---|
| Binary (yes/no) | 2 | 50% |
| 3-way choice | 3 | 33% |
| 10-way choice | 10 | 10% |
Our task is binary (vertical vs horizontal), so random guessing gives 50%.
Why this matters: If your model gets 50% on binary classification, it's learned NOTHING. It's no better than flipping a coin!
What We Expect
Since the weights are random, the Perceptron has no idea what it's doing. It's like asking someone who's never seen a line before to classify them.
Expected accuracy: Around 50% (random guessing for binary classification)
Committee Analogy
"Our committee member has been trained in procedure, but has never seen an actual case. They're about to make judgments based on completely arbitrary priorities. The results won't be pretty..."
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# =============================================================================# TESTING THE UNTRAINED PERCEPTRON ON OUR CANONICAL EXAMPLES# ============================================================================= print("="*70)print("TESTING UNTRAINED PERCEPTRON")print("="*70) # Test on our canonical vertical lineprint("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ Test 1: VERTICAL LINE │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nImage (3x3):")print(f" {vertical_line[0]}")print(f" {vertical_line[1]}")print(f" {vertical_line[2]}") prob_vertical = perceptron.forward(vertical_flat)pred_vertical = perceptron.predict(vertical_flat)actual_vertical = 1 print(f"\nForward pass calculation:")print(f" z = w · x + b = {perceptron.last_z:.4f}")print(f" output = sigmoid(z) = {prob_vertical:.4f}")print(f"\nPrediction: {pred_vertical} ({'VERTICAL' if pred_vertical == 1 else 'HORIZONTAL'})")print(f"Actual: {actual_vertical} (VERTICAL)")print(f"Result: {'CORRECT!' if pred_vertical == actual_vertical else 'WRONG!'}") # Test on our canonical horizontal lineprint("\n┌─────────────────────────────────────────────────────────────────────┐")print("│ Test 2: HORIZONTAL LINE │")print("└─────────────────────────────────────────────────────────────────────┘")print(f"\nImage (3x3):")print(f" {horizontal_line[0]}")print(f" {horizontal_line[1]}")print(f" {horizontal_line[2]}") prob_horizontal = perceptron.forward(horizontal_flat)pred_horizontal = perceptron.predict(horizontal_flat)actual_horizontal = 0 print(f"\nForward pass calculation:")print(f" z = w · x + b = {perceptron.last_z:.4f}")print(f" output = sigmoid(z) = {prob_horizontal:.4f}")print(f"\nPrediction: {pred_horizontal} ({'VERTICAL' if pred_horizontal == 1 else 'HORIZONTAL'})")print(f"Actual: {actual_horizontal} (HORIZONTAL)")print(f"Result: {'CORRECT!' if pred_horizontal == actual_horizontal else 'WRONG!'}")1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# =============================================================================# TESTING ON THE FULL DATASET: Calculate Accuracy# ============================================================================= # Generate a larger dataset for proper testingX_test, y_test = generate_line_dataset(n_samples=100, noise_level=0.0, seed=99) print("="*70)print("FULL DATASET EVALUATION")print("="*70)print(f"\nDataset: {len(y_test)} samples ({sum(y_test)} vertical, {len(y_test) - sum(y_test)} horizontal)") # Make predictions on all samplespredictions = []correct = 0 for i in range(len(X_test)): pred = perceptron.predict(X_test[i]) predictions.append(pred) if pred == y_test[i]: correct += 1 accuracy = correct / len(y_test) * 100 # Display results table (first 10 samples)print("\n" + "-"*70)print("FIRST 10 PREDICTIONS:")print("-"*70)print(f"{'Sample':<8} {'Actual':<12} {'Predicted':<12} {'Result':<10}")print("-"*70) for i in range(10): actual_name = "VERTICAL" if y_test[i] == 1 else "HORIZONTAL" pred_name = "VERTICAL" if predictions[i] == 1 else "HORIZONTAL" result = "Correct" if predictions[i] == y_test[i] else "WRONG" symbol = "+" if predictions[i] == y_test[i] else "X" print(f" {i:<6} {actual_name:<12} {pred_name:<12} {symbol} {result}") # Summaryprint("\n" + "="*70)print("ACCURACY SUMMARY")print("="*70)print(f"\n Total samples: {len(y_test)}")print(f" Correct: {correct}")print(f" Wrong: {len(y_test) - correct}")print(f"\n ACCURACY: {accuracy:.1f}%")print(f"\n Expected (random guessing): ~50%")print(f" Difference from random: {abs(accuracy - 50):.1f}%") if accuracy > 55: print("\n Hmm, slightly better than random - got lucky with the random weights!")elif accuracy < 45: print("\n Worse than random! The weights are actually hurting performance.")else: print("\n As expected: basically random guessing. The Perceptron is CONFUSED!")
4.6 Why It's Wrong: Understanding the Problem
Our Perceptron performed around 50% accuracy - basically coin-flipping. Why?
Understanding What Weights Actually DO
The weights are the Perceptron's knowledge. Each weight answers the question:
"How important is this input for making the decision?"
| Weight Value | Meaning |
|---|
| Large positive (+1.0) | "This input STRONGLY suggests class 1" |
| Small positive (+0.1) | "This input slightly suggests class 1" |
| Near zero (0.0) | "This input doesn't matter" |
| Small negative (-0.1) | "This input slightly suggests class 0" |
| Large negative (-1.0) | "This input STRONGLY suggests class 0" |
What We WANT the Perceptron to Learn
For detecting vertical lines, the ideal weights would encode this knowledge:
"Pixels in columns = IMPORTANT for vertical detection"
"Pixels in rows = NOT important (or negative) for vertical detection"
In weight terms:
- Middle column pixels → HIGH positive weights (vertical lines have these lit up)
- Other pixels → LOW or NEGATIVE weights (don't indicate verticality)
The Problem: Random Weights = No Knowledge
Our current weights are random - they encode NO knowledge about vertical lines:
- Some weights are positive when they should be negative
- Some weights are large when they should be small
- There's no pattern that matches "vertical line detection"
Feature Detection: What the Perceptron is Trying to Become
A feature detector is a model that responds strongly to specific patterns. Our goal:
| Input Pattern | Ideal Perceptron Response |
|---|
| Vertical line (any column) | High output (close to 1.0) |
| Horizontal line (any row) | Low output (close to 0.0) |
Right now: The Perceptron is NOT a feature detector - it's just random noise.
After training: It WILL become a vertical line feature detector!
The Problem: Random Weights = Random Decisions
Let's visualize what our random weights actually look like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# =============================================================================# VISUALIZING THE PROBLEM: Random Weights vs Ideal Weights# ============================================================================= # What ideal weights for a vertical detector should look likeideal_weights = np.array([ [-1, 2, -1], # Top row: look for middle [-1, 2, -1], # Middle row: look for middle [-1, 2, -1] # Bottom row: look for middle]).flatten() * 0.5 # Our actual (random) weightsactual_weights = perceptron.weights # Visualizefig, axes = plt.subplots(1, 3, figsize=(14, 4)) # Plot 1: Random weights (what we have)ax1 = axes[0]weights_grid = actual_weights.reshape(3, 3)im1 = ax1.imshow(weights_grid, cmap='RdBu', vmin=-0.5, vmax=0.5)ax1.set_title('Our Random Weights\n(Untrained)', fontsize=12, fontweight='bold')for i in range(3): for j in range(3): ax1.text(j, i, f'{weights_grid[i,j]:.2f}', ha='center', va='center', fontsize=10)plt.colorbar(im1, ax=ax1, label='Weight value') # Plot 2: Ideal weights (what we need)ax2 = axes[1]ideal_grid = ideal_weights.reshape(3, 3)im2 = ax2.imshow(ideal_grid, cmap='RdBu', vmin=-0.5, vmax=0.5)ax2.set_title('Ideal Weights\n(What we need)', fontsize=12, fontweight='bold')for i in range(3): for j in range(3): ax2.text(j, i, f'{ideal_grid[i,j]:.2f}', ha='center', va='center', fontsize=10)plt.colorbar(im2, ax=ax2, label='Weight value') # Plot 3: A vertical line (what we're trying to detect)ax3 = axes[2]im3 = ax3.imshow(vertical_line, cmap='Blues', vmin=0, vmax=1)ax3.set_title('Vertical Line\n(What we detect)', fontsize=12, fontweight='bold')for i in range(3): for j in range(3): ax3.text(j, i, f'{vertical_line[i,j]}', ha='center', va='center', fontsize=10)plt.colorbar(im3, ax=ax3, label='Pixel value') plt.tight_layout()plt.show() # Show the key insightprint("\nKEY INSIGHT: Why Random Weights Fail")print("="*60)print("""IDEAL weights for vertical detection should have: - HIGH values in the middle column (where vertical lines are) - LOW or NEGATIVE values elsewhere Our RANDOM weights have no pattern - they're just noise! The Perceptron doesn't KNOW what vertical lines look like yet.It needs to LEARN the right weights through TRAINING.""")1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# =============================================================================# WHAT IF WE HAD IDEAL WEIGHTS? (Sneak Preview)# ============================================================================= print("="*70)print("WHAT IF WE HAD THE RIGHT WEIGHTS? (A Preview)")print("="*70) # Create a new Perceptron and give it ideal weightsideal_perceptron = Perceptron(n_inputs=9)ideal_perceptron.weights = ideal_weights.copy()ideal_perceptron.bias = -1.5 # A good threshold print("\nIdeal weights (as 3x3 grid):")print(f" {ideal_perceptron.weights.reshape(3,3)[0]}")print(f" {ideal_perceptron.weights.reshape(3,3)[1]}")print(f" {ideal_perceptron.weights.reshape(3,3)[2]}")print(f"\nBias: {ideal_perceptron.bias}") # Test on the same datasetcorrect_ideal = 0for i in range(len(X_test)): if ideal_perceptron.predict(X_test[i]) == y_test[i]: correct_ideal += 1 accuracy_ideal = correct_ideal / len(y_test) * 100 print("\n" + "-"*70)print("COMPARISON:")print("-"*70)print(f"\n Random weights accuracy: {accuracy:.1f}%")print(f" Ideal weights accuracy: {accuracy_ideal:.1f}%")print(f"\n Improvement: +{accuracy_ideal - accuracy:.1f}%") print("\n" + "="*70)print("THE BIG QUESTION:")print("="*70)print("""How do we get from RANDOM weights to IDEAL weights? We don't want to hand-design them (that defeats the purpose!).We want the Perceptron to LEARN them automatically. This is what TRAINING does - and it's the topic of Part 5!""")
Part 4 Summary: What We've Learned
Key Concepts Mastered
| Concept | What It Is | Why It Matters |
|---|
| Perceptron | Single-neuron neural network | Simplest possible NN, building block for larger networks |
| Dataset Generation | Creating training examples | We can test our models without external data |
| Forward Pass | Input → Output computation | This is how predictions are made |
| Random Initialization | Starting with random weights | The beginning state before learning |
The Complete Perceptron Formula
y^=σ(w⋅x+b)=1+e−(w⋅x+b)1
Or in code:
z = np.dot(weights, x) + bias
output = 1 / (1 + np.exp(-z))
prediction = 1 if output >= 0.5 else 0
Committee Analogy Progress
| Part | What Happened |
|---|
| Part 1 | Committee learned to read evidence (matrices) |
| Part 2 | First member learned to weigh evidence (weights/bias) |
| Part 3 | Member learned to cast meaningful votes (activation) |
| Part 4 | Member attempted their first case - and FAILED! |
| Part 5 | (Next) Member learns from their mistakes |
Key Insight
Random weights = Random guessing
An untrained Perceptron has no knowledge. Its weights are just noise. To become useful, it must learn the right weights by seeing examples and adjusting based on its mistakes.
Knowledge Check
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# =============================================================================# KNOWLEDGE CHECK - Part 4# ============================================================================= print("KNOWLEDGE CHECK - Part 4: The Perceptron")print("="*60)print("\nAnswer these questions to test your understanding:\n") questions = [ { "q": "1. What are the steps of a forward pass (in order)?", "options": [ "A) Activation -> Weighted Sum -> Output", "B) Weighted Sum -> Add Bias -> Activation -> Output", "C) Input -> Output -> Activation", "D) Bias -> Weights -> Sigmoid" ], "answer": "B", "explanation": "The forward pass is: (1) compute weighted sum of inputs, (2) add bias, (3) apply activation function, (4) get output." }, { "q": "2. Why does an untrained Perceptron get ~50% accuracy?", "options": [ "A) Because sigmoid always outputs 0.5", "B) Because the dataset is unbalanced", "C) Because random weights give random predictions", "D) Because the bias is always 0" ], "answer": "C", "explanation": "Random weights have no meaningful pattern, so the Perceptron essentially guesses randomly. For binary classification, random guessing gives ~50% accuracy." }, { "q": "3. What does the forward pass output for binary classification?", "options": [ "A) Always 0 or 1 exactly", "B) A probability between 0 and 1", "C) Any real number", "D) The raw weighted sum" ], "answer": "B", "explanation": "The sigmoid activation squashes the output to a probability between 0 and 1. We then threshold at 0.5 to get a binary prediction." }, { "q": "4. For a vertical line detector, where should the weights be highest?", "options": [ "A) In the corners", "B) In the middle column", "C) In the middle row", "D) Equally everywhere" ], "answer": "B", "explanation": "Vertical lines appear in columns. High weights in the middle column will give high scores when vertical pixels align with them." }, { "q": "5. Who invented the Perceptron?", "options": [ "A) Geoffrey Hinton", "B) Frank Rosenblatt", "C) Yann LeCun", "D) Alan Turing" ], "answer": "B", "explanation": "Frank Rosenblatt invented the Perceptron in 1958 at Cornell. It was the first neural network that could learn!" }] for q in questions: print(q["q"]) for opt in q["options"]: print(f" {opt}") print() print("\n" + "="*60)print("Scroll down for answers...")print("="*60)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# =============================================================================# ANSWERS - Knowledge Check Part 4# ============================================================================= print("ANSWERS - Part 4 Knowledge Check")print("="*60) for i, q in enumerate(questions, 1): print(f"\n{i}. Answer: {q['answer']}") print(f" Explanation: {q['explanation']}") print("\n" + "="*60)print("How did you do?")print(" 5/5: Perceptron Expert!")print(" 4/5: Great understanding!")print(" 3/5: Review the sections you missed")print(" <3: Re-read Part 4 before continuing")print("="*60)
What's Next?
You've completed Part 4! Our Perceptron is built but confused - it makes random guesses because its weights are random.
Coming Up in Part 5: Training - Learning from Mistakes
In Part 5, we'll cover:
- Loss Functions - Measuring "how wrong" a prediction is
- Gradient Descent - Finding better weights
- Backpropagation - How errors flow backward
- The Training Loop - Iteratively improving weights
- Watch It Learn - See accuracy improve from 50% to 90%+!
Continue to Part 5: part_5_training.ipynb
"The Perceptron is ready. The data is ready. Now it's time to LEARN."
The Brain's Decision Committee - Learning to See, One Step at a Time