Neural Network Fundamentals
Part 9: Full Implementation - Mastery
The Brain's Decision Committee - Chapter 9
The Complete Journey
We've come a long way! From understanding matrices to building neurons, from single perceptrons to multi-layer networks, from training basics to handling deep learning challenges - now it's time to bring everything together .
"The complete, trained committee works in harmony. All the lessons learned, all the challenges overcome, unified into one elegant solution."
What You'll Learn in Part 9
By the end of this notebook, you will have:
A Complete Neural Network Class - All concepts unified in clean, documented code
A Full Data Pipeline - Train/validation/test splits with proper handling
A Robust Training Pipeline - With validation monitoring and early stopping
Complete Evaluation - All metrics, confusion matrix, and saliency visualization
Interactive Dashboard - Experiment with hyperparameters in real-time
The Final V/H Classifier - Our mission accomplished!
Prerequisites
This is the culmination notebook - you should have completed:
Part 0-1: Matrices and fundamentals
Part 2: Single neurons
Part 3: Activation functions
Part 4: The Perceptron
Part 5: Training
Part 6: Evaluation
Part 7: Hidden layers
Part 8: Deep learning challenges
Concepts We're Unifying
Part Concept How We'll Use It 1 Matrices, dot product Data representation, weight operations 2 Neuron anatomy Building blocks of our network 3 Activation functions ReLU for hidden, sigmoid for output 4 Forward pass Making predictions 5 Loss, gradients, backprop Learning from mistakes 6 Metrics, saliency Evaluating and understanding 7 Hidden layers Multiple specialists 8 Overfitting prevention Early stopping, proper sizing
Setup: Import Dependencies 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# ============================================================================= # PART 9: FULL IMPLEMENTATION - SETUP # ============================================================================= import numpy as np import matplotlib .pyplot as plt from IPython .display import display , clear_output # Try to import ipywidgets for interactive features try : import ipywidgets as widgets WIDGETS_AVAILABLE = True except ImportError : WIDGETS_AVAILABLE = False print ("Note: ipywidgets not installed. Interactive features will be limited." ) # Set up matplotlib style style_options = ['seaborn-v0_8-whitegrid' , 'seaborn-whitegrid' , 'ggplot' , 'default' ]for style in style_options : try : plt .style .use (style ) break except OSError : continue plt .rcParams ['figure.figsize' ] = [10 , 6 ]plt .rcParams ['font.size' ] = 12 print ("=" *70 )print ("PART 9: FULL IMPLEMENTATION" )print ("The Complete V/H Line Classifier" )print ("=" *70 )
9.1 The Complete Neural Network Class
This is the unified implementation incorporating everything we've learned:
Feature Part Learned Implementation Activation functions Part 3 ReLU for hidden, Sigmoid for output Forward propagation Parts 4, 7 Matrix operations through layers Loss function Part 5 Binary Cross-Entropy Backpropagation Parts 5, 7 Chain rule through all layers Validation monitoring Part 8 Track train/val metrics Early stopping Part 8 Stop when val loss increases
Why This Architecture?
Input (9) → Hidden (8, ReLU) → Output (1, Sigmoid)
Layer Size Activation Why? Input 9 None One neuron per pixel (3×3 = 9) Hidden 8 ReLU Enough specialists without overfitting; ReLU prevents vanishing gradients Output 1 Sigmoid Binary classification needs probability in (0,1)
Why Two Different Initializations?
We use different initialization strategies for different activations:
Initialization Formula Used For Why? He w ∼ N ( 0 , 2 / n i n ) w \sim N(0, \sqrt{2/n_{in}}) w ∼ N ( 0 , 2/ n in ) ReLU layers ReLU "kills" half the neurons (negative z), so we need 2× variance Xavier w ∼ N ( 0 , 1 / n i n ) w \sim N(0, \sqrt{1/n_{in}}) w ∼ N ( 0 , 1/ n in ) Sigmoid/Tanh These are symmetric around 0, so standard variance works
Using the wrong initialization can cause:
Too small: Signals shrink through layers (vanishing)
Too large: Signals explode through layers (exploding)
cell 005 full lab recommended1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# ============================================================================= # THE COMPLETE NEURAL NETWORK CLASS # ============================================================================= class NeuralNetwork : "" " Complete Neural Network implementation for binary classification . This class unifies all concepts from Parts 1 -8 : - Matrix operations (Part 1 ) - Neuron anatomy (Part 2 ) - Activation functions (Part 3 ) - Forward propagation (Part 4 ) - Training with backprop (Part 5 ) - Evaluation metrics (Part 6 ) - Hidden layers (Part 7 ) - Overfitting prevention (Part 8 ) Architecture : Input → Hidden (ReLU ) → Output (Sigmoid ) "" " # ========================================================================= # ACTIVATION FUNCTIONS (Part 3) # ========================================================================= @staticmethod def sigmoid (z ): """Sigmoid: maps to (0, 1) - used for output layer (Part 3.3)""" return 1 / (1 + np .exp (-np .clip (z , -500 , 500 ))) @staticmethod def sigmoid_derivative (z ): """Derivative of sigmoid: σ(z) * (1 - σ(z)) (Part 3.3.1)""" s = NeuralNetwork .sigmoid (z ) return s * (1 - s ) @staticmethod def relu (z ): """ReLU: max(0, z) - used for hidden layers (Part 3.5)""" return np .maximum (0 , z ) @staticmethod def relu_derivative (z ): """Derivative of ReLU: 1 if z > 0, else 0 (Part 3.5)""" return (z > 0 ).astype (float ) # ========================================================================= # INITIALIZATION (Part 7 - Xavier/He initialization) # ========================================================================= def __init__ (self , n_inputs , n_hidden , n_outputs =1 , seed =None ): "" " Initialize the neural network . Parameters : n_inputs : Number of input features (9 for 3x3 images ) n_hidden : Number of hidden neurons (the "specialists" ) n_outputs : Number of outputs (1 for binary classification ) seed : Random seed for reproducibility "" " if seed is not None : np .random .seed (seed ) self .n_inputs = n_inputs self .n_hidden = n_hidden self .n_outputs = n_outputs # He initialization for ReLU layers (Part 8 - proper initialization) self .W1 = np .random .randn (n_hidden , n_inputs ) * np .sqrt (2.0 / n_inputs ) self .b1 = np .zeros (n_hidden ) # Xavier initialization for sigmoid output self .W2 = np .random .randn (n_outputs , n_hidden ) * np .sqrt (1.0 / n_hidden ) self .b2 = np .zeros (n_outputs ) # Cache for forward pass (needed for backprop) self .cache = {} # Training history self .train_loss_history = [] self .val_loss_history = [] self .train_acc_history = [] self .val_acc_history = [] # Best model weights (for early stopping) self .best_weights = None self .best_val_loss = float ('inf' ) self .best_epoch = 0 # ========================================================================= # FORWARD PROPAGATION (Parts 4, 7) # ========================================================================= def forward (self , X ): "" " Forward pass : Input → Hidden (ReLU ) → Output (Sigmoid ) The "Committee Meeting" - each specialist examines the evidence , then the final decision maker combines their opinions . "" " # Ensure X is 2D X = np .atleast_2d (X ) # Layer 1: Input → Hidden (with ReLU - Part 3.5) self .cache ['X' ] = X self .cache ['Z1' ] = np .dot (X , self .W1 .T ) + self .b1 # (batch, n_hidden) self .cache ['A1' ] = self .relu (self .cache ['Z1' ]) # ReLU activation # Layer 2: Hidden → Output (with Sigmoid - Part 3.3) self .cache ['Z2' ] = np .dot (self .cache ['A1' ], self .W2 .T ) + self .b2 # (batch, n_outputs) self .cache ['A2' ] = self .sigmoid (self .cache ['Z2' ]) # Sigmoid for probability return self .cache ['A2' ] def predict (self , X ): """Make binary predictions (0 or 1).""" probs = self .forward (X ) return (probs >= 0.5 ).astype (int ).flatten () # ========================================================================= # LOSS FUNCTION (Part 5.3 - Binary Cross-Entropy) # ========================================================================= def compute_loss (self , y_true , y_pred ): "" " Binary Cross -Entropy loss (Part 5.3 ) Measures "surprise" - how unexpected the predictions are . "" " epsilon = 1e-15 # Prevent log(0) y_pred = np .clip (y_pred , epsilon , 1 - epsilon ) y_true = y_true .reshape (-1 , 1 ) loss = -np .mean (y_true * np .log (y_pred ) + (1 - y_true ) * np .log (1 - y_pred )) return loss # ========================================================================= # BACKPROPAGATION (Parts 5.6, 5.7, 7.4) # ========================================================================= def backward (self , y_true , learning_rate ): "" " Backpropagation : Compute gradients and update weights . The "Blame Assignment" - tracing errors back through the committee . "" " m = len (y_true ) y_true = y_true .reshape (-1 , 1 ) # Output layer gradients (Part 5.6) dZ2 = self .cache ['A2' ] - y_true # (batch, n_outputs) dW2 = np .dot (dZ2 .T , self .cache ['A1' ]) / m db2 = np .mean (dZ2 , axis =0 ) # Hidden layer gradients (Part 7.4 - chain rule) dA1 = np .dot (dZ2 , self .W2 ) dZ1 = dA1 * self .relu_derivative (self .cache ['Z1' ]) dW1 = np .dot (dZ1 .T , self .cache ['X' ]) / m db1 = np .mean (dZ1 , axis =0 ) # Update weights (Gradient Descent - Part 5.4) self .W2 -= learning_rate * dW2 self .b2 -= learning_rate * db2 self .W1 -= learning_rate * dW1 self .b1 -= learning_rate * db1 # ========================================================================= # EVALUATION (Part 6) # ========================================================================= def evaluate (self , X , y ): """Compute loss and accuracy on a dataset.""" y_pred = self .forward (X ) loss = self .compute_loss (y , y_pred ) predictions = (y_pred >= 0.5 ).astype (int ).flatten () accuracy = np .mean (predictions == y ) return loss , accuracy def confusion_matrix (self , X , y ): """Compute confusion matrix (Part 6.3).""" predictions = self .predict (X ) TP = np .sum ((predictions == 1 ) & (y == 1 )) TN = np .sum ((predictions == 0 ) & (y == 0 )) FP = np .sum ((predictions == 1 ) & (y == 0 )) FN = np .sum ((predictions == 0 ) & (y == 1 )) return {'TP' : TP , 'TN' : TN , 'FP' : FP , 'FN' : FN } # ========================================================================= # TRAINING WITH EARLY STOPPING (Parts 5.8, 8.2) # ========================================================================= def train (self , X_train , y_train , X_val =None , y_val =None , learning_rate =0.1 , epochs =100 , early_stopping_patience =10 , verbose =True ): "" " Train the neural network with optional early stopping . Parameters : X_train , y_train : Training data X_val , y_val : Validation data (for early stopping ) learning_rate : Step size for gradient descent (Part 5.5 ) epochs : Maximum training iterations early_stopping_patience : Stop if val loss doesn 't improve (Part 8.2 ) verbose : Print progress "" " self .train_loss_history = [] self .val_loss_history = [] self .train_acc_history = [] self .val_acc_history = [] patience_counter = 0 for epoch in range (epochs ): # Forward pass self .forward (X_train ) # Backward pass (learning) self .backward (y_train , learning_rate ) # Evaluate training train_loss , train_acc = self .evaluate (X_train , y_train ) self .train_loss_history .append (train_loss ) self .train_acc_history .append (train_acc ) # Evaluate validation (if provided) if X_val is not None : val_loss , val_acc = self .evaluate (X_val , y_val ) self .val_loss_history .append (val_loss ) self .val_acc_history .append (val_acc ) # Early stopping check (Part 8.2) if val_loss < self .best_val_loss : self .best_val_loss = val_loss self .best_epoch = epoch self .best_weights = { 'W1' : self .W1 .copy (), 'b1' : self .b1 .copy (), 'W2' : self .W2 .copy (), 'b2' : self .b2 .copy () } patience_counter = 0 else : patience_counter += 1 if patience_counter >= early_stopping_patience : if verbose : print (f "\n Early stopping at epoch {epoch+1}!" ) print (f " Best epoch was {self.best_epoch+1} with val_loss={self.best_val_loss:.4f}" ) self ._restore_best_weights () break # Progress output if verbose and (epoch + 1 ) % 20 == 0 : msg = f " Epoch {epoch+1:3d}: Train Loss={train_loss:.4f}, Train Acc={train_acc*100:.1f}%" if X_val is not None : msg += f ", Val Loss={val_loss:.4f}, Val Acc={val_acc*100:.1f}%" print (msg ) if verbose : final_acc = self .train_acc_history [-1 ] print (f "\nTraining complete! Final train accuracy: {final_acc*100:.1f}%" ) if X_val is not None : print (f "Best validation loss: {self.best_val_loss:.4f} at epoch {self.best_epoch+1}" ) return self def _restore_best_weights (self ): """Restore weights from best epoch.""" if self .best_weights is not None : self .W1 = self .best_weights ['W1' ] self .b1 = self .best_weights ['b1' ] self .W2 = self .best_weights ['W2' ] self .b2 = self .best_weights ['b2' ] print ("NeuralNetwork class defined!" )print ("This combines ALL concepts from Parts 1-8." )Understanding Key Implementation Details
Why do we use a cache dictionary?
During backpropagation, we need values from the forward pass:
X - the input, needed to compute gradients for W1
Z1 - pre-activation of hidden layer, needed for ReLU derivative
A1 - hidden activations, needed to compute gradients for W2
Z2, A2 - output layer values for computing output gradients
Without caching, we'd have to recompute forward pass during backward pass (wasteful!).
Why save best_weights separately?
Early stopping works by:
Training for many epochs
Saving weights whenever validation loss improves
Restoring the best weights at the end
If we only kept current weights, we'd lose the best model when we continue training past the optimal point.
Why use np.atleast_2d(X)?
This ensures our math works for both:
Single sample: shape (9,) → (1, 9)
Batch of samples: shape (batch, 9) → unchanged
Matrix multiplication requires 2D arrays, so this handles both cases gracefully.
9.2 The Complete Data Pipeline
A proper data pipeline includes:
Step Purpose Part Referenced Data Generation Create V/H line images Part 4 Train/Val/Test Split Separate data for different purposes Part 6, 8 Shuffling Prevent order-based patterns Part 5
Why Three Splits?
Split Purpose Used For Training (60%)Learn patterns Backpropagation Validation (20%)Tune hyperparameters Early stopping, model selection Test (20%)Final evaluation Report true performance
Key Rule: NEVER use test data during training or tuning!
Why These Specific Percentages?
60/20/20 is a common starting point, but it depends on your data:
Dataset Size Recommended Split Reasoning Small (<500) 60/20/20 Need enough validation/test for reliable estimates Medium (500-10K) 70/15/15 Can afford more training data Large (>10K) 80/10/10 Even 10% gives hundreds of test samples
For our 300 samples:
180 training (60%) → Enough to learn V/H patterns
60 validation (20%) → Enough to detect overfitting
60 test (20%) → Enough for reliable accuracy estimate
Why Shuffle the Data?
Without shuffling, disaster can strike!
Imagine our data is generated in order:
Samples 1-150: All VERTICAL
Samples 151-300: All HORIZONTAL
If we split 60/20/20 without shuffling:
Training (1-180): 150 vertical, 30 horizontal (imbalanced!)
Validation (181-240): 0 vertical, 60 horizontal (all one class!)
Test (241-300): 0 vertical, 60 horizontal (all one class!)
The model would learn wrong patterns and evaluation would be meaningless!
Shuffling ensures each split has a representative mix of both classes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# ============================================================================= # THE COMPLETE DATA PIPELINE # ============================================================================= def generate_line_dataset (n_samples =100 , noise_level =0.0 , seed =None ): "" " Generate vertical (1 ) and horizontal (0 ) line images . This is the dataset we 've been working with throughout the series . Our "mission" from Part 0 : classify these images correctly ! Parameters : n_samples : Total number of images to generate noise_level : Amount of random noise (0.0 = clean , 0.3 = noisy ) seed : Random seed for reproducibility Returns : X : Array of flattened 3x3 images , shape (n_samples , 9 ) y : Labels (1 =vertical , 0 =horizontal ), shape (n_samples ,) "" " if seed is not None : np .random .seed (seed ) X , y = [], [] for i in range (n_samples ): image = np .zeros ((3 , 3 )) if i < n_samples // 2 : # Vertical line - can be in ANY column col = np .random .randint (0 , 3 ) image [:, col ] = 1 label = 1 else : # Horizontal line - can be in ANY row row = np .random .randint (0 , 3 ) image [row , :] = 1 label = 0 # Add noise if specified if noise_level > 0 : image = np .clip (image + np .random .randn (3 , 3 ) * noise_level , 0 , 1 ) X .append (image .flatten ()) # Flatten to 1D (Part 2) y .append (label ) X , y = np .array (X ), np .array (y ) # Shuffle (Part 5) shuffle_idx = np .random .permutation (n_samples ) return X [shuffle_idx ], y [shuffle_idx ] def create_train_val_test_split (n_total =300 , noise_level =0.1 , seed =42 ): "" " Create proper train /validation /test splits . Split ratios : 60 % train , 20 % validation , 20 % test "" " np .random .seed (seed ) # Generate all data X , y = generate_line_dataset (n_total , noise_level =noise_level , seed =seed ) # Calculate split indices n_train = int (n_total * 0.6 ) n_val = int (n_total * 0.2 ) # Split X_train , y_train = X [:n_train ], y [:n_train ] X_val , y_val = X [n_train :n_train +n_val ], y [n_train :n_train +n_val ] X_test , y_test = X [n_train +n_val :], y [n_train +n_val :] return (X_train , y_train ), (X_val , y_val ), (X_test , y_test ) # Create our datasets print ("=" *70 )print ("CREATING THE COMPLETE DATASET" )print ("=" *70 ) (X_train , y_train ), (X_val , y_val ), (X_test , y_test ) = create_train_val_test_split ( n_total =300 , noise_level =0.15 , seed =42 ) print (f "\nDataset created with 15% noise:" )print (f " Training: {len(X_train)} samples ({sum(y_train)} vertical, {len(y_train)-sum(y_train)} horizontal)" )print (f " Validation: {len(X_val)} samples ({sum(y_val)} vertical, {len(y_val)-sum(y_val)} horizontal)" )print (f " Test: {len(X_test)} samples ({sum(y_test)} vertical, {len(y_test)-sum(y_test)} horizontal)" )print (f "\nTotal: {len(X_train) + len(X_val) + len(X_test)} samples" )1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# ============================================================================= # VISUALIZE SAMPLE IMAGES FROM OUR DATASET # ============================================================================= fig , axes = plt .subplots (2 , 5 , figsize =(12 , 5 )) # Show 5 vertical and 5 horizontal examples v_indices = np .where (y_train == 1 )[0 ][:5 ]h_indices = np .where (y_train == 0 )[0 ][:5 ] for i , idx in enumerate (v_indices ): ax = axes [0 , i ] ax .imshow (X_train [idx ].reshape (3 , 3 ), cmap ='Blues' , vmin =0 , vmax =1 ) ax .set_title ('VERTICAL' , fontsize =10 ) ax .axis ('off' ) for i , idx in enumerate (h_indices ): ax = axes [1 , i ] ax .imshow (X_train [idx ].reshape (3 , 3 ), cmap ='Oranges' , vmin =0 , vmax =1 ) ax .set_title ('HORIZONTAL' , fontsize =10 ) ax .axis ('off' ) plt .suptitle ('Our Mission: Classify These 3x3 Images\n(With 15% Noise)' , fontsize =14 , fontweight ='bold' ) plt .tight_layout ()plt .show () print ("" "OUR MISSION (from Part 0 ):════════════════════════════════════════════════════════════════════════ Build a neural network that can correctly classify these images as : • VERTICAL (1 ) - line goes up -down • HORIZONTAL (0 ) - line goes left -right The challenge : Noise makes the patterns harder to detect !The committee must learn to see through the noise ."" ")
9.3 Training the Complete Network
Now we train our neural network using everything we've learned:
Setting Value Why (Part Reference) Hidden neurons 8 Enough for patterns, not too many (Part 8 - overfitting) Learning rate 0.5 Fast but stable (Part 5) Epochs 200 Enough to learn, with early stopping (Part 8) Early stopping patience 20 Stop if no improvement for 20 epochs Activation (hidden) ReLU Prevents vanishing gradients (Parts 3, 8) Activation (output) Sigmoid Gives probability (Part 3)
How We Chose These Values
Hidden neurons = 8:
Our data has 9 inputs and 2 classes. Rule of thumb:
Minimum: 2-4 (can represent basic patterns)
Our choice: 8 (room for multiple pattern detectors)
Maximum: ~20 for 180 training samples (avoid overfitting)
Why 8 works: We need neurons to detect "left column", "middle column", "right column" for vertical, plus "top row", "middle row", "bottom row" for horizontal. 6-8 neurons can capture these patterns.
Learning rate = 0.5:
Learning Rate Behavior Too low (0.001) Very slow, may not converge in 200 epochs Good (0.1 - 1.0) Learns quickly, stable Too high (5.0) Overshoots, unstable, may diverge
For small networks with BCE loss, 0.5 is often a good starting point.
Epochs = 200 with patience = 20:
200 is a maximum "budget" of training steps
Patience of 20 means: "Stop if validation doesn't improve for 20 epochs"
This combination lets us train long enough to converge, but stops early if we're overfitting
Understanding Parameter Count
Total parameters = (input × hidden) + hidden + (hidden × output) + output
= (9 × 8) + 8 + (8 × 1) + 1
= 72 + 8 + 8 + 1 = 89 parameters
Rule of thumb: You want at least 10× more training samples than parameters.
We have 180 training samples
We have 89 parameters
Ratio: 180/89 ≈ 2× (borderline, which is why we use early stopping!)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# ============================================================================= # TRAIN THE COMPLETE NETWORK # ============================================================================= print ("=" *70 )print ("TRAINING THE NEURAL NETWORK" )print ("=" *70 ) # Create the network model = NeuralNetwork ( n_inputs =9 , # 3x3 image = 9 pixels n_hidden =8 , # 8 specialists in our committee n_outputs =1 , # Binary output (V or H) seed =42 ) print (f "\nNetwork Architecture:" )print (f " Input layer: {model.n_inputs} neurons (one per pixel)" )print (f " Hidden layer: {model.n_hidden} neurons (ReLU activation)" )print (f " Output layer: {model.n_outputs} neuron (Sigmoid activation)" )print (f " Total parameters: {9*8 + 8 + 8*1 + 1} = {9*8 + 8 + 8*1 + 1}" ) print ("\n" + "-" *70 )print ("Training with early stopping..." )print ("-" *70 ) # Train! model .train ( X_train , y_train , X_val , y_val , learning_rate =0.5 , epochs =200 , early_stopping_patience =20 , verbose =True ) print ("\n" + "=" *70 )print ("TRAINING COMPLETE!" )print ("=" *70 )How Training Works: The Complete Flow
Here's what happens during each training epoch:
┌─────────────────────────────────────────────────────────────────────┐
│ ONE TRAINING EPOCH │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. FORWARD PASS (Make predictions) │
│ Input X → [W1×X + b1] → ReLU → [W2×H + b2] → Sigmoid → Output │
│ ↓ ↓ │
│ Cache Z1, A1 Cache Z2, A2 │
│ │
│ 2. COMPUTE LOSS │
│ BCE = -mean(y×log(ŷ) + (1-y)×log(1-ŷ)) │
│ │
│ 3. BACKWARD PASS (Compute gradients) │
│ ∂L/∂W2 ← output error × hidden activations (from cache) │
│ ∂L/∂W1 ← hidden error × input (chain rule through ReLU) │
│ │
│ 4. UPDATE WEIGHTS │
│ W1 ← W1 - lr × ∂L/∂W1 │
│ W2 ← W2 - lr × ∂L/∂W2 │
│ │
│ 5. EVALUATE │
│ Compute train loss/accuracy │
│ Compute val loss/accuracy │
│ │
│ 6. EARLY STOPPING CHECK │
│ If val_loss improved → save weights │
│ If no improvement for `patience` epochs → stop & restore best │
│ │
└─────────────────────────────────────────────────────────────────────┘
This process repeats until:
Maximum epochs reached, OR
Early stopping triggers (no validation improvement)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# ============================================================================= # VISUALIZE TRAINING PROGRESS # ============================================================================= fig , axes = plt .subplots (1 , 2 , figsize =(14 , 5 )) epochs = range (1 , len (model .train_loss_history ) + 1 ) # Plot 1: Loss curves ax = axes [0 ]ax .plot (epochs , model .train_loss_history , 'b-' , label ='Training Loss' , linewidth =2 )ax .plot (epochs , model .val_loss_history , 'r-' , label ='Validation Loss' , linewidth =2 )ax .axvline (x =model .best_epoch +1 , color ='green' , linestyle ='--' , linewidth =2 , label =f 'Best epoch ({model.best_epoch+1})' ) ax .set_xlabel ('Epoch' , fontsize =12 )ax .set_ylabel ('Loss (BCE)' , fontsize =12 )ax .set_title ('Training Progress: Loss' , fontsize =14 , fontweight ='bold' )ax .legend ()ax .grid (True , alpha =0.3 ) # Plot 2: Accuracy curves ax = axes [1 ]ax .plot (epochs , [a *100 for a in model .train_acc_history ], 'b-' , label ='Training Accuracy' , linewidth =2 ) ax .plot (epochs , [a *100 for a in model .val_acc_history ], 'r-' , label ='Validation Accuracy' , linewidth =2 ) ax .axvline (x =model .best_epoch +1 , color ='green' , linestyle ='--' , linewidth =2 , label =f 'Best epoch ({model.best_epoch+1})' ) ax .set_xlabel ('Epoch' , fontsize =12 )ax .set_ylabel ('Accuracy (%)' , fontsize =12 )ax .set_title ('Training Progress: Accuracy' , fontsize =14 , fontweight ='bold' )ax .legend ()ax .grid (True , alpha =0.3 )ax .set_ylim (40 , 105 ) plt .tight_layout ()plt .show () print ("" "TRAINING INSIGHTS :════════════════════════════════════════════════════════════════════════ • Training and validation curves should stay close (no overfitting !) • Early stopping saved the best model before potential overfitting • The committee learned the V /H pattern effectively "" ")
9.4 Complete Evaluation
Now we evaluate our trained model on the test set - data it has NEVER seen during training or validation. This is the true measure of generalization.
Evaluation Metrics (Part 6)
Metric What It Measures Accuracy Overall correctness Precision Of predicted positives, how many are correct? Recall Of actual positives, how many did we find? F1 Score Harmonic mean of precision and recall Confusion Matrix Detailed breakdown of TP, TN, FP, FN
What Do "Good" Values Look Like?
Metric Poor Okay Good Excellent Accuracy <60% 60-75% 75-90% >90% F1 Score <0.5 0.5-0.7 0.7-0.9 >0.9
For our V/H classifier:
With 15% noise, >85% accuracy is quite good
Balanced precision/recall indicates no systematic bias
Similar train/val/test accuracy indicates good generalization
Reading the Confusion Matrix for Insights
The confusion matrix tells us not just HOW MANY errors, but WHAT KIND:
Scenario Meaning Possible Cause High FP (false alarm) Saying "vertical" too often Model is too sensitive to vertical patterns High FN (misses) Missing vertical lines Model isn't detecting vertical patterns well Balanced errors FP ≈ FN Model is "confused" by noise, not biased
Ideal: Most values on the diagonal (TN, TP), minimal off-diagonal (FP, FN).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# ============================================================================= # COMPLETE EVALUATION ON TEST SET # ============================================================================= print ("=" *70 )print ("FINAL EVALUATION ON TEST SET" )print ("(Data the model has NEVER seen!)" )print ("=" *70 ) # Get predictions test_predictions = model .predict (X_test ) # Confusion matrix cm = model .confusion_matrix (X_test , y_test ) # Calculate metrics accuracy = (cm ['TP' ] + cm ['TN' ]) / len (y_test )precision = cm ['TP' ] / (cm ['TP' ] + cm ['FP' ]) if (cm ['TP' ] + cm ['FP' ]) > 0 else 0 recall = cm ['TP' ] / (cm ['TP' ] + cm ['FN' ]) if (cm ['TP' ] + cm ['FN' ]) > 0 else 0 f1 = 2 * precision * recall / (precision + recall ) if (precision + recall ) > 0 else 0 print (f "\n📊 PERFORMANCE METRICS:" )print ("-" *40 )print (f " Accuracy: {accuracy*100:.1f}%" )print (f " Precision: {precision*100:.1f}%" )print (f " Recall: {recall*100:.1f}%" )print (f " F1 Score: {f1*100:.1f}%" ) print (f "\n📋 CONFUSION MATRIX:" )print ("-" *40 )print (f " Predicted" )print (f " HORIZ VERT" )print (f " Actual HORIZ {cm['TN']:3d} {cm['FP']:3d}" )print (f " Actual VERT {cm['FN']:3d} {cm['TP']:3d}" ) print (f "\n True Negatives (TN): {cm['TN']:3d} - Correctly identified horizontal" )print (f " True Positives (TP): {cm['TP']:3d} - Correctly identified vertical" )print (f " False Positives (FP): {cm['FP']:3d} - Horizontal wrongly called vertical" )print (f " False Negatives (FN): {cm['FN']:3d} - Vertical wrongly called horizontal" )1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# ============================================================================= # VISUALIZE EVALUATION RESULTS # ============================================================================= fig , axes = plt .subplots (1 , 3 , figsize =(15 , 5 )) # Plot 1: Confusion Matrix Heatmap ax = axes [0 ]cm_matrix = np .array ([[cm ['TN' ], cm ['FP' ]], [cm ['FN' ], cm ['TP' ]]])im = ax .imshow (cm_matrix , cmap ='Blues' )ax .set_xticks ([0 , 1 ])ax .set_yticks ([0 , 1 ])ax .set_xticklabels (['HORIZ (0)' , 'VERT (1)' ])ax .set_yticklabels (['HORIZ (0)' , 'VERT (1)' ])ax .set_xlabel ('Predicted' , fontsize =12 )ax .set_ylabel ('Actual' , fontsize =12 )ax .set_title ('Confusion Matrix' , fontsize =14 , fontweight ='bold' ) # Add text annotations for i in range (2 ): for j in range (2 ): text = ax .text (j , i , cm_matrix [i , j ], ha ='center' , va ='center' , fontsize =20 , fontweight ='bold' , color ='white' if cm_matrix [i , j ] > cm_matrix .max ()/2 else 'black' ) # Plot 2: Metrics Bar Chart ax = axes [1 ]metrics = ['Accuracy' , 'Precision' , 'Recall' , 'F1 Score' ]values = [accuracy *100 , precision *100 , recall *100 , f1 *100 ]colors = ['#2ecc71' , '#3498db' , '#9b59b6' , '#e74c3c' ]bars = ax .bar (metrics , values , color =colors )ax .set_ylim (0 , 105 )ax .set_ylabel ('Percentage (%)' , fontsize =12 )ax .set_title ('Performance Metrics' , fontsize =14 , fontweight ='bold' )for bar , val in zip (bars , values ): ax .text (bar .get_x () + bar .get_width ()/2 , bar .get_height () + 1 , f '{val:.1f}%' , ha ='center' , fontsize =11 , fontweight ='bold' ) # Plot 3: Sample Predictions ax = axes [2 ]ax .axis ('off' ) # Show some predictions sample_text = "SAMPLE PREDICTIONS:\n" + "=" *40 + "\n\n" for i in range (min (6 , len (X_test ))): actual = "VERT" if y_test [i ] == 1 else "HORIZ" predicted = "VERT" if test_predictions [i ] == 1 else "HORIZ" prob = model .forward (X_test [i :i +1 ])[0 , 0 ] status = "✓" if actual == predicted else "✗" sample_text += f " {status} Actual: {actual:5s} Predicted: {predicted:5s} (prob={prob:.2f})\n" ax .text (0.05 , 0.5 , sample_text , fontsize =11 , family ='monospace' , verticalalignment ='center' , transform =ax .transAxes , bbox =dict (boxstyle ='round' , facecolor ='lightyellow' , alpha =0.9 )) plt .tight_layout ()plt .show ()
9.5 Saliency: What Did the Network Learn?
Let's peek inside the trained committee's brain - what features do the hidden neurons look for?
Each hidden neuron learned to detect specific patterns. By visualizing their weights (reshaped to 3x3), we can see what they're "looking for."
How to Read the Saliency Visualizations
Each 3×3 grid shows ONE hidden neuron's "template":
Color Weight Meaning Red Positive "I get excited when this pixel is bright" Blue Negative "I get suppressed when this pixel is bright" White/Gray Near zero "I don't care about this pixel"
Patterns to Look For
Good learning: Hidden neurons specialize in different features:
Pattern Type What You'll See What It Detects Column detector One column red, others blue Vertical lines in that column Row detector One row red, others blue Horizontal lines in that row Edge detector Mixed red/blue pattern Edges or transitions General detector Mostly red or mostly blue Overall brightness level
Signs of good learning:
Different neurons have different patterns (diversity!)
Some neurons clearly detect vertical patterns
Some neurons clearly detect horizontal patterns
The W2 weights show which neurons "vote" for which class
Signs of poor learning:
All neurons look similar (no specialization)
Random-looking patterns (didn't converge)
All weights near zero (vanishing gradients)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# ============================================================================= # SALIENCY: VISUALIZE WHAT THE NETWORK LEARNED # ============================================================================= print ("=" *70 )print ("INSIDE THE COMMITTEE'S BRAIN: What Each Specialist Looks For" )print ("=" *70 ) # Get the input-to-hidden weights W1 = model .W1 # Shape: (n_hidden, n_inputs) = (8, 9) # Get the hidden-to-output weights (tells us how each specialist contributes to final decision) W2 = model .W2 .flatten () # Shape: (8,) fig , axes = plt .subplots (2 , 4 , figsize =(14 , 7 )) for i in range (model .n_hidden ): ax = axes [i // 4 , i % 4 ] # Reshape this neuron's weights to 3x3 weights = W1 [i ].reshape (3 , 3 ) # Visualize im = ax .imshow (weights , cmap ='RdBu_r' , vmin =-np .abs (weights ).max (), vmax =np .abs (weights ).max ()) # Title with contribution direction direction = "→VERT" if W2 [i ] > 0 else "→HORIZ" ax .set_title (f 'Specialist {i+1} {direction}\n(W2={W2[i]:.2f})' , fontsize =10 ) ax .axis ('off' ) # Add colorbar for first one if i == 3 : plt .colorbar (im , ax =ax , fraction =0.046 , pad =0.04 ) plt .suptitle ('Hidden Neuron Weights: Red = positive, Blue = negative\n' '→VERT means this neuron votes for VERTICAL, →HORIZ for HORIZONTAL' , fontsize =12 , fontweight ='bold' ) plt .tight_layout ()plt .show () print ("" "INTERPRETATION :════════════════════════════════════════════════════════════════════════ Each 3x3 heatmap shows what ONE hidden neuron "looks for" : • RED pixels : This neuron gets EXCITED when these pixels are bright • BLUE pixels : This neuron gets INHIBITED when these pixels are bright The "→VERT" or "→HORIZ" shows how this specialist votes in the final decision : • →VERT specialists contribute to "vertical" prediction when activated • →HORIZ specialists contribute to "horizontal" prediction when activated Look for patterns ! Some specialists might look for : • Vertical column patterns (bright red in one column ) • Horizontal row patterns (bright red in one row ) • Edge detectors (mixed red /blue patterns ) "" ")
9.6 Interactive Dashboard: Experiment Yourself!
Try different hyperparameters and see how they affect performance.
Hyperparameter What It Controls Trade-off Hidden neurons Model complexity More = can learn more, but risk overfitting Learning rate Step size Higher = faster but less stable Noise level Data difficulty Higher = harder to learn
Experiments to Try
Experiment 1: Varying Model Complexity
Hidden Neurons Expected Result 2 May underfit - not enough capacity 8 Good balance - our default 32 May overfit - watch train/val gap
Experiment 2: Varying Learning Rate
Learning Rate Expected Result 0.01 Very slow convergence 0.5 Fast, stable (our default) 2.0 May oscillate or diverge
Experiment 3: Varying Noise Level
Noise Level Expected Result 0.0 Near-perfect accuracy (too easy!) 0.15 Challenging but learnable 0.4 Very difficult, accuracy drops
What to Watch For
Healthy training: Train and val curves decrease together, then flatten
Overfitting: Train keeps improving, val gets worse (gap grows)
Underfitting: Both curves stay high and flat
Instability: Curves jump around wildly (reduce learning rate)
cell 020 full lab recommended1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# ============================================================================= # INTERACTIVE DASHBOARD: EXPERIMENT WITH HYPERPARAMETERS # ============================================================================= def run_experiment (n_hidden =8 , learning_rate =0.5 , noise_level =0.15 , n_samples =300 , seed =42 ): """Run a complete experiment with given hyperparameters.""" print ("=" *70 ) print (f "EXPERIMENT: hidden={n_hidden}, lr={learning_rate}, noise={noise_level}" ) print ("=" *70 ) # Create data (X_tr , y_tr ), (X_v , y_v ), (X_te , y_te ) = create_train_val_test_split ( n_total =n_samples , noise_level =noise_level , seed =seed ) # Create and train model exp_model = NeuralNetwork (n_inputs =9 , n_hidden =n_hidden , n_outputs =1 , seed =seed ) exp_model .train (X_tr , y_tr , X_v , y_v , learning_rate =learning_rate , epochs =200 , early_stopping_patience =20 , verbose =False ) # Evaluate test_loss , test_acc = exp_model .evaluate (X_te , y_te ) # Visualize fig , axes = plt .subplots (1 , 2 , figsize =(12 , 4 )) epochs = range (1 , len (exp_model .train_loss_history ) + 1 ) ax = axes [0 ] ax .plot (epochs , exp_model .train_loss_history , 'b-' , label ='Train' ) ax .plot (epochs , exp_model .val_loss_history , 'r-' , label ='Val' ) ax .axvline (exp_model .best_epoch +1 , color ='g' , linestyle ='--' , label =f 'Best: {exp_model.best_epoch+1}' ) ax .set_xlabel ('Epoch' ) ax .set_ylabel ('Loss' ) ax .set_title (f 'Training Progress\nFinal Test Acc: {test_acc*100:.1f}%' , fontweight ='bold' ) ax .legend () ax .grid (True , alpha =0.3 ) ax = axes [1 ] ax .plot (epochs , [a *100 for a in exp_model .train_acc_history ], 'b-' , label ='Train' ) ax .plot (epochs , [a *100 for a in exp_model .val_acc_history ], 'r-' , label ='Val' ) ax .set_xlabel ('Epoch' ) ax .set_ylabel ('Accuracy (%)' ) ax .set_title (f 'Accuracy Progress\nStopped at epoch {len(epochs)}' , fontweight ='bold' ) ax .legend () ax .grid (True , alpha =0.3 ) ax .set_ylim (40 , 105 ) plt .tight_layout () plt .show () return test_acc # Interactive widgets (if available) if WIDGETS_AVAILABLE : print ("Interactive dashboard available! Adjust sliders and click 'Run Experiment'.\n" ) hidden_slider = widgets .IntSlider (value =8 , min =2 , max =32 , step =2 , description ='Hidden:' ) lr_slider = widgets .FloatSlider (value =0.5 , min =0.01 , max =2.0 , step =0.1 , description ='Learn Rate:' ) noise_slider = widgets .FloatSlider (value =0.15 , min =0.0 , max =0.5 , step =0.05 , description ='Noise:' ) def on_button_click (b ): clear_output (wait =True ) display (widgets .VBox ([hidden_slider , lr_slider , noise_slider , run_button ])) run_experiment (hidden_slider .value , lr_slider .value , noise_slider .value ) run_button = widgets .Button (description ='Run Experiment' ) run_button .on_click (on_button_click ) display (widgets .VBox ([hidden_slider , lr_slider , noise_slider , run_button ])) else : print ("Widgets not available. Running preset experiments instead.\n" ) # Run a few preset experiments print ("\n" + "=" *70 ) print ("PRESET EXPERIMENTS" ) print ("=" *70 ) experiments = [ {"n_hidden" : 4 , "learning_rate" : 0.5 , "noise_level" : 0.1 , "desc" : "Simple model, low noise" }, {"n_hidden" : 16 , "learning_rate" : 0.5 , "noise_level" : 0.3 , "desc" : "Complex model, high noise" }, ] for exp in experiments : print (f "\n>>> {exp['desc']}" ) run_experiment (exp ['n_hidden' ], exp ['learning_rate' ], exp ['noise_level' ])
Part 9 Summary: The Complete Journey
Mission Accomplished!
We set out in Part 0 to build a neural network that could classify vertical and horizontal lines. Now we have:
Component Implementation Part Referenced Data representation 3x3 images → 9-element vectors Part 1 (Matrices) Network architecture 9 → 8 (ReLU) → 1 (Sigmoid) Parts 2, 3, 7 Forward propagation Matrix operations + activations Parts 4, 7 Loss function Binary Cross-Entropy Part 5 Training Backpropagation + Gradient Descent Part 5 Evaluation Accuracy, Confusion Matrix, F1 Part 6 Overfitting prevention Early stopping + proper sizing Part 8 Interpretability Weight visualization Part 6
The Committee Analogy Complete
Part Committee Story 0 Introduced the committee concept 1 Learned the language (matrices) 2 First committee member joins 3 Member learns to vote (activation) 4 First attempt at decisions 5 Learning from mistakes 6 Evaluating performance 7 Full committee assembled 8 Growing pains addressed 9 Complete, working committee!
Key Takeaways
Neural networks are simple at their core - Just matrix multiplications and non-linear functions
Training is optimization - Find weights that minimize loss on training data
Generalization is the goal - Performance on unseen data is what matters
Architecture matters - Right-sized models with proper activations work best
Monitoring is essential - Track train AND validation metrics
Common Mistakes to Avoid
Mistake Consequence How to Avoid No validation set Can't detect overfitting Always split your data Using test data to tune Overly optimistic results Keep test data completely separate Wrong activation for output Invalid predictions Sigmoid for binary, softmax for multi-class Too large model for data Overfitting Start small, increase if underfitting No shuffling Biased splits Always shuffle before splitting Ignoring learning curves Miss problems Plot train/val loss every time
The Complete Neural Network Checklist
Before Training:
During Training:
After Training:
Knowledge Check 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# ============================================================================= # KNOWLEDGE CHECK - Part 9 (Final Review) # ============================================================================= print ("FINAL KNOWLEDGE CHECK - Complete Neural Network Understanding" )print ("=" *70 ) questions = [ { "q" : "1. In our complete network (9→8→1), what does the '8' represent?" , "options" : [ "A) The number of training examples" , "B) The number of hidden neurons (specialists)" , "C) The learning rate" , "D) The number of epochs" ], "answer" : "B" , "explanation" : "The 8 represents hidden neurons - the 'specialists' in our committee who detect different patterns in the input." }, { "q" : "2. Why do we use ReLU for hidden layers and Sigmoid for output?" , "options" : [ "A) Random choice - they're interchangeable" , "B) ReLU prevents vanishing gradients; Sigmoid gives probability output" , "C) Sigmoid is faster than ReLU" , "D) ReLU only works for hidden layers" ], "answer" : "B" , "explanation" : "ReLU (derivative=1 when active) prevents vanishing gradients in deep networks. Sigmoid maps to (0,1) which we interpret as probability." }, { "q" : "3. What is the purpose of the validation set?" , "options" : [ "A) Extra training data" , "B) Final performance evaluation" , "C) Tune hyperparameters and detect overfitting" , "D) Test the code works" ], "answer" : "C" , "explanation" : "Validation set is used during training to tune hyperparameters and detect overfitting (early stopping). Test set is for final evaluation." }, { "q" : "4. What does early stopping prevent?" , "options" : [ "A) Underfitting" , "B) Overfitting" , "C) Slow training" , "D) Memory issues" ], "answer" : "B" , "explanation" : "Early stopping stops training when validation loss starts increasing, preventing the model from memorizing training data (overfitting)." }, { "q" : "5. In the saliency visualization, what do red pixels in a hidden neuron's weights mean?" , "options" : [ "A) Errors in that pixel" , "B) The neuron is broken" , "C) The neuron gets excited when those pixels are bright" , "D) Those pixels are ignored" ], "answer" : "C" , "explanation" : "Positive (red) weights mean the neuron responds strongly when those input pixels are bright. Negative (blue) weights mean inhibition." }, { "q" : "6. What's the complete pipeline for using a neural network?" , "options" : [ "A) Train → Test → Deploy" , "B) Data → Train → Evaluate → Deploy" , "C) Code → Train → Done" , "D) Data (split) → Train (with val monitoring) → Evaluate (on test) → Interpret" ], "answer" : "D" , "explanation" : "The complete pipeline: Split data (train/val/test), train with validation monitoring, evaluate on test set, then interpret/deploy." } ] for q in questions : print (f "\n{q['q']}" ) for opt in q ["options" ]: print (f " {opt}" ) print ("\n" + "=" *70 )print ("Scroll down for answers..." )print ("=" *70 )# ANSWERS print ("ANSWERS - Final Knowledge Check" )print ("=" *70 )for i , q in enumerate (questions , 1 ): print (f "\n{i}. Answer: {q['answer']}" ) print (f " {q['explanation']}" )
What's Next?
Congratulations! You've completed the full implementation of a neural network from scratch!
You now understand:
How neural networks represent and process data
How they learn through backpropagation
How to evaluate and interpret their decisions
How to prevent common pitfalls like overfitting
Coming Up in Part 10: The Future
The final notebook will explore:
What other problems can neural networks solve?
CNNs - Convolutional Neural Networks for images
RNNs - Recurrent Neural Networks for sequences
Transformers - The architecture behind modern AI
Resources for continued learning
Continue to Part 10: part_10_whats_next.ipynb
Congratulations!
You've built a working neural network from absolute scratch!
🎉 MISSION ACCOMPLISHED! 🎉
From matrices to mastery in 9 parts:
Part 0: The Mission → Introduced the problem
Part 1: Matrices → The language of data
Part 2: Single Neuron → The building block
Part 3: Activations → Making decisions
Part 4: Perceptron → First predictions
Part 5: Training → Learning from mistakes
Part 6: Evaluation → Measuring success
Part 7: Hidden Layers → The full committee
Part 8: Challenges → Overcoming obstacles
Part 9: Implementation → COMPLETE SYSTEM!
You are now ready for deep learning frameworks
like PyTorch and TensorFlow!
"The Brain's Decision Committee is fully operational."