AgenticWorks

A community for developers awakening to agentic AI. Hands-on lessons, enterprise-grade context engineering, and a forum that earns its quiet.

Platform

  • Learn
  • Forum
  • Showcase

Project

  • About

Community

  • Network
  • Code of conduct

Field reports

Monthly notes on what shipped, what broke, and what we learned.

© 2026 AgenticWorks. Built in public.

AgenticWorks
LearnShowcaseForumCommunity
Sign in

Track 1 · ML foundations

Brain's Decision Committee
  1. 01The first neuron
  2. 02A single neuron
  3. 03Activation functions
  4. 04The perceptron
  5. 05Training
  6. 06Evaluation
  7. 07Hidden layers
  8. 08Deep learning challenges
  9. 09Full implementation
  10. 10What's next
End to endPart 9 · 65 min · advanced

Mastery

Assemble the complete neural network class, data pipeline, training loop, evaluation, and dashboard.

Open in ColabDownload notebookFull lab fallback
Kernel: ColdSections: 0/12

Neural Network Fundamentals

Part 9: Full Implementation - Mastery

The Brain's Decision Committee - Chapter 9


The Complete Journey

We've come a long way! From understanding matrices to building neurons, from single perceptrons to multi-layer networks, from training basics to handling deep learning challenges - now it's time to bring everything together.

"The complete, trained committee works in harmony. All the lessons learned, all the challenges overcome, unified into one elegant solution."


What You'll Learn in Part 9

By the end of this notebook, you will have:

  1. A Complete Neural Network Class - All concepts unified in clean, documented code
  2. A Full Data Pipeline - Train/validation/test splits with proper handling
  3. A Robust Training Pipeline - With validation monitoring and early stopping
  4. Complete Evaluation - All metrics, confusion matrix, and saliency visualization
  5. Interactive Dashboard - Experiment with hyperparameters in real-time
  6. The Final V/H Classifier - Our mission accomplished!

Prerequisites

This is the culmination notebook - you should have completed:

  • Part 0-1: Matrices and fundamentals
  • Part 2: Single neurons
  • Part 3: Activation functions
  • Part 4: The Perceptron
  • Part 5: Training
  • Part 6: Evaluation
  • Part 7: Hidden layers
  • Part 8: Deep learning challenges

Concepts We're Unifying

PartConceptHow We'll Use It
1Matrices, dot productData representation, weight operations
2Neuron anatomyBuilding blocks of our network
3Activation functionsReLU for hidden, sigmoid for output
4Forward passMaking predictions
5Loss, gradients, backpropLearning from mistakes
6Metrics, saliencyEvaluating and understanding
7Hidden layersMultiple specialists
8Overfitting preventionEarly stopping, proper sizing

Setup: Import Dependencies

cell 003
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# =============================================================================# PART 9: FULL IMPLEMENTATION - SETUP# ============================================================================= import numpy as npimport matplotlib.pyplot as pltfrom IPython.display import display, clear_output # Try to import ipywidgets for interactive featurestry:    import ipywidgets as widgets    WIDGETS_AVAILABLE = Trueexcept ImportError:    WIDGETS_AVAILABLE = False    print("Note: ipywidgets not installed. Interactive features will be limited.") # Set up matplotlib stylestyle_options = ['seaborn-v0_8-whitegrid', 'seaborn-whitegrid', 'ggplot', 'default']for style in style_options:    try:        plt.style.use(style)        break    except OSError:        continue plt.rcParams['figure.figsize'] = [10, 6]plt.rcParams['font.size'] = 12 print("="*70)print("PART 9: FULL IMPLEMENTATION")print("The Complete V/H Line Classifier")print("="*70)

9.1 The Complete Neural Network Class

This is the unified implementation incorporating everything we've learned:

FeaturePart LearnedImplementation
Activation functionsPart 3ReLU for hidden, Sigmoid for output
Forward propagationParts 4, 7Matrix operations through layers
Loss functionPart 5Binary Cross-Entropy
BackpropagationParts 5, 7Chain rule through all layers
Validation monitoringPart 8Track train/val metrics
Early stoppingPart 8Stop when val loss increases

Why This Architecture?

Input (9) → Hidden (8, ReLU) → Output (1, Sigmoid)

LayerSizeActivationWhy?
Input9NoneOne neuron per pixel (3×3 = 9)
Hidden8ReLUEnough specialists without overfitting; ReLU prevents vanishing gradients
Output1SigmoidBinary classification needs probability in (0,1)

Why Two Different Initializations?

We use different initialization strategies for different activations:

InitializationFormulaUsed ForWhy?
Hew∼N(0,2/nin)w \sim N(0, \sqrt{2/n_{in}})w∼N(0,2/nin​​)ReLU layersReLU "kills" half the neurons (negative z), so we need 2× variance
Xavierw∼N(0,1/nin)w \sim N(0, \sqrt{1/n_{in}})w∼N(0,1/nin​​)Sigmoid/TanhThese are symmetric around 0, so standard variance works

Using the wrong initialization can cause:

  • Too small: Signals shrink through layers (vanishing)
  • Too large: Signals explode through layers (exploding)
cell 005full lab recommended
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# =============================================================================# THE COMPLETE NEURAL NETWORK CLASS# ============================================================================= class NeuralNetwork:    """    Complete Neural Network implementation for binary classification.        This class unifies all concepts from Parts 1-8:    - Matrix operations (Part 1)    - Neuron anatomy (Part 2)    - Activation functions (Part 3)    - Forward propagation (Part 4)    - Training with backprop (Part 5)    - Evaluation metrics (Part 6)    - Hidden layers (Part 7)    - Overfitting prevention (Part 8)        Architecture: Input → Hidden (ReLU) → Output (Sigmoid)    """        # =========================================================================    # ACTIVATION FUNCTIONS (Part 3)    # =========================================================================        @staticmethod    def sigmoid(z):        """Sigmoid: maps to (0, 1) - used for output layer (Part 3.3)"""        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))        @staticmethod    def sigmoid_derivative(z):        """Derivative of sigmoid: σ(z) * (1 - σ(z)) (Part 3.3.1)"""        s = NeuralNetwork.sigmoid(z)        return s * (1 - s)        @staticmethod    def relu(z):        """ReLU: max(0, z) - used for hidden layers (Part 3.5)"""        return np.maximum(0, z)        @staticmethod    def relu_derivative(z):        """Derivative of ReLU: 1 if z > 0, else 0 (Part 3.5)"""        return (z > 0).astype(float)        # =========================================================================    # INITIALIZATION (Part 7 - Xavier/He initialization)    # =========================================================================        def __init__(self, n_inputs, n_hidden, n_outputs=1, seed=None):        """        Initialize the neural network.                Parameters:            n_inputs: Number of input features (9 for 3x3 images)            n_hidden: Number of hidden neurons (the "specialists")            n_outputs: Number of outputs (1 for binary classification)            seed: Random seed for reproducibility        """        if seed is not None:            np.random.seed(seed)                self.n_inputs = n_inputs        self.n_hidden = n_hidden        self.n_outputs = n_outputs                # He initialization for ReLU layers (Part 8 - proper initialization)        self.W1 = np.random.randn(n_hidden, n_inputs) * np.sqrt(2.0 / n_inputs)        self.b1 = np.zeros(n_hidden)                # Xavier initialization for sigmoid output        self.W2 = np.random.randn(n_outputs, n_hidden) * np.sqrt(1.0 / n_hidden)        self.b2 = np.zeros(n_outputs)                # Cache for forward pass (needed for backprop)        self.cache = {}                # Training history        self.train_loss_history = []        self.val_loss_history = []        self.train_acc_history = []        self.val_acc_history = []                # Best model weights (for early stopping)        self.best_weights = None        self.best_val_loss = float('inf')        self.best_epoch = 0        # =========================================================================    # FORWARD PROPAGATION (Parts 4, 7)    # =========================================================================        def forward(self, X):        """        Forward pass: Input → Hidden (ReLU) → Output (Sigmoid)                The "Committee Meeting" - each specialist examines the evidence,        then the final decision maker combines their opinions.        """        # Ensure X is 2D        X = np.atleast_2d(X)                # Layer 1: Input → Hidden (with ReLU - Part 3.5)        self.cache['X'] = X        self.cache['Z1'] = np.dot(X, self.W1.T) + self.b1  # (batch, n_hidden)        self.cache['A1'] = self.relu(self.cache['Z1'])      # ReLU activation                # Layer 2: Hidden → Output (with Sigmoid - Part 3.3)        self.cache['Z2'] = np.dot(self.cache['A1'], self.W2.T) + self.b2  # (batch, n_outputs)        self.cache['A2'] = self.sigmoid(self.cache['Z2'])                  # Sigmoid for probability                return self.cache['A2']        def predict(self, X):        """Make binary predictions (0 or 1)."""        probs = self.forward(X)        return (probs >= 0.5).astype(int).flatten()        # =========================================================================    # LOSS FUNCTION (Part 5.3 - Binary Cross-Entropy)    # =========================================================================        def compute_loss(self, y_true, y_pred):        """        Binary Cross-Entropy loss (Part 5.3)                Measures "surprise" - how unexpected the predictions are.        """        epsilon = 1e-15  # Prevent log(0)        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)        y_true = y_true.reshape(-1, 1)        loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))        return loss        # =========================================================================    # BACKPROPAGATION (Parts 5.6, 5.7, 7.4)    # =========================================================================        def backward(self, y_true, learning_rate):        """        Backpropagation: Compute gradients and update weights.                The "Blame Assignment" - tracing errors back through the committee.        """        m = len(y_true)        y_true = y_true.reshape(-1, 1)                # Output layer gradients (Part 5.6)        dZ2 = self.cache['A2'] - y_true  # (batch, n_outputs)        dW2 = np.dot(dZ2.T, self.cache['A1']) / m        db2 = np.mean(dZ2, axis=0)                # Hidden layer gradients (Part 7.4 - chain rule)        dA1 = np.dot(dZ2, self.W2)        dZ1 = dA1 * self.relu_derivative(self.cache['Z1'])        dW1 = np.dot(dZ1.T, self.cache['X']) / m        db1 = np.mean(dZ1, axis=0)                # Update weights (Gradient Descent - Part 5.4)        self.W2 -= learning_rate * dW2        self.b2 -= learning_rate * db2        self.W1 -= learning_rate * dW1        self.b1 -= learning_rate * db1        # =========================================================================    # EVALUATION (Part 6)    # =========================================================================        def evaluate(self, X, y):        """Compute loss and accuracy on a dataset."""        y_pred = self.forward(X)        loss = self.compute_loss(y, y_pred)        predictions = (y_pred >= 0.5).astype(int).flatten()        accuracy = np.mean(predictions == y)        return loss, accuracy        def confusion_matrix(self, X, y):        """Compute confusion matrix (Part 6.3)."""        predictions = self.predict(X)        TP = np.sum((predictions == 1) & (y == 1))        TN = np.sum((predictions == 0) & (y == 0))        FP = np.sum((predictions == 1) & (y == 0))        FN = np.sum((predictions == 0) & (y == 1))        return {'TP': TP, 'TN': TN, 'FP': FP, 'FN': FN}        # =========================================================================    # TRAINING WITH EARLY STOPPING (Parts 5.8, 8.2)    # =========================================================================        def train(self, X_train, y_train, X_val=None, y_val=None,               learning_rate=0.1, epochs=100, early_stopping_patience=10,              verbose=True):        """        Train the neural network with optional early stopping.                Parameters:            X_train, y_train: Training data            X_val, y_val: Validation data (for early stopping)            learning_rate: Step size for gradient descent (Part 5.5)            epochs: Maximum training iterations            early_stopping_patience: Stop if val loss doesn't improve (Part 8.2)            verbose: Print progress        """        self.train_loss_history = []        self.val_loss_history = []        self.train_acc_history = []        self.val_acc_history = []                patience_counter = 0                for epoch in range(epochs):            # Forward pass            self.forward(X_train)                        # Backward pass (learning)            self.backward(y_train, learning_rate)                        # Evaluate training            train_loss, train_acc = self.evaluate(X_train, y_train)            self.train_loss_history.append(train_loss)            self.train_acc_history.append(train_acc)                        # Evaluate validation (if provided)            if X_val is not None:                val_loss, val_acc = self.evaluate(X_val, y_val)                self.val_loss_history.append(val_loss)                self.val_acc_history.append(val_acc)                                # Early stopping check (Part 8.2)                if val_loss < self.best_val_loss:                    self.best_val_loss = val_loss                    self.best_epoch = epoch                    self.best_weights = {                        'W1': self.W1.copy(), 'b1': self.b1.copy(),                        'W2': self.W2.copy(), 'b2': self.b2.copy()                    }                    patience_counter = 0                else:                    patience_counter += 1                                if patience_counter >= early_stopping_patience:                    if verbose:                        print(f"\n  Early stopping at epoch {epoch+1}!")                        print(f"  Best epoch was {self.best_epoch+1} with val_loss={self.best_val_loss:.4f}")                    self._restore_best_weights()                    break                        # Progress output            if verbose and (epoch + 1) % 20 == 0:                msg = f"  Epoch {epoch+1:3d}: Train Loss={train_loss:.4f}, Train Acc={train_acc*100:.1f}%"                if X_val is not None:                    msg += f", Val Loss={val_loss:.4f}, Val Acc={val_acc*100:.1f}%"                print(msg)                if verbose:            final_acc = self.train_acc_history[-1]            print(f"\nTraining complete! Final train accuracy: {final_acc*100:.1f}%")            if X_val is not None:                print(f"Best validation loss: {self.best_val_loss:.4f} at epoch {self.best_epoch+1}")                return self        def _restore_best_weights(self):        """Restore weights from best epoch."""        if self.best_weights is not None:            self.W1 = self.best_weights['W1']            self.b1 = self.best_weights['b1']            self.W2 = self.best_weights['W2']            self.b2 = self.best_weights['b2'] print("NeuralNetwork class defined!")print("This combines ALL concepts from Parts 1-8.")

Understanding Key Implementation Details

Why do we use a cache dictionary?

During backpropagation, we need values from the forward pass:

  • X - the input, needed to compute gradients for W1
  • Z1 - pre-activation of hidden layer, needed for ReLU derivative
  • A1 - hidden activations, needed to compute gradients for W2
  • Z2, A2 - output layer values for computing output gradients

Without caching, we'd have to recompute forward pass during backward pass (wasteful!).

Why save best_weights separately?

Early stopping works by:

  1. Training for many epochs
  2. Saving weights whenever validation loss improves
  3. Restoring the best weights at the end

If we only kept current weights, we'd lose the best model when we continue training past the optimal point.

Why use np.atleast_2d(X)?

This ensures our math works for both:

  • Single sample: shape (9,) → (1, 9)
  • Batch of samples: shape (batch, 9) → unchanged

Matrix multiplication requires 2D arrays, so this handles both cases gracefully.


9.2 The Complete Data Pipeline

A proper data pipeline includes:

StepPurposePart Referenced
Data GenerationCreate V/H line imagesPart 4
Train/Val/Test SplitSeparate data for different purposesPart 6, 8
ShufflingPrevent order-based patternsPart 5

Why Three Splits?

SplitPurposeUsed For
Training (60%)Learn patternsBackpropagation
Validation (20%)Tune hyperparametersEarly stopping, model selection
Test (20%)Final evaluationReport true performance

Key Rule: NEVER use test data during training or tuning!

Why These Specific Percentages?

60/20/20 is a common starting point, but it depends on your data:

Dataset SizeRecommended SplitReasoning
Small (<500)60/20/20Need enough validation/test for reliable estimates
Medium (500-10K)70/15/15Can afford more training data
Large (>10K)80/10/10Even 10% gives hundreds of test samples

For our 300 samples:

  • 180 training (60%) → Enough to learn V/H patterns
  • 60 validation (20%) → Enough to detect overfitting
  • 60 test (20%) → Enough for reliable accuracy estimate

Why Shuffle the Data?

Without shuffling, disaster can strike!

Imagine our data is generated in order:

Samples 1-150:   All VERTICAL
Samples 151-300: All HORIZONTAL

If we split 60/20/20 without shuffling:

  • Training (1-180): 150 vertical, 30 horizontal (imbalanced!)
  • Validation (181-240): 0 vertical, 60 horizontal (all one class!)
  • Test (241-300): 0 vertical, 60 horizontal (all one class!)

The model would learn wrong patterns and evaluation would be meaningless!

Shuffling ensures each split has a representative mix of both classes.

cell 008
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# =============================================================================# THE COMPLETE DATA PIPELINE# ============================================================================= def generate_line_dataset(n_samples=100, noise_level=0.0, seed=None):    """    Generate vertical (1) and horizontal (0) line images.        This is the dataset we've been working with throughout the series.    Our "mission" from Part 0: classify these images correctly!        Parameters:        n_samples: Total number of images to generate        noise_level: Amount of random noise (0.0 = clean, 0.3 = noisy)        seed: Random seed for reproducibility        Returns:        X: Array of flattened 3x3 images, shape (n_samples, 9)        y: Labels (1=vertical, 0=horizontal), shape (n_samples,)    """    if seed is not None:        np.random.seed(seed)        X, y = [], []        for i in range(n_samples):        image = np.zeros((3, 3))                if i < n_samples // 2:            # Vertical line - can be in ANY column            col = np.random.randint(0, 3)            image[:, col] = 1            label = 1        else:            # Horizontal line - can be in ANY row            row = np.random.randint(0, 3)            image[row, :] = 1            label = 0                # Add noise if specified        if noise_level > 0:            image = np.clip(image + np.random.randn(3, 3) * noise_level, 0, 1)                X.append(image.flatten())  # Flatten to 1D (Part 2)        y.append(label)        X, y = np.array(X), np.array(y)        # Shuffle (Part 5)    shuffle_idx = np.random.permutation(n_samples)    return X[shuffle_idx], y[shuffle_idx]  def create_train_val_test_split(n_total=300, noise_level=0.1, seed=42):    """    Create proper train/validation/test splits.        Split ratios: 60% train, 20% validation, 20% test    """    np.random.seed(seed)        # Generate all data    X, y = generate_line_dataset(n_total, noise_level=noise_level, seed=seed)        # Calculate split indices    n_train = int(n_total * 0.6)    n_val = int(n_total * 0.2)        # Split    X_train, y_train = X[:n_train], y[:n_train]    X_val, y_val = X[n_train:n_train+n_val], y[n_train:n_train+n_val]    X_test, y_test = X[n_train+n_val:], y[n_train+n_val:]        return (X_train, y_train), (X_val, y_val), (X_test, y_test)  # Create our datasetsprint("="*70)print("CREATING THE COMPLETE DATASET")print("="*70) (X_train, y_train), (X_val, y_val), (X_test, y_test) = create_train_val_test_split(    n_total=300, noise_level=0.15, seed=42) print(f"\nDataset created with 15% noise:")print(f"  Training:   {len(X_train)} samples ({sum(y_train)} vertical, {len(y_train)-sum(y_train)} horizontal)")print(f"  Validation: {len(X_val)} samples ({sum(y_val)} vertical, {len(y_val)-sum(y_val)} horizontal)")print(f"  Test:       {len(X_test)} samples ({sum(y_test)} vertical, {len(y_test)-sum(y_test)} horizontal)")print(f"\nTotal: {len(X_train) + len(X_val) + len(X_test)} samples")
cell 009
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# =============================================================================# VISUALIZE SAMPLE IMAGES FROM OUR DATASET# ============================================================================= fig, axes = plt.subplots(2, 5, figsize=(12, 5)) # Show 5 vertical and 5 horizontal examplesv_indices = np.where(y_train == 1)[0][:5]h_indices = np.where(y_train == 0)[0][:5] for i, idx in enumerate(v_indices):    ax = axes[0, i]    ax.imshow(X_train[idx].reshape(3, 3), cmap='Blues', vmin=0, vmax=1)    ax.set_title('VERTICAL', fontsize=10)    ax.axis('off') for i, idx in enumerate(h_indices):    ax = axes[1, i]    ax.imshow(X_train[idx].reshape(3, 3), cmap='Oranges', vmin=0, vmax=1)    ax.set_title('HORIZONTAL', fontsize=10)    ax.axis('off') plt.suptitle('Our Mission: Classify These 3x3 Images\n(With 15% Noise)',              fontsize=14, fontweight='bold')plt.tight_layout()plt.show() print("""OUR MISSION (from Part 0):════════════════════════════════════════════════════════════════════════ Build a neural network that can correctly classify these images as:  • VERTICAL (1) - line goes up-down  • HORIZONTAL (0) - line goes left-right The challenge: Noise makes the patterns harder to detect!The committee must learn to see through the noise.""")

9.3 Training the Complete Network

Now we train our neural network using everything we've learned:

SettingValueWhy (Part Reference)
Hidden neurons8Enough for patterns, not too many (Part 8 - overfitting)
Learning rate0.5Fast but stable (Part 5)
Epochs200Enough to learn, with early stopping (Part 8)
Early stopping patience20Stop if no improvement for 20 epochs
Activation (hidden)ReLUPrevents vanishing gradients (Parts 3, 8)
Activation (output)SigmoidGives probability (Part 3)

How We Chose These Values

Hidden neurons = 8:

Our data has 9 inputs and 2 classes. Rule of thumb:

  • Minimum: 2-4 (can represent basic patterns)
  • Our choice: 8 (room for multiple pattern detectors)
  • Maximum: ~20 for 180 training samples (avoid overfitting)

Why 8 works: We need neurons to detect "left column", "middle column", "right column" for vertical, plus "top row", "middle row", "bottom row" for horizontal. 6-8 neurons can capture these patterns.

Learning rate = 0.5:

Learning RateBehavior
Too low (0.001)Very slow, may not converge in 200 epochs
Good (0.1 - 1.0)Learns quickly, stable
Too high (5.0)Overshoots, unstable, may diverge

For small networks with BCE loss, 0.5 is often a good starting point.

Epochs = 200 with patience = 20:

  • 200 is a maximum "budget" of training steps
  • Patience of 20 means: "Stop if validation doesn't improve for 20 epochs"
  • This combination lets us train long enough to converge, but stops early if we're overfitting

Understanding Parameter Count

Total parameters = (input × hidden) + hidden + (hidden × output) + output
                 = (9 × 8) + 8 + (8 × 1) + 1
                 = 72 + 8 + 8 + 1 = 89 parameters

Rule of thumb: You want at least 10× more training samples than parameters.

  • We have 180 training samples
  • We have 89 parameters
  • Ratio: 180/89 ≈ 2× (borderline, which is why we use early stopping!)
cell 011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# =============================================================================# TRAIN THE COMPLETE NETWORK# ============================================================================= print("="*70)print("TRAINING THE NEURAL NETWORK")print("="*70) # Create the networkmodel = NeuralNetwork(    n_inputs=9,      # 3x3 image = 9 pixels    n_hidden=8,      # 8 specialists in our committee    n_outputs=1,     # Binary output (V or H)    seed=42) print(f"\nNetwork Architecture:")print(f"  Input layer:  {model.n_inputs} neurons (one per pixel)")print(f"  Hidden layer: {model.n_hidden} neurons (ReLU activation)")print(f"  Output layer: {model.n_outputs} neuron (Sigmoid activation)")print(f"  Total parameters: {9*8 + 8 + 8*1 + 1} = {9*8 + 8 + 8*1 + 1}") print("\n" + "-"*70)print("Training with early stopping...")print("-"*70) # Train!model.train(    X_train, y_train,    X_val, y_val,    learning_rate=0.5,    epochs=200,    early_stopping_patience=20,    verbose=True) print("\n" + "="*70)print("TRAINING COMPLETE!")print("="*70)

How Training Works: The Complete Flow

Here's what happens during each training epoch:

┌─────────────────────────────────────────────────────────────────────┐
│                        ONE TRAINING EPOCH                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. FORWARD PASS (Make predictions)                                │
│     Input X → [W1×X + b1] → ReLU → [W2×H + b2] → Sigmoid → Output │
│                    ↓                      ↓                        │
│               Cache Z1, A1            Cache Z2, A2                 │
│                                                                     │
│  2. COMPUTE LOSS                                                   │
│     BCE = -mean(y×log(ŷ) + (1-y)×log(1-ŷ))                        │
│                                                                     │
│  3. BACKWARD PASS (Compute gradients)                              │
│     ∂L/∂W2 ← output error × hidden activations (from cache)       │
│     ∂L/∂W1 ← hidden error × input (chain rule through ReLU)       │
│                                                                     │
│  4. UPDATE WEIGHTS                                                 │
│     W1 ← W1 - lr × ∂L/∂W1                                         │
│     W2 ← W2 - lr × ∂L/∂W2                                         │
│                                                                     │
│  5. EVALUATE                                                       │
│     Compute train loss/accuracy                                    │
│     Compute val loss/accuracy                                      │
│                                                                     │
│  6. EARLY STOPPING CHECK                                           │
│     If val_loss improved → save weights                            │
│     If no improvement for `patience` epochs → stop & restore best │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

This process repeats until:

  • Maximum epochs reached, OR
  • Early stopping triggers (no validation improvement)
cell 013
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# =============================================================================# VISUALIZE TRAINING PROGRESS# ============================================================================= fig, axes = plt.subplots(1, 2, figsize=(14, 5)) epochs = range(1, len(model.train_loss_history) + 1) # Plot 1: Loss curvesax = axes[0]ax.plot(epochs, model.train_loss_history, 'b-', label='Training Loss', linewidth=2)ax.plot(epochs, model.val_loss_history, 'r-', label='Validation Loss', linewidth=2)ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,           label=f'Best epoch ({model.best_epoch+1})')ax.set_xlabel('Epoch', fontsize=12)ax.set_ylabel('Loss (BCE)', fontsize=12)ax.set_title('Training Progress: Loss', fontsize=14, fontweight='bold')ax.legend()ax.grid(True, alpha=0.3) # Plot 2: Accuracy curvesax = axes[1]ax.plot(epochs, [a*100 for a in model.train_acc_history], 'b-',         label='Training Accuracy', linewidth=2)ax.plot(epochs, [a*100 for a in model.val_acc_history], 'r-',         label='Validation Accuracy', linewidth=2)ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,           label=f'Best epoch ({model.best_epoch+1})')ax.set_xlabel('Epoch', fontsize=12)ax.set_ylabel('Accuracy (%)', fontsize=12)ax.set_title('Training Progress: Accuracy', fontsize=14, fontweight='bold')ax.legend()ax.grid(True, alpha=0.3)ax.set_ylim(40, 105) plt.tight_layout()plt.show() print("""TRAINING INSIGHTS:════════════════════════════════════════════════════════════════════════ • Training and validation curves should stay close (no overfitting!)• Early stopping saved the best model before potential overfitting• The committee learned the V/H pattern effectively""")

9.4 Complete Evaluation

Now we evaluate our trained model on the test set - data it has NEVER seen during training or validation. This is the true measure of generalization.

Evaluation Metrics (Part 6)

MetricWhat It Measures
AccuracyOverall correctness
PrecisionOf predicted positives, how many are correct?
RecallOf actual positives, how many did we find?
F1 ScoreHarmonic mean of precision and recall
Confusion MatrixDetailed breakdown of TP, TN, FP, FN

What Do "Good" Values Look Like?

MetricPoorOkayGoodExcellent
Accuracy<60%60-75%75-90%>90%
F1 Score<0.50.5-0.70.7-0.9>0.9

For our V/H classifier:

  • With 15% noise, >85% accuracy is quite good
  • Balanced precision/recall indicates no systematic bias
  • Similar train/val/test accuracy indicates good generalization

Reading the Confusion Matrix for Insights

The confusion matrix tells us not just HOW MANY errors, but WHAT KIND:

ScenarioMeaningPossible Cause
High FP (false alarm)Saying "vertical" too oftenModel is too sensitive to vertical patterns
High FN (misses)Missing vertical linesModel isn't detecting vertical patterns well
Balanced errorsFP ≈ FNModel is "confused" by noise, not biased

Ideal: Most values on the diagonal (TN, TP), minimal off-diagonal (FP, FN).

cell 015
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# =============================================================================# COMPLETE EVALUATION ON TEST SET# ============================================================================= print("="*70)print("FINAL EVALUATION ON TEST SET")print("(Data the model has NEVER seen!)")print("="*70) # Get predictionstest_predictions = model.predict(X_test) # Confusion matrixcm = model.confusion_matrix(X_test, y_test) # Calculate metricsaccuracy = (cm['TP'] + cm['TN']) / len(y_test)precision = cm['TP'] / (cm['TP'] + cm['FP']) if (cm['TP'] + cm['FP']) > 0 else 0recall = cm['TP'] / (cm['TP'] + cm['FN']) if (cm['TP'] + cm['FN']) > 0 else 0f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0 print(f"\n📊 PERFORMANCE METRICS:")print("-"*40)print(f"  Accuracy:  {accuracy*100:.1f}%")print(f"  Precision: {precision*100:.1f}%")print(f"  Recall:    {recall*100:.1f}%")print(f"  F1 Score:  {f1*100:.1f}%") print(f"\n📋 CONFUSION MATRIX:")print("-"*40)print(f"                  Predicted")print(f"              HORIZ    VERT")print(f"  Actual HORIZ  {cm['TN']:3d}     {cm['FP']:3d}")print(f"  Actual VERT   {cm['FN']:3d}     {cm['TP']:3d}") print(f"\n  True Negatives (TN):  {cm['TN']:3d} - Correctly identified horizontal")print(f"  True Positives (TP):  {cm['TP']:3d} - Correctly identified vertical")print(f"  False Positives (FP): {cm['FP']:3d} - Horizontal wrongly called vertical")print(f"  False Negatives (FN): {cm['FN']:3d} - Vertical wrongly called horizontal")
cell 016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# =============================================================================# VISUALIZE EVALUATION RESULTS# ============================================================================= fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Plot 1: Confusion Matrix Heatmapax = axes[0]cm_matrix = np.array([[cm['TN'], cm['FP']], [cm['FN'], cm['TP']]])im = ax.imshow(cm_matrix, cmap='Blues')ax.set_xticks([0, 1])ax.set_yticks([0, 1])ax.set_xticklabels(['HORIZ (0)', 'VERT (1)'])ax.set_yticklabels(['HORIZ (0)', 'VERT (1)'])ax.set_xlabel('Predicted', fontsize=12)ax.set_ylabel('Actual', fontsize=12)ax.set_title('Confusion Matrix', fontsize=14, fontweight='bold') # Add text annotationsfor i in range(2):    for j in range(2):        text = ax.text(j, i, cm_matrix[i, j], ha='center', va='center',                       fontsize=20, fontweight='bold',                      color='white' if cm_matrix[i, j] > cm_matrix.max()/2 else 'black') # Plot 2: Metrics Bar Chartax = axes[1]metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']values = [accuracy*100, precision*100, recall*100, f1*100]colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c']bars = ax.bar(metrics, values, color=colors)ax.set_ylim(0, 105)ax.set_ylabel('Percentage (%)', fontsize=12)ax.set_title('Performance Metrics', fontsize=14, fontweight='bold')for bar, val in zip(bars, values):    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,             f'{val:.1f}%', ha='center', fontsize=11, fontweight='bold') # Plot 3: Sample Predictionsax = axes[2]ax.axis('off') # Show some predictionssample_text = "SAMPLE PREDICTIONS:\n" + "="*40 + "\n\n"for i in range(min(6, len(X_test))):    actual = "VERT" if y_test[i] == 1 else "HORIZ"    predicted = "VERT" if test_predictions[i] == 1 else "HORIZ"    prob = model.forward(X_test[i:i+1])[0, 0]    status = "✓" if actual == predicted else "✗"    sample_text += f"  {status} Actual: {actual:5s}  Predicted: {predicted:5s}  (prob={prob:.2f})\n" ax.text(0.05, 0.5, sample_text, fontsize=11, family='monospace',        verticalalignment='center', transform=ax.transAxes,        bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.9)) plt.tight_layout()plt.show()

9.5 Saliency: What Did the Network Learn?

Let's peek inside the trained committee's brain - what features do the hidden neurons look for?

Each hidden neuron learned to detect specific patterns. By visualizing their weights (reshaped to 3x3), we can see what they're "looking for."

How to Read the Saliency Visualizations

Each 3×3 grid shows ONE hidden neuron's "template":

ColorWeightMeaning
RedPositive"I get excited when this pixel is bright"
BlueNegative"I get suppressed when this pixel is bright"
White/GrayNear zero"I don't care about this pixel"

Patterns to Look For

Good learning: Hidden neurons specialize in different features:

Pattern TypeWhat You'll SeeWhat It Detects
Column detectorOne column red, others blueVertical lines in that column
Row detectorOne row red, others blueHorizontal lines in that row
Edge detectorMixed red/blue patternEdges or transitions
General detectorMostly red or mostly blueOverall brightness level

Signs of good learning:

  • Different neurons have different patterns (diversity!)
  • Some neurons clearly detect vertical patterns
  • Some neurons clearly detect horizontal patterns
  • The W2 weights show which neurons "vote" for which class

Signs of poor learning:

  • All neurons look similar (no specialization)
  • Random-looking patterns (didn't converge)
  • All weights near zero (vanishing gradients)
cell 018
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# =============================================================================# SALIENCY: VISUALIZE WHAT THE NETWORK LEARNED# ============================================================================= print("="*70)print("INSIDE THE COMMITTEE'S BRAIN: What Each Specialist Looks For")print("="*70) # Get the input-to-hidden weightsW1 = model.W1  # Shape: (n_hidden, n_inputs) = (8, 9) # Get the hidden-to-output weights (tells us how each specialist contributes to final decision)W2 = model.W2.flatten()  # Shape: (8,) fig, axes = plt.subplots(2, 4, figsize=(14, 7)) for i in range(model.n_hidden):    ax = axes[i // 4, i % 4]        # Reshape this neuron's weights to 3x3    weights = W1[i].reshape(3, 3)        # Visualize    im = ax.imshow(weights, cmap='RdBu_r', vmin=-np.abs(weights).max(), vmax=np.abs(weights).max())        # Title with contribution direction    direction = "→VERT" if W2[i] > 0 else "→HORIZ"    ax.set_title(f'Specialist {i+1} {direction}\n(W2={W2[i]:.2f})', fontsize=10)    ax.axis('off')        # Add colorbar for first one    if i == 3:        plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04) plt.suptitle('Hidden Neuron Weights: Red = positive, Blue = negative\n'             '→VERT means this neuron votes for VERTICAL, →HORIZ for HORIZONTAL',              fontsize=12, fontweight='bold')plt.tight_layout()plt.show() print("""INTERPRETATION:════════════════════════════════════════════════════════════════════════ Each 3x3 heatmap shows what ONE hidden neuron "looks for":  • RED pixels: This neuron gets EXCITED when these pixels are bright  • BLUE pixels: This neuron gets INHIBITED when these pixels are bright The "→VERT" or "→HORIZ" shows how this specialist votes in the final decision:  • →VERT specialists contribute to "vertical" prediction when activated  • →HORIZ specialists contribute to "horizontal" prediction when activated Look for patterns! Some specialists might look for:  • Vertical column patterns (bright red in one column)  • Horizontal row patterns (bright red in one row)  • Edge detectors (mixed red/blue patterns)""")

9.6 Interactive Dashboard: Experiment Yourself!

Try different hyperparameters and see how they affect performance.

HyperparameterWhat It ControlsTrade-off
Hidden neuronsModel complexityMore = can learn more, but risk overfitting
Learning rateStep sizeHigher = faster but less stable
Noise levelData difficultyHigher = harder to learn

Experiments to Try

Experiment 1: Varying Model Complexity

Hidden NeuronsExpected Result
2May underfit - not enough capacity
8Good balance - our default
32May overfit - watch train/val gap

Experiment 2: Varying Learning Rate

Learning RateExpected Result
0.01Very slow convergence
0.5Fast, stable (our default)
2.0May oscillate or diverge

Experiment 3: Varying Noise Level

Noise LevelExpected Result
0.0Near-perfect accuracy (too easy!)
0.15Challenging but learnable
0.4Very difficult, accuracy drops

What to Watch For

  • Healthy training: Train and val curves decrease together, then flatten
  • Overfitting: Train keeps improving, val gets worse (gap grows)
  • Underfitting: Both curves stay high and flat
  • Instability: Curves jump around wildly (reduce learning rate)
cell 020full lab recommended
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# =============================================================================# INTERACTIVE DASHBOARD: EXPERIMENT WITH HYPERPARAMETERS# ============================================================================= def run_experiment(n_hidden=8, learning_rate=0.5, noise_level=0.15, n_samples=300, seed=42):    """Run a complete experiment with given hyperparameters."""        print("="*70)    print(f"EXPERIMENT: hidden={n_hidden}, lr={learning_rate}, noise={noise_level}")    print("="*70)        # Create data    (X_tr, y_tr), (X_v, y_v), (X_te, y_te) = create_train_val_test_split(        n_total=n_samples, noise_level=noise_level, seed=seed    )        # Create and train model    exp_model = NeuralNetwork(n_inputs=9, n_hidden=n_hidden, n_outputs=1, seed=seed)    exp_model.train(X_tr, y_tr, X_v, y_v,                     learning_rate=learning_rate, epochs=200,                     early_stopping_patience=20, verbose=False)        # Evaluate    test_loss, test_acc = exp_model.evaluate(X_te, y_te)        # Visualize    fig, axes = plt.subplots(1, 2, figsize=(12, 4))        epochs = range(1, len(exp_model.train_loss_history) + 1)        ax = axes[0]    ax.plot(epochs, exp_model.train_loss_history, 'b-', label='Train')    ax.plot(epochs, exp_model.val_loss_history, 'r-', label='Val')    ax.axvline(exp_model.best_epoch+1, color='g', linestyle='--', label=f'Best: {exp_model.best_epoch+1}')    ax.set_xlabel('Epoch')    ax.set_ylabel('Loss')    ax.set_title(f'Training Progress\nFinal Test Acc: {test_acc*100:.1f}%', fontweight='bold')    ax.legend()    ax.grid(True, alpha=0.3)        ax = axes[1]    ax.plot(epochs, [a*100 for a in exp_model.train_acc_history], 'b-', label='Train')    ax.plot(epochs, [a*100 for a in exp_model.val_acc_history], 'r-', label='Val')    ax.set_xlabel('Epoch')    ax.set_ylabel('Accuracy (%)')    ax.set_title(f'Accuracy Progress\nStopped at epoch {len(epochs)}', fontweight='bold')    ax.legend()    ax.grid(True, alpha=0.3)    ax.set_ylim(40, 105)        plt.tight_layout()    plt.show()        return test_acc # Interactive widgets (if available)if WIDGETS_AVAILABLE:    print("Interactive dashboard available! Adjust sliders and click 'Run Experiment'.\n")        hidden_slider = widgets.IntSlider(value=8, min=2, max=32, step=2, description='Hidden:')    lr_slider = widgets.FloatSlider(value=0.5, min=0.01, max=2.0, step=0.1, description='Learn Rate:')    noise_slider = widgets.FloatSlider(value=0.15, min=0.0, max=0.5, step=0.05, description='Noise:')        def on_button_click(b):        clear_output(wait=True)        display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))        run_experiment(hidden_slider.value, lr_slider.value, noise_slider.value)        run_button = widgets.Button(description='Run Experiment')    run_button.on_click(on_button_click)        display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))else:    print("Widgets not available. Running preset experiments instead.\n")        # Run a few preset experiments    print("\n" + "="*70)    print("PRESET EXPERIMENTS")    print("="*70)        experiments = [        {"n_hidden": 4, "learning_rate": 0.5, "noise_level": 0.1, "desc": "Simple model, low noise"},        {"n_hidden": 16, "learning_rate": 0.5, "noise_level": 0.3, "desc": "Complex model, high noise"},    ]        for exp in experiments:        print(f"\n>>> {exp['desc']}")        run_experiment(exp['n_hidden'], exp['learning_rate'], exp['noise_level'])

Part 9 Summary: The Complete Journey

Mission Accomplished!

We set out in Part 0 to build a neural network that could classify vertical and horizontal lines. Now we have:

ComponentImplementationPart Referenced
Data representation3x3 images → 9-element vectorsPart 1 (Matrices)
Network architecture9 → 8 (ReLU) → 1 (Sigmoid)Parts 2, 3, 7
Forward propagationMatrix operations + activationsParts 4, 7
Loss functionBinary Cross-EntropyPart 5
TrainingBackpropagation + Gradient DescentPart 5
EvaluationAccuracy, Confusion Matrix, F1Part 6
Overfitting preventionEarly stopping + proper sizingPart 8
InterpretabilityWeight visualizationPart 6

The Committee Analogy Complete

PartCommittee Story
0Introduced the committee concept
1Learned the language (matrices)
2First committee member joins
3Member learns to vote (activation)
4First attempt at decisions
5Learning from mistakes
6Evaluating performance
7Full committee assembled
8Growing pains addressed
9Complete, working committee!

Key Takeaways

  1. Neural networks are simple at their core - Just matrix multiplications and non-linear functions
  2. Training is optimization - Find weights that minimize loss on training data
  3. Generalization is the goal - Performance on unseen data is what matters
  4. Architecture matters - Right-sized models with proper activations work best
  5. Monitoring is essential - Track train AND validation metrics

Common Mistakes to Avoid

MistakeConsequenceHow to Avoid
No validation setCan't detect overfittingAlways split your data
Using test data to tuneOverly optimistic resultsKeep test data completely separate
Wrong activation for outputInvalid predictionsSigmoid for binary, softmax for multi-class
Too large model for dataOverfittingStart small, increase if underfitting
No shufflingBiased splitsAlways shuffle before splitting
Ignoring learning curvesMiss problemsPlot train/val loss every time

The Complete Neural Network Checklist

Before Training:

  • Data shuffled and split (train/val/test)
  • Model architecture chosen (appropriate size)
  • Activation functions set (ReLU hidden, sigmoid output)
  • Weights initialized (He for ReLU, Xavier for sigmoid)

During Training:

  • Monitoring both train AND validation loss
  • Early stopping configured
  • Learning rate reasonable (start with 0.1-1.0)

After Training:

  • Evaluate on TEST set (not validation!)
  • Check confusion matrix for error patterns
  • Visualize learned features if possible
  • Compare train/val/test accuracy for overfitting signs

Knowledge Check

cell 022
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# =============================================================================# KNOWLEDGE CHECK - Part 9 (Final Review)# ============================================================================= print("FINAL KNOWLEDGE CHECK - Complete Neural Network Understanding")print("="*70) questions = [    {        "q": "1. In our complete network (9→8→1), what does the '8' represent?",        "options": [            "A) The number of training examples",            "B) The number of hidden neurons (specialists)",            "C) The learning rate",            "D) The number of epochs"        ],        "answer": "B",        "explanation": "The 8 represents hidden neurons - the 'specialists' in our committee who detect different patterns in the input."    },    {        "q": "2. Why do we use ReLU for hidden layers and Sigmoid for output?",        "options": [            "A) Random choice - they're interchangeable",            "B) ReLU prevents vanishing gradients; Sigmoid gives probability output",            "C) Sigmoid is faster than ReLU",            "D) ReLU only works for hidden layers"        ],        "answer": "B",        "explanation": "ReLU (derivative=1 when active) prevents vanishing gradients in deep networks. Sigmoid maps to (0,1) which we interpret as probability."    },    {        "q": "3. What is the purpose of the validation set?",        "options": [            "A) Extra training data",            "B) Final performance evaluation",            "C) Tune hyperparameters and detect overfitting",            "D) Test the code works"        ],        "answer": "C",        "explanation": "Validation set is used during training to tune hyperparameters and detect overfitting (early stopping). Test set is for final evaluation."    },    {        "q": "4. What does early stopping prevent?",        "options": [            "A) Underfitting",            "B) Overfitting",            "C) Slow training",            "D) Memory issues"        ],        "answer": "B",        "explanation": "Early stopping stops training when validation loss starts increasing, preventing the model from memorizing training data (overfitting)."    },    {        "q": "5. In the saliency visualization, what do red pixels in a hidden neuron's weights mean?",        "options": [            "A) Errors in that pixel",            "B) The neuron is broken",            "C) The neuron gets excited when those pixels are bright",            "D) Those pixels are ignored"        ],        "answer": "C",        "explanation": "Positive (red) weights mean the neuron responds strongly when those input pixels are bright. Negative (blue) weights mean inhibition."    },    {        "q": "6. What's the complete pipeline for using a neural network?",        "options": [            "A) Train → Test → Deploy",            "B) Data → Train → Evaluate → Deploy",            "C) Code → Train → Done",            "D) Data (split) → Train (with val monitoring) → Evaluate (on test) → Interpret"        ],        "answer": "D",        "explanation": "The complete pipeline: Split data (train/val/test), train with validation monitoring, evaluate on test set, then interpret/deploy."    }] for q in questions:    print(f"\n{q['q']}")    for opt in q["options"]:        print(f"   {opt}") print("\n" + "="*70)print("Scroll down for answers...")print("="*70)
cell 023
1
2
3
4
5
6
# ANSWERSprint("ANSWERS - Final Knowledge Check")print("="*70)for i, q in enumerate(questions, 1):    print(f"\n{i}. Answer: {q['answer']}")    print(f"   {q['explanation']}")

What's Next?

Congratulations! You've completed the full implementation of a neural network from scratch!

You now understand:

  • How neural networks represent and process data
  • How they learn through backpropagation
  • How to evaluate and interpret their decisions
  • How to prevent common pitfalls like overfitting

Coming Up in Part 10: The Future

The final notebook will explore:

  • What other problems can neural networks solve?
  • CNNs - Convolutional Neural Networks for images
  • RNNs - Recurrent Neural Networks for sequences
  • Transformers - The architecture behind modern AI
  • Resources for continued learning

Continue to Part 10: part_10_whats_next.ipynb


Congratulations!

You've built a working neural network from absolute scratch!

              🎉 MISSION ACCOMPLISHED! 🎉
    
    From matrices to mastery in 9 parts:
    
    Part 0: The Mission          → Introduced the problem
    Part 1: Matrices             → The language of data
    Part 2: Single Neuron        → The building block
    Part 3: Activations          → Making decisions
    Part 4: Perceptron           → First predictions
    Part 5: Training             → Learning from mistakes
    Part 6: Evaluation           → Measuring success
    Part 7: Hidden Layers        → The full committee
    Part 8: Challenges           → Overcoming obstacles
    Part 9: Implementation       → COMPLETE SYSTEM!
    
    You are now ready for deep learning frameworks
    like PyTorch and TensorFlow!

"The Brain's Decision Committee is fully operational."

Illustrated step

Data pipeline

concept

Organized case intake

Training, validation, and test examples each serve a different purpose.

Early stopping

concept

Stop before memorizing

Training halts when validation performance stops improving.

Dashboard

concept

Committee control room

Parameters become knobs learners can test directly.

AI tutor

Tutor chat is staged for the next slice. For now, use the concept cards and run cells to test each idea directly.

Pinned output

Plots and code output render under each cell. Pinning outputs to this rail will land once the core runner is evaluated.